HSL and its out-of-core solver

Similar documents

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

A note on fast approximate minimum degree orderings for symmetric matrices with some dense rows

6. Cholesky factorization

AN OUT-OF-CORE SPARSE SYMMETRIC INDEFINITE FACTORIZATION METHOD

7 Gaussian Elimination and LU Factorization

Matrix Multiplication

Direct Methods for Solving Linear Systems. Matrix Factorization

Solution of Linear Systems

SOLVING LINEAR SYSTEMS

General Framework for an Iterative Solution of Ax b. Jacobi s Method

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

AMS526: Numerical Analysis I (Numerical Linear Algebra)

It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU

High Performance Matrix Inversion with Several GPUs

Modification of the Minimum-Degree Algorithm by Multiple Elimination

Notes on Cholesky Factorization

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

8 Square matrices continued: Determinants

Mathematical Libraries and Application Software on JUROPA and JUQUEEN

Chapter 12 File Management

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

Chapter 12 File Management. Roadmap

Factorization Theorems

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Evaluation of CUDA Fortran for the CFD code Strukti

Multicore Parallel Computing with OpenMP

Adaptive Stable Additive Methods for Linear Algebraic Calculations

1 Bull, 2011 Bull Extreme Computing

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Recommended hardware system configurations for ANSYS users

4/1/2017. PS. Sequences and Series FROM 9.2 AND 9.3 IN THE BOOK AS WELL AS FROM OTHER SOURCES. TODAY IS NATIONAL MANATEE APPRECIATION DAY

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

PARALLEL ALGORITHMS FOR PREDICTIVE MODELLING

Lecture 3: Finding integer solutions to systems of linear equations

THE NAS KERNEL BENCHMARK PROGRAM

RevoScaleR Speed and Scalability

Algorithmic Research and Software Development for an Industrial Strength Sparse Matrix Library for Parallel Computers

ANSYS Solvers: Usage and Performance. Ansys equation solvers: usage and guidelines. Gene Poole Ansys Solvers Team, April, 2002

Arithmetic and Algebra of Matrices

Chapter 7: Additional Topics

Mathematical Libraries on JUQUEEN. JSC Training Course

Linear Programming. March 14, 2014

64-Bit versus 32-Bit CPUs in Scientific Computing

PROBLEMS (Cap. 4 - Istruzioni macchina)

Operation Count; Numerical Linear Algebra

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Record Storage and Primary File Organization

High Performance Computing for Operation Research

Section IV.1: Recursive Algorithms and Recursion Trees

Good FORTRAN Programs

Vector and Matrix Norms

by the matrix A results in a vector which is a reflection of the given

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

A numerically adaptive implementation of the simplex method

Notes on Determinant

Simple File Input & Output

Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures

Solving Linear Systems, Continued and The Inverse of a Matrix

1 Solving LPs: The Simplex Algorithm of George Dantzig

Solving Systems of Linear Equations

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

HY345 Operating Systems

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

SPARSE matrices are commonly present in computer memory

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

CSE 6040 Computing for Data Analytics: Methods and Tools

A Lab Course on Computer Architecture

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Simple Fortran Multitasking Library for the Apple Macintosh Computer

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

A Survey of Out-of-Core Algorithms in Numerical Linear Algebra

Solving Systems of Linear Equations Using Matrices

Parallel and Distributed Computing Programming Assignment 1

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

Physical Data Organization

Spring 2011 Prof. Hyesoon Kim

Solving polynomial least squares problems via semidefinite programming relaxations

PES Institute of Technology-BSC QUESTION BANK

The Characteristic Polynomial

Optimizing matrix multiplication Amitabha Banerjee

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Performance Evaluation and Optimization of A Custom Native Linux Threads Library

Sources: On the Web: Slides will be available on:

PROGRAMMABLE LOGIC CONTROL

Techniques of the simplex basis LU factorization update

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Linear Algebra Notes

Parallelization: Binary Tree Traversal

Analysis of Compression Algorithms for Program Data

Symbol Tables. Introduction

3 P0 P0 P3 P3 8 P1 P0 P2 P3 P1 P2

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

Advanced Computational Software

Transcription:

HSL and its out-of-core solver Jennifer A. Scott j.a.scott@rl.ac.uk Prague November 2006 p. 1/37

Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many entries are zero it is worthwhile to exploit these zeros. s p a r s e Prague November 2006 p. 2/37

Sparse matrices Many application areas in science, engineering, and finance lead to sparse systems computational fluid dynamics chemical engineering circuit simulation economic modelling fluid flow oceanography linear programming structural engineering... But all have different patterns and characteristics. Prague November 2006 p. 3/37

Circuit simulation circuit3 0 2000 4000 6000 8000 10000 12000 0 2000 4000 6000 nz = 48137 8000 10000 12000 Prague November 2006 p. 4/37

Reservoir modelling 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 nz = 3474 Prague November 2006 p. 5/37

Economic modelling 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 nz = 7682 Prague November 2006 p. 6/37

Structural engineering 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 2000 4000 6000 nz = 428650 8000 10000 Prague November 2006 p. 7/37

Acoustics 0 2000 4000 6000 8000 10000 12000 0 2000 4000 6000 8000 nz = 342828 10000 12000 Prague November 2006 p. 8/37

Chemical engineering 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000 nz = 14677 Prague November 2006 p. 9/37

Linear programming 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 nz = 4841 Prague November 2006 p. 10/37

Direct methods Direct methods involve explicit factorization eg PAQ = LU L, U lower and upper triangular matrices P, Q are permutation matrices Solution process completed by triangular solves Ly = Pb and Uz = y then x = Qz If A is sparse, it is crucial to try to ensure L and U are sparse. Prague November 2006 p. 11/37

Direct methods Direct methods involve explicit factorization eg PAQ = LU L, U lower and upper triangular matrices P, Q are permutation matrices Solution process completed by triangular solves Ly = Pb and Uz = y then x = Qz If A is sparse, it is crucial to try to ensure L and U are sparse. Suppose A is n n with nz nonzeros. Gaussian elimination for dense problem requires O(n 2 ) storage and O(n 3 ) flops. Hence infeasible for large n. Target complexity for sparse matrix computations is O(n) + O(nz). Prague November 2006 p. 11/37

Direct solvers Most sparse direct solvers have a number of phases, typically ORDER: preorder the matrix to exploit structure ANALYSE: analyse matrix structure to produce data structures for factorization FACTORIZE: perform numerical factorization SOLVE: use factors to solve one or more systems Writing an efficient direct solver is non-trivial so let someone else do it! Prague November 2006 p. 12/37

Mathematical software libraries Benefits and advantages of using high quality mathematical software libraries include: Shorten application development cycle, cutting time-to-market and gaining competitive advantage Reduce overall development costs More time to focus on specialist aspects of applications Improve application accuracy and robustness Fully supported and maintained software Prague November 2006 p. 13/37

HSL HSL began as Harwell Subroutine Library in 1963. Collection of portable, fully documented and tested Fortran packages. Primarily written and developed by Numerical Analysis Group at RAL. Each package performs a basic numerical task (eg solve linear system, find eigenvalues) and has been designed to be incorporated into programs. Particular strengths in: sparse matrix computations optimization large-scale system solution Prague November 2006 p. 14/37

HSL HSL began as Harwell Subroutine Library in 1963. Collection of portable, fully documented and tested Fortran packages. Primarily written and developed by Numerical Analysis Group at RAL. Each package performs a basic numerical task (eg solve linear system, find eigenvalues) and has been designed to be incorporated into programs. Particular strengths in: sparse matrix computations optimization large-scale system solution HSL has international reputation for reliability and efficiency. It is used by academics and commercial organisations and has been incorporated into large number of commercial products. Prague November 2006 p. 14/37

Development of HSL HSL is both revolutionary and evolutionary. Revolutionary: some codes are radically different in technique and algorithm design, including MA18: First sparse direct code (1971) MA27: First multifrontal code (1982) Prague November 2006 p. 15/37

Development of HSL HSL is both revolutionary and evolutionary. Revolutionary: some codes are radically different in technique and algorithm design, including MA18: First sparse direct code (1971) MA27: First multifrontal code (1982) Evolutionary: some codes evolve (major algorithm developments, language changes, added functionality... ) eg MA18 MA28 MA48 (unsymmetric sparse systems) MA17 MA27 MA57 (symmetric sparse systems) Prague November 2006 p. 15/37

Organisation of HSL Since 2000, HSL divided into the main HSL library and the HSL Archive HSL Archive consists of older packages that have been superseded either by improved HSL packages (eg MA28 superseded by MA48 and MA27 by MA57) or by public domain libraries such as LAPACK HSL Archive is free to all for non-commercial use but its use is not supported Prague November 2006 p. 16/37

Organisation of HSL Since 2000, HSL divided into the main HSL library and the HSL Archive HSL Archive consists of older packages that have been superseded either by improved HSL packages (eg MA28 superseded by MA48 and MA27 by MA57) or by public domain libraries such as LAPACK HSL Archive is free to all for non-commercial use but its use is not supported New release of HSL every 2-3 years... currently HSL 2004 HSL is marketed by HyproTech UK (part of AspenTech) www.hyprotech.com/hsl/ Prague November 2006 p. 16/37

The latest HSL sparse solver Problem sizes constantly grow larger 40 years ago large might have meant order 10 2 Today order > 10 7 not unusual For direct methods storage requirements grow more rapidly than problem size Prague November 2006 p. 17/37

The latest HSL sparse solver Problem sizes constantly grow larger 40 years ago large might have meant order 10 2 Today order > 10 7 not unusual For direct methods storage requirements grow more rapidly than problem size Possible options: Iterative method... but preconditioner? Combine iterative and direct methods? Buy a bigger machine... but expensive and inflexible Use an out-of-core solver Prague November 2006 p. 17/37

The latest HSL sparse solver Problem sizes constantly grow larger 40 years ago large might have meant order 10 2 Today order > 10 7 not unusual For direct methods storage requirements grow more rapidly than problem size Possible options: Iterative method... but preconditioner? Combine iterative and direct methods? Buy a bigger machine... but expensive and inflexible Use an out-of-core solver An out-of-core solver holds the matrix factors in files and may also hold the matrix data and some work arrays in files. Prague November 2006 p. 17/37

Out-of-core solvers Idea of out-of-core solvers not new: band and frontal solvers developed in 1970s and 1980s held matrix data and factors out-of-core. For example, MA32 in HSL (superseded in 1990s by MA42). 30 years ago John Reid developed a Cholesky out-of-core multifrontal code TREESOLV for element applications. Prague November 2006 p. 18/37

Out-of-core solvers Idea of out-of-core solvers not new: band and frontal solvers developed in 1970s and 1980s held matrix data and factors out-of-core. For example, MA32 in HSL (superseded in 1990s by MA42). 30 years ago John Reid developed a Cholesky out-of-core multifrontal code TREESOLV for element applications. More recent codes include: BCSEXT-LIB (Boeing) Oblio (Dobrian and Pothen) TAUCS (Toledo and students) Prague November 2006 p. 18/37

Out-of-core solvers Idea of out-of-core solvers not new: band and frontal solvers developed in 1970s and 1980s held matrix data and factors out-of-core. For example, MA32 in HSL (superseded in 1990s by MA42). 30 years ago John Reid developed a Cholesky out-of-core multifrontal code TREESOLV for element applications. More recent codes include: BCSEXT-LIB (Boeing) Oblio (Dobrian and Pothen) TAUCS (Toledo and students) Our new out-of-core solver is HSL MA77 Prague November 2006 p. 18/37

Key features of HSL MA77 HSL MA77 is designed to solve LARGE sparse symmetric systems Matrix data, matrix factor, and the main work space (optionally) held in files First release for positive definite problems (Cholesky A = LL T ); next release also for indefinite problems Matrix A may be either in assembled form or a sum of element matrices Prague November 2006 p. 19/37

Key features of HSL MA77 HSL MA77 is designed to solve LARGE sparse symmetric systems Matrix data, matrix factor, and the main work space (optionally) held in files First release for positive definite problems (Cholesky A = LL T ); next release also for indefinite problems Matrix A may be either in assembled form or a sum of element matrices A = m k=1 A (k) where A (k) has nonzeros in a small number of rows and columns and corresponds to the matrix from element k. Prague November 2006 p. 19/37

Key features of HSL MA77 HSL MA77 is designed to solve LARGE sparse symmetric systems Matrix data, matrix factor, and the main work space (optionally) held in files First release for positive definite problems (Cholesky A = LL T ); next release also for indefinite problems Matrix A may be either in assembled form or a sum of element matrices Reverse communication interface with input by rows or by elements Prague November 2006 p. 19/37

Key features of HSL MA77 HSL MA77 is designed to solve LARGE sparse symmetric systems Matrix data, matrix factor, and the main work space (optionally) held in files First release for positive definite problems (Cholesky A = LL T ); next release also for indefinite problems Matrix A may be either in assembled form or a sum of element matrices Reverse communication interface with input by rows or by elements HSL MA77 implements a multifrontal algorithm Prague November 2006 p. 19/37

Basic multifrontal algorithm Assume that A is a sum of element matrices. Basic multifrontal algorithm may be described as follows: Given a pivot sequence: do for each pivot assemble all elements that contain the pivot into a dense matrix; eliminate the pivot and any other variables that are found only here; treat the reduced matrix as a new generated element end do Prague November 2006 p. 20/37

Multifrontal method ASSEMBLY TREE Each leaf node represents an original element. Each non-leaf node represents set of eliminations and the corresponding generated element Prague November 2006 p. 21/37

Multifrontal method At each non-leaf node F F 11 12 F T 12 F 22 Pivot can only be chosen from F 11 block since F 22 is NOT fully summed. F 22 F 22 F T 12F 1 11 F 12 Prague November 2006 p. 22/37

Summary multifrontal method Pass element from children to parent Prague November 2006 p. 23/37

Summary multifrontal method Pass element from children to parent At parent perform ASSEMBLY into dense matrix Prague November 2006 p. 23/37

Summary multifrontal method Pass element from children to parent At parent perform ASSEMBLY into dense matrix Then perform ELIMINATIONS using dense Gaussian elimination (allows Level 3 BLAS TRSM and GEMM) Prague November 2006 p. 23/37

Summary multifrontal method Pass element from children to parent At parent perform ASSEMBLY into dense matrix Then perform ELIMINATIONS using dense Gaussian elimination (allows Level 3 BLAS TRSM and GEMM) Prague November 2006 p. 23/37

Language HSL is a Fortran library HSL MA77 written in Fortran 95, PLUS we use allocatable structure components and dummy arguments (part of Fortran 2003, implemented by current compilers). Advantages of using allocatables: more efficient than using pointers pointers must allow for the array being associated with an array section (eg a(i,:)) that is not a contiguous part of its parent optimization of a loop involving a pointer may be inhibited by the possibility that its target is also accessed in another way in the loop avoids the memory-leakage dangers of pointers Prague November 2006 p. 24/37

Language (continued) Other features of F95 that are important in design of HSL MA77: Automatic and allocatable arrays significantly reduce complexity of code and user interface, (especially in indefinite case) We selectively use long (64-bit) integers (selected int kind(18)) Multifrontal algorithm can be naturally formulated using recursive procedures... call factor (root)... recursive subroutine factor (node)! Loop over children over node do i = 1,number_children call factor (child(i)) end do! Assemble frontal matrix and partially factorize... end subroutine factor Prague November 2006 p. 25/37

Virtual memory management Essential to our code design is our virtual memory management system This was part of the original TREESOLV package Separate package HSL OF01 handles all i/o Prague November 2006 p. 26/37

Virtual memory management Essential to our code design is our virtual memory management system This was part of the original TREESOLV package Separate package HSL OF01 handles all i/o Provides read/write facilities for one or more direct access files through a single in-core buffer (work array) Aim is to avoiding actual input-output operations whenever possible Each set of data is accessed as a virtual array i.e. as if it were a very long array Any contiguous section of the virtual array may be read or written Each virtual array is associated with a primary file If too large for a single file, one or more secondary files are used Prague November 2006 p. 26/37

Virtual memory management Buffer Virtual arrays Superfiles main_file main_file1 main_file2 temp_file In this example, two superfiles associated with the buffer First superfile has two secondaries, the second has none Prague November 2006 p. 27/37

Use of the buffer Buffer divided into fixed length pages Most recently accessed pages of the virtual array held in buffer For each page in buffer, we store: unit number of its primary file page number within corresponding virtual array Required page(s) found using simple hash function Prague November 2006 p. 28/37

Use of the buffer Buffer divided into fixed length pages Most recently accessed pages of the virtual array held in buffer For each page in buffer, we store: unit number of its primary file page number within corresponding virtual array Required page(s) found using simple hash function Aim to minimise number of i/o operations by: using wanted pages that are already in buffer first if buffer full, free the least recently accessed page only write page to file if it has changed since entry into buffer Prague November 2006 p. 28/37

Advantages Advantages of this approach for developing sparse solvers: All i/o is isolated... assists with code design, development, debugging, and maintenance User is shielded from i/o but can control where files are written and can save data for future solves i/o is not needed if user has supplied long buffer HSL OF01 can be used in development of other solvers Prague November 2006 p. 29/37

Use of HSL OF01 within HSL MA77 HSL MA77 has an integer buffer and a real buffer The integer buffer is associated with a file that holds the integer data for the input matrix and the matrix factor The real buffer is associated with two files: one holds the real data for the input matrix and the matrix factor the other is used for the multifrontal stack The indefinite case will use two further files (to hold the integer and real data associated with delayed pivots) The user must supply pathnames and filenames for all the files Prague November 2006 p. 30/37

Use of HSL OF01 within HSL MA77 HSL MA77 has an integer buffer and a real buffer The integer buffer is associated with a file that holds the integer data for the input matrix and the matrix factor The real buffer is associated with two files: one holds the real data for the input matrix and the matrix factor the other is used for the multifrontal stack The indefinite case will use two further files (to hold the integer and real data associated with delayed pivots) The user must supply pathnames and filenames for all the files NOTE: We include an option for the files to be replaced by in-core arrays (faster for problems for which user has enough memory) Prague November 2006 p. 30/37

Numerical experiments Test set of 26 problems of order up to 10 6 from a range of applications All available in University of Florida Sparse Matrix Collection Tests used double precision (64-bit) reals on a single 3.6 GHz Intel Xeon processor of a Dell Precision 670 with 4 Gbytes of RAM g95 compiler with the -O option and ATLAS BLAS and LAPACK Comparisons with flagship HSL solver MA57 (Duff) All times are wall clock times in seconds Prague November 2006 p. 31/37

Effect of varying npage and lpage npage lpage af shell3 crankseg 2 m t1 shipsec1 3200 2 9 124.1 62.8 43.1 59.9 1600 2 10 116.3 62.5 42.4 58.1 800 2 11 115.5 59.6 39.9 55.1 100 2 14 128.0 66.2 45.0 65.3 50 2 15 154.8 73.7 50.2 74.9 1600 2 11 111.1 58.1 40.6 53.0 800 2 12 110.9 58.4 40.8 53.5 400 2 13 112.7 59.8 41.5 55.6 1600 2 12 107.3 55.7 39.1 49.7 800 2 13 108.2 56.0 39.2 49.9 400 2 14 113.1 57.3 40.7 51.2 in-core 50.9 27.8 19.1 24.5 Prague November 2006 p. 32/37

Times for the different phases of HSL_MA77 Phase af shell3 cfd2 fullb thread (n = 504, 855) (n = 123, 440) (n = 199, 187) (n = 29, 736) Input 2.18 0.42 0.55 0.45 Ordering 2.57 3.54 1.30 0.66 MA77 analyse 2.18 3.71 0.63 0.68 MA77 factor(0) 70.5 29.3 82.0 24.2 MA77 factor(1) 81.5 34.4 88.3 27.0 MA77 solve(1) 15.7 6.23 11.5 3.65 MA77 solve(10) 20.8 7.44 14.2 3.81 MA77 solve(100) 73.1 23.7 45.4 10.3 AFS 91.8 43.5 95.9 29.8 AF S 88.4 42.7 90.8 28.7 Prague November 2006 p. 33/37

Factorization time compared with MA57 2 MA57 MA77 in core Time / (MA77 out of core time) 1 0.5 0.25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Problem Index Prague November 2006 p. 34/37

Solve time compared with MA57 0.1 MA57 MA77 in core Time / (MA77 out of core time) 0.07 0.05 0.04 0.03 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Problem Index Prague November 2006 p. 35/37

Complete solution time compared with MA57 2 MA57 MA77 in core Time / (MA77 out of core time) 1 0.5 0.25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Problem Index Prague November 2006 p. 36/37

Concluding remarks Writing the solver has been (and still is) a major project Positive definite code performing well Out-of-core working adds an overhead but not prohibitive Indefinite kernel currently under development (need for pivoting adds to complexity) Version for complex arithmetic will be developed Also plan version for unsymmetric problems that have (almost) symmetric structure Prague November 2006 p. 37/37

Concluding remarks Writing the solver has been (and still is) a major project Positive definite code performing well Out-of-core working adds an overhead but not prohibitive Indefinite kernel currently under development (need for pivoting adds to complexity) Version for complex arithmetic will be developed Also plan version for unsymmetric problems that have (almost) symmetric structure References: An out-of-core sparse Cholesky solver, J. K. Reid and J. A. Scott, RAL-TR-2006-013 HSL_OF01, a virtual memory system in Fortran, J. K. Reid and J. A. Scott, RAL-TR-2006-026 http://www.numerical.rl.ac.uk/reports/reports.shtml Prague November 2006 p. 37/37