Mathematical Libraries and Application Software on JUROPA and JUQUEEN

Similar documents
Mathematical Libraries on JUQUEEN. JSC Training Course

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Public Domain commercial vendor specific

Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013

Mitglied der Helmholtz-Gemeinschaft JUQUEEN. Best Practices. Florian Janetzko / Wolfgang Frings. 2. Februar 2014

Experiences of numerical simulations on a PC cluster Antti Vanne December 11, 2002

It s Not A Disease: The Parallel Solver Packages MUMPS, PaStiX & SuperLU

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions

HSL and its out-of-core solver

Athena Knowledge Base

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

AMS526: Numerical Analysis I (Numerical Linear Algebra)

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Report on Project: Advanced System Monitoring for the Parallel Tools Platform (PTP)

Service Partition Specialized Linux nodes. Compute PE Login PE Network PE System PE I/O PE

The CNMS Computer Cluster

Linux clustering. Morris Law, IT Coordinator, Science Faculty, Hong Kong Baptist University

A Crash course to (The) Bighouse

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

1 Bull, 2011 Bull Extreme Computing

Cluster Computing at HRI

Cluster Computing in a College of Criminal Justice

PARALLEL PROGRAMMING

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

MPI Hands-On List of the exercises

DARPA, NSF-NGS/ITR,ACR,CPA,

Multicore Parallel Computing with OpenMP

Matrix Multiplication

Sourcery Overview & Virtual Machine Installation

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Code Generation Tools for PDEs. Matthew Knepley PETSc Developer Mathematics and Computer Science Division Argonne National Laboratory

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Algorithmic Research and Software Development for an Industrial Strength Sparse Matrix Library for Parallel Computers

Numerical Methods I Eigenvalue Problems

HPC Wales Skills Academy Course Catalogue 2015

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

GPUs for Scientific Computing

Overview of HPC systems and software available within

SOLVING LINEAR SYSTEMS

A Grid-Aware Web Interface with Advanced Service Trading for Linear Algebra Calculations

High Performance Computing Cluster Quick Reference User Guide

Numerical Analysis. Professor Donna Calhoun. Fall 2013 Math 465/565. Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00

High Performance Matrix Inversion with Several GPUs

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

The Forthcoming Petascale Systems Era Got Tools?

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

PLGrid Infrastructure Solutions For Computational Chemistry

Debugging with TotalView

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

64-Bit versus 32-Bit CPUs in Scientific Computing

Part I Courses Syllabus

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Giac/Xcas, a swiss knife for mathematics

CUDA programming on NVIDIA GPUs

YALES2 porting on the Xeon- Phi Early results

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

Fast Multipole Method for particle interactions: an open source parallel library component

How High a Degree is High Enough for High Order Finite Elements?

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

Making the Monte Carlo Approach Even Easier and Faster. By Sergey A. Maidanov and Andrey Naraikin

Performance and Scalability of the NAS Parallel Benchmarks in Java

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours

Notes on Cholesky Factorization

Programming Languages & Tools

Best practices for efficient HPC performance with large models

Why (and Why Not) to Use Fortran

Hands-on exercise: NPB-OMP / BT

Adaptive Stable Additive Methods for Linear Algebraic Calculations

WESTMORELAND COUNTY PUBLIC SCHOOLS Integrated Instructional Pacing Guide and Checklist Computer Math

The Asynchronous Dynamic Load-Balancing Library

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Documentation Installation of the PDR code

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

Introduction Installation Comparison. Department of Computer Science, Yazd University. SageMath. A.Rahiminasab. October9, / 17

HPC enabling of OpenFOAM R for CFD applications

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

2IP WP8 Materiel Science Activity report March 6, 2013

SAGE, the open source CAS to end all CASs?

Transcription:

Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUROPA and JUQUEEN JSC Training Course May 2014 I.Gutheil

Outline General Informations Sequential Libraries Parallel Libraries and Application Systems: Threaded Libraries MPI parallel Libraries Application Software Further Information May 2014 I.Gutheil Folie 2

General Informations JUROPA (I) Three major Compiler versions Default: Intel 11.1.072 with MKL 10.2.5.035 Module: Intel 11.0.074 with MKL 10.1.0 Module: Intel 11.1.059 with MKL 10.2.2.025 Module: Intel 12.0.[3, 4, or 5] with MKL included Module: Intel 12.1.[0, 1, 2, or 4] with MKL included Module: Intel 13.1.0 or 13.1.3, MKL included module unload intel, module unload mkl before loading a new intel (and MKL) Module module switch parastation parastation/mpi2-intel12-5.0.27-1 for some new libraries necessary May 2014 I.Gutheil Folie 3

General Informations JUROPA (II) Most libraries compiled with 11.1.059 and MKL 10.2.2.025, new versions with 12.0, 12.1, or 13.1 only Module unload and module load must be called in batch scripts before execution, too Starting with intel/12.0.3 MKL is included with module load intel/12.* For most libraries versions compiled with intel/12.0.* and/or 12.1.* available Latest libraries compiled with intel/13.1.3 May 2014 I.Gutheil Folie 4

General Informations JUROPA (III) module avail lists names of available libraries module help name shows how to use library module load name prepends LD LIBRARY PATH, LIBRARY PATH and INCLUDE and sets NAME ROOT to the correct directory Link sequence important,.o always before the libraries, sometimes double linking necessary May 2014 I.Gutheil Folie 5

General Informations JUQUEEN (I) All libraries as modules in /bgsys/local/name module avail lists names of available libraries module help name tells how to use library module load name sets environment variables for -L$(* LIB) and -I$(* INCLUDE) to include in makefile Link sequence important,.o always before the libraries, sometimes double linking necessary May 2014 I.Gutheil Folie 6

General Informations (II) First all libraries will be compiled with -O3 -qstrict -g qsimd=noauto Additional version compiled without -g added also some versions with simd See module avail for available versions May 2014 I.Gutheil Folie 7

Sequential Libraries and Packages (I) Vendor specific Libraries JUROPA only MKL Intel R Math Kernel Library versions as mentioned in general informations JUQUEEN only ESSL (Engineering and Scientific Subroutine Library) version 5.1 in /bgsys/local/lib May 2014 I.Gutheil Folie 8

Sequential Libraries and Packages (II) Public domain Libraries LAPACK (Linear Algebra PACKage) ARPACK (Arnoldi PACKage) planned GSL (Gnu Scientific Library) GMP (Gnu Multiple Precision Arithmetic Library) May 2014 I.Gutheil Folie 9

Contents of Intel R MKL 10.* BLAS, Sparse BLAS, CBLAS LAPACK Iterative Sparse Solvers, Trust Region Solver Vector Math Library Vector Statistical Library Fourier Transform Functions Trigonometric Transform Functions May 2014 I.Gutheil Folie 10

Contents of Intel R MKL 10.* GMP routines Poisson Library Interface for fftw For more information see http: //www.fz-juelich.de/ias/jsc/en/expertise/support/ Software/SystemDependentLibraries/MKLJuropa.html May 2014 I.Gutheil Folie 11

Contents of ESSL Version 5.1 BLAS level 1-3 and additional vector, matrix-vector, and matrix-matrix operations Sparse vector and matrix operations LAPACK computational routines for linear equation systems and eigensystems Banded linear system solvers Linear Least Squares Fast Fourier Transforms May 2014 I.Gutheil Folie 12

Contents of ESSL Version 5.1 (II) Numerical Quadrature Random Number Generation Interpolation For further information see IBM Engineering and Scientific Subroutine Library for Linux on POWER V5.1: Guide and Reference http: //www.fz-juelich.de/ias/jsc/en/expertise/support/ Software/SystemDependentLibraries/ESSL_ESSLSMP.html Link to IBM documents Guide and Reference May 2014 I.Gutheil Folie 13

Usage of MKL (I) FORTRAN, C, and C++ callable Arrays FORTRAN like, i.e. column-first (except cblas) Three versions, old 10.1.0 and two variants of 10.2 starting with intel/12.0.3 included in intel module Compilation and linking of program name.f calling sequential MKL routines, default version: ifort name.f -o name -lmkl intel lp64 -lmkl intel thread -lmkl core -liomp5 -lpthread or ifort name.f -o name -lmkl intel lp64 -lmkl sequential -lmkl core -liomp5 -lpthread May 2014 I.Gutheil Folie 14

Usage of MKL (II) Compilation and linking of program name.f calling sequential MKL routines starting with intel/12.0.3 module unload mkl module switch intel intel/12.0.3 ifort name.f -o name -lmkl intel lp64 -lmkl sequential -lmkl core -liomp5 -lpthread Linking of MKL always dynamic, so modules must be switched before execution, too May 2014 I.Gutheil Folie 15

Usage of MKL (III) To use CBLAS include mkl.h into source code Compilation and linking of program name.c calling sequential MKL routines [module unload mkl module switch intel intel/12.0.3] icc name.c -o name -lmkl intel lp64 -lmkl sequential -lmkl core -liomp5 -lpthread [-lifcore -lifport] May 2014 I.Gutheil Folie 16

Usage of ESSL FORTRAN, C, and C++ callable, Arrays FORTRAN like, i.e. column-first Header file essl.h for C and C++ Installed in /bgsys/local/lib (not as module) May 2014 I.Gutheil Folie 17

Usage of ESSL (II) Compilation and linking of program name.f calling ESSL routines mpixlf90 r name.f -L/bgsys/local/lib -lesslbg Compilation and linking of program name.c calling ESSL routines mpixlc r name.c -I/opt/ibmmath/essl/5.1/include -L/bgsys/local/lib -lesslbg -L/opt/ibmcmp/xlf/bg/14.1/lib64 -lxl -lxlopt -lxlf90 r -lxlfmath -lm -lrt May 2014 I.Gutheil Folie 18

LAPACK (I) Part of MKL on Juropa, until intel/11.1.* seperate file, starting with intel/12.0.3 in libmkl core.a Public domain version 3.3 and 3.4.2 on JUQUEEN Experimental C-LAPACK, liblapacke.a in version 3.4.2 on JUQUEEN Must be used together with ESSL (or ESSLsmp) Some routines already in ESSL Attention, some calling sequences are different! Experimental LAPACK header file available for C-usage of lapack 3.3 on JUQUEEN (may also be tried with 3.4.2) May 2014 I.Gutheil Folie 19

LAPACK (II) Compilation and linking of FORTRAN program name.f calling LAPACK routines JUROPA: (see usage of MKL), -lmkl lapack only up to intel/11.1.072 JUQUEEN: module load lapack/3.4.2[ g][ simd] mpixlf90 r name.f -Wl,-allow-multiple-definition -L/bgsys/local/lib [-lessl[smp]bg] -L$(LAPACK LIB) -llapack -lessl[smp]bg ESSL must be linked after LAPACK to resolve references May 2014 I.Gutheil Folie 20

Arpack ARPACK, ARnoldi PACKage, Version 2.1 Iterative solver for sparse eigenvalue problems Reverse communication interface FORTRAN 77 Calls LAPACK and BLAS routines May 2014 I.Gutheil Folie 21

GSL GNU Scientific Library Version 1.13, 1.14(default), and 1.15 with intel/12.0.4 on JUROPA, 1.15 on JUQUEEN Provides a wide range of mathematical routines Not recommended for performance reasons Often used by configure scripts module load gsl[/1.13][/1.15] JUROPA module load gsl/1.15 O3 JUQUEEN May 2014 I.Gutheil Folie 22

NAG Libraries NAG Fortran 77 Mark 23 and 24 on JUROPA in /usr/local/lib, Mark 22 on JUQUEEN: as module more than 1600 user-callable routines NAG C Mark 23: (JUROPA only) as module more than 1000 user-callable routines fl90 Release 4, (JUROPA only) in /usr/local/lib 43 new generic user-callable routines May 2014 I.Gutheil Folie 23

Parallell Libraries Threaded Parallelism MKL (JUROPA) is multi-threaded or at least thread-save usage as with sequential routines if OMP NUM THREADS not set, 8 threads used always use ifort name.f -o name -lmkl intel lp64 -lmkl intel thread -lmkl core -liomp5 -lpthread ESSLsmp 5.1 (JUQUEEN) Usage: mpixlf90 r name.f -L/bgsys/local/lib -lesslsmpbg FFTW 3.3 (Fastest Fourier Transform of the West) On JUQUEEN threaded and OpenMP version http://www.fftw.org May 2014 I.Gutheil Folie 24

Parallell Libraries MPI Parallelism ScaLAPACK (Scalable Linear Algebra PACKage) ELPA (Eigenvalue SoLvers for Petaflop-Applications) Elemental, C++ framework for parallel dense linear algebra FFTW (Fastest Fourier Transform of the West) MUMPS (MUltifrontal Massively Parallel sparse direct Solver) ParMETIS (Parallel Graph Partitioning) hypre (high performance preconditioners) May 2014 I.Gutheil Folie 25

MPI Parallelism (II) PARPACK (Parallel ARPACK), Eigensolver planned SPRNG (Scalable Parallel Random Number Generator) SUNDIALS (SUite of Nonlinear and DIfferential/ALgebraic equation Solvers) Parallel Systems, MPI Parallelism PETSc, toolkit for partial differential equations May 2014 I.Gutheil Folie 26

ScaLAPACK JUROPA: part of MKL JUQUEEN: ScaLAPACK Release 2.0.2, contains already BLACS FORTRAN, also C-Interface, scalapack.h incomplete LAPACK has to be linked, too, $LAPACK DIR set together with scalapack http://www.netlib.org/scalapack/index.html May 2014 I.Gutheil Folie 27

Contents of ScaLAPACK Parallel BLAS 1-3, PBLAS Version 2 Dense linear system solvers Banded linear system solvers Solvers for Linear Least Squares Problem Singular value decomposition Eigenvalues and eigenvectors of dense symmetric/hermitian matrices May 2014 I.Gutheil Folie 28

Usage on JUROPA Linking a program name.f calling routines from ScaLAPACK, default version: mpif77 name.f -lmkl scalapack lp64 -lmkl blacs intelmpi lp64 -lmkl lapack -lmkl intel lp64 -lmkl intel thread -lmkl core -liomp5 -lpthread Intel compiler version 12.0.1 and later: module unload mkl module switch intel intel/12.0.1 (for example) mpif77 name.f -lmkl scalapack lp64 -lmkl blacs intelmpi lp64 -lmkl intel lp64 -lmkl intel thread -lmkl core -liomp5 -lpthread May 2014 I.Gutheil Folie 29

Usage on JUQUEEN Compilation and linking of a program name.f calling ScaLAPACK routines: module load scalapack/2.0.2[ g][ simd] mpixlf90 r name.f -L$SCALAPACK LIB -lscalapack -L$LAPACK LIB -llapack -L/bgsys/local/lib -lessl[smp]bg May 2014 I.Gutheil Folie 30

ELPA Eigenvalue SoLvers for Petaflop-Applications ELPA uses ScaLAPACK, must be linked together with scalapack FORTRAN 95, same data-distribution as ScaLAPACK http: //elpa.rzg.mpg.de/elpa-english?set_language=en JUROPA development version from November 2012, compiled with intel 12.1.4 and version from November 2013 compiled with intel 13.1.3, allows hybrid parallelization JUQUEEN MPI and hybrid version from November 2013 May 2014 I.Gutheil Folie 31

Elemental C++ framework, two-dimensional data distribution element by element http://libelemental.org/about/ JUROPA version 0.81 compiled with intel/13.1.3, pure MPI version and hybrid version JUQUEEN 0.78-p1, MPI-only May 2014 I.Gutheil Folie 32

MUMPS: Multifrontal Massively Parallel sparse direct Solver Solution of linear systems with symmetric positive definite matrices, general symmetric matrices, general unsymmetric matrices Real or Complex Parallel factorization and solve phase, iterative refinement and backward error analysis F90 and MPI Version 4.8.4, 4.9.2, and 4.10.0 on JUROPA, version 4.10.0 on JUQUEEN http://graal.ens-lyon.fr/mumps/ May 2014 I.Gutheil Folie 33

ParMETIS Parallel Graph Partinioning and Fill-reducing Matrix Ordering developed in Karypis Lab at the University of Minnesota Version 3.1.1, 3.2.0, and 4.0.2 on JUROPA, 3.2.0 and 4.0.2 on JUQUEEN http: //glaros.dtc.umn.edu/gkhome/metis/parmetis/overview Hypre High performance preconditioners Version 2.0.0, 2.6.0b,2.7.0b, and 2.8.0b on JUROPA, 2.8.0b and 2.9.0b, also version with bigint, on JUQUEEN, bigint cannot be used together with essl http://www.llnl.gov/casc/hypre/software.html May 2014 I.Gutheil Folie 34

FFTW Version 2.1.5, this old version contains an MPI-parallel version of FFTW on JUROPA and JUQUEEN Version 3.2 and 3.3 on JUROPA, compiled with intel/12.1.1 Version 3.3 on JUQUEEN http://www.fftw.org May 2014 I.Gutheil Folie 35

PARPACK ARPACK Version 2.1 and PARPACK MPI-Version Must be linked with LAPACK and BLAS Reverse communication interface, user has to supply parallel matrix-vector multiplication http://www.caam.rice.edu/~kristyn/parpack_home.html May 2014 I.Gutheil Folie 36

SPRNG The Scalable Parallel Random Number Generators Library for ASCI Monte Carlo Computations Version 2.0: various random number generators in one Library Version 1.0 seperate library for each random number generator http://sprng.cs.fsu.edu/ Sundials (CVODE) Package for the solution of ordinary differential equations, Version 2.3.0, 2.4.0, and 2.5.0 on JUROPA, 2.5.0 on JUQUEEN https: //computation.llnl.gov/casc/sundials/main.html May 2014 I.Gutheil Folie 37

PETSc Portable, Extensible Toolkit for Scientific Computation Numerical solution of partial differential equations version 3.4.2 on JUROPA and on JUQUEEN basic real and complex versions advanced versions with download other packages on JUQUEEN also version with 8-Byte integer http://www.mcs.anl.gov/petsc/ Usage at Research Centre Juelich: module avail petsc module help petsc/[whatever version you want] May 2014 I.Gutheil Folie 38

Software for Materials Science Simulation Available on Package JUQUEEN JUROPA Workstations ADF yes Amber yes CP2K yes yes CPMD yes yes Dalton yes Gromacs yes yes GPAW planned yes LAMMPS yes yes Molpro yes yes NAMD yes yes NWChem yes Tremolo yes TURBOMOLE yes May 2014 I.Gutheil Folie 39

Further informations and JSC-people http://www.fz-juelich.de/ias/jsc/juropa http://www.fz-juelich.de/ias/jsc/juqueen http://www.fzjuelich.de/ias/jsc/en/expertise/ Support/Software/Software_node.html Mailto I.Gutheil: Parallel mathematical Libraries i.gutheil@fz-juelich.de J.Mextorf: NAG libraries j.mextorf@fz-juelich.de F.Janetzko: Software for materials science f.janetzko@fz-juelich.de B.Körfgen: Physics and Engineering software b.koerfgen@fz-juelich.de Software: sc@fz-juelich.de May 2014 I.Gutheil Folie 40