Automated Testing of Installed Software



Similar documents
RA MPI Compilers Debuggers Profiling. March 25, 2009

SR-IOV: Performance Benefits for Virtualized Interconnects!

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Grid 101. Grid 101. Josh Hegie.

The Asterope compute cluster

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

The RWTH Compute Cluster Environment

The CNMS Computer Cluster

NEC HPC-Linux-Cluster

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

How to Run Parallel Jobs Efficiently

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

INF-110. GPFS Installation

Manual for using Super Computing Resources

Linux Cluster Computing An Administrator s Perspective

MPI / ClusterTools Update and Plans

Overview of HPC systems and software available within

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Hodor and Bran - Job Scheduling and PBS Scripts

icer Bioinformatics Support Fall 2011

Parallelization: Binary Tree Traversal

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

High Performance Computing Cluster Quick Reference User Guide

Mastering CMake. Sixth Edition. Bill Martin & Hoffman. Ken. Andy Cedilnik, David Cole, Marcus Hanwell, Julien Jomier, Brad King, Robert Maynard,

Running COMSOL in parallel

SGE Roll: Users Guide. Version Edition

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Bright Cluster Manager 5.2. User Manual. Revision: Date: Fri, 30 Nov 2012

MapReduce Evaluator: User Guide

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

Project Discussion Multi-Core Architectures and Programming

Introduction to Supercomputing with Janus

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Parallel Debugging with DDT

High performance computing systems. Lab 1

Automating Big Data Benchmarking for Different Architectures with ALOJA

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

Introduction to SDSC systems and data analytics software packages "

Cluster Implementation and Management; Scheduling

Grid Engine. Application Integration

1 Bull, 2011 Bull Extreme Computing

Hack the Gibson. John Fitzpatrick Luke Jennings. Exploiting Supercomputers. 44Con Edition September Public EXTERNAL

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Extending Remote Desktop for Large Installations. Distributed Package Installs

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

Continuous Integration/Testing and why you should assume every change breaks your code

Installation of OpenMPI

Getting Started with HPC

Introduction to Hybrid Programming

Allinea Performance Reports User Guide. Version 6.0.6

How To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)

Building CHAOS: an Operating System for Livermore Linux Clusters

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

Sun Constellation System: The Open Petascale Computing Architecture

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Administration of Clearwell ediscovery Platform 7.x Exam.

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

HPC Wales Skills Academy Course Catalogue 2015

Introduction to the CUDA Toolkit for Building Applications. Adam DeConinck HPC Systems Engineer, NVIDIA

Technical Computing Suite Job Management Software

HPCC - Hrothgar Getting Started User Guide MPI Programming

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

User s Manual

Rudder. Sharing IT automation benefits in a team with Rudder. Benoît Peccatte bpe@normation.com. Normation Tous droits réservés normation.

openmosix Live free() or die() A short intro to HPC Kris Buytaert buytaert@stone-it.be First Prev Next Last Go Back Full Screen Close Quit

LANL Computing Environment for PSAAP Partners

ABAQUS High Performance Computing Environment at Nokia

TOOLS AND TIPS FOR MANAGING A GPU CLUSTER. Adam DeConinck HPC Systems Engineer, NVIDIA

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Use of Continuous Integration Tools for Application Performance Monitoring

Sun Grid Engine Package for OSCAR A Google SoC 2005 Project

Fujitsu HPC Cluster Suite

GRID Computing: CAS Style

When flexibility met simplicity: The friendship of OpenStack and Ansible

Software Engineering Principles The TriBITS Lifecycle Model. Mike Heroux Ross Bartlett (ORNL) Jim Willenbring (SNL)

How To Write A Checkpoint/Restart On Linux On A Microsoft Macbook (For Free) On A Linux Cluster (For Microsoft) On An Ubuntu 2.5 (For Cheap) On Your Ubuntu 3.5.2

Simple Introduction to Clusters

System Management. Leif Nixon. a security perspective 1/37

Virtualization Techniques for Cross Platform Automated Software Builds, Tests and Deployment

Transcription:

Automated Testing of Installed Software or so far, How to validate MPI stacks of an HPC cluster? Xavier Besseron HPC and Computational Science @ FOSDEM 2014 February 1, 2014 Automated Testing of Installed Software 1 / 13

Outline 1 Context & Motivations 2 Basic tests & Automation 3 ATIS 4 Main issues with MPI stacks 5 Quick overview / demo 6 Future work Automated Testing of Installed Software 2 / 13

Context: HPC clusters and Software Large variety of software on HPC clusters Example: HPCBIOS Huge work to install, maintain, update, etc. Tools to manage software EasyBuild: build, (re-)install Module: switch from one flavor to another I counted 2211 EasyConfig files in EasyBuild Automated Testing of Installed Software 3 / 13

Example: HPC platform of University of Luxembourg General statistics 2 clusters: Chaos and Gaia providing 1115 modules 376 different software/libraries 25 different flavors of zlib 15 different flavors of GCC 10 different flavors of GROMACS, OpenBLAS, ScaLAPACK 9 different flavors of WRF... explosion of the number of available software Automated Testing of Installed Software 4 / 13

Let s focus on MPI stacks On Gaia cluster at University of Luxembourg 4 MPI families: OpenMPI, MVAPICH2, MPICH, IntelMPI 5 versions of OpenMPI: 1.4.5 1.6.3 1.6.4 1.6.5 1.7.3 3 versions of MVAPICH2: 1.7 1.8.1 1.9 3 versions of MPICH: 2-1.1 3.0.4 3.0.3 8 versions of IntelMPI: 3.2.2.006 4.0.0.028 4.0.2.003 4.1.0.027 4.1.0.030 4.1.1.036 4.1.2.040 4.1.3.045 over 14 toolchains 31 different modules provide MPI And so what? Why? Some are not working out-of-the-box Let s try to find out What can we do? Spam/complain to the sysadmins Fix it! Automated Testing of Installed Software 5 / 13

Check for binaries How to test an MPI stack? which mpicc mpirun Compile and run a small example mpicc hello.c -o hello mpirun -np 2 -machinefile <hostfile> hello Compile and run micro-benchmarks tar -xzf osu-micro-benchmarks-3.9.tar.gz cd osu-micro-benchmarks-3.9./configure && make cd mpi/pt2pt mpirun -np 2 -machinefile <hostfile> osu_bw mpirun -np 2 -machinefile <hostfile> osu_latency Check the performance is correct Run HPL?... Automated Testing of Installed Software 6 / 13

How to test many MPI stacks? Repeat the previous slides multiple times! Automated Testing of Installed Software 7 / 13

How to test many MPI stacks? Repeat the previous slides multiple times! Make a script that test one MPI stack List the MPI stacks you want to test Run the script for all of them Collect data from all the tests Present the results in a synthetic way Repeat all this periodically ATIS framework (Automated Testing of Installed Software) Automated Testing of Installed Software 7 / 13

Not reinventing the wheel! Based on existing testing framework: CTest Testing tool distributed as a part of CMake Automates updating, configuring, building, testing, performing memory checking, performing coverage Submits results to a CDash or Dart dashboard system CDash Open source, web-based software testing server Aggregates, analyzes and displays the results of software testing Nice feature: can spam the sysadmins when tests fail But also Shell script, R, numdiff, cron,... Automated Testing of Installed Software 8 / 13

ATIS Current status Current focus Only on MPI testing Only on general behavior of MPI Only testing a couple of nodes, i.e. not the whole cluster User-oriented testing Run in the same environment as a user Try to mimic what a normal user would do Source code About 15 files 247 lines of CMake/CTest 212 lines of Bash 98 lines of R https://github.com/besserox/atis Automated Testing of Installed Software 9 / 13

Main issues with MPI stacks Configuration issues specific connector (i.e. oarsh instead of ssh) InfiniBand interface... Dynamic libraries issues, i.e. LD_LIBRARY_PATH not set properly for MPI libraries itself for other dependencies (hwloc, cuda,...) Bug in the MPI stacks bashism in IntelMPI 3.X... Performance issues need better tuning? Automated Testing of Installed Software 10 / 13

Quick Demonstration / Overview HPC @ Uni.lu CDashboard Automated Testing of Installed Software 11 / 13

Future directions Test other software/features Checkpoint/Restart of a process using BLCR... Test features specific to a given MPI stack alternative launcher (e.g. mpirun_rsh for MVAPICH2) disable InfiniBand distributed Checkpoint/Restart of an MPI job More reliable detection of performance issues how to tolerate temporary variation of the performance? Automated Testing of Installed Software 12 / 13

Any feedback? Thank you for your attention! Any feedback, comments, questions? New ideas or features? Automated Testing of Installed Software 13 / 13