Streamline Computing Linux Cluster User Training. ( Nottingham University)



Similar documents
Grid 101. Grid 101. Josh Hegie.

SGE Roll: Users Guide. Version Edition

Grid Engine Users Guide p1 Edition

Introduction to Sun Grid Engine (SGE)

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

GRID Computing: CAS Style

Parallel Debugging with DDT

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

NEC HPC-Linux-Cluster

An Introduction to High Performance Computing in the Department

User s Manual

Manual for using Super Computing Resources

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Miami University RedHawk Cluster Working with batch jobs on the Cluster

The CNMS Computer Cluster

HPCC USER S GUIDE. Version 1.2 July IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

How to Run Parallel Jobs Efficiently

MPI / ClusterTools Update and Plans

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

The Asterope compute cluster

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

Installing and running COMSOL on a Linux cluster

High Performance Computing

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Hodor and Bran - Job Scheduling and PBS Scripts

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

High Performance Computing Cluster Quick Reference User Guide

Running applications on the Cray XC30 4/12/2015

Parallel Processing using the LOTUS cluster

SLURM Workload Manager

Submitting Jobs to the Sun Grid Engine. CiCS Dept The University of Sheffield.

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Efficient cluster computing

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Beyond Windows: Using the Linux Servers and the Grid

Introduction to the SGE/OGS batch-queuing system

Getting Started with HPC

The RWTH Compute Cluster Environment

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

The SUN ONE Grid Engine BATCH SYSTEM

Grid Engine Training Introduction

Cluster Computing With R

Martinos Center Compute Clusters

Using the Yale HPC Clusters

Quick Tutorial for Portable Batch System (PBS)

Grid Engine 6. Troubleshooting. BioTeam Inc.

Grid Engine. Application Integration

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

ABAQUS High Performance Computing Environment at Nokia

- An Essential Building Block for Stable and Reliable Compute Clusters

Introduction to Grid Engine

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, Introduction 2

Improved LS-DYNA Performance on Sun Servers

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Sun Grid Engine, a new scheduler for EGEE

Bright Cluster Manager 5.2. User Manual. Revision: Date: Fri, 30 Nov 2012

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Using Parallel Computing to Run Multiple Jobs

NorduGrid ARC Tutorial

Windows HPC 2008 Cluster Launch

HPCC - Hrothgar Getting Started User Guide MPI Programming

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Agenda. Using HPC Wales 2

24/08/2004. Introductory User Guide

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

Site Configuration SETUP GUIDE. Windows Hosts Single Workstation Installation. May08. May 08

Debugging with TotalView

Performance Testing of a Cloud Service

Introduction to HPC Workshop. Center for e-research

Matlab on a Supercomputer

INF-110. GPFS Installation

Oracle Grid Engine. User Guide Release 6.2 Update 7 E

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

High Performance Computing at the Oak Ridge Leadership Computing Facility

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

Transcription:

1

Streamline Computing Linux Cluster User Training ( Nottingham University)

3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running Codes

High-Level View GB Ethernet networking 128.243.253.XX Gigabit Ethernet Network

Clone Clusters 1xV20 master node 4xV20 compute nodes 1xStoredge 3310 disk array 1x3com switch Nereid, phobos, callisto, europa, deimos, titan, triton and ganymede

Main Cluster - Jupiter

System Organization 7

Compute/Login Nodes 8 Compute Node 2x 2.2GHz Opteron High Memory Bandwidth Gigabit Ethernet 2GB RAM

Software Layers 9

Software Stack - Example 10

11 System Access Security Logging In Shell Environment Transferring files

12 System Access - Security Use SSH (secure shell) only where possible, it provides Secure Access Data Compression X Tunnelling (for remote graphics)

13 Login Environment I Paths and environment variables have been setup. (change things with care) BASH, CSH and TCSH setup by default more exotic shells may need additional variables for things to work correctly

14 Login Environment II Default shell is bash User modifiable environment variables set in.bashrc in home directory System wide variables from /etc/profile and each of /etc/profile.d/*.sh (job scheduler, score etc) Default Home directory is usually /home/<username>

Compilers 15 Options are GNU, PGI or Pathscale GNU: GNU: g77, gcc PGI: pgf77, pgcc, pgcc Pathscale: pathf90, pathcc All are in your path upon login All available from clones + Jupiter

16 Compilers - PGI Support AMD32/64 architecture Support f77/f90/c/c++ languages Examples of production / debug flags pgf77 fast Mvect=sse (opt for host) pgf77 O0 g (compile to debug)

Job Scheduling 17 Job schedulers task is to improve throughput and system utilisation for a wide range of users jobs across multiple systems To do this requires following info System loads Available resources Resources specification for users jobs Scheduling policy (site based)

Job scheduling 18 Job schedulers work predominantly with batch jobs batch jobs require no user input or intervention once started Most jobs schedulers now support load management and scheduling of interactive applications

19 Sun Grid Engine - Overview Sun Grid Engine - a resource management system similar to PBS and LSF Can schedule Serial and MPI jobs Serial jobs run in individual host queues Parallel jobs must include a parallel environment request Freely available, but good GUI and documentation

Working with SGE jobs There are a number of commands for querying and modifying the status of a job running or queued by SGE qsub (submit a job to SGE) qstat (query job status) qdel (delete a job)

21 Submitting a serial job Create a submit script (example.sh): #!/bin/sh # Scalar benchmark echo ``This code is running on`` /bin/hostname /bin/date The job is submitted to SGE using the qsub command: $ qsub example.sh

Submitting a Job - QSUB 22 qsub arguments: qsub o outputfile j y cwd./submit.sh OR in submit script: #!/bin/bash #$ -o outputfile #$ -j y #$ -cwd /home/horace/my_app

23 Monitoring a job - QSTAT To list the status and node properties of all nodes: qstat (add f to get a full listing) Information about users' own jobs and queues is provided by the qstat -u usersname command. e.g qstat -u fred

24 Monitoring a job - QSTAT Using the qstat command any jobs running or pending in the queue will a number (job identifier) and a job status, one of : qw (queued and waiting) t (job transferring and about to start) r (job is running on listed hosts) d (job has been marked for deletion)

Monitoring a job - QSTAT qstat example 25 job-id prior name user state submit/start at queue master ja-task-id --------------------------------------------------------------------------------------------- 1791 0 myjob0.sh grahame dr 03/30/2004 12:49:17 comp05.q MASTER 1791 0 myjob0.sh grahame dr 03/30/2004 12:49:17 comp05.q SLAVE 1791 0 myjob0.sh grahame dr 03/30/2004 12:49:17 comp05.q SLAVE 1791 0 myjob0.sh grahame dr 03/30/2004 12:49:17 comp05.q SLAVE 1792 0 myjob1.sh grahame r 03/30/2004 12:49:17 comp00.q MASTER 1792 0 myjob1.sh grahame r 03/30/2004 12:49:17 comp00.q SLAVE 1792 0 myjob1.sh grahame r 03/30/2004 12:49:17 comp01.q SLAVE 1792 0 myjob1.sh grahame r 03/30/2004 12:49:17 comp01.q SLAVE 1794 0 myjob3.sh grahame qw 03/30/2004 17:10:42 1795 0 myjob4.sh grahame qw 03/30/2004 17:10:42

26 Deleting a job - QDEL Individual Job $ qdel 151 gertrude has registered the job 151 for deletion List of Jobs $ qdel 151 152 153 All Jobs running under a given username $qdel u <username>

Output produced by jobs running under SGE 27 When a job is queued it is allocated a job number. Once it starts to run output usually sent to standard error and output are spooled to files called <script>.o<jobid> <script>.e.<jobid>

Output produced by jobs running under SGE 28 In addition to the <>.o and <>.e files you will also get a <>.po and <>.pe file with parallel jobs which contains output produced by the start and stop scripts If the job fails for any reason it is the <>.o and <>.e file you should examine to determine why. The <>.o file can often be used to check on the progress of the job

29 Debugging job failures in SGE Common reasons for a job to fail are: SGE cannot find the binary file specified in the job script Required input files are missing from the startup directory Environment variable is not set (LM_LICENSE_FILE etc) Hardware failure (eg. mpi ch_p4 or ch_gm errors)

MPI Codes All MPI implementations support F77 and C bindings (some F90/C++ also) 30 Bindings act as wrappers usually mpif77, mpif90, mpicc saves linking in extra libraries manually and specifying MPI header files Compiled to support underlying compiler options of either GNU/ PGI etc.

MPI Codes - Examples 31 On the command line (Intel) mpif77 O3 o mympi mycode.f mpicc O3 o mympi mycode.f Within a (typical) Makefile set the (F77, F90, FC or CC and the linker command) F77= mpif77 LD= mpif77 Specify generic compiler options using the FFLAGS (fortran) or CFLAGS (C) variables

Using MPISUB to submit jobs to SGE 32 mpisub is a wrapper script developed by Streamline to automatically generate and submit SGE job scripts by specifying an MPI binary and number of processors eg. mpisub nodes=8 <myapp> mpisub nodes=16x1 <myapp>

Parallel MPI jobs and SGE 33 SGE uses the concept of a parallel environment PE to execute MPI jobs. Each host has an associated queue and resource (CPU, memory) A PE is a list of hosts along with a set number of job slots and PRE/POST execution script.

MPI SGE job scripts 34 Job script synchronizes nodes allocated by SGE with the no. of procs and list of machines usually specified to mpirun command Mpisub creates your jobscript Within the job script the final line will be of the format (mpich) Scout wait F <host.list> -e scrun nodes=<nnodes x Nprocs> <application>

Any Questions? 35