1
Streamline Computing Linux Cluster User Training ( Nottingham University)
3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running Codes
High-Level View GB Ethernet networking 128.243.253.XX Gigabit Ethernet Network
Clone Clusters 1xV20 master node 4xV20 compute nodes 1xStoredge 3310 disk array 1x3com switch Nereid, phobos, callisto, europa, deimos, titan, triton and ganymede
Main Cluster - Jupiter
System Organization 7
Compute/Login Nodes 8 Compute Node 2x 2.2GHz Opteron High Memory Bandwidth Gigabit Ethernet 2GB RAM
Software Layers 9
Software Stack - Example 10
11 System Access Security Logging In Shell Environment Transferring files
12 System Access - Security Use SSH (secure shell) only where possible, it provides Secure Access Data Compression X Tunnelling (for remote graphics)
13 Login Environment I Paths and environment variables have been setup. (change things with care) BASH, CSH and TCSH setup by default more exotic shells may need additional variables for things to work correctly
14 Login Environment II Default shell is bash User modifiable environment variables set in.bashrc in home directory System wide variables from /etc/profile and each of /etc/profile.d/*.sh (job scheduler, score etc) Default Home directory is usually /home/<username>
Compilers 15 Options are GNU, PGI or Pathscale GNU: GNU: g77, gcc PGI: pgf77, pgcc, pgcc Pathscale: pathf90, pathcc All are in your path upon login All available from clones + Jupiter
16 Compilers - PGI Support AMD32/64 architecture Support f77/f90/c/c++ languages Examples of production / debug flags pgf77 fast Mvect=sse (opt for host) pgf77 O0 g (compile to debug)
Job Scheduling 17 Job schedulers task is to improve throughput and system utilisation for a wide range of users jobs across multiple systems To do this requires following info System loads Available resources Resources specification for users jobs Scheduling policy (site based)
Job scheduling 18 Job schedulers work predominantly with batch jobs batch jobs require no user input or intervention once started Most jobs schedulers now support load management and scheduling of interactive applications
19 Sun Grid Engine - Overview Sun Grid Engine - a resource management system similar to PBS and LSF Can schedule Serial and MPI jobs Serial jobs run in individual host queues Parallel jobs must include a parallel environment request Freely available, but good GUI and documentation
Working with SGE jobs There are a number of commands for querying and modifying the status of a job running or queued by SGE qsub (submit a job to SGE) qstat (query job status) qdel (delete a job)
21 Submitting a serial job Create a submit script (example.sh): #!/bin/sh # Scalar benchmark echo ``This code is running on`` /bin/hostname /bin/date The job is submitted to SGE using the qsub command: $ qsub example.sh
Submitting a Job - QSUB 22 qsub arguments: qsub o outputfile j y cwd./submit.sh OR in submit script: #!/bin/bash #$ -o outputfile #$ -j y #$ -cwd /home/horace/my_app
23 Monitoring a job - QSTAT To list the status and node properties of all nodes: qstat (add f to get a full listing) Information about users' own jobs and queues is provided by the qstat -u usersname command. e.g qstat -u fred
24 Monitoring a job - QSTAT Using the qstat command any jobs running or pending in the queue will a number (job identifier) and a job status, one of : qw (queued and waiting) t (job transferring and about to start) r (job is running on listed hosts) d (job has been marked for deletion)
Monitoring a job - QSTAT qstat example 25 job-id prior name user state submit/start at queue master ja-task-id --------------------------------------------------------------------------------------------- 1791 0 myjob0.sh grahame dr 03/30/2004 12:49:17 comp05.q MASTER 1791 0 myjob0.sh grahame dr 03/30/2004 12:49:17 comp05.q SLAVE 1791 0 myjob0.sh grahame dr 03/30/2004 12:49:17 comp05.q SLAVE 1791 0 myjob0.sh grahame dr 03/30/2004 12:49:17 comp05.q SLAVE 1792 0 myjob1.sh grahame r 03/30/2004 12:49:17 comp00.q MASTER 1792 0 myjob1.sh grahame r 03/30/2004 12:49:17 comp00.q SLAVE 1792 0 myjob1.sh grahame r 03/30/2004 12:49:17 comp01.q SLAVE 1792 0 myjob1.sh grahame r 03/30/2004 12:49:17 comp01.q SLAVE 1794 0 myjob3.sh grahame qw 03/30/2004 17:10:42 1795 0 myjob4.sh grahame qw 03/30/2004 17:10:42
26 Deleting a job - QDEL Individual Job $ qdel 151 gertrude has registered the job 151 for deletion List of Jobs $ qdel 151 152 153 All Jobs running under a given username $qdel u <username>
Output produced by jobs running under SGE 27 When a job is queued it is allocated a job number. Once it starts to run output usually sent to standard error and output are spooled to files called <script>.o<jobid> <script>.e.<jobid>
Output produced by jobs running under SGE 28 In addition to the <>.o and <>.e files you will also get a <>.po and <>.pe file with parallel jobs which contains output produced by the start and stop scripts If the job fails for any reason it is the <>.o and <>.e file you should examine to determine why. The <>.o file can often be used to check on the progress of the job
29 Debugging job failures in SGE Common reasons for a job to fail are: SGE cannot find the binary file specified in the job script Required input files are missing from the startup directory Environment variable is not set (LM_LICENSE_FILE etc) Hardware failure (eg. mpi ch_p4 or ch_gm errors)
MPI Codes All MPI implementations support F77 and C bindings (some F90/C++ also) 30 Bindings act as wrappers usually mpif77, mpif90, mpicc saves linking in extra libraries manually and specifying MPI header files Compiled to support underlying compiler options of either GNU/ PGI etc.
MPI Codes - Examples 31 On the command line (Intel) mpif77 O3 o mympi mycode.f mpicc O3 o mympi mycode.f Within a (typical) Makefile set the (F77, F90, FC or CC and the linker command) F77= mpif77 LD= mpif77 Specify generic compiler options using the FFLAGS (fortran) or CFLAGS (C) variables
Using MPISUB to submit jobs to SGE 32 mpisub is a wrapper script developed by Streamline to automatically generate and submit SGE job scripts by specifying an MPI binary and number of processors eg. mpisub nodes=8 <myapp> mpisub nodes=16x1 <myapp>
Parallel MPI jobs and SGE 33 SGE uses the concept of a parallel environment PE to execute MPI jobs. Each host has an associated queue and resource (CPU, memory) A PE is a list of hosts along with a set number of job slots and PRE/POST execution script.
MPI SGE job scripts 34 Job script synchronizes nodes allocated by SGE with the no. of procs and list of machines usually specified to mpirun command Mpisub creates your jobscript Within the job script the final line will be of the format (mpich) Scout wait F <host.list> -e scrun nodes=<nnodes x Nprocs> <application>
Any Questions? 35