Until now: tl;dr: - submit a job to the scheduler

Transcription

1

2 Until now: - access the cluster copy data to/from the cluster create parallel software compile code and use optimized libraries how to run the software on the full cluster tl;dr: - submit a job to the scheduler

3 What is a job?

4 What is a job scheduler?

5 Job scheduler/resource manager : Piece of software which: manages and allocates resources; manages and schedules jobs; Two computers are available for 10h You go, then you go. You wait. and sets up the environment for parallel and distributed computing

6 Resources: CPU cores Memory Disk space Network Accelerators Software Licenses

7

8

9

10 Slurm Free and open-source Mature Very active community Many success stories Runs 50% of TOP10 systems, including 1st Also an intergalactic soft drink

11 Other job schedulers PBSpro Torque/Maui Oracle (ex Sun) Grid Engine Condor...

12 You will learn how to: Create a job Monitor the jobs Control your own job Get job accounting info with

13 1. Make up your mind e.g. 1 core, 2GB RAM for 1 hour Job parameters resources you need; operations you need to perform. e.g. launch 'myprog' Job steps

14 2. Write a submission script It is a shell script (Bash) Bash sees these as comments Regular Bash comment Slurm takes them as commands Job step creation Regular Bash commands

15 Other useful parameters You want You ask To set a job name --job-name=myjobname To attach a comment to the job --comment= Some comment To get s -- -type= BEGIN END FAILED -- [email protected] To set the name of the ouptut file --output=result-%j.txt --error=error-%j.txt To delay the start of your job --begin=16:00 --begin=now+1hour --begin= t12:34:00 To specify an ordering of your jobs --dependency=after(ok notok any):jobids --dependency=singleton To control failure options --nokill --norequeue --requeue

16 Constraints and resources You want You ask To choose a specific feature (e.g. a processor --constraint type or a NIC type) To use a specific resources (e.g. a gpu) --gres To reserve a whole node for yourself --exclusive To chose a partition --partition

17 3. Submit the script I submit with 'sbatch' Slurm gives me the JobID One more job parameter

18 So you can play Download with wget and untar it on hmem compile the 'stress' program you can use it to burn cputime and memory:./stress --cpu 1 --vm-bytes 128M --timeout 30s Write a job script Submit a job See it running Cancel it Get it killed

19 4. Monitor your job squeue sprio sstat sview

23 A word about backfill The rule: a job with a lower priority can start before a job with a higher priority if it does not delay that job's start time. resources job 100 job's priority time Low priority job has short max run time and less requirements ; it starts before larger priority job

24

28 5. Control your job scancel scontrol sview

33 6. Job accounting sacct sreport sshare

39 The rules of fairshare A share is allocated to you: 1/nbusers If your actual usage is above that share, your fairshare value is decreased towards 0. If your actual usage is below that share, your fairshare value is increased towards 1. The actual usage taken into account decreases over time

40 A word about fairshare

41 A word about fairshare Assume 3 users, 3-cores cluster Red uses 1 core for a certain period of time Blue uses 2 cores for half that period Red uses 2 cores afterwards #nodes time

42 A word about fairshare Assume 3 users, 3-cores cluster Red uses 1 core for a certain period of time Blue uses 2 cores for half that period Red uses 2 cores afterwards

43 A word about fairshare

44

45 Getting cluster info sinfo sjstat

46 Getting cluster info sinfo sjstat

47 Interactive work salloc salloc -ntasks=4 --nodes=2

48 Interactive work salloc salloc -ntasks=4 --nodes=2

49 Summary Explore the enviroment Get node features (sinfo --node --long) Get node usage (sinfo --summarize) Submit a job: Define the resources you need Determine what the job should do Submit the job script (sbatch) View the job status (squeue) Get accounting information (sacct) job script

50

51

52 You will learn how to: Create a parallel job Request distributed resources with

53 Concurrent - Parallel - Distributed Master/slave vs SPMD Synchronous vs asynchronous Message passing vs shared memory

54 Typical resource request You want You ask 16 independent processes (no communication) --ntasks=16 MPI and do not care about where cores are distributed --ntasks=16 cores spread across distinct nodes --ntasks=16 --nodes=16 cores spread across distinct nodes and nobody else around --ntasks=16 --nodes=16 --exclusive 16 processes to spread across 8 nodes --ntasks=16 --ntasks-per-node=2 16 processes on the same node --ntasks=16 --ntasks-per-node=16 one process multithreading that can use 16 cores for --ntasks=1 --cpus-per-task=16 4 processes that can use 4 cores --ntasks=4 --cpus-per-task=4 more constraint requests --distribution=block cyclic arbitrary

55 Use case 1: Random sampling Your program draws random numbers and processes them sequentially Parallelism is obtained by launching the same program multiple times simultaneously Every process does the same thing No inter process communication Results appended to one common file

56 Use case 1: Random sampling You want You ask 16 independent processes (no communication) --ntasks=16 You use srun./myprog

57 Use case 1: Random sampling You want You ask 16 independent processes (no communication) --array= output=res%a You merge with cat res*

58 Use case 2: Multiple datafiles Your program processes data from one datafile Parallelism is obtained by launching the same program multiple times on distinct data files Everybody does the same thing on distinct data stored in different files No inter process communication Results appended to one common file

59 Use case 2: Multiple datafiles You want You ask 16 independent processes (no communication) --ntasks=16 You use srun./myprog $SLURM_PROCID

60 Use case 2: Multiple datafiles Useful commands: xargs and find/ls: Single node: ls data* xargs -n1 -P $SLURM_NPROCS myprog Multiple nodes: ls data* xargs -n1 -P $SLURM_NTASKS srun -c1 myprog Safer: find. -maxdepth1 -name data* -print0 xargs -0 -n1 -P...

61 Use case 2: Multiple datafiles You want You ask 16 independent processes (no communication) --array=1-16 You use $=SLURM_TASK_ARRAY_ID

62 Use case 3: Parameter sweep Your program tests something for one particular value of a parameter Parallelism is obtained by launching the same program multiple times with an distinct identifier Everybody does the same thing except for a given parameter value based on the identifier No inter process communication Results appended to one common file

63 Use case 3: Parameter sweep You want You ask 16 independent processes (no communication) --ntasks=16 You use srun./myprog $SLURM_PROCID

64 Use case 3: Parameter sweep You want You ask 16 independent processes (no communication) --array= output=res%a You use $SLURM_ARRAY_TASK_ID cat res* to merge

65 Use case 3: Parameter sweep Useful command: GNU Parallel Single node: parallel -j $SLURM_NPROCS myprog ::: {1..5} ::: {A..D} Multiple nodes: parallel -j $SLURM_NTASKS srun -c1 myprog ::: {1..5} ::: {A..D} Useful: parallel --joblog runtask.log resume for checkpointing parallel echo data_{1}_{2}.dat ::: ::: 1 2 3

66 Use case 4: Multithread Your program uses OpenMP or TBB Parallelism is obtained by launching a multithreaded program One program spawns itself on the node Inter process communication by shared memory Results managed in the program which outputs a summary

67 Use case 4: Multithread You want one process multithreading You use that can use You ask 16 cores for --ntasks=1 --cpus-per-task=16 OMP_NUMTHREADS=16 srun myprog

68 Use case 5: Message passing Your program uses MPI Parallelism is obtained by launching a multi-process program One program spawns itself on several nodes Inter process communication by the network Results managed in the program which outputs a summary

69 Use case 5: Message passing You want You ask 16 processes for use with MPI --ntasks=16 You use module load openmpi mpirun myprog

70 Use case 6: Master/slave You have two types of programs: master and slave Parallelism is obtained by launching a several slaves, managed by the master The master launches several slaves on distinct nodes Inter process communication by the network or the disk Results managed in the master program which outputs a summary

71 Use case 6: Master slave You want You ask 16 processes 16 threads --ntasks=16 --cpus-per-task=16 You use --multi-prog + conf file

72 Use case 6: Master slave You want You ask 16 processes 16 threads --ntasks=16 --cpus-per-task=16 You use --multi-prog + conf file

73 Summary Choose number of processes: --ntasks Choose number of threads: --cpu-per-task Launch processes with srun or mpirun Set multithreading with OMP_NUM_THREADS You can use $SLURM_PROC_ID $SLURM_TASK_ARRAY_ID

74

75

76 Try Download MPI hello world on Wikipedia, compile it, write job script and submit it Rewrite 'Multiple files' examples using xargs Rewrite 'Parameter sweep' example using GNU parallel