Using the general purpose computing clusters at EPFL

Similar documents
SLURM Workload Manager

Submitting batch jobs Slurm on ecgate. Xavi Abellan User Support Section

Until now: tl;dr: - submit a job to the scheduler

General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

An introduction to compute resources in Biostatistics. Chris Scheller

Biowulf2 Training Session

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

The Asterope compute cluster

NEC HPC-Linux-Cluster

Matlab on a Supercomputer

Introduction to Supercomputing with Janus

Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Introduction to parallel computing and UPPMAX

SLURM Resources isolation through cgroups. Yiannis Georgiou Matthieu Hautreux

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

An introduction to Fyrkat

The RWTH Compute Cluster Environment

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Tutorial-4a: Parallel (multi-cpu) Computing

Batch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Using the Yale HPC Clusters

Parallel Processing using the LOTUS cluster

Using NeSI HPC Resources. NeSI Computational Science Team

Getting Started with HPC

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, Introduction 2

UMass High Performance Computing Center

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

Grid Engine Users Guide p1 Edition

Slurm Workload Manager Architecture, Configuration and Use

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Agenda. Using HPC Wales 2

Introduction to the SGE/OGS batch-queuing system

Hodor and Bran - Job Scheduling and PBS Scripts

LSN 10 Linux Overview

CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions

Introduction to HPC Workshop. Center for e-research

8/15/2014. Best (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

How to Run Parallel Jobs Efficiently

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

Manual for using Super Computing Resources

An Introduction to High Performance Computing in the Department

Getting Started with Amazon EC2 Management in Eclipse

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

The CNMS Computer Cluster

Bright Cluster Manager 5.2. User Manual. Revision: Date: Fri, 30 Nov 2012

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group

Streamline Computing Linux Cluster User Training. ( Nottingham University)

User s Manual

Using the Windows Cluster

Compute Cluster Documentation

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Job Scheduling with Moab Cluster Suite

Partek Flow Installation Guide

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

A High Performance Computing Scheduling and Resource Management Primer

Job Scheduling Using SLURM

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Using Parallel Computing to Run Multiple Jobs

GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

HPCC USER S GUIDE. Version 1.2 July IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

Linux command line. An introduction to the Linux command line for genomics. Susan Fairley

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

SGE Roll: Users Guide. Version Edition

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Command Line Crash Course For Unix

Setting up PostgreSQL

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition

Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen

HPC Wales Skills Academy Course Catalogue 2015

Informationsaustausch für Nutzer des Aachener HPC Clusters

Running applications on the Cray XC30 4/12/2015

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Thirty Useful Unix Commands

Introduction to Operating Systems

Martinos Center Compute Clusters

INF-110. GPFS Installation

Part I Courses Syllabus

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

Using Google Compute Engine

Slurm at CEA. Site Report CEA 10 AVRIL 2012 PAGE septembre SLURM User Group - September 2014 F. Belot, F. Diakhaté, A. Roy, M.

HRSK-II Nutzerschulung

Kiko> A personal job scheduler

Simulation of batch scheduling using real production-ready software tools

Using the Yale HPC Clusters

Unix Sampler. PEOPLE whoami id who

Do it Yourself System Administration

Visualization Cluster Getting Started

Installation Guide for FTMS and Node Manager 1.6.0

How to Use NoMachine 4.4

Transcription:

Using the general purpose computing clusters at EPFL scitas.epfl.ch March 5, 2015

1 / 1 Bellatrix Frontend at bellatrix.epfl.ch 16 x 2.2 GHz cores per node 424 nodes with 32GB Infiniband QDR network The batch system is SLURM RedHat 6.5

Castor I Frontend at castor.epfl.ch I 16 x 2.6 GHz cores per node I 50 nodes with 64GB I 2 nodes with 256GB I For sequential jobs (Matlab etc.) I The batch system is SLURM I RedHat 6.5 2/1

3 / 1 Deneb Frontend at deneb.epfl.ch 16 x 2.6 GHz cores per node 376 nodes with 64GB 8 nodes with 256GB 2 nodes with 512GB and 32 cores 16 nodes with 4 Nvidia K40 GPUs Infiniband QDR network The batch system is SLURM RedHat 6.5

4 / 1 Storage /home filesystem has per user quotas will be backed up for important things (source code, results and theses) shared by all clusters /scratch high performance temporary space is not backed up is organised by laboratory local to cluster

5 / 1 Connection Start the X server (automatic on a Mac) Open a terminal ssh -Y username@castor.epfl.ch Try the following commands: id pwd quota ls /scratch/<group>/<username>

6 / 1 The batch system Goal: to take a list of jobs and execute them when appropriate resources become available SCITAS uses SLURM on its clusters: http://slurm.schedmd.com The configuration depends on the purpose of the cluster (serial vs parallel)

7 / 1 sbatch The fundamental command is sbatch sbatch submits jobs to the batch system Suggested workflow: create a short job-script submit it to the batch system

8 / 1 sbatch - exercise Copy the first two examples to your home directory cp /scratch/examples/ex1.run. cp /scratch/examples/ex2.run. Open the file ex1.run with your editor of choice

9 / 1 ex1.run #!/bin/bash #SBATCH --workdir /scratch/<group>/<username> #SBATCH --nodes 1 #SBATCH --ntasks 1 #SBATCH --cpus-per-task 1 #SBATCH --mem 1024 sleep 10 echo "hello from $(hostname)" sleep 10

10 / 1 ex1.run #SBATCH is a directive to the batch system --nodes 1 the number of nodes to use - on Castor this is limited to 1 --ntasks 1 the number of tasks (in an MPI sense) to run per job --cpu-per-task 1 the number of cores per aforementioned task --mem 4096 the memory required per node in MB --time 12:00:00 --time 2-6 the time required # 12 hours # two days and six hours

11 / 1 Running ex1.run The job is assigned a default runtime of 15 minutes $ sbatch ex1.run Submitted batch job 439 $ cat /scratch/<group>/<username>/slurm-439.out hello from c03

12 / 1 What went on? sacct -j <JOB_ID> sacct -l -j <JOB_ID> Or more usefully: Sjob <JOB ID>

13 / 1 Cancelling jobs To cancel a specific job: scancel <JOB_ID> To cancel all your jobs: scancel -u <username>

14 / 1 ex2.run #!/bin/bash #SBATCH --workdir /scratch/<group>/<username> #SBATCH --nodes 1 #SBATCH --ntasks 1 #SBATCH --cpus-per-task 8 #SBATCH --mem 122880 #SBATCH --time 00:30:00 #SBATCH --mail-type=all #SBATCH --mail-user=<email_address> /scratch/examples/linpack/runme_1_45k

15 / 1 What s going on? squeue squeue -u <username> Squeue Sjob <JOB_ID> scontrol -d show job <JOB_ID> sinfo

16 / 1 squeue and Squeue squeue Job states: Pending, Resources, Priority, Running squeue grep <JOB_ID> squeue -j <JOB_ID> Squeue <JOB_ID>

17 / 1 Sjob $ Sjob <JOB_ID> JobID JobName Cluster Account Partition Timelimit User Group ------------ ---------- ---------- ---------- ---------- ---------- --------- --------- 31006 ex1.run castor scitas-ge serial 00:15:00 jmenu scitas-ge 31006.batch batch castor scitas-ge Submit Eligible Start End ------------------- ------------------- ------------------- ------------------- 2014-05-12T15:55:48 2014-05-12T15:55:48 2014-05-12T15:55:48 2014-05-12T15:56:08 2014-05-12T15:55:48 2014-05-12T15:55:48 2014-05-12T15:55:48 2014-05-12T15:56:08 Elapsed ExitCode State ---------- -------- ---------- 00:00:20 0:0 COMPLETED 00:00:20 0:0 COMPLETED NCPUS NTasks NodeList UserCPU SystemCPU AveCPU MaxVMSize ---------- -------- --------------- ---------- ---------- ---------- ---------- 1 c04 00:00:00 00:00.001 1 1 c04 00:00:00 00:00.001 00:00:00 207016K

18 / 1 scontrol $ scontrol -d show job <JOB_ID> $ scontrol -d show job 400 obid=400 Name=s1.job UserId=user(123456) GroupId=group(654321) Priority=111 Account=scitas-ge QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0 DerivedExitCode=0:0 RunTime=00:03:39 TimeLimit=00:15:00 TimeMin=N/A SubmitTime=2014-03-06T09:45:27 EligibleTime=2014-03-06T09:45:27 StartTime=2014-03-06T09:45:27 EndTime=2014-03-06T10:00:27 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=serial AllocNode:Sid=castor:106310 ReqNodeList=(null) ExcNodeList=(null) NodeList=c03 BatchHost=c03 NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:* Nodes=c03 CPU IDs=0 Mem=1024 MinCPUsNode=1 MinMemoryCPU=1024M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/<user>/jobs/s1.job WorkDir=/scratch/<group>/<user>

19 / 1 Modules Modules make your life easier module avail module show <take your pick> module load <take your pick> module list module purge module list

20 / 1 ex3.run - Mathematica Copy the following files to your chosen directory: cp /scratch/examples/ex3.run. cp /scratch/examples/mathematica.in. Submit ex3.run to the batch system and see what happens...

21 / 1 ex3.run #!/bin/bash #SBATCH --ntasks 1 #SBATCH --cpus-per-task 1 #SBATCH --nodes 1 #SBATCH --mem 4096 #SBATCH --time 00:05:00 echo STARTING AT date module purge module load mathematica/9.0.1 math < mathematica.in echo FINISHED at date

22 / 1 Compiling ex4.* source files Copy the following files to your chosen directory: /scratch/examples/ex4_readme.txt /scratch/examples/ex4.c /scratch/examples/ex4.cxx /scratch/examples/ex4.f90 /scratch/examples/ex4.run Then compile them with: module load intelmpi/4.1.3 mpiicc -o ex4 c ex4.c mpiicpc -o ex4 cxx ex4.cxx mpiifort -o ex4 f90 ex4.f90

23 / 1 The 3 methods to get interactive access 1/3 In order to schedule an allocation use salloc with exactly the same options for resources as sbatch You will then arrive in a new prompt which is still on the submission node but by using srun you can get access to the allocated resources eroche@castor:hello > salloc -N 1 -n 2 salloc: Granted job allocation 1234 bash-4.1$ hostname castor bash-4.1$ srun hostname c03 c03

24 / 1 The 3 methods to get interactive access 2/3 To get a prompt on the machine one needs to use the --pty option with srun and then bash -i (or tcsh -i ) to get the shell: eroche@castor > salloc -N 1 -n 1 salloc: Granted job allocation 1235 eroche@castor > srun --pty bash -i bash-4.1$ hostname c03

25 / 1 The 3 methods to get interactive access 3/3 This is the least elegant but it is the method by which one can run X11 applications: eroche@bellatrix > salloc -n 1 -c 16 -N 1 salloc: Granted job allocation 1236 bash-4.1$ srun hostname c04 bash-4.1$ ssh -Y c04 eroche@c04 >

26 / 1 The SLURM jobs environment SLURM defines a number of variables, among them: jmenu@castor: > srun env grep SLURM SLURM SUBMIT DIR=/home/jmenu SLURM SUBMIT HOST=castor SLURM JOB ID=90771 SLURM JOB NUM NODES=1 SLURM JOB NODELIST=c05 SLURM JOB CPUS PER NODE=1 SLURM MEM PER CPU=1024 SLURM JOBID=90771... SLURM NODELIST=c05 SLURM TASKS PER NODE=1 SLURM NTASKS=1 SLURM NPROCS=1... SLURM CPUS ON NODE=1 SLURM TASK PID=72868... SLURMD NODENAME=c05

27 / 1 Dynamic libs used in an application ldd displays the libraries an executable file depends on: jmenu@castor: ~ /COURS > ldd ex4 f90 linux-vdso.so.1 => (0x00007fff4b905000) libmpigf.so.4 => /opt/software/intel/14.0.1/intel64/lib/libmpigf.so.4 (0x00007f556cf88000) libmpi.so.4 => /opt/software/intel/14.0.1/intel64/lib/libmpi.so.4 (0x00007f556c91c000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003807e00000) librt.so.1 => /lib64/librt.so.1 (0x0000003808a00000) libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003808200000) libm.so.6 => /lib64/libm.so.6 (0x0000003807600000) libc.so.6 => /lib64/libc.so.6 (0x0000003807a00000) libgcc s.so.1 => /lib64/libgcc s.so.1 (0x000000380c600000) /lib64/ld-linux-x86-64.so.2 (0x0000003807200000)

28 / 1 ex4.run #!/bin/bash... module purge module load intelmpi/4.1.3 module list echo LAUNCH DIR=/scratch/scitas-ge/jmenu EXECUTABLE="./ex4 f90" echo "--> LAUNCH DIR = ${LAUNCH DIR}" echo "--> EXECUTABLE = ${EXECUTABLE}" echo echo "--> ${EXECUTABLE} depends on the following dynamic libraries:" ldd ${EXECUTABLE} echo cd ${LAUNCH DIR} srun ${EXECUTABLE}...

29 / 1 The debug partition In order to have priority access for debugging sbatch --partition=debug ex1.run Limits on Castor: 1 hours walltime 16 cores between all users

30 / 1 General: cgroups (Castor) cgroups ( control groups ) is a Linux kernel feature to limit, account, and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups SLURM: Linux cgroups apply contraints to the CPUs and memory that can be used by a job They are automatically generated using the resource requests given to SLURM They are automatically destroyed at the end of the job, thus releasing all resources used Even if there is physical memory available a task will be killed if it tries to exceed the limits of the cgroup!

31 / 1 System process view Two tasks running on the same node with ps auxf root 177873 slurmstepd: [1072] user 177877 \_ /bin/bash /var/spool/slurmd/job01072/slurm_script user 177908 \_ sleep 10 root 177890 slurmstepd: [1073] user 177894 \_ /bin/bash /var/spool/slurmd/job01073/slurm_script user 177970 \_ sleep 10 Check memory, thread and core usage with htop

32 / 1 Fair share 1/3 The scheduler is configured to give all groups a share of the computing power Within each group the members have an equal share by default: jmenu@castor:~ > sacctmgr show association where account=lacal format=account,cluster,user,grpnodes, QOS,DefaultQOS,Share tree Account Cluster User GrpNodes QOS Def QOS Share -------------------- ---------- ---------- -------- -------------------- --------- --------- lacal castor normal 1 lacal castor aabecker debug,normal normal 1 lacal castor kleinjun debug,normal normal 1 lacal castor knikitin debug,normal normal 1 Priority is based on recent usage this is forgotten about with time (half-life) fair share comes into play when the resources are heavily used

33 / 1 Fair share 2/3 Job priority is a weighted sum of various factors: jmenu@castor:~ > sprio -w JOBID PRIORITY AGE FAIRSHARE QOS Weights 1000 10000 100000 jmenu@bellatrix:~ > sprio -w JOBID PRIORITY AGE FAIRSHARE JOBSIZE QOS Weights 1000 10000 100 100000 To compare jobs priorities: jmenu@castor:~ > sprio -j80833,77613 JOBID PRIORITY AGE FAIRSHARE QOS 77613 145 146 0 0 80833 9204 93 9111 0

34 / 1 Fair share 3/3 FairShare values range from 0.0 to 1.0: Value Meaning 0.0 you used much more resources that you were granted 0.5 you got what you paid for 1.0 you used nearly no resources jmenu@bellatrix:~ > sshare -a -A lacal Accounts requested: : lacal Account User Raw Shares Norm Shares Raw Usage Effectv Usage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- lacal 666 0.097869 1357691548 0.256328 0.162771 lacal boissaye 1 0.016312 0 0.042721 0.162771 lacal janson 1 0.016312 0 0.042721 0.162771 lacal jetchev 1 0.016312 0 0.042721 0.162771 lacal kleinjun 1 0.016312 1357691548 0.256328 0.000019 lacal pbottine 1 0.016312 0 0.042721 0.162771 lacal saltini 1 0.016312 0 0.042721 0.162771 More information at: http://schedmd.com/slurmdocs/priority_multifactor.html

35 / 1 Helping yourself man pages are your friend! man sbatch man sacct man gcc module load intel/14.0.1 man ifort

36 / 1 Getting help If you still have problems then send a message to: 1234@epfl.ch Please start the subject with HPC for automatic routing to the HPC team Please give as much information as possible including: the jobid the directory location and name of the submission script where the slurm-*.out file is to be found how the sbatch command was used to submit it the output from env and module list commands

37 / 1 Appendix Change your shell at: https://dinfo.epfl.ch/cgi-bin/accountprefs Scitas web site: http://scitas.epfl.ch