Biowulf2 Training Session



Similar documents
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Until now: tl;dr: - submit a job to the scheduler

SLURM Workload Manager

General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)

An introduction to compute resources in Biostatistics. Chris Scheller

The Asterope compute cluster

Using the Yale HPC Clusters

The RWTH Compute Cluster Environment

Matlab on a Supercomputer

Parallel Processing using the LOTUS cluster

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

Submitting batch jobs Slurm on ecgate. Xavi Abellan User Support Section

Using NeSI HPC Resources. NeSI Computational Science Team

NEC HPC-Linux-Cluster

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Introduction to parallel computing and UPPMAX

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Hodor and Bran - Job Scheduling and PBS Scripts

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

RA MPI Compilers Debuggers Profiling. March 25, 2009

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Bright Cluster Manager 5.2. User Manual. Revision: Date: Fri, 30 Nov 2012

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, Introduction 2

Introduction to Supercomputing with Janus

To connect to the cluster, simply use a SSH or SFTP client to connect to:

How to Run Parallel Jobs Efficiently

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

Introduction to the SGE/OGS batch-queuing system

Using the Yale HPC Clusters

Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano

Grid 101. Grid 101. Josh Hegie.

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

8/15/2014. Best (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

Running applications on the Cray XC30 4/12/2015

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Getting Started with HPC

NYUAD HPC Center Running Jobs

Martinos Center Compute Clusters

Introduction to HPC Workshop. Center for e-research

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Using Parallel Computing to Run Multiple Jobs

An Introduction to High Performance Computing in the Department

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group

Advanced PBS Workflow Example Bill Brouwer 05/01/12 Research Computing and Cyberinfrastructure Unit, PSU

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

The CNMS Computer Cluster

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Batch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas

Job Scheduling with Moab Cluster Suite

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to SDSC systems and data analytics software packages "

Grid Engine Users Guide p1 Edition

Quick Tutorial for Portable Batch System (PBS)

Grid Engine 6. Troubleshooting. BioTeam Inc.

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Job Scheduling on a Large UV Chad Vizino SGI User Group Conference May Pittsburgh Supercomputing Center

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Parallel Debugging with DDT

Slurm Workload Manager Architecture, Configuration and Use

Installing and running COMSOL on a Linux cluster

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

NERSC File Systems and How to Use Them

Cluster Implementation and Management; Scheduling

Resource Scheduling Best Practice in Hybrid Clusters

Manual for using Super Computing Resources

A Crash course to (The) Bighouse

User s Manual

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

Linux Cluster Computing An Administrator s Perspective

SGE Roll: Users Guide. Version Edition

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

MapReduce Evaluator: User Guide

HPCC USER S GUIDE. Version 1.2 July IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

PBSPro scheduling. PBS overview Qsub command: resource requests. Queues a7ribu8on. Fairshare. Backfill Jobs submission.

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Overview of HPC Resources at Vanderbilt

icer Bioinformatics Support Fall 2011

NorduGrid ARC Tutorial

Agenda. Using HPC Wales 2

Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013

High Performance Computing Infrastructure at DESY

CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions

Transcription:

Biowulf2 Training Session 9 July 2015 Slides at: h,p://hpc.nih.gov/docs/b2training.pdf HPC@NIH website: h,p://hpc.nih.gov

System hardware overview What s new/different The batch system & subminng jobs Monitoring and development tools Q&A Shorthand for this presenta/on: B1 = the current Biowulf cluster (Biowulf1) B2 = the new expansion part of the cluster (Biowulf2)

Biowulf2 12 racks 576 nodes 46 network switches Full containment cooling

Compute Node Summary

NIHnet core Biowulf2 10G/40G/100G Ethernet Network 100G 16x10G 5 NFS Helixnet VLAN 24x10G Clusternet VLAN GPFS 2x40G 10G 10G 1G Compute nodes 56G FDR Fabric Service nodes (Login, DTNs, etc) Compute nodes QDR Fabric 32G

Final B1 B2 Transidon June 27 Network cutover (all nodes and storage on the same network segment) July 9 (TODAY) All Biowulf users have B2 accounts New accounts B2 only login to biowulf2.nih.gov July 10 B1 c2 & c4 nodes redred July/August Transidon remaining B1 nodes (ib, c16 & c24) to B2 (ie SLURM) Buy- in nodes (NCI/CCR, NIMH, NIDDK/LCP) will be co- ordinated separately

Changes for Biowulf2 NODES RedHat/CentOS 6.6 /scratch is in shared space (same on all nodes) /lscratch is local to compute node (SSD, 800 GB) /tmp is RAM (6 GB) GPFS filesystems accessible by all nodes Infiniband compiladon on login node BATCH SLURM batch system Core- based allocadon (nodes can be shared by muldple jobs) Walldme limit required for most jobs DATA TRANSFER Data Transfer Nodes/Globus (nihhpc#globus) wget from compute nodes via web proxy (not yet available) New web site h,p://hpc.nih.gov

Unchanged on Biowulf2 No cpu or memory intensive programs on login node Same /home, /data, and group directories Login node for smaller file transfers (Globus for larger ones) Environment modules

SLURM Hardware Terminology CPUs CPU

SubmiNng Jobs Before, under PBS (B1) qsub swarm Now, under SLURM (B2) sbatch swarm

Parddons (Queues) [biowulf2 ~]$ batchlim Partition MaxCPUsPerUser DefWalltime MaxWalltime --------------------------------------------------------------- norm 1024 04:00:00 10-00:00:00 interactive 64 08:00:00 1-12:00:00 quick 256 01:00:00 01:00:00 largemem 128 04:00:00 10-00:00:00 ibfdr 768 10-00:00:00 10-00:00:00 gpu 128 10-00:00:00 10-00:00:00 unlimited 128 UNLIMITED UNLIMITED niddk 512 04:00:00 10-00:00:00 [biowulf2 ~]$ sbatch -partition ibfdr... Coming in August $ sbatch --partition niddk,norm... [for buy-in users] $ sbatch --partition quick,norm... [for quick users]

Show available resources [steve@biowulf2 ~]$ freen...per-node Resources... Partition FreeNds FreeCores Cores CPUs Mem Disk Features -------------------------------------------------------------------------------- norm* 306/310 9822/9920 16 32 125g 800g cpu32,core16,g128,ssd800,x2650 unlimited 14/14 448/448 16 32 125g 800g cpu32,core16,g128,ssd800,x2650 niddk 86/86 2752/2752 16 32 125g 800g cpu32,core16,g128,niddk,ssd800,x2650 ibfdr 173/192 5536/6144 16 32 62g 800g cpu32,core16,ibfdr,g64,ssd800,x2650 gpu 24/24 768/768 16 32 125g 800g cpu32,core16,g128,gpuk20x,ssd800,x2650 largemem 4/4 256/256 32 64 1009g 800g cpu64,core32,g1024,ssd800,x4620 quick 86/86 2752/2752 16 32 125g 800g cpu32,core16,g128,niddk,ssd800,x2650

Basics Default compute allocadon = 1 physical core = 2 CPUs in Slurm notadon (2 hypercores in old notadon) DefMemPerCPU = 2 GB Therefore, default memory alloc = 4 GB

Single- threaded app Required Command Allocated CPU Memory CPU Memory 1 < 4 GB sbatch jobscript 2 4 GB 1 M GB sbatch --mem=mg jobscript 2 M GB (M > 4)

Muld- threaded app Required Command Allocated CPU Memory CPU Memory C threads on C CPUs M GB sbatch --cpus-per-task=c --mem=mg jobscript C (rounded up to nearest 2) M GB Default memory allocadon for C cpus would be (C*2) GB

Muld- threaded app Use $SLURM_CPUS_PER_TASK inside the script #/bin/bash cd /data/mydir module load hmmer hmmsearch - - cpu $SLURM_CPUS_PER_TASK globins4.hmm /fdb/fastadb/nr.aa.fas #/bin/bash cd /data/mydir module load mapsplice mapsplice.py - 1 Sp_ds.10k.lev.fq - 2 Sp_ds.10k.right.fq - p $SLURM_CPUS_PER_TASK

Parallel (MPI) process Many parallel apps do be,er with - - ntasks- per- core=1 (ignore hyperthreading) Required Command Allocated CPU Memory CPU Memory C < 4 GB per MPI process sbatch --partition=ibfdr --ntasks=c [--constraint=nodetype ] --exclusive --ntasks-per-core=1 jobscript 2* C (1 per MPI process) (rounded up to nearest 2 if necess.) 4 GB per MPI process C G GB per MPI process sbatch --partition=ibfdr --ntasks=c [--constraint=nodetype] --exclusive --mem-per-cpu=g --ntasks-per-core=1 jobscript 2*C (1 per MPI process) (rounded up to nearest 2 if necess.) 2*G GB per MPI process

Parallel (MPI) process Use $SLURM_NTASKS within the script #/bin/bash module load meep/1.2.1/mpi/gige cd /data/username/mydir meme mini- drosoph.s - oc meme_out - maxsize 600000 - p $SLURM_NTASKS OpenMPI has been built with Slurm support, so $np need not be specified. [or the equivalent of $PBS_NODEFILE] #/bin/bash # this file is called meep.par.bat # module for ethernet runs module load meep/1.2.1/mpi/gige cd /data/$user/mydir mpirun `which meep- mpi` bent_waveguide.ctl

Auto- threading apps sbatch -exclusive --cpus-per-task=32 jobscript will give you a node with 32 CPUs. To see all available node types, use freen [- n] [biowulf2 ~]$ freen...per-node Resources... Partition FreeNds FreeCores Cores CPUs Mem Disk Features -------------------------------------------------------------------------------- norm* 305/310 9772/9920 16 32 125g 800g cpu32,core16,g128,ssd800,x2650 unlimited 14/14 448/448 16 32 125g 800g cpu32,core16,g128,ssd800,x2650 niddk 85/86 2720/2752 16 32 125g 800g cpu32,core16,g128,niddk,ssd800,x265 ibfdr 177/192 5664/6144 16 32 62g 800g cpu32,core16,ibfdr,g64,ssd800,x2650 gpu 24/24 768/768 16 32 125g 800g cpu32,core16,g128,gpuk20x,ssd800,x2 largemem 4/4 256/256 32 64 1009g 800g cpu64,core32,g1024,ssd800,x4620

Interacdve jobs [biowulf2 ~]$ sinteractive salloc.exe: Granted job allocation 22374 [cn0004 ~]$ exit exit salloc.exe: Relinquishing job allocation 22374 [biowulf2 somedir]$ sinteractive --cpus-per-task=4 --mem=8g salloc.exe: Granted job allocation 22375 [cn0004 somedir]$ Note: your environment will be exported to the job by default.

Allocadng GPUs sbatch -partition=gpu --gres=gpu:k20x:1 jobscript sbatch -partition=gpu --gres=gpu:k20x:2 jobscript Resource Name Resource Type Count Required even if 1 (Max of 2)

Infiniband Jobs sbatch --partition=ibfdr [--constraint=x2650] --ntasks=64 --ntasks-per-core=1 --exclusive jobscript

Large Memory Jobs (>128 GB) sbatch --mem=768g Four large memory nodes Use large memory nodes for their memory, not for their (64) cores

Swarm of single- threaded processes Required Command Allocated CPU Memory CPU Memory 1 per process < 1.5 GB per process swarm -f swarmfile 2 per process 1.5 GB per process 1 per process G GB (G > 1.5) swarm -g G -f swarmfile 2 per process G GB per process 1 per process G GB (G > 1.5) swarm -g G -p P -f swarmfile 1 per process G GB per process

Swarm of muld- threaded processes Required Command Allocated CPU Memory CPU Memory T threads on T CPUs per process G GB per process swarm -t T -g G -f swarmfile T per process G GB per process Default memory allocadon: T*1.5 GB per process

Swarm of muld- threaded processes Use $SLURM_CPUS_PER_TASK inside batch script mapsplice.py - 1 seq1a.fq - 2 seq1b.fq p $SLURM_CPUS_PER_TASK mapsplice.py - 1 seq2a.fq - 2 seq2b.fq p $SLURM_CPUS_PER_TASK mapsplice.py - 1 seq3a.fq - 2 seq3b.fq p $SLURM_CPUS_PER_TASK etc

Swarm of auto- threading processes Swarm - t auto - > implies - - exclusive

Differences from B1 swarm All swarms are job arrays Processes grouped by jobarray subjob, rather than by node - p <#> processes per subjob autobundle is built- in by default Stderr/stdout filenames: swarm_23163_0.e, swarm_23163_0.o No swarm prologue/epilogue Central script directory, no more.swarm directories - - resource- list replaced by - - licenses and - - gres

Differences from B1 swarm, cont'd Some sbatch opdons are understood by swarm --job-name --dependency --time --partition --exclusive --licenses Other sbatch opdons passed to swarm with - - sbatch switch --sbatch --workdir=somedir

Walldmes sbatch --time=12:00:00 jobscript swarm --time=12:00:00 jobscript

Walldmes Default and max walldmes for each parddon can be seen with batchlim $ batchlim Partition MaxCPUsPerUser DefWalltime MaxWalltime --------------------------------------------------------------- norm 1024 04:00:00 10-00:00:00 interactive 64 08:00:00 1-12:00:00 quick 256 01:00:00 01:00:00 largemem 128 04:00:00 10-00:00:00 ibfdr 768 10-00:00:00 10-00:00:00 gpu 128 10-00:00:00 10-00:00:00 unlimited 128 UNLIMITED UNLIMITED niddk 512 04:00:00 10-00:00:00 sbatch --time=168:00:00 jobscript Do NOT overesdmate walldmes

Unlimited Parddon sbatch --partition=unlimited No walldme limit Only 14 nodes (448 CPUs) available

No requeue Equivalent of PBS - r y n sbatch --requeue Or sbatch --no-requeue

Deledng Batch Jobs scancel <jobid> <jobid> Opdons --user --name --state --nodelist

Using PBS batch scripts Slurm will try to udlize PBS direcdves in scripts by default. - - ignore- pbs will ignore all PBS direcdves in scripts pbs2slurm will convert an exisdng PBS script to Slurm syntax (as best as it can) For parallel jobs, it s best to rewrite scripts in Slurm syntax. (see h,p://hpc.nih.gov/docs/pbs2slurm.html for more info)

Output/error files Slurm combines stdout and stderr into a single file called slurm- ######.out Default locadon: submission directory Can be reassigned with --error /path/to/dir/filename --output /path/to/dir/filename

Licensed Sovware [biowulf2 ~]$ licenses License Total Free ------------------------------------ matlab-stat 1 1 matlab-pde 1 1 matlab 4 4 Request a license with sbatch --license=matlab,matlab-stat swarm f swarmfile -license=matlab

Dependencies sbatch --dependency=afterany:212323 jobscript [afterok, afternotok, after, afterany] sbatch --dependency=afterany:13201,13202 Job3.bat [user@biowulf2]$ squeue -u $USER -S S,i,M -o "%12i %15j %4t %30E" JOBID NAME ST DEPENDENCY 13201 Job1.bat R 13202 Job2.bat R 13203 Job3.bat PD afterany:13201,afterany:13202

Applicadons Web- based documentadon rewri,en to reflect changes in batch system h,p://hpc.nih.gov/apps Over 270 applicadons rebuilt/tested for CentOS 6/Biowulf2

Modules To see default versions of all apps, use: module d avail For a case- insensidve module search, use: module spider Example: [biowulf2 ~]$ module spider xplor-nih ------------------------------ Xplor-NIH: Xplor-NIH/2.39 ------------------------------

Local disk on node Local disk is now /lscratch (/scratch is central shared space) freen will report disk size as a Feature (eg, ssd800 = 800GB SSD disk) Jobs which need to use /lscratch must explicitly allocate space: sbatch --gres=lscratch:500 GB Read/write to /lscratch/$slurm_jobid (/lscratch directory not writable) /lscratch/$slurm_jobid deleted at end of job clearscratch command no longer available

Scheduling Priority Priority = FairShare+Age+JobSize+Parddon+QOS Fairshare half life of 7 days sprio command reports priorides of pending (queued) jobs Backfill job scheduling for parallel jobs

Monitoring show all jobs [susanc@biowulf2]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 22340 ibfdr stmv-204 susanc PD 0:00 128 (Dependency) 22353 ibfdr meme_sho susanc PD 0:00 128 (Dependency) 22339 ibfdr stmv-102 susanc PD 0:00 64 (Dependency) 22352 ibfdr meme_sho susanc PD 0:00 64 (Dependency) [susanc@biowulf2]$ sjobs...requested... User JobId JobName Part St Runtime Nodes CPUs Mem Dependency Features Nodelist susanc 22344 meme_short ibfdr R 22:30 1 32 1.0GB/cpu (null) cn0414 susanc 22335 stmv-64 ibfdr R 0:54 4 128 1.0GB/cpu (null) cn[0413,0415-0417] susanc 22336 stmv-128 ibfdr PD 0:00 8 128 1.0GB/cpu afterany:22335 (null) (Dependency) susanc 22337 stmv-256 ibfdr PD 0:00 16 256 1.0GB/cpu afterany:22336 (null) (Dependency) susanc 22338 stmv-512 ibfdr PD 0:00 32 512 1.0GB/cpu afterany:22337 (null) (Dependency)

$ jobload -u muser $ jobload -u suser Monitoring jobload JOBID TIME NODES CPUS THREADS LOAD MEMORY Alloc Running Used/Alloc 170581_276 3-14:22:51 cn0343 2 2 100% 294.6 MB/1.5 GB 170581_277 3-14:20:51 cn0343 2 2 100% 294.6 MB/1.5 GB 170581_282 3-14:07:52 cn0343 2 2 100% 294.6 MB/1.5 GB 170581_283 3-14:07:52 cn0343 2 2 100% 294.6 MB/1.5 GB 170581_284 3-14:06:52 cn0343 2 2 100% 294.6 MB/1.5 GB 170581_285 3-14:06:52 cn0343 2 2 100% 294.6 MB/1.5 GB 170581_286 3-14:01:52 cn0343 2 2 100% 294.6 MB/1.5 GB... USER SUMMARY Jobs: 256 Nodes: 256 CPUs: 512 Load Avg: 100% JOBID TIME NODES CPUS THREADS LOAD MEMORY Alloc Running Used/Alloc 211880 2:08:46 cn0499 32 16 50% 2.6 GB/31.2 GB 2:08:46 cn0500 32 16 50% 2.4 GB/31.2 GB 2:08:46 cn0501 32 16 50% 2.4 GB/31.2 GB... JOB SUMMARY: {Nodes: 48, CPUs: 1536, Load Avg: 50.0} USER SUMMARY Jobs: 1 Nodes: 48 CPUs: 1536 Load Avg: 50%

jobhist [biowulf2 ~]$ jobhist 22343 JobId : 22343 User : susanc Submitted : 20150522 08:59:54 Submission Dir : /spin1/users/susanc/meme Submission Command : sbatch --depend=afterany:22342 --partition=ibfdr --ntasks=2 --ntasks-per-core=1 meme_short.slurm Partition State AllocNodes AllocCPUs Walltime MemReq MemUsed Nodelist ibfdr COMPLETED 1 32 01:43:58 1.0GB/cpu 0.7GB cn0414 Note: RUNNING jobs will always show MemUsed = 0.0GB

Job summary Show all jobs in any state since midnight sacct Show all jobs that failed since midnight sacct --state f Show all jobs that failed this month sacct --state f --starttime 2015-07-01

Development Tools GCC 4.9 Intel Compiler Suite 2015 PGI 15.4 OpenMPI 1.8 MVAPICH2 2.1 CUDA 7 Toolkit Anaconda Python 2.6 thru 3.4 Details and more info at h,p://hpc.nih.gov/docs/development.html

Documentadon User Guide h,p://hpc.nih.gov/docs/userguide.html 2- page cheat sheet h,p://hpc.cit.nih.gov/docs/biowulf2- handout.pdf Slurm documentadon h,p://slurm.schedmd.com/documentadon.html h,p://slurm.schedmd.com/pdfs/summary.pdf

??? Email staff@biowulf.nih.gov