Biowulf2 Training Session
|
|
|
- Bridget Hodge
- 10 years ago
- Views:
Transcription
1 Biowulf2 Training Session 9 July 2015 Slides at: h,p://hpc.nih.gov/docs/b2training.pdf HPC@NIH website: h,p://hpc.nih.gov
2 System hardware overview What s new/different The batch system & subminng jobs Monitoring and development tools Q&A Shorthand for this presenta/on: B1 = the current Biowulf cluster (Biowulf1) B2 = the new expansion part of the cluster (Biowulf2)
3 Biowulf2 12 racks 576 nodes 46 network switches Full containment cooling
4 Compute Node Summary
5 NIHnet core Biowulf2 10G/40G/100G Ethernet Network 100G 16x10G 5 NFS Helixnet VLAN 24x10G Clusternet VLAN GPFS 2x40G 10G 10G 1G Compute nodes 56G FDR Fabric Service nodes (Login, DTNs, etc) Compute nodes QDR Fabric 32G
6 Final B1 B2 Transidon June 27 Network cutover (all nodes and storage on the same network segment) July 9 (TODAY) All Biowulf users have B2 accounts New accounts B2 only login to biowulf2.nih.gov July 10 B1 c2 & c4 nodes redred July/August Transidon remaining B1 nodes (ib, c16 & c24) to B2 (ie SLURM) Buy- in nodes (NCI/CCR, NIMH, NIDDK/LCP) will be co- ordinated separately
7 Changes for Biowulf2 NODES RedHat/CentOS 6.6 /scratch is in shared space (same on all nodes) /lscratch is local to compute node (SSD, 800 GB) /tmp is RAM (6 GB) GPFS filesystems accessible by all nodes Infiniband compiladon on login node BATCH SLURM batch system Core- based allocadon (nodes can be shared by muldple jobs) Walldme limit required for most jobs DATA TRANSFER Data Transfer Nodes/Globus (nihhpc#globus) wget from compute nodes via web proxy (not yet available) New web site h,p://hpc.nih.gov
8 Unchanged on Biowulf2 No cpu or memory intensive programs on login node Same /home, /data, and group directories Login node for smaller file transfers (Globus for larger ones) Environment modules
9 SLURM Hardware Terminology CPUs CPU
10 SubmiNng Jobs Before, under PBS (B1) qsub swarm Now, under SLURM (B2) sbatch swarm
11 Parddons (Queues) [biowulf2 ~]$ batchlim Partition MaxCPUsPerUser DefWalltime MaxWalltime norm :00: :00:00 interactive 64 08:00: :00:00 quick :00:00 01:00:00 largemem :00: :00:00 ibfdr :00: :00:00 gpu :00: :00:00 unlimited 128 UNLIMITED UNLIMITED niddk :00: :00:00 [biowulf2 ~]$ sbatch -partition ibfdr... Coming in August $ sbatch --partition niddk,norm... [for buy-in users] $ sbatch --partition quick,norm... [for quick users]
12 Show available resources ~]$ freen...per-node Resources... Partition FreeNds FreeCores Cores CPUs Mem Disk Features norm* 306/ / g 800g cpu32,core16,g128,ssd800,x2650 unlimited 14/14 448/ g 800g cpu32,core16,g128,ssd800,x2650 niddk 86/ / g 800g cpu32,core16,g128,niddk,ssd800,x2650 ibfdr 173/ / g 800g cpu32,core16,ibfdr,g64,ssd800,x2650 gpu 24/24 768/ g 800g cpu32,core16,g128,gpuk20x,ssd800,x2650 largemem 4/4 256/ g 800g cpu64,core32,g1024,ssd800,x4620 quick 86/ / g 800g cpu32,core16,g128,niddk,ssd800,x2650
13 Basics Default compute allocadon = 1 physical core = 2 CPUs in Slurm notadon (2 hypercores in old notadon) DefMemPerCPU = 2 GB Therefore, default memory alloc = 4 GB
14 Single- threaded app Required Command Allocated CPU Memory CPU Memory 1 < 4 GB sbatch jobscript 2 4 GB 1 M GB sbatch --mem=mg jobscript 2 M GB (M > 4)
15 Muld- threaded app Required Command Allocated CPU Memory CPU Memory C threads on C CPUs M GB sbatch --cpus-per-task=c --mem=mg jobscript C (rounded up to nearest 2) M GB Default memory allocadon for C cpus would be (C*2) GB
16 Muld- threaded app Use $SLURM_CPUS_PER_TASK inside the script #/bin/bash cd /data/mydir module load hmmer hmmsearch - - cpu $SLURM_CPUS_PER_TASK globins4.hmm /fdb/fastadb/nr.aa.fas #/bin/bash cd /data/mydir module load mapsplice mapsplice.py - 1 Sp_ds.10k.lev.fq - 2 Sp_ds.10k.right.fq - p $SLURM_CPUS_PER_TASK
17 Parallel (MPI) process Many parallel apps do be,er with - - ntasks- per- core=1 (ignore hyperthreading) Required Command Allocated CPU Memory CPU Memory C < 4 GB per MPI process sbatch --partition=ibfdr --ntasks=c [--constraint=nodetype ] --exclusive --ntasks-per-core=1 jobscript 2* C (1 per MPI process) (rounded up to nearest 2 if necess.) 4 GB per MPI process C G GB per MPI process sbatch --partition=ibfdr --ntasks=c [--constraint=nodetype] --exclusive --mem-per-cpu=g --ntasks-per-core=1 jobscript 2*C (1 per MPI process) (rounded up to nearest 2 if necess.) 2*G GB per MPI process
18 Parallel (MPI) process Use $SLURM_NTASKS within the script #/bin/bash module load meep/1.2.1/mpi/gige cd /data/username/mydir meme mini- drosoph.s - oc meme_out - maxsize p $SLURM_NTASKS OpenMPI has been built with Slurm support, so $np need not be specified. [or the equivalent of $PBS_NODEFILE] #/bin/bash # this file is called meep.par.bat # module for ethernet runs module load meep/1.2.1/mpi/gige cd /data/$user/mydir mpirun `which meep- mpi` bent_waveguide.ctl
19 Auto- threading apps sbatch -exclusive --cpus-per-task=32 jobscript will give you a node with 32 CPUs. To see all available node types, use freen [- n] [biowulf2 ~]$ freen...per-node Resources... Partition FreeNds FreeCores Cores CPUs Mem Disk Features norm* 305/ / g 800g cpu32,core16,g128,ssd800,x2650 unlimited 14/14 448/ g 800g cpu32,core16,g128,ssd800,x2650 niddk 85/ / g 800g cpu32,core16,g128,niddk,ssd800,x265 ibfdr 177/ / g 800g cpu32,core16,ibfdr,g64,ssd800,x2650 gpu 24/24 768/ g 800g cpu32,core16,g128,gpuk20x,ssd800,x2 largemem 4/4 256/ g 800g cpu64,core32,g1024,ssd800,x4620
20 Interacdve jobs [biowulf2 ~]$ sinteractive salloc.exe: Granted job allocation [cn0004 ~]$ exit exit salloc.exe: Relinquishing job allocation [biowulf2 somedir]$ sinteractive --cpus-per-task=4 --mem=8g salloc.exe: Granted job allocation [cn0004 somedir]$ Note: your environment will be exported to the job by default.
21 Allocadng GPUs sbatch -partition=gpu --gres=gpu:k20x:1 jobscript sbatch -partition=gpu --gres=gpu:k20x:2 jobscript Resource Name Resource Type Count Required even if 1 (Max of 2)
22 Infiniband Jobs sbatch --partition=ibfdr [--constraint=x2650] --ntasks=64 --ntasks-per-core=1 --exclusive jobscript
23 Large Memory Jobs (>128 GB) sbatch --mem=768g Four large memory nodes Use large memory nodes for their memory, not for their (64) cores
24 Swarm of single- threaded processes Required Command Allocated CPU Memory CPU Memory 1 per process < 1.5 GB per process swarm -f swarmfile 2 per process 1.5 GB per process 1 per process G GB (G > 1.5) swarm -g G -f swarmfile 2 per process G GB per process 1 per process G GB (G > 1.5) swarm -g G -p P -f swarmfile 1 per process G GB per process
25 Swarm of muld- threaded processes Required Command Allocated CPU Memory CPU Memory T threads on T CPUs per process G GB per process swarm -t T -g G -f swarmfile T per process G GB per process Default memory allocadon: T*1.5 GB per process
26 Swarm of muld- threaded processes Use $SLURM_CPUS_PER_TASK inside batch script mapsplice.py - 1 seq1a.fq - 2 seq1b.fq p $SLURM_CPUS_PER_TASK mapsplice.py - 1 seq2a.fq - 2 seq2b.fq p $SLURM_CPUS_PER_TASK mapsplice.py - 1 seq3a.fq - 2 seq3b.fq p $SLURM_CPUS_PER_TASK etc
27 Swarm of auto- threading processes Swarm - t auto - > implies - - exclusive
28 Differences from B1 swarm All swarms are job arrays Processes grouped by jobarray subjob, rather than by node - p <#> processes per subjob autobundle is built- in by default Stderr/stdout filenames: swarm_23163_0.e, swarm_23163_0.o No swarm prologue/epilogue Central script directory, no more.swarm directories - - resource- list replaced by - - licenses and - - gres
29 Differences from B1 swarm, cont'd Some sbatch opdons are understood by swarm --job-name --dependency --time --partition --exclusive --licenses Other sbatch opdons passed to swarm with - - sbatch switch --sbatch --workdir=somedir
30 Walldmes sbatch --time=12:00:00 jobscript swarm --time=12:00:00 jobscript
31 Walldmes Default and max walldmes for each parddon can be seen with batchlim $ batchlim Partition MaxCPUsPerUser DefWalltime MaxWalltime norm :00: :00:00 interactive 64 08:00: :00:00 quick :00:00 01:00:00 largemem :00: :00:00 ibfdr :00: :00:00 gpu :00: :00:00 unlimited 128 UNLIMITED UNLIMITED niddk :00: :00:00 sbatch --time=168:00:00 jobscript Do NOT overesdmate walldmes
32 Unlimited Parddon sbatch --partition=unlimited No walldme limit Only 14 nodes (448 CPUs) available
33 No requeue Equivalent of PBS - r y n sbatch --requeue Or sbatch --no-requeue
34 Deledng Batch Jobs scancel <jobid> <jobid> Opdons --user --name --state --nodelist
35 Using PBS batch scripts Slurm will try to udlize PBS direcdves in scripts by default. - - ignore- pbs will ignore all PBS direcdves in scripts pbs2slurm will convert an exisdng PBS script to Slurm syntax (as best as it can) For parallel jobs, it s best to rewrite scripts in Slurm syntax. (see h,p://hpc.nih.gov/docs/pbs2slurm.html for more info)
36 Output/error files Slurm combines stdout and stderr into a single file called slurm- ######.out Default locadon: submission directory Can be reassigned with --error /path/to/dir/filename --output /path/to/dir/filename
37 Licensed Sovware [biowulf2 ~]$ licenses License Total Free matlab-stat 1 1 matlab-pde 1 1 matlab 4 4 Request a license with sbatch --license=matlab,matlab-stat swarm f swarmfile -license=matlab
38 Dependencies sbatch --dependency=afterany: jobscript [afterok, afternotok, after, afterany] sbatch --dependency=afterany:13201,13202 Job3.bat squeue -u $USER -S S,i,M -o "%12i %15j %4t %30E" JOBID NAME ST DEPENDENCY Job1.bat R Job2.bat R Job3.bat PD afterany:13201,afterany:13202
39 Applicadons Web- based documentadon rewri,en to reflect changes in batch system h,p://hpc.nih.gov/apps Over 270 applicadons rebuilt/tested for CentOS 6/Biowulf2
40 Modules To see default versions of all apps, use: module d avail For a case- insensidve module search, use: module spider Example: [biowulf2 ~]$ module spider xplor-nih Xplor-NIH: Xplor-NIH/
41 Local disk on node Local disk is now /lscratch (/scratch is central shared space) freen will report disk size as a Feature (eg, ssd800 = 800GB SSD disk) Jobs which need to use /lscratch must explicitly allocate space: sbatch --gres=lscratch:500 GB Read/write to /lscratch/$slurm_jobid (/lscratch directory not writable) /lscratch/$slurm_jobid deleted at end of job clearscratch command no longer available
42 Scheduling Priority Priority = FairShare+Age+JobSize+Parddon+QOS Fairshare half life of 7 days sprio command reports priorides of pending (queued) jobs Backfill job scheduling for parallel jobs
43 Monitoring show all jobs squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) ibfdr stmv-204 susanc PD 0: (Dependency) ibfdr meme_sho susanc PD 0: (Dependency) ibfdr stmv-102 susanc PD 0:00 64 (Dependency) ibfdr meme_sho susanc PD 0:00 64 (Dependency) sjobs...requested... User JobId JobName Part St Runtime Nodes CPUs Mem Dependency Features Nodelist susanc meme_short ibfdr R 22: GB/cpu (null) cn0414 susanc stmv-64 ibfdr R 0: GB/cpu (null) cn[0413, ] susanc stmv-128 ibfdr PD 0: GB/cpu afterany:22335 (null) (Dependency) susanc stmv-256 ibfdr PD 0: GB/cpu afterany:22336 (null) (Dependency) susanc stmv-512 ibfdr PD 0: GB/cpu afterany:22337 (null) (Dependency)
44 $ jobload -u muser $ jobload -u suser Monitoring jobload JOBID TIME NODES CPUS THREADS LOAD MEMORY Alloc Running Used/Alloc _ :22:51 cn % MB/1.5 GB _ :20:51 cn % MB/1.5 GB _ :07:52 cn % MB/1.5 GB _ :07:52 cn % MB/1.5 GB _ :06:52 cn % MB/1.5 GB _ :06:52 cn % MB/1.5 GB _ :01:52 cn % MB/1.5 GB... USER SUMMARY Jobs: 256 Nodes: 256 CPUs: 512 Load Avg: 100% JOBID TIME NODES CPUS THREADS LOAD MEMORY Alloc Running Used/Alloc :08:46 cn % 2.6 GB/31.2 GB 2:08:46 cn % 2.4 GB/31.2 GB 2:08:46 cn % 2.4 GB/31.2 GB... JOB SUMMARY: {Nodes: 48, CPUs: 1536, Load Avg: 50.0} USER SUMMARY Jobs: 1 Nodes: 48 CPUs: 1536 Load Avg: 50%
45 jobhist [biowulf2 ~]$ jobhist JobId : User : susanc Submitted : :59:54 Submission Dir : /spin1/users/susanc/meme Submission Command : sbatch --depend=afterany: partition=ibfdr --ntasks=2 --ntasks-per-core=1 meme_short.slurm Partition State AllocNodes AllocCPUs Walltime MemReq MemUsed Nodelist ibfdr COMPLETED :43:58 1.0GB/cpu 0.7GB cn0414 Note: RUNNING jobs will always show MemUsed = 0.0GB
46 Job summary Show all jobs in any state since midnight sacct Show all jobs that failed since midnight sacct --state f Show all jobs that failed this month sacct --state f --starttime
47 Development Tools GCC 4.9 Intel Compiler Suite 2015 PGI 15.4 OpenMPI 1.8 MVAPICH2 2.1 CUDA 7 Toolkit Anaconda Python 2.6 thru 3.4 Details and more info at h,p://hpc.nih.gov/docs/development.html
48 Documentadon User Guide h,p://hpc.nih.gov/docs/userguide.html 2- page cheat sheet h,p://hpc.cit.nih.gov/docs/biowulf2- handout.pdf Slurm documentadon h,p://slurm.schedmd.com/documentadon.html h,p://slurm.schedmd.com/pdfs/summary.pdf
49 ???
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research
! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at
Until now: tl;dr: - submit a job to the scheduler
Until now: - access the cluster copy data to/from the cluster create parallel software compile code and use optimized libraries how to run the software on the full cluster tl;dr: - submit a job to the
SLURM Workload Manager
SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux
General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)
Slurm Training15 Agenda 1 2 3 About Slurm Key Features of Slurm Extending Slurm Resource Management Daemons Job/step allocation 4 5 SMP MPI Parametric Job monitoring Accounting Scheduling Administration
An introduction to compute resources in Biostatistics. Chris Scheller [email protected]
An introduction to compute resources in Biostatistics Chris Scheller [email protected] 1. Resources 1. Hardware 2. Account Allocation 3. Storage 4. Software 2. Usage 1. Environment Modules 2. Tools 3.
The Asterope compute cluster
The Asterope compute cluster ÅA has a small cluster named asterope.abo.fi with 8 compute nodes Each node has 2 Intel Xeon X5650 processors (6-core) with a total of 24 GB RAM 2 NVIDIA Tesla M2050 GPGPU
Using the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: [email protected] Read documentation at: http://research.computing.yale.edu/hpc-support
The RWTH Compute Cluster Environment
The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de
Matlab on a Supercomputer
Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides
Parallel Processing using the LOTUS cluster
Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS
PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007
PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit
Submitting batch jobs Slurm on ecgate. Xavi Abellan [email protected] User Support Section
Submitting batch jobs Slurm on ecgate Xavi Abellan [email protected] User Support Section Slide 1 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic
Using NeSI HPC Resources. NeSI Computational Science Team ([email protected])
NeSI Computational Science Team ([email protected]) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting
NEC HPC-Linux-Cluster
NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St
Introduction to parallel computing and UPPMAX
Introduction to parallel computing and UPPMAX Intro part of course in Parallel Image Analysis Elias Rudberg [email protected] March 22, 2011 Parallel computing Parallel computing is becoming increasingly
RWTH GPU Cluster. Sandra Wienke [email protected] November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke [email protected] November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative
Hodor and Bran - Job Scheduling and PBS Scripts
Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.
Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014
Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan
RA MPI Compilers Debuggers Profiling. March 25, 2009
RA MPI Compilers Debuggers Profiling March 25, 2009 Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar
Streamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
Bright Cluster Manager 5.2. User Manual. Revision: 3324. Date: Fri, 30 Nov 2012
Bright Cluster Manager 5.2 User Manual Revision: 3324 Date: Fri, 30 Nov 2012 Table of Contents Table of Contents........................... i 1 Introduction 1 1.1 What Is A Beowulf Cluster?..................
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:
HPC at IU Overview. Abhinav Thota Research Technologies Indiana University
HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is
CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015. 1 Introduction 2
CNAG User s Guide Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015 Contents 1 Introduction 2 2 System Overview 2 3 Connecting to CNAG cluster 2 3.1 Transferring files...................................
Introduction to Supercomputing with Janus
Introduction to Supercomputing with Janus Shelley Knuth [email protected] Peter Ruprecht [email protected] www.rc.colorado.edu Outline Who is CU Research Computing? What is a supercomputer?
To connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or
How to Run Parallel Jobs Efficiently
How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2
High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina
High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16 Topics Specifications Overview Site Policies Intel Compilers
Introduction to the SGE/OGS batch-queuing system
Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic
Using the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Dec 2015 To get help Send an email to: [email protected] Read documentation at: http://research.computing.yale.edu/hpc-support
Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano
Managing GPUs by Slurm Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano Agenda General Slurm introduction Slurm@CSCS Generic Resource Scheduling for GPUs Resource
Grid 101. Grid 101. Josh Hegie. [email protected] http://hpc.unr.edu
Grid 101 Josh Hegie [email protected] http://hpc.unr.edu Accessing the Grid Outline 1 Accessing the Grid 2 Working on the Grid 3 Submitting Jobs with SGE 4 Compiling 5 MPI 6 Questions? Accessing the Grid Logging
HPC-Nutzer Informationsaustausch. The Workload Management System LSF
HPC-Nutzer Informationsaustausch The Workload Management System LSF Content Cluster facts Job submission esub messages Scheduling strategies Tools and security Future plans 2 von 10 Some facts about the
8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status
Best Practices @OLCF (and more) Bill Renaud OLCF User Support General Information This presentation covers some helpful information for users of OLCF Staying informed Aspects of system usage that may differ
Running applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.
Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction
Getting Started with HPC
Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage
NYUAD HPC Center Running Jobs
NYUAD HPC Center Running Jobs 1 Overview... Error! Bookmark not defined. 1.1 General List... Error! Bookmark not defined. 1.2 Compilers... Error! Bookmark not defined. 2 Loading Software... Error! Bookmark
Martinos Center Compute Clusters
Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress
Introduction to HPC Workshop. Center for e-research ([email protected])
Center for e-research ([email protected]) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using
Miami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
Using Parallel Computing to Run Multiple Jobs
Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The
An Introduction to High Performance Computing in the Department
An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software
OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group
OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may
Advanced PBS Workflow Example Bill Brouwer 05/01/12 Research Computing and Cyberinfrastructure Unit, PSU [email protected]
Advanced PBS Workflow Example Bill Brouwer 050112 Research Computing and Cyberinfrastructure Unit, PSU [email protected] 0.0 An elementary workflow All jobs consuming significant cycles need to be submitted
Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)
Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing
The CNMS Computer Cluster
The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the
How To Run A Tompouce Cluster On An Ipra (Inria) 2.5.5 (Sun) 2 (Sun Geserade) 2-5.4 (Sun-Ge) 2/5.2 (
Running Hadoop and Stratosphere jobs on TomPouce cluster 16 October 2013 TomPouce cluster TomPouce is a cluster of 20 calcula@on nodes = 240 cores Located in the Inria Turing building (École Polytechnique)
Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015
Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians
Batch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas [email protected]
Batch Usage on JURECA Introduction to Slurm Nov 2015 Chrysovalantis Paschoulas [email protected] Batch System Concepts (1) A cluster consists of a set of tightly connected "identical" computers
Job Scheduling with Moab Cluster Suite
Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. [email protected] 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959
Introduction to SDSC systems and data analytics software packages "
Introduction to SDSC systems and data analytics software packages " Mahidhar Tatineni ([email protected]) SDSC Summer Institute August 05, 2013 Getting Started" System Access Logging in Linux/Mac Use available
Grid Engine Users Guide. 2011.11p1 Edition
Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the
Quick Tutorial for Portable Batch System (PBS)
Quick Tutorial for Portable Batch System (PBS) The Portable Batch System (PBS) system is designed to manage the distribution of batch jobs and interactive sessions across the available nodes in the cluster.
Grid Engine 6. Troubleshooting. BioTeam Inc. [email protected]
Grid Engine 6 Troubleshooting BioTeam Inc. [email protected] Grid Engine Troubleshooting There are two core problem types Job Level Cluster seems OK, example scripts work fine Some user jobs/apps fail Cluster
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014
1. How to Get An Account CACR Accounts 2. How to Access the Machine Connect to the front end, zwicky.cacr.caltech.edu: ssh -l username zwicky.cacr.caltech.edu or ssh [email protected] Edits,
Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen
Niels Bohr Institute University of Copenhagen Who am I Theoretical physics (KU, NORDITA, TF, NSC) Computing non-standard superconductivity and superfluidity condensed matter / statistical physics several
Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar
Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working
Job Scheduling on a Large UV 1000. Chad Vizino SGI User Group Conference May 2011. 2011 Pittsburgh Supercomputing Center
Job Scheduling on a Large UV 1000 Chad Vizino SGI User Group Conference May 2011 Overview About PSC s UV 1000 Simon UV Distinctives UV Operational issues Conclusion PSC s UV 1000 - Blacklight Blacklight
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.
bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29. September 2010 Richling/Kredel (URZ/RUM) bwgrid Treff WS 2010/2011 1 / 25 Course
Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected]
Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected] Setup Login to Ranger: - ssh -X [email protected] Make sure you can export graphics
Parallel Debugging with DDT
Parallel Debugging with DDT Nate Woody 3/10/2009 www.cac.cornell.edu 1 Debugging Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece
Slurm Workload Manager Architecture, Configuration and Use
Slurm Workload Manager Architecture, Configuration and Use Morris Jette [email protected] Goals Learn the basics of SLURM's architecture, daemons and commands Learn how to use a basic set of commands Learn
Installing and running COMSOL on a Linux cluster
Installing and running COMSOL on a Linux cluster Introduction This quick guide explains how to install and operate COMSOL Multiphysics 5.0 on a Linux cluster. It is a complement to the COMSOL Installation
1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology
Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science
NERSC File Systems and How to Use Them
NERSC File Systems and How to Use Them David Turner! NERSC User Services Group! Joint Facilities User Forum on Data- Intensive Computing! June 18, 2014 The compute and storage systems 2014 Hopper: 1.3PF,
Cluster Implementation and Management; Scheduling
Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /
Resource Scheduling Best Practice in Hybrid Clusters
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Resource Scheduling Best Practice in Hybrid Clusters C. Cavazzoni a, A. Federico b, D. Galetti a, G. Morelli b, A. Pieretti
Manual for using Super Computing Resources
Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad
A Crash course to (The) Bighouse
A Crash course to (The) Bighouse Brock Palen [email protected] SVTI Users meeting Sep 20th Outline 1 Resources Configuration Hardware 2 Architecture ccnuma Altix 4700 Brick 3 Software Packaged Software
Cluster@WU User s Manual
Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut
OLCF Best Practices. Bill Renaud OLCF User Assistance Group
OLCF Best Practices Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may differ from
Linux Cluster Computing An Administrator s Perspective
Linux Cluster Computing An Administrator s Perspective Robert Whitinger Traques LLC and High Performance Computing Center East Tennessee State University : http://lxer.com/pub/self2015_clusters.pdf 2015-Jun-14
SGE Roll: Users Guide. Version @VERSION@ Edition
SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1
New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry
New High-performance computing cluster: PAULI Sascha Frick Institute for Physical Chemistry 02/05/2012 Sascha Frick (PHC) HPC cluster pauli 02/05/2012 1 / 24 Outline 1 About this seminar 2 New Hardware
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
MapReduce Evaluator: User Guide
University of A Coruña Computer Architecture Group MapReduce Evaluator: User Guide Authors: Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada and Juan Touriño December 9, 2014 Contents 1 Overview
HPCC USER S GUIDE. Version 1.2 July 2012. IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35
HPCC USER S GUIDE Version 1.2 July 2012 IITS (Research Support) Singapore Management University IITS, Singapore Management University Page 1 of 35 Revision History Version 1.0 (27 June 2012): - Modified
1 DCSC/AU: HUGE. DeIC Sekretariat 2013-03-12/RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations
Bilag 1 2013-03-12/RB DeIC (DCSC) Scientific Computing Installations DeIC, previously DCSC, currently has a number of scientific computing installations, distributed at five regional operating centres.
PBSPro scheduling. PBS overview Qsub command: resource requests. Queues a7ribu8on. Fairshare. Backfill Jobs submission.
PBSPro scheduling PBS overview Qsub command: resource requests Queues a7ribu8on Fairshare Backfill Jobs submission 9 mai 03 PBS PBS overview 9 mai 03 PBS PBS organiza5on: daemons frontend compute nodes
JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert
Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA
Overview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
icer Bioinformatics Support Fall 2011
icer Bioinformatics Support Fall 2011 John B. Johnston HPC Programmer Institute for Cyber Enabled Research 2011 Michigan State University Board of Trustees. Institute for Cyber Enabled Research (icer)
NorduGrid ARC Tutorial
NorduGrid ARC Tutorial / Arto Teräs and Olli Tourunen 2006-03-23 Slide 1(34) NorduGrid ARC Tutorial Arto Teräs and Olli Tourunen CSC, Espoo, Finland March 23
Agenda. Using HPC Wales 2
Using HPC Wales Agenda Infrastructure : An Overview of our Infrastructure Logging in : Command Line Interface and File Transfer Linux Basics : Commands and Text Editors Using Modules : Managing Software
Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013
Cluster performance, how to get the most out of Abel Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013 Introduction Architecture x86-64 and NVIDIA Compilers MPI Interconnect Storage Batch queue
High Performance Computing Infrastructure at DESY
High Performance Computing Infrastructure at DESY Sven Sternberger & Frank Schlünzen High Performance Computing Infrastructures at DESY DV-Seminar / 04 Feb 2013 Compute Infrastructures at DESY - Outline
CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions
CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions (Version: 07.10.2013) Foto: Thomas Josek/JosekDesign Viktor Achter Dr. Stefan Borowski Lech Nieroda Dr. Lars Packschies Volker
