Using the general purpose computing clusters at EPFL

Size: px
Start display at page:

Download "Using the general purpose computing clusters at EPFL"

Transcription

1 Using the general purpose computing clusters at EPFL scitas.epfl.ch March 5, 2015

2 1 / 1 Bellatrix Frontend at bellatrix.epfl.ch 16 x 2.2 GHz cores per node 424 nodes with 32GB Infiniband QDR network The batch system is SLURM RedHat 6.5

3 Castor I Frontend at castor.epfl.ch I 16 x 2.6 GHz cores per node I 50 nodes with 64GB I 2 nodes with 256GB I For sequential jobs (Matlab etc.) I The batch system is SLURM I RedHat 6.5 2/1

4 3 / 1 Deneb Frontend at deneb.epfl.ch 16 x 2.6 GHz cores per node 376 nodes with 64GB 8 nodes with 256GB 2 nodes with 512GB and 32 cores 16 nodes with 4 Nvidia K40 GPUs Infiniband QDR network The batch system is SLURM RedHat 6.5

5 4 / 1 Storage /home filesystem has per user quotas will be backed up for important things (source code, results and theses) shared by all clusters /scratch high performance temporary space is not backed up is organised by laboratory local to cluster

6 5 / 1 Connection Start the X server (automatic on a Mac) Open a terminal ssh -Y username@castor.epfl.ch Try the following commands: id pwd quota ls /scratch/<group>/<username>

7 6 / 1 The batch system Goal: to take a list of jobs and execute them when appropriate resources become available SCITAS uses SLURM on its clusters: The configuration depends on the purpose of the cluster (serial vs parallel)

8 7 / 1 sbatch The fundamental command is sbatch sbatch submits jobs to the batch system Suggested workflow: create a short job-script submit it to the batch system

9 8 / 1 sbatch - exercise Copy the first two examples to your home directory cp /scratch/examples/ex1.run. cp /scratch/examples/ex2.run. Open the file ex1.run with your editor of choice

10 9 / 1 ex1.run #!/bin/bash #SBATCH --workdir /scratch/<group>/<username> #SBATCH --nodes 1 #SBATCH --ntasks 1 #SBATCH --cpus-per-task 1 #SBATCH --mem 1024 sleep 10 echo "hello from $(hostname)" sleep 10

11 10 / 1 ex1.run #SBATCH is a directive to the batch system --nodes 1 the number of nodes to use - on Castor this is limited to 1 --ntasks 1 the number of tasks (in an MPI sense) to run per job --cpu-per-task 1 the number of cores per aforementioned task --mem 4096 the memory required per node in MB --time 12:00:00 --time 2-6 the time required # 12 hours # two days and six hours

12 11 / 1 Running ex1.run The job is assigned a default runtime of 15 minutes $ sbatch ex1.run Submitted batch job 439 $ cat /scratch/<group>/<username>/slurm-439.out hello from c03

13 12 / 1 What went on? sacct -j <JOB_ID> sacct -l -j <JOB_ID> Or more usefully: Sjob <JOB ID>

14 13 / 1 Cancelling jobs To cancel a specific job: scancel <JOB_ID> To cancel all your jobs: scancel -u <username>

15 14 / 1 ex2.run #!/bin/bash #SBATCH --workdir /scratch/<group>/<username> #SBATCH --nodes 1 #SBATCH --ntasks 1 #SBATCH --cpus-per-task 8 #SBATCH --mem #SBATCH --time 00:30:00 #SBATCH --mail-type=all #SBATCH --mail-user=< _address> /scratch/examples/linpack/runme_1_45k

16 15 / 1 What s going on? squeue squeue -u <username> Squeue Sjob <JOB_ID> scontrol -d show job <JOB_ID> sinfo

17 16 / 1 squeue and Squeue squeue Job states: Pending, Resources, Priority, Running squeue grep <JOB_ID> squeue -j <JOB_ID> Squeue <JOB_ID>

18 17 / 1 Sjob $ Sjob <JOB_ID> JobID JobName Cluster Account Partition Timelimit User Group ex1.run castor scitas-ge serial 00:15:00 jmenu scitas-ge batch batch castor scitas-ge Submit Eligible Start End T15:55: T15:55: T15:55: T15:56: T15:55: T15:55: T15:55: T15:56:08 Elapsed ExitCode State :00:20 0:0 COMPLETED 00:00:20 0:0 COMPLETED NCPUS NTasks NodeList UserCPU SystemCPU AveCPU MaxVMSize c04 00:00:00 00: c04 00:00:00 00: :00: K

19 18 / 1 scontrol $ scontrol -d show job <JOB_ID> $ scontrol -d show job 400 obid=400 Name=s1.job UserId=user(123456) GroupId=group(654321) Priority=111 Account=scitas-ge QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0 DerivedExitCode=0:0 RunTime=00:03:39 TimeLimit=00:15:00 TimeMin=N/A SubmitTime= T09:45:27 EligibleTime= T09:45:27 StartTime= T09:45:27 EndTime= T10:00:27 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=serial AllocNode:Sid=castor: ReqNodeList=(null) ExcNodeList=(null) NodeList=c03 BatchHost=c03 NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:* Nodes=c03 CPU IDs=0 Mem=1024 MinCPUsNode=1 MinMemoryCPU=1024M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/<user>/jobs/s1.job WorkDir=/scratch/<group>/<user>

20 19 / 1 Modules Modules make your life easier module avail module show <take your pick> module load <take your pick> module list module purge module list

21 20 / 1 ex3.run - Mathematica Copy the following files to your chosen directory: cp /scratch/examples/ex3.run. cp /scratch/examples/mathematica.in. Submit ex3.run to the batch system and see what happens...

22 21 / 1 ex3.run #!/bin/bash #SBATCH --ntasks 1 #SBATCH --cpus-per-task 1 #SBATCH --nodes 1 #SBATCH --mem 4096 #SBATCH --time 00:05:00 echo STARTING AT date module purge module load mathematica/9.0.1 math < mathematica.in echo FINISHED at date

23 22 / 1 Compiling ex4.* source files Copy the following files to your chosen directory: /scratch/examples/ex4_readme.txt /scratch/examples/ex4.c /scratch/examples/ex4.cxx /scratch/examples/ex4.f90 /scratch/examples/ex4.run Then compile them with: module load intelmpi/4.1.3 mpiicc -o ex4 c ex4.c mpiicpc -o ex4 cxx ex4.cxx mpiifort -o ex4 f90 ex4.f90

24 23 / 1 The 3 methods to get interactive access 1/3 In order to schedule an allocation use salloc with exactly the same options for resources as sbatch You will then arrive in a new prompt which is still on the submission node but by using srun you can get access to the allocated resources eroche@castor:hello > salloc -N 1 -n 2 salloc: Granted job allocation 1234 bash-4.1$ hostname castor bash-4.1$ srun hostname c03 c03

25 24 / 1 The 3 methods to get interactive access 2/3 To get a prompt on the machine one needs to use the --pty option with srun and then bash -i (or tcsh -i ) to get the shell: eroche@castor > salloc -N 1 -n 1 salloc: Granted job allocation 1235 eroche@castor > srun --pty bash -i bash-4.1$ hostname c03

26 25 / 1 The 3 methods to get interactive access 3/3 This is the least elegant but it is the method by which one can run X11 applications: eroche@bellatrix > salloc -n 1 -c 16 -N 1 salloc: Granted job allocation 1236 bash-4.1$ srun hostname c04 bash-4.1$ ssh -Y c04 eroche@c04 >

27 26 / 1 The SLURM jobs environment SLURM defines a number of variables, among them: jmenu@castor: > srun env grep SLURM SLURM SUBMIT DIR=/home/jmenu SLURM SUBMIT HOST=castor SLURM JOB ID=90771 SLURM JOB NUM NODES=1 SLURM JOB NODELIST=c05 SLURM JOB CPUS PER NODE=1 SLURM MEM PER CPU=1024 SLURM JOBID= SLURM NODELIST=c05 SLURM TASKS PER NODE=1 SLURM NTASKS=1 SLURM NPROCS=1... SLURM CPUS ON NODE=1 SLURM TASK PID= SLURMD NODENAME=c05

28 27 / 1 Dynamic libs used in an application ldd displays the libraries an executable file depends on: jmenu@castor: ~ /COURS > ldd ex4 f90 linux-vdso.so.1 => (0x00007fff4b905000) libmpigf.so.4 => /opt/software/intel/14.0.1/intel64/lib/libmpigf.so.4 (0x00007f556cf88000) libmpi.so.4 => /opt/software/intel/14.0.1/intel64/lib/libmpi.so.4 (0x00007f556c91c000) libdl.so.2 => /lib64/libdl.so.2 (0x e00000) librt.so.1 => /lib64/librt.so.1 (0x a00000) libpthread.so.0 => /lib64/libpthread.so.0 (0x ) libm.so.6 => /lib64/libm.so.6 (0x ) libc.so.6 => /lib64/libc.so.6 (0x a00000) libgcc s.so.1 => /lib64/libgcc s.so.1 (0x c600000) /lib64/ld-linux-x86-64.so.2 (0x )

29 28 / 1 ex4.run #!/bin/bash... module purge module load intelmpi/4.1.3 module list echo LAUNCH DIR=/scratch/scitas-ge/jmenu EXECUTABLE="./ex4 f90" echo "--> LAUNCH DIR = ${LAUNCH DIR}" echo "--> EXECUTABLE = ${EXECUTABLE}" echo echo "--> ${EXECUTABLE} depends on the following dynamic libraries:" ldd ${EXECUTABLE} echo cd ${LAUNCH DIR} srun ${EXECUTABLE}...

30 29 / 1 The debug partition In order to have priority access for debugging sbatch --partition=debug ex1.run Limits on Castor: 1 hours walltime 16 cores between all users

31 30 / 1 General: cgroups (Castor) cgroups ( control groups ) is a Linux kernel feature to limit, account, and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups SLURM: Linux cgroups apply contraints to the CPUs and memory that can be used by a job They are automatically generated using the resource requests given to SLURM They are automatically destroyed at the end of the job, thus releasing all resources used Even if there is physical memory available a task will be killed if it tries to exceed the limits of the cgroup!

32 31 / 1 System process view Two tasks running on the same node with ps auxf root slurmstepd: [1072] user \_ /bin/bash /var/spool/slurmd/job01072/slurm_script user \_ sleep 10 root slurmstepd: [1073] user \_ /bin/bash /var/spool/slurmd/job01073/slurm_script user \_ sleep 10 Check memory, thread and core usage with htop

33 32 / 1 Fair share 1/3 The scheduler is configured to give all groups a share of the computing power Within each group the members have an equal share by default: jmenu@castor:~ > sacctmgr show association where account=lacal format=account,cluster,user,grpnodes, QOS,DefaultQOS,Share tree Account Cluster User GrpNodes QOS Def QOS Share lacal castor normal 1 lacal castor aabecker debug,normal normal 1 lacal castor kleinjun debug,normal normal 1 lacal castor knikitin debug,normal normal 1 Priority is based on recent usage this is forgotten about with time (half-life) fair share comes into play when the resources are heavily used

34 33 / 1 Fair share 2/3 Job priority is a weighted sum of various factors: jmenu@castor:~ > sprio -w JOBID PRIORITY AGE FAIRSHARE QOS Weights jmenu@bellatrix:~ > sprio -w JOBID PRIORITY AGE FAIRSHARE JOBSIZE QOS Weights To compare jobs priorities: jmenu@castor:~ > sprio -j80833,77613 JOBID PRIORITY AGE FAIRSHARE QOS

35 34 / 1 Fair share 3/3 FairShare values range from 0.0 to 1.0: Value Meaning 0.0 you used much more resources that you were granted 0.5 you got what you paid for 1.0 you used nearly no resources jmenu@bellatrix:~ > sshare -a -A lacal Accounts requested: : lacal Account User Raw Shares Norm Shares Raw Usage Effectv Usage FairShare lacal lacal boissaye lacal janson lacal jetchev lacal kleinjun lacal pbottine lacal saltini More information at:

36 35 / 1 Helping yourself man pages are your friend! man sbatch man sacct man gcc module load intel/ man ifort

37 36 / 1 Getting help If you still have problems then send a message to: 1234@epfl.ch Please start the subject with HPC for automatic routing to the HPC team Please give as much information as possible including: the jobid the directory location and name of the submission script where the slurm-*.out file is to be found how the sbatch command was used to submit it the output from env and module list commands

38 37 / 1 Appendix Change your shell at: Scitas web site:

SLURM Workload Manager

SLURM Workload Manager SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux

More information

Submitting batch jobs Slurm on ecgate. Xavi Abellan xavier.abellan@ecmwf.int User Support Section

Submitting batch jobs Slurm on ecgate. Xavi Abellan xavier.abellan@ecmwf.int User Support Section Submitting batch jobs Slurm on ecgate Xavi Abellan xavier.abellan@ecmwf.int User Support Section Slide 1 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic

More information

Until now: tl;dr: - submit a job to the scheduler

Until now: tl;dr: - submit a job to the scheduler Until now: - access the cluster copy data to/from the cluster create parallel software compile code and use optimized libraries how to run the software on the full cluster tl;dr: - submit a job to the

More information

General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)

General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!) Slurm Training15 Agenda 1 2 3 About Slurm Key Features of Slurm Extending Slurm Resource Management Daemons Job/step allocation 4 5 SMP MPI Parametric Job monitoring Accounting Scheduling Administration

More information

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research ! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt. SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt. SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

An introduction to compute resources in Biostatistics. Chris Scheller schelcj@umich.edu

An introduction to compute resources in Biostatistics. Chris Scheller schelcj@umich.edu An introduction to compute resources in Biostatistics Chris Scheller schelcj@umich.edu 1. Resources 1. Hardware 2. Account Allocation 3. Storage 4. Software 2. Usage 1. Environment Modules 2. Tools 3.

More information

Biowulf2 Training Session

Biowulf2 Training Session Biowulf2 Training Session 9 July 2015 Slides at: h,p://hpc.nih.gov/docs/b2training.pdf HPC@NIH website: h,p://hpc.nih.gov System hardware overview What s new/different The batch system & subminng jobs

More information

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959

More information

The Asterope compute cluster

The Asterope compute cluster The Asterope compute cluster ÅA has a small cluster named asterope.abo.fi with 8 compute nodes Each node has 2 Intel Xeon X5650 processors (6-core) with a total of 24 GB RAM 2 NVIDIA Tesla M2050 GPGPU

More information

NEC HPC-Linux-Cluster

NEC HPC-Linux-Cluster NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores

More information

Matlab on a Supercomputer

Matlab on a Supercomputer Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides

More information

Introduction to Supercomputing with Janus

Introduction to Supercomputing with Janus Introduction to Supercomputing with Janus Shelley Knuth shelley.knuth@colorado.edu Peter Ruprecht peter.ruprecht@colorado.edu www.rc.colorado.edu Outline Who is CU Research Computing? What is a supercomputer?

More information

Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano

Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano Managing GPUs by Slurm Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano Agenda General Slurm introduction Slurm@CSCS Generic Resource Scheduling for GPUs Resource

More information

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Miami University RedHawk Cluster Working with batch jobs on the Cluster Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.

More information

Introduction to parallel computing and UPPMAX

Introduction to parallel computing and UPPMAX Introduction to parallel computing and UPPMAX Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 22, 2011 Parallel computing Parallel computing is becoming increasingly

More information

SLURM Resources isolation through cgroups. Yiannis Georgiou email: yiannis.georgiou@bull.fr Matthieu Hautreux email: matthieu.hautreux@cea.

SLURM Resources isolation through cgroups. Yiannis Georgiou email: yiannis.georgiou@bull.fr Matthieu Hautreux email: matthieu.hautreux@cea. SLURM Resources isolation through cgroups Yiannis Georgiou email: yiannis.georgiou@bull.fr Matthieu Hautreux email: matthieu.hautreux@cea.fr Outline Introduction to cgroups Cgroups implementation upon

More information

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St

More information

An introduction to Fyrkat

An introduction to Fyrkat Cluster Computing May 25, 2011 How to get an account https://fyrkat.grid.aau.dk/useraccount How to get help https://fyrkat.grid.aau.dk/wiki What is a Cluster Anyway It is NOT something that does any of

More information

The RWTH Compute Cluster Environment

The RWTH Compute Cluster Environment The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de

More information

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27. Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction

More information

Tutorial-4a: Parallel (multi-cpu) Computing

Tutorial-4a: Parallel (multi-cpu) Computing HTTP://WWW.HEP.LU.SE/COURSES/MNXB01 Introduction to Programming and Computing for Scientists (2015 HT) Tutorial-4a: Parallel (multi-cpu) Computing Balazs Konya (Lund University) Programming for Scientists

More information

Batch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de

Batch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Batch Usage on JURECA Introduction to Slurm Nov 2015 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Batch System Concepts (1) A cluster consists of a set of tightly connected "identical" computers

More information

To connect to the cluster, simply use a SSH or SFTP client to connect to:

To connect to the cluster, simply use a SSH or SFTP client to connect to: RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or

More information

Using the Yale HPC Clusters

Using the Yale HPC Clusters Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support

More information

Parallel Processing using the LOTUS cluster

Parallel Processing using the LOTUS cluster Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS

More information

Using NeSI HPC Resources. NeSI Computational Science Team (support@nesi.org.nz)

Using NeSI HPC Resources. NeSI Computational Science Team (support@nesi.org.nz) NeSI Computational Science Team (support@nesi.org.nz) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting

More information

Getting Started with HPC

Getting Started with HPC Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage

More information

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015. 1 Introduction 2

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015. 1 Introduction 2 CNAG User s Guide Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015 Contents 1 Introduction 2 2 System Overview 2 3 Connecting to CNAG cluster 2 3.1 Transferring files...................................

More information

UMass High Performance Computing Center

UMass High Performance Computing Center .. UMass High Performance Computing Center University of Massachusetts Medical School October, 2014 2 / 32. Challenges of Genomic Data It is getting easier and cheaper to produce bigger genomic data every

More information

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

HPC-Nutzer Informationsaustausch. The Workload Management System LSF HPC-Nutzer Informationsaustausch The Workload Management System LSF Content Cluster facts Job submission esub messages Scheduling strategies Tools and security Future plans 2 von 10 Some facts about the

More information

Grid Engine Users Guide. 2011.11p1 Edition

Grid Engine Users Guide. 2011.11p1 Edition Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the

More information

Slurm Workload Manager Architecture, Configuration and Use

Slurm Workload Manager Architecture, Configuration and Use Slurm Workload Manager Architecture, Configuration and Use Morris Jette jette@schedmd.com Goals Learn the basics of SLURM's architecture, daemons and commands Learn how to use a basic set of commands Learn

More information

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014 Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan

More information

Agenda. Using HPC Wales 2

Agenda. Using HPC Wales 2 Using HPC Wales Agenda Infrastructure : An Overview of our Infrastructure Logging in : Command Line Interface and File Transfer Linux Basics : Commands and Text Editors Using Modules : Managing Software

More information

Introduction to the SGE/OGS batch-queuing system

Introduction to the SGE/OGS batch-queuing system Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic

More information

Hodor and Bran - Job Scheduling and PBS Scripts

Hodor and Bran - Job Scheduling and PBS Scripts Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.

More information

LSN 10 Linux Overview

LSN 10 Linux Overview LSN 10 Linux Overview ECT362 Operating Systems Department of Engineering Technology LSN 10 Linux Overview Linux Contemporary open source implementation of UNIX available for free on the Internet Introduced

More information

CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions

CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions (Version: 07.10.2013) Foto: Thomas Josek/JosekDesign Viktor Achter Dr. Stefan Borowski Lech Nieroda Dr. Lars Packschies Volker

More information

Introduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz)

Introduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz) Center for e-research (eresearch@nesi.org.nz) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using

More information

8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status Best Practices @OLCF (and more) Bill Renaud OLCF User Support General Information This presentation covers some helpful information for users of OLCF Staying informed Aspects of system usage that may differ

More information

How to Run Parallel Jobs Efficiently

How to Run Parallel Jobs Efficiently How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2

More information

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015 Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians

More information

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Advanced Techniques with Newton Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Workshop Goals Gain independence Executing your work Finding Information Fixing Problems Optimizing Effectiveness

More information

Manual for using Super Computing Resources

Manual for using Super Computing Resources Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad

More information

An Introduction to High Performance Computing in the Department

An Introduction to High Performance Computing in the Department An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software

More information

Getting Started with Amazon EC2 Management in Eclipse

Getting Started with Amazon EC2 Management in Eclipse Getting Started with Amazon EC2 Management in Eclipse Table of Contents Introduction... 4 Installation... 4 Prerequisites... 4 Installing the AWS Toolkit for Eclipse... 4 Retrieving your AWS Credentials...

More information

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew

More information

The CNMS Computer Cluster

The CNMS Computer Cluster The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the

More information

Bright Cluster Manager 5.2. User Manual. Revision: 3324. Date: Fri, 30 Nov 2012

Bright Cluster Manager 5.2. User Manual. Revision: 3324. Date: Fri, 30 Nov 2012 Bright Cluster Manager 5.2 User Manual Revision: 3324 Date: Fri, 30 Nov 2012 Table of Contents Table of Contents........................... i 1 Introduction 1 1.1 What Is A Beowulf Cluster?..................

More information

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may

More information

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Streamline Computing Linux Cluster User Training. ( Nottingham University) 1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running

More information

Cluster@WU User s Manual

Cluster@WU User s Manual Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut

More information

Using the Windows Cluster

Using the Windows Cluster Using the Windows Cluster Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster

More information

Compute Cluster Documentation

Compute Cluster Documentation Drucksachenkategorie Compute Cluster Documentation Compiled by Martin Geier (DLR FA-STM) Authors Michael Schäfer (INIT GmbH) Martin Geier (DLR FA-STM) Date 22.05.2015 Table of contents Table of contents...

More information

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:

More information

Job Scheduling with Moab Cluster Suite

Job Scheduling with Moab Cluster Suite Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..

More information

Partek Flow Installation Guide

Partek Flow Installation Guide Partek Flow Installation Guide Partek Flow is a web based application for genomic data analysis and visualization, which can be installed on a desktop computer, compute cluster or cloud. Users can access

More information

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative

More information

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

OLCF Best Practices. Bill Renaud OLCF User Assistance Group OLCF Best Practices Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may differ from

More information

A High Performance Computing Scheduling and Resource Management Primer

A High Performance Computing Scheduling and Resource Management Primer LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was

More information

Job Scheduling Using SLURM

Job Scheduling Using SLURM Job Scheduling Using SLURM Morris Jette jette@schedmd.com Goals Learn the basics of SLURM's architecture, daemons and commands Learn how to use a basic set of commands Learn how to build, configure and

More information

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing

More information

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is

More information

Using Parallel Computing to Run Multiple Jobs

Using Parallel Computing to Run Multiple Jobs Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The

More information

GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems

GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems Riccardo Murri, Sergio Maffioletti Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich

More information

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine) Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing

More information

HPCC USER S GUIDE. Version 1.2 July 2012. IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

HPCC USER S GUIDE. Version 1.2 July 2012. IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35 HPCC USER S GUIDE Version 1.2 July 2012 IITS (Research Support) Singapore Management University IITS, Singapore Management University Page 1 of 35 Revision History Version 1.0 (27 June 2012): - Modified

More information

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014 1. How to Get An Account CACR Accounts 2. How to Access the Machine Connect to the front end, zwicky.cacr.caltech.edu: ssh -l username zwicky.cacr.caltech.edu or ssh username@zwicky.cacr.caltech.edu Edits,

More information

Linux command line. An introduction to the Linux command line for genomics. Susan Fairley

Linux command line. An introduction to the Linux command line for genomics. Susan Fairley Linux command line An introduction to the Linux command line for genomics Susan Fairley Aims Introduce the command line Provide an awareness of basic functionality Illustrate with some examples Provide

More information

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science

More information

SGE Roll: Users Guide. Version @VERSION@ Edition

SGE Roll: Users Guide. Version @VERSION@ Edition SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1

More information

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University

More information

Command Line Crash Course For Unix

Command Line Crash Course For Unix Command Line Crash Course For Unix Controlling Your Computer From The Terminal Zed A. Shaw December 2011 Introduction How To Use This Course You cannot learn to do this from videos alone. You can learn

More information

Setting up PostgreSQL

Setting up PostgreSQL Setting up PostgreSQL 1 Introduction to PostgreSQL PostgreSQL is an object-relational database management system based on POSTGRES, which was developed at the University of California at Berkeley. PostgreSQL

More information

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition 10 STEPS TO YOUR FIRST QNX PROGRAM QUICKSTART GUIDE Second Edition QNX QUICKSTART GUIDE A guide to help you install and configure the QNX Momentics tools and the QNX Neutrino operating system, so you can

More information

Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen

Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen Niels Bohr Institute University of Copenhagen Who am I Theoretical physics (KU, NORDITA, TF, NSC) Computing non-standard superconductivity and superfluidity condensed matter / statistical physics several

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

Informationsaustausch für Nutzer des Aachener HPC Clusters

Informationsaustausch für Nutzer des Aachener HPC Clusters Informationsaustausch für Nutzer des Aachener HPC Clusters Paul Kapinos, Marcus Wagner - 21.05.2015 Informationsaustausch für Nutzer des Aachener HPC Clusters Agenda (The RWTH Compute cluster) Project-based

More information

Running applications on the Cray XC30 4/12/2015

Running applications on the Cray XC30 4/12/2015 Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes

More information

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA

More information

Thirty Useful Unix Commands

Thirty Useful Unix Commands Leaflet U5 Thirty Useful Unix Commands Last revised April 1997 This leaflet contains basic information on thirty of the most frequently used Unix Commands. It is intended for Unix beginners who need a

More information

Introduction to Operating Systems

Introduction to Operating Systems Introduction to Operating Systems It is important that you familiarize yourself with Windows and Linux in preparation for this course. The exercises in this book assume a basic knowledge of both of these

More information

Martinos Center Compute Clusters

Martinos Center Compute Clusters Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress

More information

INF-110. GPFS Installation

INF-110. GPFS Installation INF-110 GPFS Installation Overview Plan the installation Before installing any software, it is important to plan the GPFS installation by choosing the hardware, deciding which kind of disk connectivity

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University Working with HPC and HTC Apps Abhinav Thota Research Technologies Indiana University Outline What are HPC apps? Working with typical HPC apps Compilers - Optimizations and libraries Installation Modules

More information

Using Google Compute Engine

Using Google Compute Engine Using Google Compute Engine Chris Paciorek January 30, 2014 WARNING: This document is now out-of-date (January 2014) as Google has updated various aspects of Google Compute Engine. But it may still be

More information

Slurm at CEA. Site Report CEA 10 AVRIL 2012 PAGE 1. 16 septembre 2014. SLURM User Group - September 2014 F. Belot, F. Diakhaté, A. Roy, M.

Slurm at CEA. Site Report CEA 10 AVRIL 2012 PAGE 1. 16 septembre 2014. SLURM User Group - September 2014 F. Belot, F. Diakhaté, A. Roy, M. Site Report SLURM User Group - September 2014 F. Belot, F. Diakhaté, A. Roy, M. Hautreux 16 septembre 2014 CEA 10 AVRIL 2012 PAGE 1 Agenda Supercomputing projects Developments evaluation methodology Xeon

More information

HRSK-II Nutzerschulung

HRSK-II Nutzerschulung HRSK-II Nutzerschulung 2013-05-30 Jan Wender Senior IT Consultant 1 2 science + computing ag Founded 1989 Offices Tübingen München Düsseldorf Berlin Employees ~300 Turnover 2012 ~30 Mio. Euro Shareholder

More information

Kiko> A personal job scheduler

Kiko> A personal job scheduler Kiko> A personal job scheduler V1.2 Carlos allende prieto october 2009 kiko> is a light-weight tool to manage non-interactive tasks on personal computers. It can improve your system s throughput significantly

More information

Simulation of batch scheduling using real production-ready software tools

Simulation of batch scheduling using real production-ready software tools Simulation of batch scheduling using real production-ready software tools Alejandro Lucero 1 Barcelona SuperComputing Center Abstract. Batch scheduling for high performance cluster installations has two

More information

Using the Yale HPC Clusters

Using the Yale HPC Clusters Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Dec 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support

More information

Unix Sampler. PEOPLE whoami id who

Unix Sampler. PEOPLE whoami id who Unix Sampler PEOPLE whoami id who finger username hostname grep pattern /etc/passwd Learn about yourself. See who is logged on Find out about the person who has an account called username on this host

More information

Do it Yourself System Administration

Do it Yourself System Administration Do it Yourself System Administration Due to a heavy call volume, we are unable to answer your call at this time. Please remain on the line as calls will be answered in the order they were received. We

More information

Visualization Cluster Getting Started

Visualization Cluster Getting Started Visualization Cluster Getting Started Contents 1 Introduction to the Visualization Cluster... 1 2 Visualization Cluster hardware and software... 2 3 Remote visualization session through VNC... 2 4 Starting

More information

Installation Guide for FTMS 1.6.0 and Node Manager 1.6.0

Installation Guide for FTMS 1.6.0 and Node Manager 1.6.0 Installation Guide for FTMS 1.6.0 and Node Manager 1.6.0 Table of Contents Overview... 2 FTMS Server Hardware Requirements... 2 Tested Operating Systems... 2 Node Manager... 2 User Interfaces... 3 License

More information

How to Use NoMachine 4.4

How to Use NoMachine 4.4 How to Use NoMachine 4.4 Using NoMachine What is NoMachine and how can I use it? NoMachine is a software that runs on multiple platforms (ie: Windows, Mac, and Linux). It is an end user client that connects

More information