Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Similar documents
Batch Scripts for RA & Mio

Job Scheduling with Moab Cluster Suite

Quick Tutorial for Portable Batch System (PBS)

Resource Management and Job Scheduling

Job scheduler details

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Hodor and Bran - Job Scheduling and PBS Scripts

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

Introduction to Sun Grid Engine (SGE)

TORQUE Administrator s Guide. version 2.3

NYUAD HPC Center Running Jobs

RA MPI Compilers Debuggers Profiling. March 25, 2009

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Running applications on the Cray XC30 4/12/2015

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Guillimin HPC Users Meeting. Bryan Caron

Grid 101. Grid 101. Josh Hegie.

Getting Started with HPC

How To Run A Cluster On A Linux Server On A Pcode (Amd64) On A Microsoft Powerbook On A Macbook 2 (Amd32)

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

The Moab Scheduler. Dan Mazur, McGill HPC Aug 23, 2013

Using Parallel Computing to Run Multiple Jobs

Introduction to the SGE/OGS batch-queuing system

NEC HPC-Linux-Cluster

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Using the Yale HPC Clusters

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Grid Engine 6. Troubleshooting. BioTeam Inc.

Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems...

Submitting and Running Jobs on the Cray XT5

The RWTH Compute Cluster Environment

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

High-Performance Reservoir Risk Assessment (Jacta Cluster)

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource

Introduction to SDSC systems and data analytics software packages "

HPC system startup manual (version 1.30)

Installing and running COMSOL on a Linux cluster

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

PBS Professional 12.1

Parallel Debugging with DDT

Martinos Center Compute Clusters

How To Run A Steady Case On A Creeper

The SUN ONE Grid Engine BATCH SYSTEM

TORQUE Resource Manager

User s Manual

Beyond Windows: Using the Linux Servers and the Grid

PBS Professional User s Guide. PBS Works is a division of

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

An Introduction to High Performance Computing in the Department

Grid Engine Users Guide p1 Edition

A High Performance Computing Scheduling and Resource Management Primer

Grid Engine Training Introduction

HOD Scheduler. Table of contents

Altair. PBS Pro. User Guide 5.4. for UNIX, Linux, and Windows

Hack the Gibson. John Fitzpatrick Luke Jennings. Exploiting Supercomputers. 44Con Edition September Public EXTERNAL

The CNMS Computer Cluster

Streamline Computing Linux Cluster User Training. ( Nottingham University)

The Asterope compute cluster

MFCF Grad Session 2015

Cluster Computing With R

Efficient cluster computing

SGE Roll: Users Guide. Version Edition

Maui Administrator's Guide

Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster

Batch Job Management with Torque/OpenPBS

Introduction to Sun Grid Engine 5.3

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

LANL Computing Environment for PSAAP Partners

Submitting batch jobs Slurm on ecgate. Xavi Abellan User Support Section

Manual for using Super Computing Resources

Technical Support. Copyright notice does not imply publication. For more information, contact Altair at: Web:

PBS Professional 11.1

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Rocoto. HWRF Python Scripts Training Miami, FL November 19, 2015

Introduction to Grid Engine

Using the Yale HPC Clusters

Using NeSI HPC Resources. NeSI Computational Science Team

ITG Software Engineering

Matlab on a Supercomputer

High Performance Computing

High Performance Computing at the Oak Ridge Leadership Computing Facility

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Sun Grid Engine, a new scheduler for EGEE

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

Introduction to HPC Workshop. Center for e-research

GRID Computing: CAS Style

Transcription:

Ra - Batch Scripts Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Jobs on Ra are Run via a Batch System Ra is a shared resource Purpose: Give fair access to all users Have control over where jobs are run Set limits on times and processor counts Allow exclusive access to nodes while you own them

Batch Systems Create a script that describes: What you want to do Resources Time required What resources (# nodes) you want How long you want the resources Submit you script to the queuing system Wait for some time for your job to start Your job runs and you get your output

Moab -Top level Batch System Product of Cluster Resources http://www.clusterresources.com/pages/ products/moab-cluster-suite/workloadmanager.php Marketing pitch from the web page: Moab Workload Manager is a policy-based job scheduler and event engine that enables utility-based computing for clusters. Moab Workload Manager combines intelligent scheduling of resources with advanced reservations to process jobs on the right resources at the right time. It also provides flexible policy and event engines that process workloads faster and in line with set business requirements and priorities.

Moab Set up queues policy Hands actual job submission to Torque Torque - a.k.a. pbs Torque is a descendent of the open source version of PBS Moab has the commands of pbs but adds some. The most important is msub

A Simple Script for a MPI job!/bin/bash #PBS -l nodes=1:ppn=8 #PBS -l walltime=02:00:00 #PBS -N testio #PBS -o stdout #PBS -e stderr #PBS -V #PBS -m abe #PBS -M tkaiser@mines.edu #---------------------- cd $PBS_O_WORKDIR mpiexec -n 8 c_ex00 Scripts contain normal shell commands and comments designated with a # that are interpreted by PBS

Running and monitoring jobs [tkaiser@ra ~/mydir]$ msub mpmd Use msub to submit the job qstat to see its status (Q,R,C) qdel to kill the job if needed 8452 [tkaiser@ra ~/mydir]$ qstat 8452 Job id Name User Time Use S Queue ------------------ -------------- ------------- -------- - ----- 8452.ra testio tkaiser 0 R SHORT [tkaiser@ra ~/mydir]$ [tkaiser@ra ~/mydir]$ [tkaiser@ra ~/mydir]$ qstat -u tkaiser Job id Name User Time Use S Queue ------------------ -------------- ------------- -------- - ----- 8452.ra testio tkaiser 0 R SHORT [tkaiser@ra ~/mydir]$ [tkaiser@ra ~/mydir]$ [tkaiser@ra ~/mydir]$ qdel 8452

A Simple Script for a MPI job Command!/bin/bash #PBS -l nodes=1:ppn=8 #PBS -l walltime=02:00:00 #PBS -N testio #PBS -o stdout #PBS -e stderr #PBS -V #PBS -m abe #PBS -M tkaiser@mines.edu Comment We will run this job using the "bash" shell We want 1 node with 8 processors for a total of 8 processors We will run for up to 2 hours The name of our job is testio Standard output from our program will go to a file stdout Error output from our program will go to a file stderr Very important! Exports all environment variables from the submitting shell into the batch shell. Send mail on abort, begin, end use n to suppress Where to send email to (cell phone?) #---------------------- Not important. Just a separator line. cd $PBS_O_WORKDIR mpiexec -n 8 c_ex00 Very important! Go to directory $PBS_O_WORKDIR which is the directory which is where our script resides Run the MPI program c_ex00 on 8 computing cores.

#PBS -l More Script notes Used to specify resource requests Can have multiple per -l lines per script #PBS -l nodes=2:ppn=8:fat Ra has 16 Gbyte and 32 Gbyte nodes Specify the 32 Gbyte nodes #PBS -l naccesspolicy=singlejob Grant exclusive access to a node Not normally needed

PBS Variables Variable PBS_JOBID PBS_O_WORKDIR PBS_NNODES PBS_NODEFILE PBS_O_HOME PBS_JOBNAME PBS_QUEUE PBS_MOMPORT PBS_O_HOST PBS_O_LANG PBS_O_LOGNAME PBS_O_PATH PBS_O_SHELL PBS_JOBCOOKIE PBS_TASKNUM PBS_NODENUM Meaning unique PBS job ID jobs submission directory number of nodes requested file with list of allocated nodes home dir of submitting user user specified job name job queue active port for mom daemon host of currently running job language variable for job name of submitting user path to executables used in job script script shell job cookie number of tasks requested (see pbsdsh) node offset number (see pbsdsh)

Where s my Output? Standard Error and Standard Out don t show up until after your job is completed Output is temporarily stored in /tmp on your nodes You can ssh to your nodes and see your output Looking into changing this behavior, or at least make it optional You can pipe output to a file: mpiexec myjob > myjob.$pbs_jobid

Useful Moab Commands Command canceljob checkjob mdiag mjobctl mrsvctl mshow msub releasehold releaseres sethold showq showres showstart showstate cancel job Description provide detailed status report for specified job provide diagnostic reports for resources, workload, and scheduling control and modify job create, control and modify reservations displays various diagnostic messages about the system and job queues submit a job (Donʼt use qsub) release job defers and holds release reservations set job holds show queued jobs show existing reservations show estimates of when job can/will start show current state of resources http://www.clusterresources.com/products/mwm/docs/a.gcommandoverview.shtml

Useful Torque commands Command tracejob pbsnodes qalter qdel qhold qrls qrun qsig qstat pbsdsh qsub Description trace job actions and states recorded in TORQUE logs view/modify batch status of compute nodes modify queued batch jobs delete/cancel batch jobs hold batch jobs release batch job holds start a batch job send a signal to a batch job view queues and jobs launch tasks within a parallel job submit a job (Use msub instead) http://www.clusterresources.com/torquedocs21/a.acommands.shtml

Running Interactive

Interactive Runs It is possible to grab a collection of nodes and run parallel interactively Nodes are yours until you quit Might be useful during development Running a number of small parallel jobs Required (almost) for running parallel debuggers

Multistep Process Ask for the nodes using msub msub -q INTERACTIVE -I -V -l nodes=1 Go to your working directory Run your job(s) Logout to release your nodes You are automatically Logged on to the nodes you are given Ignore: cp: cannot stat `/opt/torque/mom_priv/jobs/73156.ra.local.sc': No such file or directory Don t abuse it, lest you be shot.

Example [tkaiser@ra ~/guide]$msub -q INTERACTIVE -I -V -l nodes=1 -l naccesspolicy=singlejob msub: waiting for job 1280.ra.mines.edu to start msub: job 1280.ra.mines.edu ready [tkaiser@compute-9-8 ~]$cd guide [tkaiser@compute-9-8 ~/guide]$mpiexec -n 4 c_ex00 Hello from 0 of 4 on compute-9-8.local Hello from 1 of 4 on compute-9-8.local Hello from 2 of 4 on compute-9-8.local Hello from 3 of 4 on compute-9-8.local [tkaiser@compute-9-8 ~/guide]$ [tkaiser@compute-9-8 ~/guide]$mpiexec -n 16 c_ex00 Hello from 1 of 16 on compute-9-8.local Hello from 0 of 16 on compute-9-8.local Hello from 3 of 16 on compute-9-8.local Hello from 4 of 16 on compute-9-8.local...... Hello from 15 of 16 on compute-9-8.local Hello from 2 of 16 on compute-9-8.local Hello from 6 of 16 on compute-9-8.local [tkaiser@compute-9-8 ~/guide]$ [tkaiser@compute-9-8 ~/guide]$exit logout Ensures exclusive access msub: job 1280.ra.mines.edu completed [tkaiser@ra ~/guide]$