Introduction to the SGE/OGS batch-queuing system



Similar documents
Grid Engine Users Guide p1 Edition

Introduction to Sun Grid Engine (SGE)

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

SGE Roll: Users Guide. Version Edition

Efficient cluster computing

Grid 101. Grid 101. Josh Hegie.

Grid Engine 6. Troubleshooting. BioTeam Inc.

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Quick Tutorial for Portable Batch System (PBS)

Streamline Computing Linux Cluster User Training. ( Nottingham University)

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

Cluster Computing With R

GRID Computing: CAS Style

Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster

The RWTH Compute Cluster Environment

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Introduction to Grid Engine

Batch Job Analysis to Improve the Success Rate in HPC

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

Miami University RedHawk Cluster Working with batch jobs on the Cluster

HPCC USER S GUIDE. Version 1.2 July IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

An Introduction to High Performance Computing in the Department

Hodor and Bran - Job Scheduling and PBS Scripts

Beyond Windows: Using the Linux Servers and the Grid

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Martinos Center Compute Clusters

SLURM Workload Manager

Using the Yale HPC Clusters

Grid Engine Training Introduction

Using Parallel Computing to Run Multiple Jobs

Submitting batch jobs Slurm on ecgate. Xavi Abellan User Support Section

Submitting Jobs to the Sun Grid Engine. CiCS Dept The University of Sheffield.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

User s Manual

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

NYUAD HPC Center Running Jobs

CycleServer Grid Engine Support Install Guide. version 1.25

Parallel Debugging with DDT

User s Guide. Introduction

HPC system startup manual (version 1.30)

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Running applications on the Cray XC30 4/12/2015

Configuration of High Performance Computing for Medical Imaging and Processing. SunGridEngine 6.2u5

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Introduction to Programming and Computing for Scientists

Getting Started with HPC

The SUN ONE Grid Engine BATCH SYSTEM

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

Biowulf2 Training Session

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Manual for using Super Computing Resources

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

Introduction to SDSC systems and data analytics software packages "

Batch Scripts for RA & Mio

Job Scheduling with Moab Cluster Suite

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

High Performance Computing

High-Performance Reservoir Risk Assessment (Jacta Cluster)

High Performance Compute Cluster

AstroCompute. AWS101 - using the cloud for Science. Brendan Bouffler ( boof ) Scientific Computing AWS. ska-astrocompute@amazon.

Grid Engine experience in Finis Terrae, large Itanium cluster supercomputer. Pablo Rey Mayo Systems Technician, Galicia Supercomputing Centre (CESGA)

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Linux command line. An introduction to the Linux command line for genomics. Susan Fairley

UMass High Performance Computing Center

Setting up PostgreSQL

Using NeSI HPC Resources. NeSI Computational Science Team

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

How to Run Parallel Jobs Efficiently

Until now: tl;dr: - submit a job to the scheduler

Introduction to Sun Grid Engine 5.3

HOD Scheduler. Table of contents

MapReduce Evaluator: User Guide

System Resources. To keep your system in optimum shape, you need to be CHAPTER 16. System-Monitoring Tools IN THIS CHAPTER. Console-Based Monitoring

NEC HPC-Linux-Cluster

Grid Engine 6. Monitoring, Accounting & Reporting. BioTeam Inc. info@bioteam.net

The CNMS Computer Cluster

Open Source Grid Computing Java Roundup

Vital-IT Users Training: HPC in Life Sciences

Grid Engine. Application Integration

Introduction to HPC Workshop. Center for e-research

Installing and running COMSOL on a Linux cluster

HDFS Installation and Shell

Site Configuration SETUP GUIDE. Windows Hosts Single Workstation Installation. May08. May 08

Using the Yale HPC Clusters

Introduction to Operating Systems

HPCC - Hrothgar Getting Started User Guide MPI Programming

Transcription:

Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011

The basic problem Process a large set of data. Assumptions: 1. Cannot be done on a single computer for space or time constraints. 2. The data can be subdivided into files, each of which can be processed independently. 3. (Processing each file can comprise several steps.) 4. (Accessing the files over a network has acceptable overhead.)

Today s lab session Two approaches: Local execution of programs (e.g., on your laptop) Batched execution of programs (on a cluster) The goal of these initial lab sessions is to show what the difference is, in practice, and what tools are available in each case. These slides are available for download from: http://www.gc3.uzh.ch/teaching/lsci2011/lab02/lab02.pdf

Login to the cluster ocikbpra.uzh.ch Log in to the cluster: ssh username@ocikbpra.uzh.ch You should be greeted by this shell prompt: [username@ocikbpra ]$ Gather the sample application and test files into a directory lab2: mkdir lab2 cp -av murri/lsci/rank-int.i386 lab2/ cp -av murri/lsci/m0,6*.sms lab2/ cd lab2

The cluster ocikbpra.uzh.ch ssh username@ocikbpra.uzh.ch 00 11 00 11 00 11 internet ocikbpra.uzh.ch /home filesystem /share/apps filesystem (exported over the net) local 1Gb/s ethernet network compute 0 0.local compute 0 1.local compute 0 27.local /state/partition1 (local scratch filesystem) filesystem

Recap from Lab Session 1 Process control features offered by the GNU/Linux shell: background processes with the & operator monitor process status with the ps command send signals to running processes with the kill command Lab Session 1 slides are available for download from: http://www.gc3.uzh.ch/teaching/lsci2011/lab01/lab01.pdf

Timing command execution, I The command /usr/bin/time reports about the time spent by the system executing a command. Typical reports include: user time: CPU time spent processing user-level code. system time: CPU time spent processing kernel-level code. real/elapsed time: time from the start to the end of the program (as would have been measured by an external clock). Quiz: can the CPU time be greater than the real/elapsed time?

Timing command execution, II Exercises: 1. Using man time, figure out how to determine the CPU and real time spent running the command rank-int.i386 M0,6-D5.sms. 2. Can time also report on the memory? If yes, how much memory does the above command take?

Timing command execution, III $ /usr/bin/time./rank-int.i386 M0,6-D5.sms./rank-int.i386 file:m0,6-d5.sms rows:3024 cols:49800... 0.10user 0.04system 0:00.18elapsed 80%CPU (0avgtext+0avgdata 0inputs+0outputs (0major+1971minor)pagefaults 0swaps

Timing command execution, III Command-line to run $ /usr/bin/time./rank-int.i386 M0,6-D5.sms./rank-int.i386 file:m0,6-d5.sms rows:3024 cols:49800... 0.10user 0.04system 0:00.18elapsed 80%CPU (0avgtext+0avgdata 0inputs+0outputs (0major+1971minor)pagefaults 0swaps

Timing command execution, III $ /usr/bin/time./rank-int.i386 M0,6-D5.sms Command output./rank-int.i386 file:m0,6-d5.sms rows:3024 cols:49800 nonze 0.10user 0.04system 0:00.18elapsed 80%CPU (0avgtext+0avgdata 0inputs+0outputs (0major+1971minor)pagefaults 0swaps

Timing command execution, III $ /usr/bin/time./rank-int.i386 M0,6-D5.sms./rank-int.i386 file:m0,6-d5.sms rows:3024 cols:49800... Timing information 0.10user 0.04system 0:00.18elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1971minor)pagefaults 0swaps

Timing command execution, III $ /usr/bin/time./rank-int.i386 M0,6-D5.sms./rank-int.i386 file:m0,6-d5.sms rows:3024 cols:49800... Memory information 0.10user 0.04system 0:00.18elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1971minor)pagefaults 0swaps

Timing command execution, III $ /usr/bin/time./rank-int.i386 M0,6-D5.sms./rank-int.i386 file:m0,6-d5.sms rows:3024 cols:49800... 0.10user 0.04system 0:00.18elapsed 80%CPU (0avgtext+0avgdata I/O and paging info 0inputs+0outputs (0major+1971minor)pagefaults 0swaps

Resource limits, I Why impose limits on the utilization of system resources? What system resources would you want to limit in our case?

Resource limits, II The command ulimit allows setting resource usage limits: $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited [...] file size [...] (blocks, -f) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 [...] stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 32767 virtual memory (kbytes, -v) unlimited [...]

Resource limits, III Warning: The ulimit command is a shell built-in. It takes immediate effect on all the following commands. To restrict the scope to one command only, enclose it and ulimit in parentheses: $ (ulimit -t 15;./rank-int.i386 M0,6-D8.sms) (Parentheses force the enclosed commands to be executed in a sub-shell.)

Resource limits, IV Exercises: 1. What does the following command do? $ (ulimit -t 15;./rank-int.i386 M0,6-D8.sms) What happens if you leave out the ulimit part? 2. What are the options given by ulimit for limiting memory? 3. What should happen if you run the following command? What really happens? $ (ulimit -m 102400;./rank-int.i386 M0,6-D11.sms) 4. What should happen if you run the following command? What really happens? $ (ulimit -v 102400;./rank-int.i386 M0,6-D11.sms)

SGE/OGS Sun Grid Engine (SGE) is a batch-queuing system produced by Sun Microcomputers; made open-source in 2001. After acquisition by Oracle, the product forked: Open Grid Scheduler (OGS), the open-source version Univa Grid Engine is a commercial-only version, developed by the core SGE engineer team from Sun. Used on UZH main HPC cluster Schroedinger.

SGE architecture, I sge qmaster Runs on master node ocikbpra.uzh.ch Accepts client requests (job submission, job/host state inspection) Schedules jobs on compute nodes (formerly separate sge schedd process) Client programs qhost, qsub, qstat Run by user on submit node Clients for sge qmaster Master daemon has a list of authorized submit nodes

SGE architecture, II sge execd Runs on every compute node Accepts job start requests from sge qmaster Monitors node status (load average, free memory, etc.) and reports back to sge qmaster sge shepherd Spawned by sge execd when starting a job Monitors the execution of a single job

Job submission, I The qsub command is used to submit a job to the batch system. The job consists of a shell script and its (optional) arguments. Example: qsub myscript.sh If any arguments are given after the script name, they will be available to the script as $1, $2, etc. # in myscript.sh, $1="hello" and $2="world" qsub myscript.sh hello world

Job submission, II Upon successful submission, qsub prints a job ID to standard output: $ qsub -cwd myscript.sh Your job 76104 ("myscript.sh") has been submitted This job ID must be used with all SGE commands that operate on jobs. As soon as the job starts, two files will be created, containing the script s standard output (.ojobid) and standard error (.ejobid). $ ls -l myscript.sh* -rwxrwxr-x 1 murri murri 30 Oct 6 14:23 myscript.sh -rw-r--r-- 1 murri murri 0 Oct 6 14:24 myscript.sh.e76104 -rw-r--r-- 1 murri murri 14 Oct 6 14:24 myscript.sh.o76104

Commonly used options for qsub -cwd Execute job in current directory; if not given, the job script is run in the home directory. -o Path name of the file where standard output will be stored. -e Path name of the file where standard error will be stored. -j If -j y is given, then merge standard error into standard output (as they were both sent to the screen).

Monitoring jobs The qstat command is used to monitor jobs submitted to the SGE system. Example: $ qstat job-id prior name user state submit/start at queue 73344 0.60500 mod_run danielyli dt 10/06/2011 14:38:45 all.q@compute-0-13.local 76105 0.50500 myscript.s murri r 10/06/2011 14:40:35 all.q@compute-0-20.local The state column is a combination of the following codes: (see man qstat for a complete list) r Job is running qw Job is waiting in the queue qh Job is being held back in queue E An Error has occurred d Job has been deleted by user t Job is being transferred to compute node

Job submission, III Exercises: 1. Write a script rank1.sh to run the command./rank-int.i386 M0,6-D5.sms, then run it. Does this job appear in qstat output? Compare the output with what you would get when running locally: is there any significant change? 2. Write a script rank2.sh to run the command./rank-int.i386 M0,6-D11.sms, then run it. Does this job appear in qstat output? When do the standard output and standard error files appear? What s their initial content? 3. How can you determine the amount of resources (CPU time, wall-clock time, etc.) used by a job?

Job resource utilization, I The qstat -j command reports information on a job, while it is running Example: $ qstat -j 76106 ============================================================== job_number: 76106 exec_file: job_scripts/76106 submission_time: Thu Oct 6 14:51:45 2011 owner: murri [...] cwd: /home/murri/lsci [...] script_file: myscript.sh usage 1: cpu=00:01:30, mem=8.64453 GBs, io=0.02295, vmem=103.637m, maxvme scheduling info: queue instance "all.q@compute-0-3.local" dropped because it is t [...] The usage line contains current resource utilization.

Job resource utilization, II The qacct command reports all information on a job, but only after it has completed. Example: $ qacct -j 76106 ============================================================== qname all.q hostname compute-0-27.local group murri [...] jobname myscript.sh jobnumber 76106 taskid undefined [...] qsub_time Thu Oct 6 14:51:45 2011 start_time Thu Oct 6 14:51:50 2011 end_time Thu Oct 6 14:54:29 2011 [...] exit_status 0 ru_wallclock 159 ru_utime 158.421 ru_stime 0.456 [...] cpu 158.877 mem 15.183 [...] maxvmem 103.637M [...]

Resource utilization, I The -l option to qsub allows specifying what resources will be needed by a job. The most common resource requirements are: s rt Total job runtime (wall-clock time), in seconds s cpu Total job CPU time, in seconds mem free Request at least this much free RAM; use m or g suffix for MB or GB s mem Upper limit to RAM usage; use m or g suffix s vmem Upper limit to virtual memory usage; use m or g suffix Example: # run job with a time limit of 20 seconds $ qsub -l s_rt=20 myscript.sh

Resource utilization, II Exercises: 1. Is the following job limited to 20 seconds runtime? $ qsub -l s_rt=20 rank2.sh What do you find the in the job s stdout and stderr file? Compare with what happens in the ulimit case. What happens if you replace s rt by s cpu? 2. Run the same job, putting a 10MB limit on mem free, then s rss, s mem, and finally s vmem. Compare the actual resource utilization (via qacct) with the requirement. In what cases does the job terminate correctly? What s the resource utilization in this cases? 3. Compile a table with runtime, CPU time, and memory utilization for each of the matrices M*.sms. Is there a correlation with the matrix file size?

References [1] setrlimit(2) manual page, http://manpages.ubuntu.com/manpages/oneiric/ en/man2/getrlimit.2.html