Introduction to Sun Grid Engine (SGE)



Similar documents
Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Grid Engine Users Guide p1 Edition

Grid 101. Grid 101. Josh Hegie.

SGE Roll: Users Guide. Version Edition

The SUN ONE Grid Engine BATCH SYSTEM

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Cluster Computing With R

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

Quick Tutorial for Portable Batch System (PBS)

Efficient cluster computing

Introduction to the SGE/OGS batch-queuing system

Introduction to Sun Grid Engine 5.3

GRID Computing: CAS Style

Installing and running COMSOL on a Linux cluster

Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Grid Engine Training Introduction

Beyond Windows: Using the Linux Servers and the Grid

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Parallel Debugging with DDT

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Introduction to Grid Engine

User s Manual

Oracle Grid Engine. User Guide Release 6.2 Update 7 E

An Introduction to High Performance Computing in the Department

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Running applications on the Cray XC30 4/12/2015

High Performance Computing

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

User s Guide. Introduction

Hodor and Bran - Job Scheduling and PBS Scripts

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

Manual for using Super Computing Resources

How to Run Parallel Jobs Efficiently

Using Parallel Computing to Run Multiple Jobs

Grid Engine. Application Integration

Grid Engine 6. Troubleshooting. BioTeam Inc.

Configuration of High Performance Computing for Medical Imaging and Processing. SunGridEngine 6.2u5

The RWTH Compute Cluster Environment

Submitting Jobs to the Sun Grid Engine. CiCS Dept The University of Sheffield.

HPC system startup manual (version 1.30)

NEC HPC-Linux-Cluster

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Sun Grid Engine Manual

High-Performance Reservoir Risk Assessment (Jacta Cluster)

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Batch Scripts for RA & Mio

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Job scheduler details

Martinos Center Compute Clusters

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

CycleServer Grid Engine Support Install Guide. version 1.25

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Sun Grid Engine, a new scheduler for EGEE

HPCC USER S GUIDE. Version 1.2 July IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

LSKA 2010 Survey Report Job Scheduler

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Batch Job Analysis to Improve the Success Rate in HPC

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource

Multiprogramming. IT 3123 Hardware and Software Concepts. Program Dispatching. Multiprogramming. Program Dispatching. Program Dispatching

Resource Management and Job Scheduling

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Grid Engine experience in Finis Terrae, large Itanium cluster supercomputer. Pablo Rey Mayo Systems Technician, Galicia Supercomputing Centre (CESGA)

TSM for Windows Installation Instructions: Download the latest TSM Client Using the following link:

The CNMS Computer Cluster

Sun ONE Grid Engine, Enterprise Edition Administration and User s Guide

The Asterope compute cluster

Matlab on a Supercomputer

Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:

Running ANSYS Fluent Under SGE

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Until now: tl;dr: - submit a job to the scheduler

NYUAD HPC Center Running Jobs

Grid Engine 6. Policies. BioTeam Inc.

Backing Up TestTrack Native Project Databases

locuz.com HPC App Portal V2.0 DATASHEET

HPCC - Hrothgar Getting Started User Guide MPI Programming

Getting Started with HPC

Parallel Computing with Mathematica UVACSE Short Course

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

How To Use A Job Management System With Sun Hpc Cluster Tools

Marvell SATA3 RAID Installation Guide

Miami University RedHawk Cluster Connecting to the Cluster Using Windows

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

How To Run A Steady Case On A Creeper

Running COMSOL in parallel

System Area Manager. Remote Management

NorduGrid ARC Tutorial

Transcription:

Introduction to Sun Grid Engine (SGE)

What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems Features : Automatic computing resource selection Resource Accounting Support for parallel computing (mpi) Support for Grid Computing 2

SGE Job Management 3

Job management in SGE 1. Each user submit their job into SGE scheduler. No need to wait for the job to finish. 2. SGE choose node(s) to run the job. 3. Output and error of the job will be placed in output and error file 4

SGE Architecture & Components 5

SGE Components Host type Master Host Control all jobs Run at frontend node Execution Host Host that compute the job(s) Run at compute node Submit Host Where user log-in and submit their job In ROCKS, frontend is also Submit Host Administrative Host Where admin log-in and do administrative task over SGE Also frontend in ROCKS. 6

SGE Components SGE Software Components sge_commd - Communication daemon. Centralizing all communication. Run on all nodes sge_qmaster - Entry point for all command (qsub, qstat, etc ). Run at Master Host (frontend) sge_execd - Execution daemon. Run only on remote computing resource. Run at Execution Host (compute node) SGE Utility (qsub, qdel, qstat, etc ) - Utility command for user job submission and statistics. Install on Submit Host and Administrative Host only. 7

SGE Components Queue A container for a class of jobs allowed to execute on a host concurrently A queue determines jobs types Cpu (itanium.q, xeon.q) Mem (himem.q) Time (short.q, long.q) Licences (Fluent.q) No need to submit job to a particular queue! Only need to specify your job requirements OS, software, mem SGE will dispatch to suitable queue on a low-loaded host ROCKS automatically setup queue for you! 8

Basic SGE Command qsub - Job submission qstat - View job statistics qdel - Delete a job from queue qhost - show current online host qalter - job parameter alteration 9

Basic Job Submission NOTE: Must use ordinary user to submit the job! Example : Create a simple Job Script to submit the job #!/bin/sh date echo Hello world Save it to a file named simplejob Then submit the job using qsub simplejob 10

Basic job submission (con t) The job id will be shown after job submited After job finished, output will be placed in simplejob.o<job id> and error in simplejob.e<job id> 11

Job statistics Now create another job script called simplejob2 with the following content #!/bin/sh date echo sleep 10000 seconds sleep 1000 Submit the job qsub simplejob2 12

Job statistics (con t) Now, let s see the status of our job with qstat state qw means job is waiting in the queue (SGE is allocating a node for the job). Now try qstat again state t means job is starting. r means job is running 13

Job statistics (con t) Important field in job statistics Job ID - Job ID Name - job script name user name - owner of the job state - job state queue - queue name (in ROCKS, it usually a node name) 14

Job deletion Use qstat to see the job id of simplejob2 Now, let s delete the job with qdel <job id> 15

Job deletion (con t) Job output and error (until the job was killed) will be placed in simplejob2.o<job id>. 16

What is Job Script? Job script is a shell script that describe the job The program command Some job parameter (aka. qsub option) May include the command to start parallel job (such as mpirun ) 17

More on job submission Let s see what we can do on job submission Create a directory named myproject then cd to that directory mkdir myproject cd myproject Then, create a program myprog with the following content Compile this program into myprog gcc myprog.c -o myprog 18

More on job submission (con t) Now let s create a job script advancejob Note the./myprog line 19

More on job submission (con t) Now, try submiiting the job with the same command qsub advancejob Now, let s see the output 20

More on job submission (con t) SGE always run the job on user s home directory The output and error file also placed in user s home directory You need to supply -cwd, -o, and -e to fix this problem -cwd - Change to current working directory before doing anything -o, -e - specify output file name (instead of xx.{o,e}<job id>) 21

More on job submission (con t) Now let s submit the job again with the following command qsub -cwd -o./advancejob.out -e./advancejob.err advancejob arg1 arg2 arg3 NOTE: you can pass job script argument as arg1 arg2 arg3 in this example 22

More job options qsub-n theadvancejob -a 03121500 -cwd - S /bin/sh -o advance.out -j y advancejob arg1 arg2 arg3 -N - specify job name -a - specify job start date ([YY]MMDDHHMM[.ss]) -S - specify the shell interpreter for the job script -j y - merge standard error to output file (advance.out) in this case Try to submit the job and see the result! 23

Placing job option in the script You can specify the job option in job script, by prefix the line with #$ 24

Altering the job You can alter the job parameter after it was queued Only some part of parameter can be altered after the job was launched! Using qalter command to altering job, using the same argument and option as qsub 25

Altering the job parameter Please consult the man page (man qalter) for the list of option that could be altered after the job launched (in t or r state 26

Job suspension You can suspend the job state at any time Suspend queued job stop that job from being launched When to suspend job? You need to run another more important job, but the old job consume all resource Admin. wants to suspend some job because it consume too much resource on the system 27

Job suspension (con t) Using qhold command qhold <job id> Using qlrs command to release a hold job qrls <job id> 28

The qhost command You can use qhost command to see the online node in SGE qhost Try supplying -j option and see what s happened (try it after submit some job) 29

qmon : SGE in Graphics Mode Previous section we introduce using SGE via command line We can comfortably utilize SGE via Graphical User Interface (GUI) by qmon Among the facilities provided by the qmon are submitting jobs, managing jobs, managing hosts, and managing job queues 30

Running qmon X-Windows is required by qmon for providing GUI Start X-Windows by startx Start the qmon by qmon 31

Submitting a Job via QMON Click, the submit job window will show 32

Job Control via QMON Click for viewing job status and controlling jobs 33

Queue Control Only one compute node usually consists of one queue but you can add more queues or remove existing queues Slot management Slot is the capacity of a queue that can handle concurrent jobs May provide Number of slot of a queue = Number of processor of the compute node 34

Queue Control via SGE Click for control queues 35

Queue Control via SGE (Cont ) This icon present a queue named compute0 prepared for a host named comp-pvfs-0-0 This queue consists of only one slot You can modify properties of this queue by highlight its icon and click the Modify button * Normal user cannot control queues 36

Queue Control via SGE (Cont ) Modify the properties of a queue Try to modify the number of slot 37

Lab 1: Batch scheduler Write a small program that calculate the multiplication table. Save the file in multab.c Program takes one argument which is the number used to generate the multiplication table Multab 2 - generate multiplication table for number 2 Print the multiplication table to standard output Using SGE to submit the job. Calculate the multiplication table of 2 to 12 38

The End