Introduction to Sun Grid Engine (SGE)
What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems Features : Automatic computing resource selection Resource Accounting Support for parallel computing (mpi) Support for Grid Computing 2
SGE Job Management 3
Job management in SGE 1. Each user submit their job into SGE scheduler. No need to wait for the job to finish. 2. SGE choose node(s) to run the job. 3. Output and error of the job will be placed in output and error file 4
SGE Architecture & Components 5
SGE Components Host type Master Host Control all jobs Run at frontend node Execution Host Host that compute the job(s) Run at compute node Submit Host Where user log-in and submit their job In ROCKS, frontend is also Submit Host Administrative Host Where admin log-in and do administrative task over SGE Also frontend in ROCKS. 6
SGE Components SGE Software Components sge_commd - Communication daemon. Centralizing all communication. Run on all nodes sge_qmaster - Entry point for all command (qsub, qstat, etc ). Run at Master Host (frontend) sge_execd - Execution daemon. Run only on remote computing resource. Run at Execution Host (compute node) SGE Utility (qsub, qdel, qstat, etc ) - Utility command for user job submission and statistics. Install on Submit Host and Administrative Host only. 7
SGE Components Queue A container for a class of jobs allowed to execute on a host concurrently A queue determines jobs types Cpu (itanium.q, xeon.q) Mem (himem.q) Time (short.q, long.q) Licences (Fluent.q) No need to submit job to a particular queue! Only need to specify your job requirements OS, software, mem SGE will dispatch to suitable queue on a low-loaded host ROCKS automatically setup queue for you! 8
Basic SGE Command qsub - Job submission qstat - View job statistics qdel - Delete a job from queue qhost - show current online host qalter - job parameter alteration 9
Basic Job Submission NOTE: Must use ordinary user to submit the job! Example : Create a simple Job Script to submit the job #!/bin/sh date echo Hello world Save it to a file named simplejob Then submit the job using qsub simplejob 10
Basic job submission (con t) The job id will be shown after job submited After job finished, output will be placed in simplejob.o<job id> and error in simplejob.e<job id> 11
Job statistics Now create another job script called simplejob2 with the following content #!/bin/sh date echo sleep 10000 seconds sleep 1000 Submit the job qsub simplejob2 12
Job statistics (con t) Now, let s see the status of our job with qstat state qw means job is waiting in the queue (SGE is allocating a node for the job). Now try qstat again state t means job is starting. r means job is running 13
Job statistics (con t) Important field in job statistics Job ID - Job ID Name - job script name user name - owner of the job state - job state queue - queue name (in ROCKS, it usually a node name) 14
Job deletion Use qstat to see the job id of simplejob2 Now, let s delete the job with qdel <job id> 15
Job deletion (con t) Job output and error (until the job was killed) will be placed in simplejob2.o<job id>. 16
What is Job Script? Job script is a shell script that describe the job The program command Some job parameter (aka. qsub option) May include the command to start parallel job (such as mpirun ) 17
More on job submission Let s see what we can do on job submission Create a directory named myproject then cd to that directory mkdir myproject cd myproject Then, create a program myprog with the following content Compile this program into myprog gcc myprog.c -o myprog 18
More on job submission (con t) Now let s create a job script advancejob Note the./myprog line 19
More on job submission (con t) Now, try submiiting the job with the same command qsub advancejob Now, let s see the output 20
More on job submission (con t) SGE always run the job on user s home directory The output and error file also placed in user s home directory You need to supply -cwd, -o, and -e to fix this problem -cwd - Change to current working directory before doing anything -o, -e - specify output file name (instead of xx.{o,e}<job id>) 21
More on job submission (con t) Now let s submit the job again with the following command qsub -cwd -o./advancejob.out -e./advancejob.err advancejob arg1 arg2 arg3 NOTE: you can pass job script argument as arg1 arg2 arg3 in this example 22
More job options qsub-n theadvancejob -a 03121500 -cwd - S /bin/sh -o advance.out -j y advancejob arg1 arg2 arg3 -N - specify job name -a - specify job start date ([YY]MMDDHHMM[.ss]) -S - specify the shell interpreter for the job script -j y - merge standard error to output file (advance.out) in this case Try to submit the job and see the result! 23
Placing job option in the script You can specify the job option in job script, by prefix the line with #$ 24
Altering the job You can alter the job parameter after it was queued Only some part of parameter can be altered after the job was launched! Using qalter command to altering job, using the same argument and option as qsub 25
Altering the job parameter Please consult the man page (man qalter) for the list of option that could be altered after the job launched (in t or r state 26
Job suspension You can suspend the job state at any time Suspend queued job stop that job from being launched When to suspend job? You need to run another more important job, but the old job consume all resource Admin. wants to suspend some job because it consume too much resource on the system 27
Job suspension (con t) Using qhold command qhold <job id> Using qlrs command to release a hold job qrls <job id> 28
The qhost command You can use qhost command to see the online node in SGE qhost Try supplying -j option and see what s happened (try it after submit some job) 29
qmon : SGE in Graphics Mode Previous section we introduce using SGE via command line We can comfortably utilize SGE via Graphical User Interface (GUI) by qmon Among the facilities provided by the qmon are submitting jobs, managing jobs, managing hosts, and managing job queues 30
Running qmon X-Windows is required by qmon for providing GUI Start X-Windows by startx Start the qmon by qmon 31
Submitting a Job via QMON Click, the submit job window will show 32
Job Control via QMON Click for viewing job status and controlling jobs 33
Queue Control Only one compute node usually consists of one queue but you can add more queues or remove existing queues Slot management Slot is the capacity of a queue that can handle concurrent jobs May provide Number of slot of a queue = Number of processor of the compute node 34
Queue Control via SGE Click for control queues 35
Queue Control via SGE (Cont ) This icon present a queue named compute0 prepared for a host named comp-pvfs-0-0 This queue consists of only one slot You can modify properties of this queue by highlight its icon and click the Modify button * Normal user cannot control queues 36
Queue Control via SGE (Cont ) Modify the properties of a queue Try to modify the number of slot 37
Lab 1: Batch scheduler Write a small program that calculate the multiplication table. Save the file in multab.c Program takes one argument which is the number used to generate the multiplication table Multab 2 - generate multiplication table for number 2 Print the multiplication table to standard output Using SGE to submit the job. Calculate the multiplication table of 2 to 12 38
The End