The SUN ONE Grid Engine BATCH SYSTEM Juan Luis Chaves Sanabria Centro Nacional de Cálculo Científico (CeCalCULA) Latin American School in HPC on Linux Cluster October 27 November 07 2003 What is SGE? Is a cluster resource management software Acceptsjobssubmittedby usersand schedules them for execution on the cluster based upon resource management policies (who gets how much resources when) Jobs are distributed in a way that optimizes uniform workload across the cluster
Who develop SGE? SGE is developed by Sun Microsystems http://www.sun.com/gridware http://gridengine.sunsource.net SUN adquired Gridware. Developer of Distributed Resource Management (DRM) software (July 2000) SUN release SGE as a free downloable binary for solaris and linux OS to facilitate deployment of compute farms. Source code is available. Open source project to enable the Grid Computing Model. SGE 5.3 supported platforms Compaq Tru64 Unix 5.0, 5.1 Hewlett Packard HP-UX 10.20, 11.00 IBM AIX 4.3.X Linux x86, kernel 2.4, glibc 2.2 Linux Alpha/AXP, kernel 2.2, glibc 2.2 SGI IRIX 6.2 6.5 SUN Solaris (sparc) 2.6, 7, 8, 9 32-bit SUN Solaris (sparc) 2.6, 7, 8, 9 64-bit Sun Solaris (x86) 8
How the System Operates? SGE accepts jobs requests for computer resources (requeriment profile by each job) Jobs requests are located in a holding area until they can be executed When are ready to be executed, the request is forwarded to the adequate execution(s) device(s) SGE manage the execution of the request Logs the record of their execution when it s finalized SGE Components Hosts: Master (sge_qmaster y sge_schedd): control all the SGE components and the overall cluster activity Execution (sge_execd): authorized to execute jobs through SGE Administration: designated to carry out any kind of administrative task for the SGE system Submit: for submitting (qsub) and controlling (qstat, qdel, qhold, qrls,... ) batch jobs
SGE Components (2) Queues: A queue is a container for a class of jobs (Batch/Parallel/Interactive/Checkpoint) allowed to execute on a particular host concurrently Commands applied to a queue affect all jobs associated with this. SGE Components (3) Queues (2): Properties: name: queue s name hostname: machine host of the queue processors: in a multiprocessor system are the processors to which queue has access qtype: type of jobs permited to run in this queue (Interactive, Batch, Parallel, Checkpointing) slots: the numbers of jobs that can run concurrently in that queue
SGE Components (4) Queues (3): Properties (2): owner_lists: queue s owners user_lists: users o grups ids of those who may access the queue xuser_lists: userso grupsidsofthosewhomay not access the queue complex_list: indicate the complexs associated with the queue complex values: assigns capacities as provided for this queue for certain complex attributes SGE Components (5) Complex: Set of features (resources) associated with a queue, a hosts, or the entire cluster that are known to SGE. Cell: Each loosely separated SGE cluster, with a different configuration and master machine. The SGE_CELL environment variable permit discriminate among clusters
SGE funcionality Is controlled by four daemons: sge_qmaster: control all the cluster s management and scheduling activities Receive scheduling decisions from sge_schedd Requets actions from sge_execd on the execution hosts Mantain tables about cluster status sge_shadowd: daemon used if exist a host backup (shadow master host) for the functionality of sge_qmaster SGE functionality (2) sge_schedd: mantain an up to date view of the cluster s status with the data provided by the sge_qmaster daemon. It : Decide which jobs are forwarded to which queues Comunicate these decisions to the sge_qmaster, who initiates the appropriate actions
SGE funcionality (3) sge_execd: is responsible for the queues on its host and for the execution of the jobs in this queues. It send information to the master host (sge_qmaster) about jobs status or load on its host. sge_commd: all the daemons communicates among them through the communication daemons (one per host) SGE functionality (4) Master Host sge_qmaster sge_schedd q2 q3 sge_execd sge_commd sge_commd sge_commd sge_execd q1 switch sge_commd sge_execd q4 q5
Using SGE Depend of the user type executing the SGE command. SGE define four types of users: Managers: Have full capabilities to manipulate SGE Operators: Can execute all the commands like managers, with the exception of making configuration changes to the SGE Owners: Are defined by queue and can manipulate the owned queues or jobs within them. Users: Only can manage the owned jobs and only can use queues or parallel environments where are authorized Using SGE (2) Command Manager Operator Owner User qacct qalter qconf No system setup changes Shown only Shown only qdel qhold qhost qlogin
Using SGE (3) Command Manager Operator Owner User qmod qmon qrls No system setup changes Own jobs and owned queues only No configuration changes No configuration changes qselect qsh qstat Submitting Jobs Prerequisites ensure that in your.[t]cshrc or. bashrc no commands are executed that need a terminal (tty) bash, sh or ksh tty s if [ $? = 0 ]; then stty erase ^H fi csh or tcsh tty s if ( $status = 0 ) then stty erase ^H endif
Submitting Jobs (2) Prerequisites (2) ensure that in your.[t]cshrc or.bashrc you set executable search path and other SGE environmental conditions csh or tcsh: source <sge_root_dir>/default/common/settings.csh bash, sh or ksh:. <sge_root_dir>/default/common/settings.sh Submitting Jobs (3) specify what script should be executed qsub cwd job_script -cwd: run the job from the current working directory. (Default: $HOME) in the simplest case the job script contains one line, the name of the executable various examples in <sge_root_dir>/examples/jobs/ many options are available for qsub man qsub
Submitting Jobs (4) Example of a script file #!/bin/csh WORKDIR=/tmp/scratch/$USER DATADIR=$HOME/data mkdir -p $WORKDIR cp $DATADIR/input_data $WORKDIR cd $WORKDIR executable < input_data > out_executable cp out_executable $DATADIR rm rf $WORKDIR Submitting Jobs (5) Output and Error redirection: Default standard output filename: <Job_name>.o<Job_id> Can by changed with the o option Default standard error filename: <Job_name>.e<Job_id> Can by changed with the e option Active SGE comments in script files: Per default are identified by #$
Submitting Jobs (6) Array Jobs: Are parametrized executions of the same script SGE view them as an array of independent tasks joined into a single job. task_id is the array job task index number Each task can use the environment variable $SGE_TASK_ID to retrieve their own task index number and use it to access input data sets arranged for this task_id Submitting Jobs (7) Array Jobs (2): Example: qsub l h_cpu=0:30:0 t 2-10:2 script.sh input.data Default standard output filename: <Job_name>.o<Job_id>.<Task_id> Default standard error filename: <Job_name>.e<Job_id>.<Task_id> Can be monitored and controlled as a total or by individual or subset of tasks
Submitting Jobs (8) Interactive Jobs: Are executed on interactive queues Three ways are available: qlogin: start a telnet-like sesion on a host choosed by SGE qrsh: Is like rsh or rlogin UNIX commands qsh: Is an xterm that is brought up with the display set corresponding to the setting of the DISPLAY environment variable. If this variable is not set, the xterm is directed to the 0.0 screen of the X server on the host from which the interactive job was submitted. DISPLAY can be set with the -display option. Monitoring and Controlling Jobs qstat: show job/queue status Whithout arguments show running/pending jobs -j show detailed information on running/pending jobs -f show submitted jobs and full listing of all queues qhost: show job/host status Whithout arguments show all execution host and their configuration -q show detailed information on queues at each host
Monitoring and Controlling Jobs (2) qdel: cancel jobs submitted through SGE qdel <job_id> qmod: suspend/unsuspend running jobs qmod s <job_id> (suspend) qmod us <job_id> (unsuspend) qhold: holds back pending jobs from execution qrls: releases jobs from holds previously assigned to them Parallel Jobs Are submitted to run on parallel environments Parallel environments are procedures to accomplish with requeriments needed to run a specific parallel application One parallel environment by each class or type of parallel application configured into the cluster
Parallel Jobs (2) qconf ap <parallel environment name> create a new parallel environment qconf spl list all defined parallel environments qconf sp <parallel environment name> show detailed information on the specified parallel environtment name Parallel Jobs (3) Parallel environment example: $ qconf -sp mpich pe_name mpich queue_list all slots 8 user_lists NONE xuser_lists NONE start_proc_args $pe_hostfile /usr/local/sge/mpi/startmpi.sh -catch_rsh stop_proc_args /usr/local/sge/mpi/stopmpi.sh allocation_rule $round_robin control_slaves TRUE job_is_first_task FALSE
Parallel Jobs (4) Script example: #!/bin/csh # # (c) 2002 Sun Microsystems, Inc. Use is subject to license terms. # # our name #$ -N MPI_calc_PI_Job # # pe request #$ -pe mpich 2-6 # #$ -v MPIR_HOME=/usr/local/mpich # # needs in # $NSLOTS # the number of tasks to be used # $TMPDIR/machines # a valid machine file to be passed to mpirun # echo "Got $NSLOTS slots." # $MPIR_HOME/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines $HOME/MPI/cpi Checkpointing SGE support two class of checkpointing: User level checkpointing Operating system level checkpointing Checkpointing environments must be defined by each type of application with this support When a checkpointing job is launched this must be indicated using the ckpt option of the qsub command
Checkpointing (2) Checkpointing environments are defined in configuration files: Define the operations to: initiating a checkpoint generation migrate a checkpoint job to another host restart of a checkpointed application As well as the list of queues which are eligible for a check-pointing method. Checkpointing (3) Checkpoint environment file format: ckpt_name <name> interface user defined or os provided. ckpt_command command to initiate the checkpoint. migr_command command used during a migration of a checkpointing job from one host to another. restart_command command used to restart a previously checkpointed application. clean_command command used to cleanup after a checkpointed application has finished. ckpt_dir where checkpoint file should be stored. queue_list all or comma separated list of queues signal Unix signal to be sent to a job to initiate a checkpoint generation when when generate the checkpoints: s (shutdown the node) m (periodically, at the min_cpu_interval interval defined by the queue) x (when the job gets suspended) r (job will be rescheduled (not checkpointed))
SGE Administration All administration activities on SGE are commited through the qmon command Basically: qconf a<h q s > <associated arguments> qconf d<h q e conf s > <associated arguments> qconf m<q conf > <associated arguments> qconf s<h s sel conf > <associated arguments> QMON: the SGE GUI