Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Size: px
Start display at page:

Download "Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)"

Transcription

1 Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing Basic Job Submission Test Script qsub Serial Job Submission Example with the test.sh script: Output: qsub Parallel Job Submission List available Parallel Environments (pe): Example: Scripted Job Submission Shell Script Submission Example: OpenMPI Submission Example: Job Submission Monitoring Alternate Additional SGE Commands Additional Pre Written ODU Turing Submission Scripts Sample Script Functionality Information Technology Services High Performance Computing Group 1 of 11

2 Document Text Style Associations commands text that is blue with the New Courier font are commands that can be run from the command line input/output text text that only uses the New Courier font is either input or output from the command line [keyboard button] words or symbols found within square brackets are buttons located on a standard keyboard {Note: information} text found within curly brackets is additional information that should be further examined Prerequisites Active MIDAS account (midas.odu.edu) The username (MIDASID) is the part of your before symbol (i.e. the username for sample123@odu.edu would be sample123 ). The password is the same as the one for your ODU . MIDAS stands for Monarch IDentification and Authorization System. Active LIONS (L2CP) account in MIDAS Log into MIDAS (midas.odu.edu), and select the Inactive Services tab. Click LIONS and then click Click to Activate Service. Active HPC Service Information Technology Services High Performance Computing Group 2 of 11

3 Send an to with the subject Activate HPC Service, and in the body of the have your Full Name, MIDASID (Username), and ODU Number (UIN). Information Technology Services High Performance Computing Group 3 of 11

4 Terminology SGE SGE is an acronym for Sun Grid Engine, the predecessor of Oracle Grid Engine, the opensource parent of Open Grid Scheduler / Grid Engine. In short, SGE is the popular name to which Grid Engine is referred. job this term is given to a set of instructions/applications that is distributed to a cluster for processing script a set of commands and/or options written into a file for execution on the cluster submission requesting a job to run commands or a script with commands on the cluster source code human readable text in a programming language that in most cases can not be directly executed compiling the process taking human readable source code and turning it into computer executable binary code through the use of a compiler binary an executable application distributed jobs a set of jobs that can run independent of other job results parallel job A job that can be distributed but results are dependant upon other jobs Information Technology Services High Performance Computing Group 4 of 11

5 What is the Grid Engine (SGE)? SGE is an application that both distributes and manages job submission queues on the Turing cluster. The application is an opensource fork of the Oracle (historically Sun) Grid Engine. The software allows both serial, as well as; parallel (with the addition of message passing software) jobs to run over heterogenous hardware architectures. Those individual architecture, termed nodes, are further separated into central/graphical processing unit (cpu/gpu) core capabilities. SGE administrators are then given the ability to separate the capabilities into queues and parallel environments based on researcher needs. The queues are then given various priorities on the cluster to balance resource usage. Loading the SGE Module on Turing To have access to the SGE job submission queues, and associated monitoring tools, paths to the various application scripts must be added to the current user session. {Note: The Turing cluster adds the paths to the SGE paths user profiles automatically at account creation.} module add SGE adds SGE paths to the current user session module initadd SGE permanently adds SGE to all user sessions Further information on module usage can be found on the ODU High Performance Computing website. Information Technology Services High Performance Computing Group 5 of 11

6 Basic Job Submission There are two (2) primary types of SGE jobs which are serial and parallel. The jobs can either be defined by submission scripts or directly inserted. This section will give examples for both. Test Script The following script uses a shell script that prints the current working directory (pwd) and the returns the name of the computer on which the script is being run (echo $HOSTNAME). script.shtext #!/bin/bash echo " Starting Script " echo "The working directory is:" pwd echo "The hostname of the node is:" echo $HOSTNAME echo " Done " Output of script.shwhen run on the login node Information Technology Services High Performance Computing Group 6 of 11

7 qsubserial Job Submission The queue submission (qsub) command is used to submit a batch job to the Turing cluster. qsub [options] script_name Example with the test.shscript: qsub cwd script.sh {Note: the cwdoption added stands for current working directory, without this option any action taken or results returned will come from the user s home directory(~).} Output: When a script is submitted, and run on the cluster through qsub, two (2) files are always created. The first is a standard output file (script_name.o(job_number ) and the second file contains any errors (script_name.e(job_number ). Information Technology Services High Performance Computing Group 7 of 11

8 qsubparallel Job Submission Parallel jobs use the parallel environments component of SGE. List available Parallel Environments (pe): qconf spl qsub [options] pe pe_name script_name Example: {Note: What is not shown in the example is that source code was compiled to the binary a.out using the mpicc compiler from the OpenMPI module.} mpi_scripttext module add openmpi mpirun np $NSLOTS./a.out qsub cwd pe openmpi 4 8 mpi_script Scripted Job Submission When submitting jobs, it is recommended to use a script that contains all the settings needed. This job submission script becomes particularly helpful when running many jobs, or jobs that need many switched and/or input variables. Below is an example of a submission script. Shell Script Submission Example: #!/bin/bash # Add any required modules # The batch system should use the current directory as working directory. #$ cwd # Redirect output stream to this file. #$ o mpi_output.dat Information Technology Services High Performance Computing Group 8 of 11

9 # Join the error stream to the output stream. #$ j yes # Send status information to this address. #$ M username@odu.edu # Send me an e mail when the job has finished. #$ m e shell_commands OpenMPI Submission Example: #!/bin/bash # Add any required modules module add openmpi # The batch system should use the current directory as working directory. #$ cwd # Redirect output stream to this file. #$ o mpi_output.dat # Join the error stream to the output stream. #$ j yes # Send status information to this address. #$ M username@odu.edu # Send me an e mail when the job has finished. #$ m e # Use the parallel environment "openmpi", which assigns as many processes # as available on each host. If there are not enough machines to run the MPI job # on the maximum of 16, 20, or 32 cores, the batch system will gradually try to run the job # on fewer cores, but not less than 8. #$ pe openmpi 4 8 mpirun np $NSLOTS./binary_name Information Technology Services High Performance Computing Group 9 of 11

10 Job Submission Monitoring When large jobs are submitted to the cluster they often require processing time greater than a few seconds. In all cases a job can be monitored through the command qmon. The state column provides the current action being taken regarding the submitted job. The states are: qw queued waiting t transferring r running R restarted s suspended S suspended T threshold qstat Alternate qstat (options) job_name Information Technology Services High Performance Computing Group 10 of 11

11 Additional SGE Commands qhost shows the current status of the available Grid Engine hosts, queues and the jobs associated with the queues qdel job_number removes a job from the queue qrsh starts and interactive shell as a job {Note: requires X11} Additional Pre-Written ODU Turing Submission Scripts The Turing cluster has a set of scripts that have been written to aid users when submitting to various parallel environments. Scripts in the location are continually modified and added. The scripts are located in: Pre written script path: /cm/shared/scripts/ Sample Script Functionality g09q used to submit jobs using the Gaussian parallel environment matlabx used to submit jobs using the Matlab parallel environment comsol 43b used to submit jobs using the Comsol3 parallel environment Information Technology Services High Performance Computing Group 11 of 11