Efficient cluster computing
|
|
- Antony Lindsey
- 8 years ago
- Views:
Transcription
1 Efficient cluster computing Introduction to the Sun Grid Engine (SGE) queuing system Markus Rampp (RZG, MIGenAS) MPI for Evolutionary Anthropology Leipzig, Feb. 16, 2007
2 Outline Introduction Basic concepts: queues, jobs, scripts essential SGE commands and options Advanced topics Job chains Array jobs DRMAA API Tips & Tricks, References not covered: SGE configuration & administration, policies, accounting, grid computing, MPI,...
3 Introduction Sun Grid Engine (SGE): a popular batch-queuing system Software like SGE is typically used on a computer farm or computer cluster and is responsible for accepting, scheduling, dispatching, and managing the remote execution of large numbers of standalone, parallel or interactive user jobs. It also manages and schedules the allocation of distributed resources such as processors, memory, disk space, and software licenses. (taken from Wikipedia) Popular batch systems (DRMs) Sun Grid Engine (open source) LoadLeveler (IBM) NQS (Cray, NEC) DQS (open source)...
4 Introduction (2) Why should one use a DRM? Increase efficiency Operator s perspective: transparent resource management clustering of compute resources load balancing, optimization of resource usage fair (policy-based) distribution of resources accounting User s perspective: shared usage of system resources optimize throughput organize/simplify handling of ( large ) computational tasks enhanced stability (survive system crashes, maintenance,... ) well-defined resource allocation ( benchmarking) facilitates (non-interactive) work
5 Basic concepts Queues: Queue 1 Queue 2 Queue 1 Queue 2 Queue 3 Resource A Resource B queuename qtype used/tot. load_avg arch states all.q@e01.bc.rzg.mpg.de BIP 0/ lx26-x86 d all.q@e02.bc.rzg.mpg.de BIP 0/ lx26-x86 all.q@e03.bc.rzg.mpg.de BIP 0/ lx26-x86 all.q@e04.bc.rzg.mpg.de BIP 0/ lx26-x86 all.q@e05.bc.rzg.mpg.de BIP 0/ lx26-x86 all.q@e06.bc.rzg.mpg.de BIP 0/ lx26-x86 all.q@e07.bc.rzg.mpg.de BIP 0/ lx26-x86 normal@eva001.opt.rzg.mpg.de B 0/ lx26-amd64 normal@eva002.opt.rzg.mpg.de B 0/ lx26-amd64 normal@eva003.opt.rzg.mpg.de B 0/ lx26-amd64 normal@eva004.opt.rzg.mpg.de B 0/ lx26-amd64 normal@eva005.opt.rzg.mpg.de B 0/ lx26-amd64
6 Basic concepts (2) Jobs & scripts 1. prepare script of executable commands 2. specify resources and meta information 3. submit to batch system (returns a job ID) 4. use the job ID for job control (query status, cancel,... ) #$ -S /bin/sh #$ -cwd #$ -M mjr@rzg.mpg.de #$ -m e #$ -N example #begin executable commands (shell specified by #$ -S) # note: starting here, a leading # starts # a comment, whereas in the above # SGE header it does NOT echo "starting job..." blastall -p blastp -d nr -i query_1.fa -o blastout_1.txt blastall -p blastp -d nr -i query_2.fa -o blastout_2.txt echo "...done" > qsub example_1.sge Your job ("example") has been submitted. > qstat job-id prior name user state submit/start at queue slots ja-task-id example mjr qw 02/12/ :43:42 1 > qstat job-id prior name user state submit/start at queue slots ja-task-id example mjr r 02/12/ :19:26 all.q@e13.bc.rzg.mpg.de 1
7 SGE commands & options Interacting with the queuing system: SGE s q-commands qsub submit job qstat query queue/job status qdel delete job qhold hold ( suspend ) job; (note: user/operator/system holds) cf. ll-commands of LoadLeveler llsubmit llq,llstatus llcancel llhold qrls releases holds llhold -r qalter, qmod modify job qhost provide concise system overview llmodify qmon Graphical user interface (X)
8 SGE commands & options (2) Specify qsub options in script header and/or on command line (overrides script) Essential options for qsub: -S: path to shell -m b e a s n...: send mail at beginning end... of job -M: address for notification -N: name of job -j y: join stdout and stderr Additional options for qsub: -q: queue -p: priority (default 0; users may only decrease) -P: name of project -a: earliest date/time at which a job is eligible for execution... : cf. man qsub
9 SGE commands & options (3) Commonly used options for qstat: qstat displays list of jobs only qstat -u <user> -j <job ID> displays list of jobs for specified user/job qstat -f full format display qstat -r extended display (incl. resource requirements, scheduling info) >qstat -f queuename qtype used/tot. load_avg arch states BIP 0/ lx26-x86 d all.q@e02.bc.rzg.mpg.de BIP 0/ lx26-x86... all.q@e07.bc.rzg.mpg.de BIP 0/ lx26-x86 all.q@e08.bc.rzg.mpg.de BIP 2/ lx26-x megablast hfz r 02/13/ :34: megablast hfz r 02/13/ :34: all.q@e09.bc.rzg.mpg.de BIP 2/ lx26-x megablast hfz r 02/13/ :28: megablast hfz r 02/13/ :31: ############################################################################ - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS ############################################################################ megablast hfz qw 02/13/ :34: :1
10 SGE commands & options (4) > qstat -r -j ============================================================== job_number: exec_file: job_scripts/10422 submission_time: Tue Feb 13 16:34: owner: hfz uid: 1553 group: rzb gid: 4131 sge_o_home: /afs/ipp/home/h/hfz sge_o_log_name: hfz sge_o_path: /opt/sge6/bin/lx26-x86:/usr/local/bin:/opt/gnome/bin:/usr/games:/usr/bin/x11:/usr/bin:/bin sge_o_shell: /bin/tcsh sge_o_workdir: /bio/tmp/hfz/sargossa sge_o_host: e01 account: sge cwd: /bio/tmp/hfz/sargossa path_aliases: /tmp_mnt/ * * / mail_list: hfz@e01.bc.rzg.mpg.de notify: FALSE job_name: megablast jobshare: 0 shell_list: /bin/sh env_list: script_file: /afs/ipp/home/h/hfz/mysql/sequenzen/e01/submit_megablast_test.sge project: gendb job-array tasks: :1 usage 808: cpu=00:08:46, mem= GBs, io= , vmem=1.377g, maxvmem=1.566g.. usage 875: cpu=00:00:24, mem= GBs, io= , vmem=1.251g, maxvmem=1.251g scheduling info: queue instance "all.q@e01.bc.rzg.mpg.de" dropped because it is disabled queue instance "all.q@f12.bc.rzg.mpg.de" dropped because it is queue instance "all.q@e13.bc.rzg.mpg.de" dropped because it is full queue instance "all.q@f08.bc.rzg.mpg.de" dropped because it is full queue instance "all.q@f01.bc.rzg.mpg.de" dropped because it is full queue instance "all.q@e14.bc.rzg.mpg.de" dropped because it is full queue instance "all.q@f03.bc.rzg.mpg.de" dropped because it is full queue instance "all.q@f05.bc.rzg.mpg.de" dropped because it is full queue instance "all.q@e09.bc.rzg.mpg.de" dropped because it is full (project gendb) is not allowed to run in host "e07.bc.rzg.mpg.de" based on the excluded project list not all array task may be started due to max_aj_instances
11 Input/Output Output stdout: <job name>.o<job ID> stderr: <job name>.e<job ID> path: can be specified by qsub -o <stdout path> -e <stderr path> paths relative to current working directory at submission (with qsub -cwd option) user s home directory (if -cwd option is not specified): > ls example.e10404 example.o10404 example_1.sge Input arguments: qsub [ options ] [ command -- [ command_args ]] > qsub -p -10 example_1.sge arg1
12 Advanced topics Job chains: sets of consecutive interdependent jobs Job arrays: sets of similar and independent (parallel) jobs DRMAA: API specification
13 Job chains: sets of consecutive jobs Solution 1 (trivial) >cat allinone.sge #$ -S /bin/sh #$ -N allinone./doformatdb./doblastall./dopostprocessing >qsub allinone.sge Your job ("allinone") has been submitted. Solution 2 (modular, nested qsub) >cat formatdb.sge #$ -S /bin/sh #$ -N FormatDB./doFormatDB qsub blastall.sge >cat blastall.sge #$ -S /bin/sh #$ -N BlastAll./doBlastAll qsub postprocessing.sge... >qsub formatdb.sge Your job ("formatdb") has been submitted.
14 Job chains: sets of consecutive jobs (2) Solution 3 (optimized, uses -hold jid <job id job name>) >cat formatdb.sge #$ -S /bin/sh #$ -N FormatDB./doFormatDB >cat blastall.sge #$ -S /bin/sh #$ -N BlastAll #$ -hold_jid FormatDB./doBlastAll... >qsub formatdb.sge Your job ("formatdb") has been submitted. >qsub blastall.sge Your job ("blastall") has been submitted. >qsub postprocessing.sge Your job ("postprocessing") has been submitted. Advantage: accumulates waiting time Note: -hold_jid <job_name> can only be used to reference jobs of the same user (-hold_jid <job_id> can be used to reference any job)
15 Array jobs Submit sets of similar and independent tasks : qsub -t 1-500:1 example_3.sge submits 500 instances of the same script each instance ( task ) is executed independently all instances subsumed with a single job ID variable $SGE_TASK_ID discriminates between instances task numbering scheme: -t <first>-<last>:<stepsize> related: $SGE_TASK_FIRST,$SGE_TASK_LAST,$SGE_TASK_STEPSIZE Example: #$ -S /bin/sh #$ -cwd #$ -N blastarray #$ -t 1-500:1 QUERY=query_${SGE_TASK_ID}.fa OUTPUT=blastout_${SGE_TASK_ID}.txt echo "processing query $QUERY..." blastall -p blastn -d nt -i $QUERY -o $OUTPUT echo "...done"
16 Array jobs (2) > qsub example_3.sge Your job :1 ("blastarray") has been submitted. > qstat job-id prior name user state submit/start at queue slots ja-task-id blastarray mjr r 02/13/ :05:56 all.q@e08.bc.rzg.mpg.de blastarray mjr r 02/13/ :05:56 all.q@e08.bc.rzg.mpg.de blastarray mjr r 02/13/ :07:11 all.q@e09.bc.rzg.mpg.de blastarray mjr r 02/13/ :07:11 all.q@e09.bc.rzg.mpg.de blastarray mjr r 02/13/ :05:41 all.q@e10.bc.rzg.mpg.de blastarray mjr r 02/13/ :05:41 all.q@e10.bc.rzg.mpg.de blastarray mjr r 02/13/ :08:41 all.q@e11.bc.rzg.mpg.de blastarray mjr r 02/13/ :08:41 all.q@e11.bc.rzg.mpg.de blastarray mjr r 02/13/ :08:11 all.q@e12.bc.rzg.mpg.de blastarray mjr r 02/13/ :08:11 all.q@e12.bc.rzg.mpg.de blastarray mjr r 02/13/ :02:11 all.q@e13.bc.rzg.mpg.de blastarray mjr r 02/13/ :02:11 all.q@e13.bc.rzg.mpg.de blastarray mjr r 02/13/ :03:26 all.q@e14.bc.rzg.mpg.de blastarray mjr r 02/13/ :03:26 all.q@e14.bc.rzg.mpg.de blastarray mjr r 02/13/ :07:11 all.q@f01.bc.rzg.mpg.de blastarray mjr r 02/13/ :07:11 all.q@f01.bc.rzg.mpg.de blastarray mjr r 02/13/ :05:11 all.q@f02.bc.rzg.mpg.de blastarray mjr r 02/13/ :05:11 all.q@f02.bc.rzg.mpg.de blastarray mjr r 02/13/ :04:41 all.q@f03.bc.rzg.mpg.de blastarray mjr r 02/13/ :04:41 all.q@f03.bc.rzg.mpg.de blastarray mjr r 02/13/ :03:41 all.q@f04.bc.rzg.mpg.de blastarray mjr r 02/13/ :03:41 all.q@f04.bc.rzg.mpg.de blastarray mjr r 02/13/ :08:11 all.q@f05.bc.rzg.mpg.de blastarray mjr r 02/13/ :08:11 all.q@f05.bc.rzg.mpg.de blastarray mjr r 02/13/ :05:11 all.q@f06.bc.rzg.mpg.de blastarray mjr r 02/13/ :05:26 all.q@f06.bc.rzg.mpg.de blastarray mjr r 02/13/ :04:26 all.q@f07.bc.rzg.mpg.de blastarray mjr r 02/13/ :04:26 all.q@f07.bc.rzg.mpg.de blastarray mjr r 02/13/ :03:56 all.q@f08.bc.rzg.mpg.de blastarray mjr r 02/13/ :03:56 all.q@f08.bc.rzg.mpg.de blastarray mjr qw 02/13/ :28: :1
17 Array jobs (3) Benefits: simple organization simple interaction with job (single job ID) optimized throughput (see, e.g. qconf -sconf for jobs-per-user limits, etc.) powerful tool for (trivially) parallel applications Notes: one stdout/stderr file per task stdout: <job name>.o<job ID>.<task ID> stderr: <job name>.e<job ID>.<task ID> task-specific $TMPDIR $SGE TASK ID (and its relatives) are undefined for non-array jobs allocate reasonable chunks of work to tasks
18 Excursus: load balancing total work chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 PE 1 PE 2 PE 3 overhead idle time time number of PEs number of chunks t tot t overhead
19 DRMAA Distributed Resource Management Application API: API specification for the submission and control of jobs to one or more DRM systems (see Purpose: integration with applications Advantages: Portability, vendor independence Reliability: avoids error-prone parsing of output from qsub, qstat,... Efficiency: avoids expensive (and intricate: e.g. Perl) system calls Implementations: SGE Bindings for Java, C/C++ Modules for perl, Python...
20 DRMAA (2) Java example (fragment) package de.mpg.rzg.drmaa.queue; import java.util.list; import org.apache.commons.logging.log; import org.apache.commons.logging.logfactory; import org.ggf.drmaa.drmaaexception; import org.ggf.drmaa.jobtemplate; import org.ggf.drmaa.session; public class DrmaaQueueScheduler {... public String submitjob(){ String jobid = null; try { /* create DRMAA session */ SessionFactory factory session = SessionFactory.getFactory().getSession(); session.init(null); /* setup job template */ JobTemplate jt = session.createjobtemplate(); jt.setremotecommand("blastall"); jt.setargs(new String[]{"-p","blastp","-d","nr"}); jt.setjobname("blast"); List<String> taskids = session.runbulkjobs(jt,1,numjobtasks,chunksize); jobid = taskids.isempty()? null : taskids.get(0).split("[.]")[0]; } catch (DrmaaException e) { logger.error("submitting DRMAA job failed: "+e.getmessage()); } } } return jobid;
21 Tips & Tricks Submit scripts do not wire SGE logics into your application instead, use SGE scripts only as simple wrappers example: #$ -S /bin/sh #$ -t :10 perl ${HOME}/doMegablastChunk.pl $SGE_TASK_ID $SGE_TASK_STEPSIZE $TMPDIR facilitates: (interactive) testing code maintenance portability across different DRMs
22 Tips & Tricks (2) Misc do not rely on checkpointing: implement restart capability instead do not rely on (interactive) environment (e.g. $PATH) chose appropriate location for stdout, stderr redirect (wanted) stdout to separate file use reasonable partitioning of total computational work: avoid very short jobs/tasks ( 1 Minute): scheduling overhead avoid very long jobs/large arrays ( several days, tasks): manageability RZG specific issue save-password (AFS/Kerberos) before submitting your first job or after a change of your RZG password monitor for SGE error messages
23 Tips & Tricks (3) References and further reading Wikipedia Grid Engine SGE homepage SGE documentation SGE man pages SGE documentation of the RZG homepage (section Computing ) SGE configuration on the SUN Linux Cluster of the MPI-EVAn
SGE Roll: Users Guide. Version @VERSION@ Edition
SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1
More informationGrid Engine Users Guide. 2011.11p1 Edition
Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the
More informationIntroduction to Sun Grid Engine (SGE)
Introduction to Sun Grid Engine (SGE) What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems
More informationStreamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
More informationEnigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster
Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster http://www.biostat.jhsph.edu/bit/sge_lecture.ppt.pdf Marvin Newhouse Fernando J. Pineda The JHPCE staff:
More informationGrid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)
Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing
More informationIntroduction to the SGE/OGS batch-queuing system
Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic
More informationThe SUN ONE Grid Engine BATCH SYSTEM
The SUN ONE Grid Engine BATCH SYSTEM Juan Luis Chaves Sanabria Centro Nacional de Cálculo Científico (CeCalCULA) Latin American School in HPC on Linux Cluster October 27 November 07 2003 What is SGE? Is
More informationUser s Guide. Introduction
CHAPTER 3 User s Guide Introduction Sun Grid Engine (Computing in Distributed Networked Environments) is a load management tool for heterogeneous, distributed computing environments. Sun Grid Engine provides
More informationGrid Engine 6. Troubleshooting. BioTeam Inc. info@bioteam.net
Grid Engine 6 Troubleshooting BioTeam Inc. info@bioteam.net Grid Engine Troubleshooting There are two core problem types Job Level Cluster seems OK, example scripts work fine Some user jobs/apps fail Cluster
More informationGRID Computing: CAS Style
CS4CC3 Advanced Operating Systems Architectures Laboratory 7 GRID Computing: CAS Style campus trunk C.I.S. router "birkhoff" server The CAS Grid Computer 100BT ethernet node 1 "gigabyte" Ethernet switch
More informationHigh Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina
High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16 Topics Specifications Overview Site Policies Intel Compilers
More informationGrid Engine Training Introduction
Grid Engine Training Jordi Blasco (jordi.blasco@xrqtc.org) 26-03-2012 Agenda 1 How it works? 2 History Current status future About the Grid Engine version of this training Documentation 3 Grid Engine internals
More informationSLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
More informationIntroduction to Sun Grid Engine 5.3
CHAPTER 1 Introduction to Sun Grid Engine 5.3 This chapter provides background information about the Sun Grid Engine 5.3 system that is useful to users and administrators alike. In addition to a description
More informationMiami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
More informationIntroduction to Grid Engine
Introduction to Grid Engine Workbook Edition 8 January 2011 Document reference: 3609-2011 Introduction to Grid Engine for ECDF Users Workbook Introduction to Grid Engine for ECDF Users Author: Brian Fletcher,
More informationQuick Tutorial for Portable Batch System (PBS)
Quick Tutorial for Portable Batch System (PBS) The Portable Batch System (PBS) system is designed to manage the distribution of batch jobs and interactive sessions across the available nodes in the cluster.
More informationSLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
More informationMaxwell compute cluster
Maxwell compute cluster An introduction to the Maxwell compute cluster Part 1 1.1 Opening PuTTY and getting the course materials on to Maxwell 1.1.1 On the desktop, double click on the shortcut icon for
More informationGrid 101. Grid 101. Josh Hegie. grid@unr.edu http://hpc.unr.edu
Grid 101 Josh Hegie grid@unr.edu http://hpc.unr.edu Accessing the Grid Outline 1 Accessing the Grid 2 Working on the Grid 3 Submitting Jobs with SGE 4 Compiling 5 MPI 6 Questions? Accessing the Grid Logging
More informationOracle Grid Engine. User Guide Release 6.2 Update 7 E21976-02
Oracle Grid Engine User Guide Release 6.2 Update 7 E21976-02 February 2012 Oracle Grid Engine User Guide, Release 6.2 Update 7 E21976-02 Copyright 2000, 2012, Oracle and/or its affiliates. All rights reserved.
More informationPBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007
PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit
More informationNotes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine
Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine Last updated: 6/2/2008 4:43PM EDT We informally discuss the basic set up of the R Rmpi and SNOW packages with OpenMPI and the Sun Grid
More informationCluster Computing With R
Cluster Computing With R Stowers Institute for Medical Research R/Bioconductor Discussion Group Earl F. Glynn Scientific Programmer 18 December 2007 1 Cluster Computing With R Accessing Linux Boxes from
More informationSubmitting Jobs to the Sun Grid Engine. CiCS Dept The University of Sheffield. Email D.Savas@sheffield.ac.uk M.Griffiths@sheffield.ac.
Submitting Jobs to the Sun Grid Engine CiCS Dept The University of Sheffield Email D.Savas@sheffield.ac.uk M.Griffiths@sheffield.ac.uk October 2012 Topics Covered Introducing the grid and batch concepts.
More informationRunning on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:
More informationGrid Engine. Application Integration
Grid Engine Application Integration Getting Stuff Done. Batch Interactive - Terminal Interactive - X11/GUI Licensed Applications Parallel Jobs DRMAA Batch Jobs Most common What is run: Shell Scripts Binaries
More informationAn Introduction to High Performance Computing in the Department
An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software
More informationLoadLeveler Overview. January 30-31, 2012. IBM Storage & Technology Group. IBM HPC Developer Education @ TIFR, Mumbai
IBM HPC Developer Education @ TIFR, Mumbai IBM Storage & Technology Group LoadLeveler Overview January 30-31, 2012 Pidad D'Souza (pidsouza@in.ibm.com) IBM, System & Technology Group 2009 IBM Corporation
More informationTechnical Guide to ULGrid
Technical Guide to ULGrid Ian C. Smith Computing Services Department September 4, 2007 1 Introduction This document follows on from the User s Guide to Running Jobs on ULGrid using Condor-G [1] and gives
More informationGrid Engine 6. Policies. BioTeam Inc. info@bioteam.net
Grid Engine 6 Policies BioTeam Inc. info@bioteam.net This module covers High level policy config Reservations Backfilling Resource Quotas Advanced Reservation Job Submission Verification We ll be talking
More informationRunning applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
More informationBatch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource
PBS INTERNALS PBS & TORQUE PBS (Portable Batch System)-software system for managing system resources on workstations, SMP systems, MPPs and vector computers. It was based on Network Queuing System (NQS)
More informationHow To Run A Tompouce Cluster On An Ipra (Inria) 2.5.5 (Sun) 2 (Sun Geserade) 2-5.4 (Sun-Ge) 2/5.2 (
Running Hadoop and Stratosphere jobs on TomPouce cluster 16 October 2013 TomPouce cluster TomPouce is a cluster of 20 calcula@on nodes = 240 cores Located in the Inria Turing building (École Polytechnique)
More informationHigh Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda
High Performance Computing with Sun Grid Engine on the HPSCC cluster Fernando J. Pineda HPSCC High Performance Scientific Computing Center (HPSCC) " The Johns Hopkins Service Center in the Dept. of Biostatistics
More informationJob Scheduling with Moab Cluster Suite
Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..
More informationUsing Parallel Computing to Run Multiple Jobs
Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The
More informationSLURM Workload Manager
SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux
More informationIntroduction to Running Computations on the High Performance Clusters at the Center for Computational Research
! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at
More informationSubmitting batch jobs Slurm on ecgate. Xavi Abellan xavier.abellan@ecmwf.int User Support Section
Submitting batch jobs Slurm on ecgate Xavi Abellan xavier.abellan@ecmwf.int User Support Section Slide 1 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic
More informationJob scheduler details
Job scheduler details Advanced Computing Center for Research & Education (ACCRE) Job scheduler details 1 / 25 Outline 1 Batch queue system overview 2 Torque and Moab 3 Submitting jobs (ACCRE) Job scheduler
More informationNorduGrid ARC Tutorial
NorduGrid ARC Tutorial / Arto Teräs and Olli Tourunen 2006-03-23 Slide 1(34) NorduGrid ARC Tutorial Arto Teräs and Olli Tourunen CSC, Espoo, Finland March 23
More informationGrid Engine experience in Finis Terrae, large Itanium cluster supercomputer. Pablo Rey Mayo Systems Technician, Galicia Supercomputing Centre (CESGA)
Grid Engine experience in Finis Terrae, large Itanium cluster supercomputer Pablo Rey Mayo Systems Technician, Galicia Supercomputing Centre (CESGA) Agenda Introducing CESGA Finis Terrae Architecture Grid
More informationAstroCompute. AWS101 - using the cloud for Science. Brendan Bouffler ( boof ) Scientific Computing (SciCo) @ AWS. ska-astrocompute@amazon.
AstroCompute AWS101 - using the cloud for Science Brendan Bouffler ( boof ) Scientific Computing (SciCo) @ AWS ska-astrocompute@amazon.com AWS is hoping to contribute to the development of data processing,
More informationBeyond Windows: Using the Linux Servers and the Grid
Beyond Windows: Using the Linux Servers and the Grid Topics Linux Overview How to Login & Remote Access Passwords Staying Up-To-Date Network Drives Server List The Grid Useful Commands Linux Overview Linux
More informationUsing the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support
More informationHigh Performance Computing
High Performance Computing at Stellenbosch University Gerhard Venter Outline 1 Background 2 Clusters 3 SU History 4 SU Cluster 5 Using the Cluster 6 Examples What is High Performance Computing? Wikipedia
More informationLinux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.
Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction
More informationCluster@WU User s Manual
Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut
More informationGC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems
GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems Riccardo Murri, Sergio Maffioletti Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich
More informationThe RWTH Compute Cluster Environment
The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de
More informationRunning ANSYS Fluent Under SGE
Running ANSYS Fluent Under SGE ANSYS, Inc. Southpointe 275 Technology Drive Canonsburg, PA 15317 ansysinfo@ansys.com http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494 Release 15.0 November 2013 ANSYS,
More informationBatch Scripts for RA & Mio
Batch Scripts for RA & Mio Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Jobs are Run via a Batch System Ra and Mio are shared resources Purpose: Give fair access to all users Have control over where jobs
More informationIntroduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St
More informationHow to Run Parallel Jobs Efficiently
How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2
More informationLSKA 2010 Survey Report Job Scheduler
LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,
More informationInstalling and running COMSOL on a Linux cluster
Installing and running COMSOL on a Linux cluster Introduction This quick guide explains how to install and operate COMSOL Multiphysics 5.0 on a Linux cluster. It is a complement to the COMSOL Installation
More informationNYUAD HPC Center Running Jobs
NYUAD HPC Center Running Jobs 1 Overview... Error! Bookmark not defined. 1.1 General List... Error! Bookmark not defined. 1.2 Compilers... Error! Bookmark not defined. 2 Loading Software... Error! Bookmark
More informationHigh-Performance Reservoir Risk Assessment (Jacta Cluster)
High-Performance Reservoir Risk Assessment (Jacta Cluster) SKUA-GOCAD 2013.1 Paradigm 2011.3 With Epos 4.1 Data Management Configuration Guide 2008 2013 Paradigm Ltd. or its affiliates and subsidiaries.
More informationParallel Debugging with DDT
Parallel Debugging with DDT Nate Woody 3/10/2009 www.cac.cornell.edu 1 Debugging Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece
More informationNEC HPC-Linux-Cluster
NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores
More informationUsing NeSI HPC Resources. NeSI Computational Science Team (support@nesi.org.nz)
NeSI Computational Science Team (support@nesi.org.nz) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting
More informationRa - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu
Ra - Batch Scripts Timothy H. Kaiser, Ph.D. tkaiser@mines.edu Jobs on Ra are Run via a Batch System Ra is a shared resource Purpose: Give fair access to all users Have control over where jobs are run Set
More informationBackground. These efforts tend to be multi-institutional. Institutes and universities have varying infrastructure and schedulers installed
Background Modern scientific efforts underway that require large scale parallel computations These efforts tend to be multi-institutional Institutes and universities have varying infrastructure and schedulers
More informationOracle Grid Engine. Administration Guide Release 6.2 Update 7 E21978-01
Oracle Grid Engine Administration Guide Release 6.2 Update 7 E21978-01 August 2011 Oracle Grid Engine Administration Guide, Release 6.2 Update 7 E21978-01 Copyright 2000, 2011, Oracle and/or its affiliates.
More informationBatch Job Analysis to Improve the Success Rate in HPC
Batch Job Analysis to Improve the Success Rate in HPC 1 JunWeon Yoon, 2 TaeYoung Hong, 3 ChanYeol Park, 4 HeonChang Yu 1, First Author KISTI and Korea University, jwyoon@kisti.re.kr 2,3, KISTI,tyhong@kisti.re.kr,chan@kisti.re.kr
More informationUsing WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014
Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan
More informationMaintaining Non-Stop Services with Multi Layer Monitoring
Maintaining Non-Stop Services with Multi Layer Monitoring Lahav Savir System Architect and CEO of Emind Systems lahavs@emindsys.com www.emindsys.com The approach Non-stop applications can t leave on their
More informationAn Oracle White Paper August 2010. Beginner's Guide to Oracle Grid Engine 6.2
An Oracle White Paper August 2010 Beginner's Guide to Oracle Grid Engine 6.2 Executive Overview...1 Introduction...1 Chapter 1: Introduction to Oracle Grid Engine...3 Oracle Grid Engine Jobs...3 Oracle
More informationGrid Engine Administration. Overview
Grid Engine Administration Overview This module covers Grid Problem Types How it works Distributed Resource Management Grid Engine 6 Variants Grid Engine Scheduling Grid Engine 6 Architecture Grid Problem
More informationBenchmark Report: Univa Grid Engine, Nextflow, and Docker for running Genomic Analysis Workflows
PRBB / Ferran Mateo Benchmark Report: Univa Grid Engine, Nextflow, and Docker for running Genomic Analysis Workflows Summary of testing by the Centre for Genomic Regulation (CRG) utilizing new virtualization
More informationSun Grid Engine, a new scheduler for EGEE
Sun Grid Engine, a new scheduler for EGEE G. Borges, M. David, J. Gomes, J. Lopez, P. Rey, A. Simon, C. Fernandez, D. Kant, K. M. Sephton IBERGRID Conference Santiago de Compostela, Spain 14, 15, 16 May
More informationSUN GRID ENGINE & SGE/EE: A CLOSER LOOK
SUN GRID ENGINE & SGE/EE: A CLOSER LOOK Carlo Nardone HPC Consultant Sun Microsystems, GSO SUN GRID ENGINE & SGE/EE: A CLOSER LOOK Agenda Sun and Grid Computing Sun Grid Engine: Architecture Campus Grid
More informationIntroduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz)
Center for e-research (eresearch@nesi.org.nz) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using
More informationResource Management and Job Scheduling
Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University May 18 18-22 May 2015 1 Resource Managers Keep track of resources Nodes: CPUs, disk, memory,
More informationRocoto. HWRF Python Scripts Training Miami, FL November 19, 2015
Rocoto HWRF Python Scripts Training Miami, FL November 19, 2015 Outline Introduction to Rocoto How it works Overview and description of XML Effectively using Rocoto (run, boot, stat, check, rewind, logs)
More informationKISTI Supercomputer TACHYON Scheduling scheme & Sun Grid Engine
KISTI Supercomputer TACHYON Scheduling scheme & Sun Grid Engine 슈퍼컴퓨팅인프라지원실 윤 준 원 (jwyoon@kisti.re.kr) 2014.07.15 Scheduling (batch job processing) Distributed resource management Features of job schedulers
More informationlocuz.com HPC App Portal V2.0 DATASHEET
locuz.com HPC App Portal V2.0 DATASHEET Ganana HPC App Portal makes it easier for users to run HPC applications without programming and for administrators to better manage their clusters. The web-based
More informationHPCC USER S GUIDE. Version 1.2 July 2012. IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35
HPCC USER S GUIDE Version 1.2 July 2012 IITS (Research Support) Singapore Management University IITS, Singapore Management University Page 1 of 35 Revision History Version 1.0 (27 June 2012): - Modified
More informationBEGINNER'S GUIDE TO SUN GRID ENGINE 6.2
BEGINNER'S GUIDE TO SUN GRID ENGINE 6.2 Installation and Configuration White Paper September 2008 Abstract This white paper will walk through basic installation and configuration of Sun Grid Engine 6.2,
More informationAdvanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011
Advanced Techniques with Newton Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Workshop Goals Gain independence Executing your work Finding Information Fixing Problems Optimizing Effectiveness
More informationHodor and Bran - Job Scheduling and PBS Scripts
Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.
More informationBatch Scheduling and Resource Management
Batch Scheduling and Resource Management Luke Tierney Department of Statistics & Actuarial Science University of Iowa October 18, 2007 Luke Tierney (U. of Iowa) Batch Scheduling and Resource Management
More informationMartinos Center Compute Clusters
Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress
More informationGrid Engine 6. Monitoring, Accounting & Reporting. BioTeam Inc. info@bioteam.net
Grid Engine 6 Monitoring, Accounting & Reporting BioTeam Inc. info@bioteam.net This module covers System Monitoring Accounting & Reporting tools SGE Accounting File ARCo & sgeinspect SGE Reporting 3rd
More informationGeneral Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)
Slurm Training15 Agenda 1 2 3 About Slurm Key Features of Slurm Extending Slurm Resource Management Daemons Job/step allocation 4 5 SMP MPI Parametric Job monitoring Accounting Scheduling Administration
More informationOperating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:
Chapter 7 OBJECTIVES Operating Systems Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the concept of virtual memory. Understand the
More informationDebugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu
Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu Setup Login to Ranger: - ssh -X username@ranger.tacc.utexas.edu Make sure you can export graphics
More informationKiko> A personal job scheduler
Kiko> A personal job scheduler V1.2 Carlos allende prieto october 2009 kiko> is a light-weight tool to manage non-interactive tasks on personal computers. It can improve your system s throughput significantly
More informationConfiguration of High Performance Computing for Medical Imaging and Processing. SunGridEngine 6.2u5
Configuration of High Performance Computing for Medical Imaging and Processing SunGridEngine 6.2u5 A manual guide for installing, configuring and using the cluster. Mohammad Naquiddin Abd Razak Summer
More informationJuropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de
Juropa Batch Usage Introduction May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Batch System Usage Model A Batch System: monitors and controls the resources on the system manages and schedules
More informationRelease Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11
Release Notes for Open Grid Scheduler/Grid Engine Version: Grid Engine 2011.11 New Features Berkeley DB Spooling Directory Can Be Located on NFS The Berkeley DB spooling framework has been enhanced such
More informationThis document presents the new features available in ngklast release 4.4 and KServer 4.2.
This document presents the new features available in ngklast release 4.4 and KServer 4.2. 1) KLAST search engine optimization ngklast comes with an updated release of the KLAST sequence comparison tool.
More informationParallels Plesk Panel
Parallels Plesk Panel Copyright Notice ISBN: N/A Parallels 660 SW 39th Street Suite 205 Renton, Washington 98057 USA Phone: +1 (425) 282 6400 Fax: +1 (425) 282 6444 Copyright 1999-2009, Parallels, Inc.
More informationJobScheduler Web Services Executing JobScheduler commands
JobScheduler - Job Execution and Scheduling System JobScheduler Web Services Executing JobScheduler commands Technical Reference March 2015 March 2015 JobScheduler Web Services page: 1 JobScheduler Web
More informationFigure 12: Fully distributed deployment of the Job Scheduler toolkit
A.2 Job Scheduler Role(s): Service Provider Component(s): Job Scheduler License: Apache 2.0 A.2.1 Installation A.2.1.1. Installation Requirements These are the prerequisites of every component in the toolkit:
More informationSTAR-Scheduler: A Batch Job Scheduler for Distributed I/O Intensive Applications V. Mandapaka (a), C. Pruneau (b), J. Lauret (c), S.
1 STAR-Scheduler: A Batch Job Scheduler for Distributed I/O Intensive Applications V. Mandapaka (a), C. Pruneau (b), J. Lauret (c), S. Zeadally (a) (a) Department of Computer Science, Wayne State University
More informationUsing the UCI biocluster s queuing system with Grid Engine (GE) Kevin Thornton krthornt@uci.edu June 22, 2012
Using the UCI biocluster s queuing system with Grid Engine (GE) Kevin Thornton krthornt@uci.edu June 22, 2012 Intro! 4 The storage! 4 Practical considerations! 4 Other considerations! 5 The queues! 5 Using
More informationManual for using Super Computing Resources
Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad
More information