Grid Engine. Application Integration



Similar documents
The SUN ONE Grid Engine BATCH SYSTEM

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

Grid Engine experience in Finis Terrae, large Itanium cluster supercomputer. Pablo Rey Mayo Systems Technician, Galicia Supercomputing Centre (CESGA)

Grid 101. Grid 101. Josh Hegie.

Grid Engine Administration. Overview

Introduction to Sun Grid Engine (SGE)

SGE Roll: Users Guide. Version Edition

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Grid Engine 6. Troubleshooting. BioTeam Inc.

User s Manual

GRID Computing: CAS Style

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

SMRT Analysis Software Installation (v2.3.0)

Oracle Grid Engine. Administration Guide Release 6.2 Update 7 E

How to Run Parallel Jobs Efficiently

Cluster Computing With R

Oracle Grid Engine. User Guide Release 6.2 Update 7 E

How To Use A Job Management System With Sun Hpc Cluster Tools

An Introduction to High Performance Computing in the Department

Installing and running COMSOL on a Linux cluster

MPI / ClusterTools Update and Plans

Manual for using Super Computing Resources

Grid Engine 6. Policies. BioTeam Inc.

ABAQUS High Performance Computing Environment at Nokia

Grid Engine Users Guide p1 Edition

Efficient cluster computing

Running ANSYS Fluent Under SGE

High Performance Computing

Parallel Debugging with DDT

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Beyond Windows: Using the Linux Servers and the Grid

User s Guide. Introduction

Sun Grid Engine Package for OSCAR A Google SoC 2005 Project

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Sun Grid Engine, a new scheduler for EGEE

Batch Job Analysis to Improve the Success Rate in HPC

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

Sun Grid Engine, a new scheduler for EGEE middleware

locuz.com HPC App Portal V2.0 DATASHEET

Introduction to Sun Grid Engine 5.3

Introduction to the SGE/OGS batch-queuing system

Sun Powers the Grid SUN GRID ENGINE

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Setting Up the Site Licenses

Configuring Windows Server Clusters

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

Healthstone Monitoring System

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

CHAPTER FIVE RESULT ANALYSIS

ANSYS Remote Solve Manager User's Guide

Submitting Jobs to the Sun Grid Engine. CiCS Dept The University of Sheffield.

Maintaining Non-Stop Services with Multi Layer Monitoring

Efficiency of Web Based SAX XML Distributed Processing

Running COMSOL in parallel

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Assignment # 1 (Cloud Computing Security)

Applications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance. Sunny Gleason COM S 717

DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service

Using Google Compute Engine

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Introduction to WebSphere Administration

LSKA 2010 Survey Report Job Scheduler

PuTTY/Cygwin Tutorial. By Ben Meister Written for CS 23, Winter 2007

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource

NYUAD HPC Center Running Jobs

MATLAB Distributed Computing Server System Administrator's Guide

Parallel Visualization of Petascale Simulation Results from GROMACS, NAMD and CP2K on IBM Blue Gene/P using VisIt Visualization Toolkit

Remote Access to Unix Machines

CONNECTING TO DEPARTMENT OF COMPUTER SCIENCE SERVERS BOTH FROM ON AND OFF CAMPUS USING TUNNELING, PuTTY, AND VNC Client Utilities

COMSOL Server. Manual

The System Monitor Handbook. Chris Schlaeger John Tapsell Chris Schlaeger Tobias Koenig

Chapter 2: Getting Started

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

Introduction to Grid Engine

Getting Started with HPC

AstroCompute. AWS101 - using the cloud for Science. Brendan Bouffler ( boof ) Scientific Computing AWS. ska-astrocompute@amazon.

Quick Tutorial for Portable Batch System (PBS)

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

MATLAB Distributed Computing Server System Administrator s Guide. R2013b

Alfresco Enterprise on AWS: Reference Architecture

INF-110. GPFS Installation

Bright Cluster Manager 5.2. User Manual. Revision: Date: Fri, 30 Nov 2012

Running applications on the Cray XC30 4/12/2015

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine

Grid Computing in SAS 9.4 Third Edition

Introduction. AppDynamics for Databases Version Page 1

Transcription:

Grid Engine Application Integration

Getting Stuff Done. Batch Interactive - Terminal Interactive - X11/GUI Licensed Applications Parallel Jobs DRMAA

Batch Jobs Most common What is run: Shell Scripts Binaries Scripts that call binaries Method: qsub

Batch Jobs Example usage: qsub -cwd $SGE_ROOT/examples/jobs/sleeper.sh

Batch Jobs Gotchas qsub assumes your submission is a script Use -b y when directly calling an executable Qmaster param shell_start_mode Can override your choice of shell/interpreter Are you really getting the shell you requested?

Batch Jobs Shortcut Remember #$ for embedding args into scripts Example: #!/bin/sh ## SGE Arguments #$ -q all.q #$ -P MyProject #$ -l -hard matlablicense=2 /path/to/my/application.exe

Interactive Jobs - Terminal based Less common What is run: Short executables Remote shells Scripts/apps requiring human input Methods: qrsh qlogin

Interactive Jobs - Terminal based Example usage: qlogin [hedeby@vcentos-a ~]$ qlogin Your job 6 ("QLOGIN") has been submitted waiting for interactive job to be scheduled... Your interactive job 6 has been successfully scheduled. Establishing builtin session to host vcentos-a... [hedeby@vcentos-a ~]$ exit logout

Interactive Jobs - Terminal based Example usage: qrsh $ qrsh "uname -a; /bin/hostname Linux vcentos-a 2.6.18-128.1.6.el5 #1 SMP Wed Apr 1 09:10:25 EDT \ 2009 x86_64 x86_64 x86_64 GNU/Linux vcentos-a

Interactive Jobs - Terminal based Note qrsh is a great way to run quick commands on the least loaded node in the system But No guarantee of success If cluster is full, qrsh will return error In workflows need to trap for this or use qrsh -sync y

Interactive Jobs - Terminal based Quickly cluster enable a binary Trivial NCBI blastall wrapper Give this to your users They use it in the same way they always have Behind the scenes we are invoking it via qrsh #!/bin/sh # Cluster enable NCBI blastall qrsh -cwd -now no /opt/bin/blastall $*

Interactive Jobs - Terminal based Gotchas Remember no guarantee of success By default only works if job slots are free Subject to sysadmin modification SGE prior to 6.2 uses custom rsh* methods Many admins replace this with calls to SSH SGE 6.2 and later has better builtin method Direct communication with SGE execd Other problems seen at scale Running out of TCP ports or filehandles

Interactive Jobs - Graphical Less common What is run: Xterm X11 applications MatLab etc. Methods: qrsh qsh Requires SSH & X11 forwarding to be working SGE spawns xterm for you

Example: Grid xterm via qsh

Interactive Jobs - Graphical Gotchas qrsh and qlogin methods won t work in the default config Requires Replace rlogin_* methods with calls to SSH Configure/text X11 forwarding over SSH In SGE 6.2 and later The brand new builtin methods don t support X11 forwarding yet You must replace the builtin methods with SSH based techniques

Licensed Applications

Licensed Applications May not may not be a big deal Depends on your license type

Licensed Applications You don t have to worry when: You have a site license There is no scarcity of entitlements You have more FLEXlm tokens than job slots Method Don t do anything, should just work as usual

Licensed Applications Moderate effort required when: More nodes/slots than license entitlements No license server, or Dedicated license server on cluster Method SGE Admins will create user-requestable consumable resource to match license count All you need to do is request this resource when submitting

Licensed Applications Somewhat significant effort required when: More nodes/slots than license entitlements An external shared/central license server must be queried Method Integrate external license data with Grid Engine job scheduler Old way: load sensor scripts Best practice way: Olesen FLEXlm method External perl daemon queries license server Rapid modification of SGE consumable resource values to reflect real world license availability data Much faster than the load sensor method Less chance of race condition

Parallel Jobs

Parallel Jobs A parallel job runs simultaneously across multiple servers Biggest SGE job I ve heard of: Single application running across 63,000 CPU cores TACC Ranger Cluster in Texas

Parallel Jobs Setting up the PE is often the hardest part Application specific PE s are normal: To account for: Specific MPI implementation required Desired communication fabric Special network topologies Application-specific MPI starter or stop methods Preference for specific CPU allocation_rule

Parallel Jobs No magic involved Requires work Your application must support parallel methods

Parallel Jobs Many different software implementations are used to support parallel tasks: MPICH LAM-MPI OpenMPI PVM LINDA

Parallel Jobs Loose Integration Grid Engine used for: Picking when the job runs Picking where the job runs Generating the custom machine file Grid Engine does not: Launch or control the parallel job itself Track resource consumption or child processes

Parallel Jobs Advantages of loose integration Easy to set up Can trivially support almost any parallel application technology

Parallel Jobs Disadvantages of loose integration Grid Engine can t track resource consumption Grid Engine must trust the parallel app to honor the custom hostfile Grid Engine can not kill runaway jobs

Parallel Jobs Tight Integration Grid Engine handles all aspects of parallel job operation from start to finish Includes spawning and controlling all parallel processes

Parallel Jobs Tight integration advantages: Grid Engine remains in control Resource usage accurately tracked Standard commands like qdel will work Child tasks will not be forgotten about or left untouched

Parallel Jobs Tight Integration disadvantages: Can be really hard to implement Makes job debugging and troubleshooting harder May be application specific

Parallel Environment Config pe_name lam-loose slots 4 user_lists NONE xuser_lists NONE start_proc_args /opt/class/loose-lammpi/startmpi.sh $pe_hostfile stop_proc_args /opt/class/loose-lammpi/bin/lamhalt allocation_rule $fill_up control_slaves FALSE job_is_first_task TRUE urgency_slots min

Parallel Environment Usage qsub -pe lam-loose 4./my-mpich-job.sh qsub -pe lam-loose 8-10./my-mpich-job.sh qsub -pe lam-loose 3./my-lam-job.sh

Behind the scenes: MPICH Very simple The startmpi.sh script is run before job launches and creates custom machine file The user job script gets date required by mpirun from environment variables: $NODES, $TEMPDIR/machines, etc. The stopmpi.sh script is just a placeholder Does not really do anything (no need yet)

Behind the scenes: MPICH A trivial MPICH wrapper for Grid Engine: #!/bin/sh MPIRUN= /usr/local/mpich-1.2.6/ch_p4/bin/mpirun ## ---- EMBEDDED SGE ARGUMENTS ---- #$ -N MPI_Job #$ -pe mpich 3-5 #$ -cwd ## ------------------------------------ echo "I got $NSLOTS slots to run on!" $MPIRUN -np $NSLOTS -machinefile $TMPDIR/machines./mpi-program

Behind the scenes: LAM-MPI Very simple Just like MPICH But 2 additions: The lamstart.sh script launches LAMBOOT The lamstop.sh script executes LAMHALT at job termination In an example configuration, lamboot is started this way: lamboot -v -ssi boot rsh -ssi rsh_agent "ssh -x -q" $TMPDIR/machines

Behind the scenes: LAM-MPI A trivial LAM-MPI wrapper for Grid Engine: #!/bin/sh MPIRUN= /usr/local/bin/mpirun ## ---- EMBEDDED SGE ARGUMENTS ---- #$ -N MPI_Job #$ -pe lammpi 3-5 #$ -cwd ## ------------------------------------ echo "I have $NSLOTS slots to run on!" $MPIRUN C./mpi-program

OpenMPI In absence of specific requirements, a great choice Works well over Gigabit Ethernet Trivial to achieve tight SGE integration Recent personal experience: Out of the box: cpi.c on 1024 CPUs Out of the box: heavyweight genome analysis pipeline on 650 Nehalem cores

Behind the scenes: OpenMPI Incredibly easy/simple OpenMPI 1.2.x natively supports automatic tight SGE integration Build from source with --enable-sge Usage: mpirun -np $NSLOTS /path-to-my-parallel-app OpenMPI PE config: pe_name openmpi slots 4 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $round_robin control_slaves TRUE job_is_first_task FALSE urgency_slots min

OpenMPI Job Script #!/bin/sh ## ---- EMBEDDED SGE ARGUMENTS ---- #$ -N MPI_Job #$ -pe openmpi 3-5 #$ -cwd ## ------------------------------------ echo "I got $NSLOTS slots to run on!" mpirun -np $NSLOTS./my-mpi-program

Workflows

Workflows Getting a bit into SGE User territory here SGE admins often asked to assist with pipelines & workflows You should say yes Otherwise: Don t be surprised when a user writes a shell loop that qsub s a few million jobs

Workflows A few simple SGE features can provide significant efficiency gains Benefits both users & cluster operators Teach your users effective use of: Job dependency syntax Array Jobs Proper use of sync -y

Job Dependencies Easy to understand, trivial to implement Job Names -hold_jid Very effective building block for workflows and pipelines

Array Jobs Perfect solution for use cases involving running the same application many times with only minor changes in input or output arguments Both User & Cluster will benefit Users will thank you

Array Jobs continued Note: Task interdependency within an array job is a new SGE feature (and open source success story) Have been some issues with array job tasks and specific resource allocation implementation

Workflow cont. Consider prolog & epilog scripts for special cases Best pipeline integration I ve seen DNA Productions (film rendering) Prolog does JDBC insert into SQL database to capture all job details Epilog updates DB after job exit End result: 100% capture of workflow in DB Any job can be repeated exactly at any time

Workflow cont. JSV s may be of help as well I ve got a project now that needs one And finally For serious cluster-ware code

DRMAA

DRMAA Grid Engine s only public API Actually a standard www.drmaa.org Distributed Resource Management Application API

DRMAA Source: www.drmaa.org

DRMAA Best practice way for writing cluster aware applications API concentrates solely on job submission and control This is not a Grid Engine management API Multiple language bindings Pick what you like C, C++, Java, Perl, Python, Ruby C and Java seem to be most active

Questions?