A Tiny Queuing System for Blast Servers



Similar documents
CycleServer Grid Engine Support Install Guide. version 1.25

Syntax: cd <Path> Or cd $<Custom/Standard Top Name>_TOP (In CAPS)

PharmaSUG Paper AD11

Do it Yourself System Administration

Extending Remote Desktop for Large Installations. Distributed Package Installs

Installation Guide for AmiRNA and WMD3 Release 3.1

Monitoring a Linux Mail Server

INASP: Effective Network Management Workshops

Load Balancing/High Availability Configuration for neoninsight Server

Bitrix Site Manager ASP.NET. Installation Guide

LSKA 2010 Survey Report Job Scheduler

dotdefender v5.12 for Apache Installation Guide Applicure Web Application Firewall Applicure Technologies Ltd. 1 of 11 support@applicure.

Secure Shell Demon setup under Windows XP / Windows Server 2003

Monitoring Clearswift Gateways with SCOM

User s Manual

Lesson 7 - Website Administration

Sitemap. Component for Joomla! This manual documents version 3.15.x of the Joomla! extension.

Unix Scripts and Job Scheduling

DiskPulse DISK CHANGE MONITOR

Basic C Shell. helpdesk@stat.rice.edu. 11th August 2003

Beyond Windows: Using the Linux Servers and the Grid

Backing Up TestTrack Native Project Databases

Installation Guide for WebSphere Application Server (WAS) and its Fix Packs on AIX V5.3L

INSTALLING KAAZING WEBSOCKET GATEWAY - HTML5 EDITION ON AN AMAZON EC2 CLOUD SERVER

Installing and running COMSOL on a Linux cluster

Cisco Networking Academy Program Curriculum Scope & Sequence. Fundamentals of UNIX version 2.0 (July, 2002)

WEB2CS INSTALLATION GUIDE

CLC Server Command Line Tools USER MANUAL

Lab 3 Assignment (Web Security)

Wolfr am Lightweight Grid M TM anager USER GUIDE

Monitoring Oracle Enterprise Performance Management System Release Deployments from Oracle Enterprise Manager 12c

How To Script Administrative Tasks In Marklogic Server

Introduction to Sun Grid Engine (SGE)

PHP Debugging. Draft: March 19, Christopher Vickery

Job Scheduling with Moab Cluster Suite

IBM Redistribute Big SQL v4.x Storage Paths IBM. Redistribute Big SQL v4.x Storage Paths

Navigating the Rescue Mode for Linux

How Strong Is Your Fu?

WebSphere Application Server security auditing

Introduction to Shell Programming

How to use the UNIX commands for incident handling. June 12, 2013 Koichiro (Sparky) Komiyama Sam Sasaki JPCERT Coordination Center, Japan

Fermilab Central Web Service Site Owner User Manual. DocDB: CS-doc-5372

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

1 How to Monitor Performance

Using the AVR microcontroller based web server

1 How to Monitor Performance

How to Push CDR Files from Asterisk to SDReporter. September 27, 2013

Automating admin tasks using shell scripts and cron Vijay Kumar Adhikari.

Integrity Checking and Monitoring of Files on the CASTOR Disk Servers

Installing Platform Product Suite for SAS (Windows)

WebIOPi. Installation Walk-through Macros

depl Documentation Release depl contributors

STEP 4 : GETTING LIGHTTPD TO WORK ON YOUR SEAGATE GOFLEX SATELLITE

Training Day : Linux

EZcast technical documentation

MySQL Backups: From strategy to Implementation

This appendix describes the following procedures: Cisco ANA Registry Backup and Restore Oracle Database Backup and Restore

Linux Syslog Messages in IBM Director

IT Support Tracking with Request Tracker (RT)

ArcGIS for Server: Administrative Scripting and Automation

1. What is this? Why would I want it?

Legal and Copyright Notice

Configuring High Availability for VMware vcenter in RMS Distributed Setup

Automated Offsite Backup with rdiff-backup

SYSTEM BACKUP AND RESTORE (AlienVault USM 4.8+)

Unix Sampler. PEOPLE whoami id who

Application Notes for MUG Enterprise Call Guard with Avaya Proactive Contact with CTI Issue 1.0

DoD Public Key Enablement (PKE) Quick Reference Guide. Securing Apache HTTP with mod_ssl for Linux

Collaborate.ets.org Password Setup & Recovery Guide. Table of Contents

Command Line Crash Course For Unix

TCH Forecaster Installation Instructions

AUTOMATED BACKUPS. There are few things more satisfying than a good backup

Camilyo APS package by Techno Mango Service Provide Deployment Guide Version 1.0

Cloud Homework instructions for AWS default instance (Red Hat based)

Installing and Using the Zimbra Reporting Tool

EventTracker: Configuring DLA Extension for AWStats Report AWStats Reports

TP1: Getting Started with Hadoop

Distance-Learning Remote Laboratories using LabVIEW

Hands-On UNIX Exercise:

Installing buzztouch Self Hosted

Integrating VoltDB with Hadoop

System i Access for Web Configuring an Integrated Web Application Server Instance

Reflection DBR USER GUIDE. Reflection DBR User Guide. 995 Old Eagle School Road Suite 315 Wayne, PA USA

USEFUL UNIX COMMANDS

Load testing with. WAPT Cloud. Quick Start Guide

Secure File Transfer Installation. Sender Recipient Attached FIles Pages Date. Development Internal/External None 11 6/23/08

ClickCartPro Software Installation README

Avira Update Manager User Manual

EventTracker: Configuring DLA Extension for AWStats report AWStats Reports

Installing IBM Websphere Application Server 7 and 8 on OS4 Enterprise Linux

Setting Up a CLucene and PostgreSQL Federation

CASHNet Secure File Transfer Instructions

Transcription:

A Tiny Queuing System for Blast Servers Colas Schretter and Laurent Gatto December 9, 2005 Introduction When multiple Blast [4] similarity searches are run simultaneously against large databases and no parallel computing facility is at hand, the IO-bound process requires to limit the number of concurrent runs. Therefore we setup a queuing system to ensure that, at any time, the executed jobs do not require more than the available IO resources. Existing queuing systems like OpenPBS [1], Maui [2] or Sun Grid Engine [3] could be installed to solve these requirements. However, we developed a set of shell scripts that work together to add lightweight queuing abilities to any UNIX-like systems. The features of our solution are: it only relies on shell scripts and the crontab service it is easy to install, use and extend it provides e-mail notifications The set of scripts of a real-case application are given and commented. The example presented has been limited (i.e. in the number of Blast parameters) for didactical purposes. This document should help anyone to adapt those scripts and extend the system for further requirements. Framework In the following sections, we describe each functionality by providing the scripts and their dependencies. First, we describe how users submit jobs to the system. Then, we explain how we implemented the queuing mechanism. We setup an active system loop that watches for the presence of new job requests and a locking mechanism to limit resources usages. A general deployment diagram describing the queuing system is given in figure 1. The requirements for the queuing system are a UNIX-like operating system, a running webserver and the necessary Blast programs and databases. cschrett@ulb.ac.be lgatto@ulb.ac.be 1

Figure 1: Deployment diagram of the queuing system for Blast searches. Job Submission The queueing system manages two sets of files: 1. Blast searches ready to be run as soon as resources are freed 2. result files of recently finished searches The search results are saved in a web accessible directory (i.e. /var/www/localhost/htdocs/). Each file is uniquely identified by its exact generation time and can be consulted by the user. The ready-to-be-run searches (consisting of a blastall command and the input queries) are stored in the queue directory. The queuing blast file are identified by their exact generation time and the users contact email adress. Note that the blast/queue and blast/results directories have to be respectively writable and readable by the apache user. We use a modified WWWBlast web page as entry to the queuing system. The only modification consists in the addition of an HTML text field for the users email adress. When posted, the form invokes a shell cgi script queue_blast_jobs.sh located in a directory containing server scripts, typically /var/www/localhost/cgi-bin/. This script will parse the user s blast parameters and add the appropriate file to the blast jobs queue. To avoid a possible delay if resources are free, the process_blast_jobs.sh script (see below) is immediately launched (independently of cron). Listing 1: queue blast jobs.sh #! / bin / sh # Authors : Colas Schretter and Laurent Gatto { cschrett, lgatto } @ulb. ac. be, 2004-2005 # change to the results and queue directories cd / var / www / localhost / htdocs / blast # unique temporary file identified by the ID of the running process TMPFILE =/ tmp / queue_blast_post. $$ # saving the cgi post cat > $TMPFILE # parsing the cgi post file to extract blast parameters 2

DATALIB =$( cat $TMPFILE grep -Z -A 2 " DATALIB " tail -n 1 tr -d \r ) PROGRAM =$( cat $TMPFILE grep -Z -A 2 " PROGRAM " tail -n 1 tr -d \r ) EMAIL =$( cat $TMPFILE grep -Z -A 2 " EMAIL " tail -n 1 sed s/@/[ at ]/ tr -d \r ) DATE =$( date +%Y%m%d%H%M%N ) # creating a new file in the queue # the size of the query input is limited to 1000 lines echo " echo \" $( cat $TMPFILE grep -Z -A 1000 " SEQUENCE " sed -e 1,2d -e /---/, $ d )\" blastall -p $PROGRAM -d $DATALIB -o../ results / $DATE. html -T T" > queue / $DATE. $EMAIL. blast # remove cgi post rm $TMPFILE # output an HTML page echo " Content - type : text / html " echo echo " <h2 > Your BLAST job has been enqueued </ h2 >" echo " <p> The status of your request will be notified by email when resources are available </p>" echo " <p> You results will be availables at http :// yourblastserver. org / blast / results / $DATE. html as soon as possible. </p>" # try to run the BLAST immediately./ process_blast_jobs. sh Job Watcher and Launcher A crontab daemon periodically runs the process_blast_jobs.sh script that checks if files are stored in the queue directory and eventually runs the Blast searches. MAILTO="" * * * * * /var/www/localhost/htdocs/blast/process_blast_jobs.sh In this example, Blast queries will be detected in less than a minute. The script first checks if the necessary ressources are available. When an application is running behind the queuing system, a lock file is created as a flag and the script exits. If no lock file is present, process_blast_jobs.sh first locks the system. For all the jobs in the queues, it then chooses the oldest job file, recovers the users email adress and the future result file url and launches the similarity search. When a Blast search finishes, the script mails the user the result url. When all the jobs in the queue are done, the system is unlocked to allow the next searches to be performed. Listing 2: process blast jobs.sh #!/ bin / sh # Authors : Laurent Gatto and Colas Schretter { lgatto, cschrett } @ulb. ac. be, 2004-2005 # change to the working directory cd / var / www / localhost / htdocs / blast / queue / if [ -e./ locked ] then # Resources are not available exit 0 3

fi # lock de system touch./ locked # process all enqueued files reverse sorted by timestamp for JOB in $( ls -1tr * blast ) do # get the results url and the user s email EMAIL =$( basename $JOB. blast sed -e s /^[0-9]*\.// -e s /\\[ at \\]/ @/ ) HTML =${ JOB %%.*}. html sh rm $JOB $JOB # mail user that the results are ready echo " Dear user, your BLAST results are available at http :// yourblastserver. org / blast / results / $HTML Enjoy " mail -s " BLAST results " $EMAIL done # job ( s) is / are done ; resources are freed rm./ locked Possible extensions It is further possible to distribute the computation charge among several independent and dedicated servers. A possibility would be to have a web server as the entry-point for job submissions, that manages the queue and result directories. In order to ensures interactive performances to any visitor, the resources of this server are exclusively reserved for dynamic web page generation and mailing capabilities. Resource-consuming tasks are not allowed to run on the web server. Therefore, executables are launched on a third party application server. The shell cgi scripts queue_blast_jobs.sh and process_blast_jobs.sh would, in such case, still be located on the webserver, but the Blast search would be run over ssh on the application server and the results would be copied back to the webserver 1 : ssh wwwrun@application_server "$JOB" scp -q wwwrun@application_server:~/results/$output $QUEUEDIRS/results/. It is further possible to define two or more simultaneous jobs by allowing two or more locked files to be created. Different locked files may be associated with different application hosts. process_blast_jobs.sh would check for the existence of locked_1 to locked_n before starting a new search. References [1] http://www.openpbs.org. 1 ssh and scp need to be automated with ssh keys to avoid password requests 4

[2] http://www.clusterresources.com. [3] http://gridengine.sunsource.net. [4] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J Mol Biol, 215(3):403 410, October 1990. 5