High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

Transcription

1 High Performance Computing with Sun Grid Engine on the HPSCC cluster Fernando J. Pineda

2 HPSCC High Performance Scientific Computing Center (HPSCC) " The Johns Hopkins Service Center in the Dept. of Biostatistics that operates a computing cluster on a fee-for-service basis. Two server rooms (E3607) and the CDSR (basement)." " Mission To provide a shared high-performance computing environment for research in biostatistics, genetics, computational biology and bioinformatics." Condo model of shared computing" " Administrative team (1.7FTE) Fernando Pineda (Director) Marvin Newhouse (Computing Systems Manager) Jiong Yang (Systems Engineer) Cindy Hockett (Financial Admin) BIT committee (Advisory committee)

3 CDSR racks, UPS & Tape library

4 Marvin & the Qualstar tape Library

5 Computing facilities (mostly in E3605) 1074 cores 8.5TB RAM

6 Storage facilities (in the CDSR) 1000TB (formatted)

7 User view of the HPSCC cluster (the reality is more complex) compute farm User machines Login server enigma2.jhsph.edu 400 cores direct attached local disks ( /tmp, /scratch ) LAN 100 Mbs Private Network 1000 Mbs Mbs switches".. storage farm NFS /nexsan2 /nexsan /thumper2 /thumper /home" 10/2009 Nexsan Sataboy (10.5TB raw) Sunfire X4500 servers (2 x 24TB raw) Sun 7210 (22.5TB raw)

8 System Software Red Hat Enterprise Linux Login server (and each compute node) has its own instance of RHEL (login server may not be at the same OS revision level as compute nodes) See The Linux Documentation Project or Solaris with ZFS/NFS (Network File System) Sun J4500 (Amber1) NFS server with multi-gigabit network connectivity exports /home file systems to login server and compute nodes DIY DCS (670TB NFS server with multi-gigabit network connectivity Other NFS exports NFS on Stan exports static project data to login server and cluster NFS on Thumper(s) exports static project data to login server and cluster Rocks > Cluster build and maintenance tool Sun Grid Engine (SGE 6.0u8) --> 6.2u2 Used for job submission to cluster Provides cluster resource and load management Cluster accounting

9 All computing on the cluster is done by logging into enigma2 and then starting interactive sessions or submitting batch jobs via Sun Grid Engine (SGE). compute farm User Login server enigma2.jhsph.edu ssh qrsh qsub switches" 10/2009

10 Sun Grid Engine (SGE) The key that unlocks resources on the cluster Job submission qrsh start interactive session qsub submit batch job Job management qdel delete a job from a queue Cluster information display qstat job status listings qu to see your jobs qhost qconf information about execution hosts information about cluster and queue configuration

11 Interactive jobs qrsh establishes a remote shell connection (ssh) on an unspecified node. Examples: qrsh # open remote shell qrsh R # open R in remote shell qrsh vi # open vi in remote shell Establish an ssh connection and run executable with given resource constraints: qrsh -l mem_free=3.0g R qrsh -l mem_free=5g R CMD BATCH my.r

12 bash> ssh enigma2> qrsh -l iact,mem_free=6g,h_vmem=10g Last login: Thu Oct 15 23:45: from enigma2.local Rocks Compute Node Rocks (Cydonia) Profile built 16:08 08-Jul-2008 Kickstarted 07:27 08-Jul-2008 compute-0-46> cd teaching compute-0-46> BLASTDB=/home/mmi/fernando/myblast/db compute-0-46> blastall -p blastn \ -d cel_hairpins \ -i test_queries.fasta \ -o workspace/output.txt

13 #/bin/bash ## # blast_demo.sh # demonstrates batch job submission ## #$ -N blast_job #$ -l h_rt=0:5:0 # kill runaway job at 5 minutes #$ -cwd #$ -o /home/mmi/fernando/teaching/workspace/blast_job.out #$ -e /home/mmi/fernando/teaching/workspace/blast_job.err # ## # main script ## OUTPUTDIR=/home/mmi/fernando/teaching/workspace BLASTDB=/home/mmi/fernando/myblast/db blastall -p blastn \ -d cel_hairpins \ -i test_queries.fasta \ -o $OUTPUTDIR/blast_job.txt enigma2> qsub blast_job.sh

14 Running I/O intensive jobs Understanding bandwidth constraints User Login server enigma2.jhsph.edu Each machine has it s own temporary storage DAS. slow fast ssh qrsh qsub NFS (Network file system) DAS (direct attached storage) RAM (local memory) switches" NFS /home. your home directory is here

15 Use file staging for I/O intensive jobs Performing sequential I/O to your home directories over NFS can be very very slow. Break the bottleneck with the staging technique For each batch job executed, SGE creates a local temporary directory under /tmp and exports the path name in the $TMPDIR environment variable. start your processing by copying input files into $TMPDIR. perform all your I/O from/to files in $TMPDIR end processing by copying output files out of $TMPDIR into a permanent location (under your home directory) when the job exits, $TMPDIR, and anything in it, is deleted by SGE so that the temporary storage is available for the next job.

16 staging.sh #/bin/bash # Author: Fernando J. Pineda # A file that demonstrates staging of a file to # a local scratch partition # # SGE options & parameters # (1) the name of the job #$ -N SGE_DEMO # (2) resource requirements #$ -l h_rt=0:30:0 #$ -l mem_free=1.0g # (3) output files #$ -cwd #$ -o demo.out #$ -e demo.err # stage the input data # (1) copy big file of compressed data mkdir $TMPDIR/data cp ~/teaching/c_elegans.tar.gz $TMPDIR/data/. cd $TMPDIR/data ls -lh c_elegans.tar.gz # (2) decompress and untar the data tar -xzvf c_elegans.tar.gz du -ch $TMPDIR/data grep total # Do your processing # # (1) in this case just move the data # to a temporary output directory # and tar it again mkdir $TMPDIR/output mv $TMPDIR/data $TMPDIR/output cd $TMPDIR/output tar -czf results.tar.gz data # save the results of this computation # in my home directory and quit cp results.tar.gz $HOME/.

17 Getting help on-line support for system and application-specific questions Web site with tutorials and instructions, primarily with a Biostat bent, but also has tutorials on using the cluster bitsupport@jhsph.edu System support for the cluster Monitored by: Benilton Carvalho, Rafael Irizarr, Harris Jaffee, Konstantin Milman, Marvin Newhouse, Roger Peng, Fernando Pineda, Jiong Yang Please use bitsupport rather than contacting Jiong and Marvin directly. bithelp@jhsph.edu Application support for R, perl, emacs, blast, etc. Monitored by: Ming-Wen An, Martin Aryee, Christopher Barr, Brian Caffo, Benilton Carvalho, Hector Corrada, Haley Hedlin, Rafael Irizarry, Harris Jaffee, Elizabeth Johnson, Qing Li, Aidan McDermott, Marvin Newhouse, Roger Peng, Nicholas Reich, Ingo Ruczinski, Bruce Swihat, Jiong Yang The JHPCE web site, still under development