Vital-IT Users Training: HPC in Life Sciences
|
|
|
- Herbert Townsend
- 9 years ago
- Views:
Transcription
1 Vital-IT Users Training: HPC in Life Sciences Vital-IT Group, Lausanne Contact: Status: 29 January 2015
2 Objectives of this course Obtain basic knowledge on high throughput and high performance computing Learn how to use computer clusters at Vital-IT Tips and tricks for effective and efficient cluster usage 2, HPC in Life Sciences
3 Outline 1. Motivation 2. Background on Scientific Computing 3. Practical usage of life science HPC infrastructure 4. Vital-IT: Different storage systems and guidelines 3, HPC in Life Sciences
4 Motivation Set up a desktop machine (PC) with bioinformatics software Run e.g. tophat, BLAST, HMMER, velvet etc. on a small data set Possible? How long does it take? Run it against a larger data set, multiple genomes etc. Possible? How long does it take? Are we happy with the software/ hardware environment? Performance? Storage space? Maintenance? etc. 4, HPC in Life Sciences
5 Bioinformatics Competence Center 5, HPC in Life Sciences
6 Missions To support research groups in biology and medicine To maintain and develop algorithms in life science To teach and liaise with embedded bioinformaticians (education in algorithm development and use of high performance computing environment) To keep active research inside Vital-IT, mainly through collaborations with partners To operate high productivity computational and storage resources 6, HPC in Life Sciences
7 Vital-IT Team 7, HPC in Life Sciences
8 Vital-IT and Partner Resources in Switzerland Arc lemanic (3 clusters managed by Vital-IT team) UNIL, EPFL, UNIGE (>3500 processing cores) One login for all clusters; LSF for job submission Production usage is based on project approval! Linux CentOS 6.4, parallel file system, large storage archive University of Berne Operated by Uni Berne Informatikdienste UBELIX ( ~1200 cores; SGE for jobs submission Vital-IT software + databases are available in collaboration with Vital-IT 8, HPC in Life Sciences
9 Vital-IT and Partner Resources in Switzerland EPFL UNIL UNIGE 9, HPC in Life Sciences
10 Resources on the Vital-IT Web Site CPU cores 10, HPC in Life Sciences
11 Outline 1. Motivation 2. Background on Scientific Computing Compute Cluster Parallelism 3. Practical usage of life science HPC infrastructure 4. Vital-IT: Different storage systems and guidelines 11, HPC in Life Sciences
12 A typical compute cluster Usually, a cluster consists of Front-end machine where users log in and submit jobs Compute nodes (also called worker nodes or hosts) where the actual computation takes place Typically, one is not allowed to log into compute nodes Specialized software for submitting jobs to compute nodes, also referred to as Local Resource Management System (LRMS) or batch system LSF (Load Sharing Facility) Vital-IT (UNIL/EPFL/UNIGE) SGE (Sun/Oracle Grid Engine) Uni Berne (UBELIX) etc. 12, HPC in Life Sciences
13 A typical compute cluster (cont.) LSF: UNIL, EPFL, UNIGE login node (front-end) Local Resource Management System SGE: Uni Berne host host compute nodes host Several processing cores (CPUs) per host. A host is identified by a single host name. 13, HPC in Life Sciences
14 When should I use a cluster? An application runs too long on a single PC and takes too much CPU Just run the single job on a cluster The problem (input data) can be split into pieces which can be executed in parallel Data splitting and final merging Parameter sweep: only some input parameters are changed but the same application is executed several times 14, HPC in Life Sciences
15 Bioinformatics examples NGS, genome analysis (data intensive) tophat, bowtie, velvet, abyss, Solexa GA-Pipeline etc. Sequence search and alignment BLAST, HMMER etc. Molecular dynamics NAMD, GROMACS, CHARMM etc. Computational phylogeny Phylip, PAML, BEAST, RAxML etc. Protein identification X!Tandem, Mascot etc. 15, HPC in Life Sciences
16 Outline 1. Motivation 2. Background on Scientific Computing Compute Cluster Parallelism 3. Practical usage of life science HPC infrastructure 4. Vital-IT: Different storage systems and guidelines 16, HPC in Life Sciences
17 High Performance/Throughput Computing: run things in parallel Embarrassingly parallel Massively parallel Jobs run independently Multiple results are produced Communication, synchronisation classic HPC, uses MPI on supercomputers 17, HPC in Life Sciences
18 Outline 1. Motivation 2. Background on Scientific Computing 3. Practical usage of life science HPC infrastructure - Vital-IT UNIL/EPFL/UNIGE infrastructure overview - Concrete cluster usage 4. Vital-IT: Different storage systems and guidelines 18, HPC in Life Sciences
19 Using the Vital-IT Cluster at UNIL In fact, there are two clusters! Production cluster (prd.vital-it.ch front-end / login node) More than 2000 CPUs on more than 140 compute nodes This login node is visible on the Internet (register your IP address to see this machine) Development cluster (dev.vital-it.ch front-end / login node) 2 compute nodes This cluster is not visible on the Internet Hardware environment is rather homogeneous Based on Intel s Xeon architecture No computation/extensive CPU usage on prd/dev frontends! If you need to compile and test code, please do it on dev. 19, HPC in Life Sciences
20 How to login to Vital-IT s clusters UNIL (Uni Lausanne) ssh prd.vital-it.ch ssh dev.vital-it.ch EPFL ssh frt.el.vital-it.ch UNIGE (Uni Geneva) ssh frt.ug.vital-it.ch 20, HPC in Life Sciences
21 Vital-IT Production Cluster Status Example 21, HPC in Life Sciences
22 How do we get >2000 CPUs on the UNIL/Prod Cluster (~150 hosts)? Host name Archit. CPUs Speed RAM total cpt037-cpt100 Intel 8 3 GHz 16 GB 64 * 8 = 512 cpt133-cpt148 Intel 4 3 GHz 16 GB 16 * 4 = 64 cpt Intel 4 3 GHz 8 GB 16 * 4 = 64 (165,166, ,186)* AMD GHz 512 GB 7 * 64 = 448 cpt187, 188 AMD GHz 256 GB 2 * 48 = 96 cpt Intel GHz 16 GB 32 * 8 = 256 dee-cpt01-08 Intel GHz 24 GB 8*16 = 128 Not all CPUs are available all the time! 22, HPC in Life Sciences Need clever software for selecting suitable compute node. (Local Resource Management System) *Priority/reserved for some UNIL groups
23 Large memory machines rserv.vital-it.ch AMD, 8 cores, 64 GB RAM rserv01.vital-it.ch AMD, 64 cores, 256 GB RAM Usage policies: Dedicated to the memory-intensive jobs LSF cannot be used to submit jobs to this machine Use direct ssh connections to the machine (from prd) Other machines with 256 GB and 512 GB RAM restricted access Intel and AMD Opteron (incl. GPUs for specific projects) EPFL/UNIGE clusters: min. 128 GB RAM 23, HPC in Life Sciences
24 How to use the cluster Linux terminal interface UNIX knowledge is essential ssh access to front-end only No need to select a compute node yourself We have LSF (Load Sharing Facility) that Knows about all compute nodes Accepts user jobs Submits jobs to a compute node Knows about status of jobs 24, HPC in Life Sciences
25 LSF (Load Sharing Facility) login node (front-end) prd.vital.it (LSF client) LSF Server cpt037 compute nodes cpt141 compute nodes cpt191 25, HPC in Life Sciences
26 Jobs and Queues Every program to be executed on a cluster needs to be defined as a job with a job description: Executable, input data, output data, etc. Job characteristics (CPU, memory, architecture etc.) Priorities and run time limitations Queues are used to classify jobs Different queues for different priorities/running times normal, long, priority, etc. Job states: pending, running, suspended, done, exit (usually refers to an error) 26, HPC in Life Sciences
27 Vital-IT UNIL s main LSF queues in detail queue$name$ (me$limit$ max.$jobs$ simultan.$ max.$cpus$per$ job$ Default$RAM$ in$gb$ $ priority$ normal' default(queue( 24'hours' 150' 64' 2' medium' long' 10'days' 64' 16' 2' low' priority' 60'min' (interact.( jobs)( keep(it( low ( moderate'' 2' high' Beyond'this':me' limit,'jobs'might' be'killed.' 27, HPC in Life Sciences That s'the'max.'number' of'jobs'a'single'user''can' ac:vely'run'at'any'given' :me.' Parallel' execu:ons'per' job.' Each'CPU'is'shared' (poten:ally'with'other'users).' One'cannot'reserve'a'single' CPU'for'a'single'job.' Jobs'with'higher'priority'are' treated'first'(preemp:ve' jobs)'
28 LSF queues at EPFL and UNIGE Similar queues/setup but machines have more cores and more memory EPFL 16 hosts, 48 CPUs (cores) each Queue$ RAM$ default$ RAM$$ max$ normal' 2'GB' 128'GB' long' 2'GB' 128'GB' priority' 2'GB' 64'GB' UNIGE 19 hosts, 48 CPUs (cores) each Queue$ RAM$ default$ RAM$$ max$ normal' 2'GB' 238'GB' long' 2'GB' 57'GB' priority' 2'GB' 57'GB' interac:ve' 2'GB' 238'GB' 28, HPC in Life Sciences
29 Usage hints basic overview Which queue should I use? Try to estimate overall processing time Potentially, test with priority queue for short and interactive jobs Where to write data/results? /scratch/cluster/[daily weekly monthly]/userid OR /scratch/local/[daily weekly]/userid Please don t write to /home/ if job runs on cluster! Directories are shared between all compute nodes Shared file systems (except /scratch/local!) Where to archive (long term storage)? There is an archive system More'details'on' storage'will'be' given'later' 29, HPC in Life Sciences
30 Which version of the software + database? You always need to know which software version and database version you use Version might change over time! Might make all your experiments invalid! Always check version before you start experiments Don t rely on default installations blindly Verify again and again! $ bowtie2 --version version , HPC in Life Sciences
31 Applications (bioinformatics software tools) Many applications are installed Specific version needs to be selected explicitly e.g. bwa-0.7.5a 31, HPC in Life Sciences
32 Accessing Vital-IT Software General convention: /software/<category>/<package>/<version>/ <category> scientific/technical category such as UHTS, Phylogeny, LC- MS-MS, Development, etc. <package> is the general name of the software package <version> version of the package incl. different compiler/architecture names if needed. /software/uhts/aligner/bowtie2/2.2.1 /software/uhts/quality_control/fastqc/ /software/phylogeny/paml/4.5 /software/development/java_jre/1.8.0_05 /software/r/3.0.2 /software/bin/gcc 32, HPC in Life Sciences
33 Activate package: module avail add rm Check what is available (not necessarily activated!) module avail Activate (add) a specific software tool to PATH module add UTHS/Quality_control/fastqc/ Check what is activated module list Deactivate (remove) a software tool module rm UHTS/Quality_control/fastqc/ , HPC in Life Sciences
34 Accessing Vital-IT Software ~]$ vit_soft case'insensi:ve'search' Missing argument Tool to retrieve information about Vital-IT software (version: 0.0.5) Usage: vit_soft -search=<uvw> vit_soft -package=<xyz> vit_soft -module=<xyz> vit_soft -binary=<abc> vit_soft -latest vit_soft -help Text search for packages (c.i.) Get package information Get module command for a package Search package providing this binary Get latest installed packages More details 34, HPC in Life Sciences
35 Accessing Vital-IT Software vit_soft -s tie VitalIT-Utilities x86_64 perl-tie-toobject x86_64 perl-tie-ixhash el5.centos.noarch perl-html-entities-numbered x86_64 bowtie x86_64 bowtie x86_64 bowtie beta6-2.x86_64 bowtie x86_64 Same'as:' vit_sow'' search':e' vit_soft -m bowtie2 module add UHTS/Aligner/bowtie2/2.2.1 module add UHTS/Aligner/bowtie2/2.1.0 module add UHTS/Aligner/bowtie2/2.0.0beta6 Same'as:' vit_sow' module'bow:e2' 35, HPC in Life Sciences
36 Outline 1. Motivation 2. Background on Scientific Computing 3. Practical usage of life science HPC infrastructure - Vital-IT UNIL/EPFL/UNIGE infrastructure overview - Concrete cluster usage - Using UBELIX + Vital-IT software 4. Vital-IT: Different storage systems and guidelines 36, HPC in Life Sciences
37 LSF command line client Interact with LSF queue using the following basic commands: bsub submit job bjobs obtain job status bkill kill job Other useful commands bswitch move job to another queue bhist historical information on job bpeek display std. output/error of unfinished job bqueues information about available queues 37, HPC in Life Sciences
38 Basic LSF example (1) Example command we want to execute: blastall -p blastp -d "swiss" -i p seq Write a shell script (job description) blast.sh: #!/bin/bash blast.sh #BSUB -L /bin/bash #BSUB -o blast-output.txt #BSUB -e blast-error.txt standard'output(is'redirected' to'a'file'indicated'by'yo' standard'error'indicated'by'ye' blastall -p blastp -d "swiss" -i p seq 38, HPC in Life Sciences
39 Basic LSF example (2) ''Submit'job' ' Note'the' < 'sign!' bsub <./blast.sh Job <870965> is submitted to queue <normal>. ''Check'status' ' Job'ID'that'uniquely' iden:fies'a'job' bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME user007 RUN normal frt cpt204 *23456.seq Nov 16 16:40 bjobs No unfinished job found ls blast-error.txt blast-output.txt Result'is'ready'if'job'is'no' longer'displayed.' Note:'bjobs'without'job' ID'displays'informa:on'on' all'of'your'ac:ve'jobs.' ' 39, HPC in Life Sciences
40 Basic LSF example (3) bjobs If'job'ID'is'used'explicitly,'also'the' DONE'status'is'shown.' JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME user007 DONE normal frt cpt204 *23456.seq Nov 16 16:40 bjobs a Info'about'all'jobs'(running'and'finished).' Only'available'for'a'few'hours!' JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME user007 DONE normal frt cpt129 *;hostname Nov 17 11: user007 DONE normal frt cpt129 *;hostname Nov 17 11: user007 DONE normal frt cpt014 *simple.sh Nov 17 11: user007 DONE normal frt cpt194 *23456.seq Nov 17 11: user007 EXIT normal frt cpt014 *stderr.sh Nov 17 11: user007 DONE normal frt cpt005 *stderr.sh Nov 17 11:37 40, HPC in Life Sciences
41 Where to write results, store files Location for input files and results (output files): /scratch/ Location for scripts /home/<userid> or /scratch/ 41, HPC in Life Sciences
42 How do we apply that to our basic example? Script to launch blastall /home/user007/blast.sh Create directory in /scratch mkdir /scratch/cluster/weekly/user007 Change working directory to /scratch cd /scratch/cluster/weekly/user007 Launch job with reference to blast.sh in home directory bsub < /home/user007/blast.sh 42, HPC in Life Sciences Please'do'that'for'(all)'your'jobs!' Other'op:ons'to'write'into'/scratch?'
43 LSF example define a job name Give a job a meaningful name #!/bin/bash #BSUB -L /bin/bash #BSUB -o blast-output.txt #BSUB -e blast-error.txt #BSUB -J blastp blastall -p blastp -d "swiss" -i p seq blast-with-jobname.sh bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME user007 DONE normal frt cpt129 blastp Nov 17 14:00 43, HPC in Life Sciences
44 LSF example use job ID in output file name Distinguish jobs and files #!/bin/bash #BSUB -L /bin/bash #BSUB -o blast-%j-output.txt #BSUB -e blast-%j-error.txt blastall -p blastp -d "swiss" -i p seq blast-with-jobid.sh bsub <./blast-with-jobid.sh Job <885626> is submitted to default queue <normal>. $ls *.txt blast error.txt blast output.txt 44, HPC in Life Sciences
45 LSF example notification Receive an when job has finished Includes CPU usage information (timing) etc. #!/bin/bash #BSUB -L /bin/bash #BSUB -o blast-output.txt #BSUB -e blast-error.txt #BSUB -u #BSUB -N blastall -p blastp -d "swiss" -i p seq blast-with- .sh See example on next slide 45, HPC in Life Sciences
46 . Your job looked like: # LSBATCH: User input #!/bin/bash eamail$is$sent$by$lsf$server$ #BSUB -L /bin/bash #BSUB -o blast-output.txt #BSUB -e blast-error.txt #BSUB u My @example.org #BSUB -N Job'descrip:on' blastall -p blastp -d "swiss" -i p seq Successfully completed. Resource usage summary: CPU time : 2.44 sec. Max Memory : 1 MB Max Swap : 15 MB Job'status' Resource'usage' (performance' informa:on)' Max Processes : 1 Max Threads : 1 Read file <blast-output.txt> for stdout output of this job. Read file <blast-error.txt> for stderr output of this job. 46, HPC in Life Sciences Informa:on'on'output'and' error'files.'
47 Interactive Job Job output is displayed on the terminal rather than written into a file Gives impression that job is executed on the local machine Waits until job has finished echo "The hostname is... " hostname bsub -I <./simple.sh Job <886075> is submitted to default queue <normal>. <<Waiting for dispatch...>> <<Starting on cpt194>> The hostname is... cpt194 simple.sh 47, HPC in Life Sciences
48 Interactive Job with X11-forwarding Jobs with a graphical display can be used in interactive mode graphical interface can be displayed (forwarded) on log-in node (ssh X) #!/bin/bash #BSUB -L /bin/bash #BSUB -o output.txt #BSUB -e error.txt #BSUB -J interact #BSUB -XF xterm beast terminal'type' X11-forwarding.sh 48, HPC in Life Sciences
49 Processes, threads, multitasking A process is an executing - or running - program identified by a unique process identifier (PID). Looking closer, a process is some kind of entity - sometimes referred to as an "allocation unit" - that comprises several things, such as the running program, a reserved space in memory, temporary data, etc. A thread is a 'light weight' process - or "unit of execution" - that is contained in a process. It uses the same environment as the process it belongs to - PID, memory, data, etc. Each process has one or more threads but each thread belongs to one process only! 49, HPC in Life Sciences
50 Processes, threads, multitasking Long ago in computer age a computer contained one central processing unit CPU. One machine 1 CPU / 1 core On such a computer, only ONE process is active at a time. And for processes with several threads, only ONE thread is active at a time even if we get the impression that several threads run in parallel. Indeed, the CPU switches very fast between the threads - some CPU time is given to process one, then some CPU time is given to process two, etc. until the all the processes have completed - which gives the impression that they "kind of run in parallel". 1 process With 4 threads All the threads run sequentially. Thus, the total execution time is the sum of the time needed to complete all four threads individually. 50, HPC in Life Sciences
51 Processes, threads, multitasking Based on the fact that '1 thread -> 1 CPU', engineers have multiplied the CPU - or cores (*) to allow multiple threads to run simultaneously. One machine 4 CPUS / 4 cores This means also, that the more cores there are, the more threads can be executed at the same time - for real this time! (*) Cores and CPU are not exact synonyms. Actually, the execution parts of the CPU were duplicated and reorganized into 'cores'. Thus, a CPU can contain one or more cores - bi-core, quadri-core, etc. 1 process With 4 threads All the threads run at the same time - in parallel. Thus the total execution time is the time needed to complete one thread - assuming they take the same time to complete. 51, HPC in Life Sciences
52 Multi-threaded jobs (1 thread per core) By default, LSF reserves 1 core (CPU) per job Multi-threaded jobs need to request more cores on the same host (machine)! #!/bin/bash #BSUB -L /bin/bash #BSUB -o output.txt #BSUB -e error.txt #BSUB -n 4 #BSUB R "span[ptile=4]" 4'CPU'cores'(processors)' requested' 4'cores'on'the'same' host!' myprogram --threads=4 52, HPC in Life Sciences
53 Resource/memory requirements Job can explicitly require certain hardware resources such as RAM (main memory), swap space, etc. 2 GB RAM max per default! #!/bin/bash #BSUB -L /bin/bash #BSUB -o output.txt #BSUB -e error.txt #BSUB R "rusage[mem=4096]" #BSUB -M hostname Required'memory'in'MB# (at'scheduling':me)' Required'memory'in'KB# (at'job'run':me)' mem-requirement.sh 53, HPC in Life Sciences
54 Cluster resources (lsload) temp. space /scratch/local RAM (main memory) 54, HPC in Life Sciences
55 Memory requirement and multi-core Request a multi-core job that needs lots of memory (e.g. 40 GB on one machine) #!/bin/bash #BSUB -L /bin/bash #BSUB -o output.txt #BSUB -e error.txt #BSUB -J mem-mult #BSUB -n 4 #BSUB -R "span[ptile=4]" #BSUB -R "rusage[mem=40000]" #BSUB -M /myprogram1 mem-multicore.sh 55, HPC in Life Sciences
56 Request/use local disk on compute node 10 GB at scheduling time (select) 6 GB at run time (rusage) Clean up directory yourself! #!/bin/bash #BSUB -L /bin/bash #BSUB -o output.txt #BSUB -e error.txt #BSUB -J localdisk #BSUB R "select[tmp>$((10*1024))]" #BSUB -R "rusage[tmp>$((6*1024))]" mkdir /scratch/local/daily/mydir [ ] rm rf /scratch/local/daily/mydir 56, HPC in Life Sciences
57 Check the availability of compute nodes lshosts -R "select[maxmem=400000]" HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES cpt165 X86_64 Opteron M 31999M Yes () cpt166 X86_64 Opteron M 31999M Yes () cpt181 X86_64 Opteron M 31999M Yes () cpt182 X86_64 Opteron M 31999M Yes () cpt183 X86_64 Opteron M 31999M Yes () cpt184 X86_64 Opteron M 31999M Yes () cpt186 X86_64 Opteron M 31999M Yes () Memory'size' Installed'in'host' 57, HPC in Life Sciences
58 Big'memory'requirements' will'rapidly'decrease'the'list' of'poten:al'execu:on'hosts.' Check the availability of compute nodes lsload -R "select[maxmem=400000]" HOST_NAME status r15s r1m r15m ut pg ls I t tmp swp mem Cpt165 ok % G 31G 500G Cpt182 ok % G 31G 336G cpt186 ok % G 31G 305G cpt183 ok % G 31G 245G cpt184 ok % G 31G 318G cpt181 ok % G 31G 310G cpt166 ok % G 30G 402G lsload -R "select[mem=400000]" Current'available' memory' HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem cpt165 ok % G 31G 500G cpt166 ok % G 30G 401G 58, HPC in Life Sciences
59 Check the availability of compute nodes lshosts gives information about hosts hardware specs lsload gives information about hosts current available resources (maxmem, mem, tmp, status, swp, etc.) 59, HPC in Life Sciences
60 Status pending for resource-hungry jobs If job is pending for a long time, use option p or -l to get more info why it is pending: bjobs'yp'164498' JOBID'''USER''''STAT''QUEUE''''''FROM_HOST'''EXEC_HOST'''JOB_NAME''' SUBMIT_TIME' ''userid'PEND''normal'''''frt'''''''''''''''''''''testjob''''May''4'09:59' 'Closed'by'LSF'administrator:'15'hosts;' 'Not'specified'in'job'submission:'52'hosts;' 'Job'requirements'for'reserving'resource'(mem)'not'sa:sfied:'95' hosts;' 'Load'informa:on'unavailable:'3'hosts;' 'Just'started'a'job'recently:'2'hosts;' '' 60, HPC in Life Sciences
61 Job runs too long: move it to another queue By default, jobs in the normal queue can run for 24 hours If you see that it might need to run longer, move the job to the long queue bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME testuse RUN normal frt.el.vita cpt16.el.vi lsftest Dec 7 12:10 bswitch long Job <85613> is switched to queue <long> 61, HPC in Life Sciences
62 Array Job: submit the same job multiple times Motivation/example: My simulation software mysim needs to be executed 20 times Submit a single job rather than 20 individual ones #!/bin/bash Index$in'job'array' #BSUB -L /bin/bash #BSUB -J array[1-20] #BSUB -o array-output-%j-%i.txt #BSUB -e array-error-%j-%i.txt array-job.sh run-mysim --seed=$lsb_jobindex Job'ID$ 62, HPC in Life Sciences
63 Job arrays (cont.) bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME user007 RUN normal frt cpt130 array[16] Mar 8 11: user007 RUN normal frt cpt101 array[9] Mar 8 11: user007 RUN normal frt cpt119 array[2] Mar 8 11: user007 RUN normal frt cpt127 array[4] Mar 8 11: user007 RUN normal frt cpt106 array[3] Mar 8 11: user007 RUN normal frt cpt131 array[1] Mar 8 11: user007 RUN normal frt cpt140 array[10] Mar 8 11: user007 RUN normal frt cpt138 array[12] Mar 8 11: user007 RUN normal frt cpt139 array[15] Mar 8 11:55 Nota bene: The same program will be executed multiple times Be careful when you write output files: Need to be sub-job specific $LSB_JOBINDEX env. var can be used in script $LSB_JOBID for entire job array! 63, HPC in Life Sciences
64 Pitfalls with LSF (1) bsub./blast.sh If'you'forget'to'use' <,'the'job' output$is$sent$via$eamail!$ ' Correct'command:' bsub <./blast.sh Job <886355> is submitted to default queue <normal>. bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME hstocki DONE normal frt cpt187./blast.sh Nov 17 16:34 ls blast.sh p seq Job'output'is'not'available'even'if'the'job' seems'to'have'finished'correctly.'' 64, HPC in Life Sciences
65 Pitfalls with LSF (2) #!/bin/bash #BSUB -L /bin/bash #BSUB -e blast-error.txt blastall -p blastp -d "swiss" -i p seq blast-woo.sh bsub <./blast-woo.sh If'standard'output'is'not' men:oned,'it'will'not'be'created' in'file'system'but'sent'via'eymail' (with'delay)!'add:' ' #BSUB -o blast-output.txt ' 65, HPC in Life Sciences
66 Pitfalls with LSF (3) By default, notifications are sent to mailbox connected to UNIX user id: Need to explicitly mention different address YN'is'missing'to' explicitly'evoke' no:fica:on!' #!/bin/bash #BSUB -L /bin/bash #BSUB -o blast-output.txt #BSUB -e blast-error.txt #BSUB -u blastall -p blastp -d "swiss" -i p seq blast-with- 2.sh 66, HPC in Life Sciences
67 Further information LSF commands man pages (man bsub, man bjobs etc.) All commands Infrastructure, cluster status, software, support etc. 67, HPC in Life Sciences
68 Vital-IT Helpdesk If you have project-related questions on how to use Vital-IT software etc., please send to: Try to be as specific as possible when asking a question/reporting a problem so it can be reproduced. Please specify: Machine/cluster Software package (command you used) + version Input data and directory where you worked Output/error you have received Copy/paste command used + full error message 68, HPC in Life Sciences
69 Summary: what can you expect from Vital- IT and the team? High performance cluster, i.e. hardware and software infrastructure Wide range of bioinformatics applications New applications can be added on request Expertise in several domains of biology, bioinformatics and computer science (IT) The Vital-IT team Can help you prepare and run your project on the cluster but cannot solve all questions related to all software applications! 69, HPC in Life Sciences
70 "New applications can be added on request" The Vital-IT team is making every effort to answer all requests as soon as possible. However: creation of accounts may take several days installation of new software takes 1-2 weeks (but can take more if the software is complex) Vital-IT expects you to help testing the application that you requested to install 70, HPC in Life Sciences
71 Outline 1. Motivation 2. Background on Scientific Computing 3. Practical usage of life science HPC infrastructure - Vital-IT UNIL/EPFL/UNIGE infrastructure overview - Concrete cluster usage - Using UBELIX + Vital-IT software 4. Vital-IT: Different storage systems and guidelines 71, HPC in Life Sciences
72 SGE (Sun/Oracle Grid Engine) login node (front-end) submit.unibe.ch SGE client SGE Server cnode01 compute nodes dnode03 compute nodes fnode02 72, HPC in Life Sciences
73 LSF vs SGE (UBELIX) main difference Task$ LSF$ SGE$ Submit'job' bsub' qsub' Check'status' bjobs' qstat' Cancel/Kill'job' bkill' qdel' 73, HPC in Life Sciences
74 LSF vs SGE (UBELIX) main difference LSF: bsub < blast.sh #!/bin/bash #BSUB -o blast-output.txt #BSUB -e blast-error.txt module add Blast/ncbi-blast/ blastall -p blastp -d "swiss" -i p seq SGE: qsub sge-blast.sh #!/bin/bash #$ -l h_cpu=01:00:00 #$ -l h_vmem=1g #$ -cwd #$ -o blast-output.txt #$ -e blast-error.txt module load vital-it module add Blast/blast/ blastall -p blastp -d "swiss" -i p seq Star:ng'12'February'2015:' module'load 'is'now'mandatory!' 74, HPC in Life Sciences
75 Compute nodes on UBELIX Note the difference of architecture (Intel/AMD), number of cores, RAM etc. Total: 2288 processing cores ~8 TB RAM ~300 TB disk space (shared file system) disks mentioned above are local disks! 75, HPC in Life Sciences
76 Storage on UBELIX Not available from compute nodes! *2 TB per user, 15 TB per group may be extended by contacting ** quota can be requested when submitting job 76, HPC in Life Sciences
77 UBELIX SGE queues in detail queue$name$ (me$limit$ max.$jobs$ simultaneously$ max.$cpus$ per$job$ priority$ all.q' default(queue( 360'hours' ='15'days' 150' 8' medium' short.q' 1'hour' 30' 8' high' mpi.q' 360'hours' ='15'days' 200' 128' medium' Beyond'this':me' limit,'jobs'might' be'killed.' 77, HPC in Life Sciences That s'the'max.'number' of'jobs'a'single'user''can' ac:vely'run'at'any'given' :me'(limited'also'by' CPUs/job).' Parallel'execu:ons' per'job.' (MPI:'If'128'CPUs/ job,'only'1'job'is' possible.'if'64' CPUs/job,'max'3' jobs.)' Each'CPU'is'shared' (poten:ally'with'other'users).' One'cannot'reserve'a'single' CPU'for'a'single'job.' Jobs'with'higher'priority'are' treated'first'(preemp:ve' jobs)'
78 In summary: SGE command line client Interact with SGE queue using the following basic commands: qsub submit job qstat obtain job status (qacct) qdel delete/kill job UBELIX web site: hqp:// 78, HPC in Life Sciences
79 Before we start: let s set the environment Environment on cluster nodes is different to login node Need to explicitly set paths That is particularly required for certain Vital-IT applications and databases (/mnt/local/bin for applications) # Set PATH export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib" for i in /mnt/common/etc/profile.d/*.sh; do if [ -x $i ]; then. $i; fi; done 79, HPC in Life Sciences
80 Basic SGE example (1) Example command we want to execute: blastall -p blastp -d "swiss" -i p seq Write a shell script (job description) sge-blast.sh sge-blast.sh #!/bin/bash # Set PATH [ ] (see previous slide) #$ -l h_cpu=01:00:00 #$ -l h_vmem=1g #$ -cwd CPU':me'requirement'(REQUIRED)$ memory'requirement'(required)$ current'working'directory' blastall p blastp -d "swiss" -i p seq 80, HPC in Life Sciences
81 Basic SGE example (2) ''Submit'job' ' qsub blast.sh Your job ( sge-blast.sh") has been submitted ''Check'status' ' Job'ID'that'uniquely' iden:fies'a'job' qstat job-id prior name user state submit/start at queue sge-blast.sh userid qw 02/01/ :56:05 qstat ls err out Result'is'ready'if'job'is' no'longer'displayed.' ' 81, HPC in Life Sciences
82 Basic SGE example (3) qacct j qname short.q hostname cnode02.ubelix.unibe.ch group groupid owner userid [ ] qsub_time Wed Feb 1 11:56: start_time Wed Feb 1 11:56: end_time Wed Feb 1 11:56: granted_pe NONE slots 1 failed 0 exit_status 0 ru_wallclock 2 [ ] cpu mem io iow maxvmem M arid 82, HPC in Life Sciences undefined
83 SGE example define a job name Give a job a meaningful name #!/bin/bash #$ -l h_cpu=01:00:00 #$ -l h_vmem=1g #$ -cwd #$ -N blastp blastall -p blastp -d "swiss" -i p seq sge-blast-with-jobname.sh qstat job-id prior name user state submit/start at queue blastp userid qw 02/01/ :13:22 83, HPC in Life Sciences
84 SGE example redirect output/error Write stdout and stderr in specific files #!/bin/bash #$ -l h_cpu=01:00:00 #$ -l h_vmem=1g #$ -cwd #$ -o blast-output.txt #$ -e blast-error.txt standard'output(is'redirected' to'a'file'indicated'by'yo' standard'error'indicated'by'ye' blastall -p blastp -d "swiss" -i p seq sge-blast-redirect.sh ls blast-output.txt blast-error.txt 84, HPC in Life Sciences
85 SGE example use job ID in output file name Distinguish jobs and files #!/bin/bash #$ -l h_cpu=01:00:00 #$ -l h_vmem=1g #$ -cwd #$ -o $JOB_ID-output.txt #$ -e $JOB_ID-error.txt blastall -p blastp -d "swiss" -i p seq echo "My JobID was $JOB_ID" sge-blast-with-jobid.sh cat output.txt My JobID was , HPC in Life Sciences
86 SGE example notification Receive an when job has finished Includes CPU usage information (timing) etc. #!/bin/bash #$ -l h_cpu=01:00:00 #$ -l h_vmem=1g #$ -cwd #$ -M #$ -m e b' 'beginning'of'job' e' 'end' a' 'aborted' s' 'suspended' blastall -p blastp -d "swiss" -i p seq sge-blast-with- .sh See example on next slide 86, HPC in Life Sciences
87 eamail$is$sent$by$sge$server$$ Job (blast-with- .sh) Complete User = userid Queue = [email protected] Host = cnode03.ubelix.unibe.ch Start Time = 02/01/ :22:33 End Time = 02/01/ :22:35 User Time = 00:00:01 System Time = 00:00:00 Wallclock Time = 00:00:02 CPU = 00:00:02 Max vmem = M Exit Status = 0 87, HPC in Life Sciences
88 SGE Array Job: submit the same job multiple times Motivation/example: My simulation software mysim needs to be executed 10 times Submit a single job rather than 10 individual ones #!/bin/bash #$ -t 1-10 #$ -l h_cpu=24:00:00 #$ -l h_vmem=100m #$ -cwd #$ -N array-job Index/Task$in'job'array' echo "Task $SGE_TASK_ID out of 10 mysim --seed $SGE_TASK_ID sge-array-job.sh 88, HPC in Life Sciences
89 Job arrays (cont.) qstat job-id prior name user state submit/start at queue slots ja-task-id array-job userid r 02/01/ :59:18 all.q@cnode array-job userid t 02/01/ :59:18 all.q@cnode array-job userid r 02/01/ :59:18 all.q@dnode array-job userid qw 02/01/ :58: :1 Same job ID for all tasks! 89, HPC in Life Sciences
90 Resource requirements Job can explicitly require certain hardware resources such as RAM (main memory), swap space, etc. #!/bin/bash #$ -l h_cpu=01:00:00 #$ -l h_vmem=1g #$ -cwd echo "hostname" hostname xk'(kb),'xm'(mb)','xg'(gb)' ' 90, HPC in Life Sciences
91 Resource requirements: local disk space Usage of scratch space $TMP #!/bin/bash #$ -l h_cpu=01:00:00 #$ -l h_vmem=1g #$ -l scratch=1 #$ -l scratch_size=1g #$ -l scratch_files=1 mkdir $TMP/hallo cd $TMP/hallo/ pwd ls -l $TMP/hallo xk'(kb),'xm'(mb),'xg'(gb)' sge-scratch-space.sh cat scratch-space.sh.o /scratch/local/ all.q/hallo total 0 91, HPC in Life Sciences
92 Multi-threaded jobs (1 thread per core) By default, SGE reserves 1 core (CPU) per job Multi-threaded jobs need to request more cores (slots) #!/bin/bash #$ -l h_cpu=01:00:00 #$ -l h_vmem=1g #$ -cwd #$ -pe smp 4./pthread-4 4'CPU'cores'(processors)'requested' ' pe=programming'environment' smp=symmetric'mul:'processing' ' sge-4-threads.sh job-id prior name user state submit/start at queue slots sge- userid r 02/03/ :06:33 [email protected] 4 92, HPC in Life Sciences
93 Fribourg Bern: additional cluster in Bern Managed by Interfaculty Bioinformatics unit Contact: 8 machines with 240 cores (SGE) GB RAM per machine ssh 93, HPC in Life Sciences
94 Bern: additional cluster in Bern (2) /home/username For scripts and other small files (quota 20 GB) Will be backed up soon (daily?) /data3/users/<username> Location to store your data Quota 5TB; not backed up! /scratch (local to node, local disk) (4.8TB on 80 core nodes, 4TB on 16 core nodes) 94, HPC in Life Sciences
95 Bern: additional cluster in Bern (3) SGE example #!/bin/sh #$ -S /bin/sh #$ -e scriptname-$job_id.err #$ -o scriptname-$job_id.out #$ -M #$ -l h_vmem=50g #$ -l h_rt=540:00:00 #$ -pe smp 8 #$ -q all.q # here starts the command that is executed on the cluster blastall p blastn -a 8 -i seqs.fasta -d blastdatabase -o blastout 95, HPC in Life Sciences
96 Outline 1. Motivation 2. Background on Scientific Computing 3. Practical usage of life science HPC infrastructure - Vital-IT UNIL/EPFL/UNIGE infrastructure overview - Concrete cluster usage - Using UBELIX + Vital-IT software 4. Vital-IT: Different storage systems and guidelines 96, HPC in Life Sciences
97 Basic overview /home/<userid> private scripts, small data /scratch/ programs should write results there /archive/... long term storage /db/ public bioinformatics databases 97, HPC in Life Sciences
98 Home directories /home/<userid> Backed up on tape 5 GB quota Only most important files should be stored there Your files we don t support software that you install there! Please do not store large (temporary) results in /home but do so in the /scratch directory (next slide) Check quota with quota s Filesystem blocks quota limit grace files quota limit grace :/exports/home 828M 5120M 5632M Space'used'' 5'GB'quota' Number'of'files' created' 98, HPC in Life Sciences
99 Space Limitation for LSF: #!/bin/bash #BSUB -L /bin/bash #BSUB -o blast-output.txt By default, output/error redirection is written into the following directory (note quota on $HOME): $HOME/.lsbatch/ If output is large (>5GB), that will exceed your quota in your home directory so better write to a file in /scratch without using o myprogram > /scratch/cluster/daily/userid/output.txt (-o is only written, once job has finished) 99, HPC in Life Sciences
100 Scratch directories /scratch No backup! automatically cleaned! 1 TB quota per user (/scratch/cluster) currently 56 TB /scratch/cluster/[daily weekly monthly] Create a subdirectory according to your userid You can work like on a single machine but make sure that different processes don t write into the same file! Once you have finished your computation, please download results from /scratch to your local machine and back up your results /scratch/local/[daily weekly] only GB Local scratch on compute node (not shared!) Very fast access! 100, HPC in Life Sciences
101 LSF Job to use /scratch/ Explicitly cd into /scratch or write files directly there #!/bin/bash #BSUB -L /bin/bash #BSUB -o blast-output.txt #BSUB -e blast-error.txt #BSUB -J myjob Could'also'be'redirected' to'/scratch.'by'default,' local'directory'from'where' you'submiqed'job!' cd /scratch/cluster/daily/userid blastall -p blastp -d "swiss" -i p seq Change'working' directory'to'write' results'directly' into'''/scratch' 101, HPC in Life Sciences
102 Attention: automatic cleanup in /scratch/cluster/[daily weekly monthly]/ Often, when you retrieve external data in compressed format (e.g. tar.gz) the files are "older" than 1 week, 1 month etc. tar -zxvf mydata.tar.gz (from e.g. from 2012) will preserve file creation time you will loose the data next day! Solution: tar zxvmf mydata.tar.gz Use 'm' option to extract file setting current time (Note: here, the order of the options is important) 102, HPC in Life Sciences
103 Archive /archive Large storage for scientific data (more than ½ PB) Automatic backup and versioning Hierarchical storage system (HSM) with disk and tape No quota but if disk cache is full, files are truncated from disk and must be retrieved from tape Don t run programs that write directly to archive! Copy data explicitly from/to archive (zip/tarball data!) Avoid copying/archiving many small files, i.e. create larger files! /archive/<entity>/<group>/<userid> $HOME/archive (symbolic link in your home directory) 103, HPC in Life Sciences
104 Hierarchical Storage System (HSM) Quantum StorNext basics: Clients HSM Clients HSM MDC * SMB/CIFS NFS SAN Network Disk Storage Tape Storage * MetaData Controller Slide by Volker Flegel 104, HPC in Life Sciences
105 Is your file in disk cache of HSM? ondisk Check, if a file is on the disk cache of the HSM or truncted (max 16 KB on disk) ondisk -h ondisk../authpaper.pdf: on_disk./bigresults.pdf: truncated./ieeetrans.bst: truncated./ocg.sty: on_disk./literatur.bib: on_disk./bigresults.tex: truncated 105, HPC in Life Sciences
106 Archive: ask LSF to transfer/stage files Before jobs starts, LSF can copy the file to a scratch directory #/bin/bash source'file'on'/archive' #BSUB -L /bin/bash #BSUB -o output.txt #BUSB -e error.txt #BSUB -J stage3 #BSUB -f "/archive/ /file1.seq > /scratch/cluster/ /file1.seq" #BSUB -f "/archive/ /file2.seq > /scratch/cluster/ /file2.seq" #BSUB -f "/archive/ /file3.seq > /scratch/cluster/ /file3.seq" cd /scratch/cluster/ / ls -l tail file1.seq des:na:on'file'on/scratch' 106, HPC in Life Sciences
107 Data access latency: important change! Hierarchical storage system: Provides access to large amounts of data (/archive) Data might not be always available on disk but need to be fetched from a tape drive access latency storage capacity (increase) RAM disk cache tape (archive) storage cost (increase) Keep'that'hierarchy'in'mind'when'you'plan'to'use'data'in'archive' directory!!!'' 107, HPC in Life Sciences
108 Summary Vital-IT (UNIL/EPFL/UNIGE) and UBELIX Task$ LSF$ SGE$ Submit'job' bsub' qsub' Check'status' bjobs' qstat' Cancel/Kill'job' bkill' qdel' VitalYIT'soWware'on'all'clusters' 3 clusters at 3 sites Medium and large size memory Disk storage + long term tape archive Single cluster Medium size memory Disk space only 108, HPC in Life Sciences
109 Number and size of files (per directory) Note that file systems are not good at dealing with thousands of thousands of small files Particularly true for cluster/parallel file systems! Advice: try to organise your data to keep larger files (bigger than 4 MB!) If you store files, create tar balls for smaller files to have less than ~ files per directory 109, HPC in Life Sciences
110 Conclusion Vital-IT software at 4 sites: UNIL, EPFL, UNIGE and UNIBE Infrastructure summary Submit jobs from front-end nodes Be aware of quotas and different access latencies /scratch for writing data (UNIL/EPFL/UNIGE) Contact us for help etc. Vital-IT is much more than an HPC infrastructure We work closely with biologist and platforms Common research projects, embedded bioinformaticians 110, HPC in Life Sciences
111 Thank you 111, HPC in Life Sciences
Parallel Processing using the LOTUS cluster
Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS
UMass High Performance Computing Center
.. UMass High Performance Computing Center University of Massachusetts Medical School October, 2014 2 / 32. Challenges of Genomic Data It is getting easier and cheaper to produce bigger genomic data every
Introduction to the SGE/OGS batch-queuing system
Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic
The RWTH Compute Cluster Environment
The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de
Streamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
Using the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: [email protected] Read documentation at: http://research.computing.yale.edu/hpc-support
An Introduction to High Performance Computing in the Department
An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software
Vital-IT Storage Guidelines
Introduction This document describes the current storage organization of Vital-IT and defines some rules about its usage. We need to re-specify the usage of the different parts of the infrastructure as
Installing and running COMSOL on a Linux cluster
Installing and running COMSOL on a Linux cluster Introduction This quick guide explains how to install and operate COMSOL Multiphysics 5.0 on a Linux cluster. It is a complement to the COMSOL Installation
Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)
Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing
High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina
High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16 Topics Specifications Overview Site Policies Intel Compilers
Introduction to Sun Grid Engine (SGE)
Introduction to Sun Grid Engine (SGE) What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems
SGE Roll: Users Guide. Version @VERSION@ Edition
SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1
Running Jobs with Platform LSF. Platform LSF Version 8.0 June 2011
Running Jobs with Platform LSF Platform LSF Version 8.0 June 2011 Copyright 1994-2011 Platform Computing Corporation. Although the information in this document has been carefully reviewed, Platform Computing
Grid Engine Users Guide. 2011.11p1 Edition
Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the
How To Run A Tompouce Cluster On An Ipra (Inria) 2.5.5 (Sun) 2 (Sun Geserade) 2-5.4 (Sun-Ge) 2/5.2 (
Running Hadoop and Stratosphere jobs on TomPouce cluster 16 October 2013 TomPouce cluster TomPouce is a cluster of 20 calcula@on nodes = 240 cores Located in the Inria Turing building (École Polytechnique)
Cluster Computing With R
Cluster Computing With R Stowers Institute for Medical Research R/Bioconductor Discussion Group Earl F. Glynn Scientific Programmer 18 December 2007 1 Cluster Computing With R Accessing Linux Boxes from
1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology
Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science
High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda
High Performance Computing with Sun Grid Engine on the HPSCC cluster Fernando J. Pineda HPSCC High Performance Scientific Computing Center (HPSCC) " The Johns Hopkins Service Center in the Dept. of Biostatistics
Batch Job Analysis to Improve the Success Rate in HPC
Batch Job Analysis to Improve the Success Rate in HPC 1 JunWeon Yoon, 2 TaeYoung Hong, 3 ChanYeol Park, 4 HeonChang Yu 1, First Author KISTI and Korea University, [email protected] 2,3, KISTI,[email protected],[email protected]
Miami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
Getting Started with HPC
Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage
Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster
Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster http://www.biostat.jhsph.edu/bit/sge_lecture.ppt.pdf Marvin Newhouse Fernando J. Pineda The JHPCE staff:
Quick Tutorial for Portable Batch System (PBS)
Quick Tutorial for Portable Batch System (PBS) The Portable Batch System (PBS) system is designed to manage the distribution of batch jobs and interactive sessions across the available nodes in the cluster.
Efficient cluster computing
Efficient cluster computing Introduction to the Sun Grid Engine (SGE) queuing system Markus Rampp (RZG, MIGenAS) MPI for Evolutionary Anthropology Leipzig, Feb. 16, 2007 Outline Introduction Basic concepts:
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
Cluster@WU User s Manual
Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut
Grid 101. Grid 101. Josh Hegie. [email protected] http://hpc.unr.edu
Grid 101 Josh Hegie [email protected] http://hpc.unr.edu Accessing the Grid Outline 1 Accessing the Grid 2 Working on the Grid 3 Submitting Jobs with SGE 4 Compiling 5 MPI 6 Questions? Accessing the Grid Logging
SLURM Workload Manager
SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux
Kiko> A personal job scheduler
Kiko> A personal job scheduler V1.2 Carlos allende prieto october 2009 kiko> is a light-weight tool to manage non-interactive tasks on personal computers. It can improve your system s throughput significantly
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St
PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007
PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
Running Jobs with Platform LSF. Version 6.2 September 2005 Comments to: [email protected]
Running Jobs with Platform LSF Version 6.2 September 2005 Comments to: [email protected] Copyright We d like to hear from you 1994-2005 Platform Computing Corporation All rights reserved. You can help us
GRID Computing: CAS Style
CS4CC3 Advanced Operating Systems Architectures Laboratory 7 GRID Computing: CAS Style campus trunk C.I.S. router "birkhoff" server The CAS Grid Computer 100BT ethernet node 1 "gigabyte" Ethernet switch
HPCC USER S GUIDE. Version 1.2 July 2012. IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35
HPCC USER S GUIDE Version 1.2 July 2012 IITS (Research Support) Singapore Management University IITS, Singapore Management University Page 1 of 35 Revision History Version 1.0 (27 June 2012): - Modified
Grid Engine 6. Troubleshooting. BioTeam Inc. [email protected]
Grid Engine 6 Troubleshooting BioTeam Inc. [email protected] Grid Engine Troubleshooting There are two core problem types Job Level Cluster seems OK, example scripts work fine Some user jobs/apps fail Cluster
Using the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Dec 2015 To get help Send an email to: [email protected] Read documentation at: http://research.computing.yale.edu/hpc-support
Agenda. Using HPC Wales 2
Using HPC Wales Agenda Infrastructure : An Overview of our Infrastructure Logging in : Command Line Interface and File Transfer Linux Basics : Commands and Text Editors Using Modules : Managing Software
Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014
Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
RWTH GPU Cluster. Sandra Wienke [email protected] November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke [email protected] November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative
NEC HPC-Linux-Cluster
NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores
Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.
Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry
New High-performance computing cluster: PAULI Sascha Frick Institute for Physical Chemistry 02/05/2012 Sascha Frick (PHC) HPC cluster pauli 02/05/2012 1 / 24 Outline 1 About this seminar 2 New Hardware
Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research
! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at
Parallel Debugging with DDT
Parallel Debugging with DDT Nate Woody 3/10/2009 www.cac.cornell.edu 1 Debugging Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece
Introduction to Grid Engine
Introduction to Grid Engine Workbook Edition 8 January 2011 Document reference: 3609-2011 Introduction to Grid Engine for ECDF Users Workbook Introduction to Grid Engine for ECDF Users Author: Brian Fletcher,
Partek Flow Installation Guide
Partek Flow Installation Guide Partek Flow is a web based application for genomic data analysis and visualization, which can be installed on a desktop computer, compute cluster or cloud. Users can access
The SUN ONE Grid Engine BATCH SYSTEM
The SUN ONE Grid Engine BATCH SYSTEM Juan Luis Chaves Sanabria Centro Nacional de Cálculo Científico (CeCalCULA) Latin American School in HPC on Linux Cluster October 27 November 07 2003 What is SGE? Is
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:
Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria
Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew
Running applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
Beyond Windows: Using the Linux Servers and the Grid
Beyond Windows: Using the Linux Servers and the Grid Topics Linux Overview How to Login & Remote Access Passwords Staying Up-To-Date Network Drives Server List The Grid Useful Commands Linux Overview Linux
Informationsaustausch für Nutzer des Aachener HPC Clusters
Informationsaustausch für Nutzer des Aachener HPC Clusters Paul Kapinos, Marcus Wagner - 21.05.2015 Informationsaustausch für Nutzer des Aachener HPC Clusters Agenda (The RWTH Compute cluster) Project-based
Using NeSI HPC Resources. NeSI Computational Science Team ([email protected])
NeSI Computational Science Team ([email protected]) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting
Installing Platform Product Suite for SAS (Windows)
Installing Platform Product Suite for SAS (Windows) Version 3.1 March 29, 2007 Contents Introduction on page 3 Supported Versions and Requirements on page 4 Prerequisites on page 5 Install the Software
Biowulf2 Training Session
Biowulf2 Training Session 9 July 2015 Slides at: h,p://hpc.nih.gov/docs/b2training.pdf HPC@NIH website: h,p://hpc.nih.gov System hardware overview What s new/different The batch system & subminng jobs
NorduGrid ARC Tutorial
NorduGrid ARC Tutorial / Arto Teräs and Olli Tourunen 2006-03-23 Slide 1(34) NorduGrid ARC Tutorial Arto Teräs and Olli Tourunen CSC, Espoo, Finland March 23
SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC
Paper BI222012 SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC ABSTRACT This paper will discuss at a high level some of the options
The CNMS Computer Cluster
The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the
Using Parallel Computing to Run Multiple Jobs
Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The
Maintaining Non-Stop Services with Multi Layer Monitoring
Maintaining Non-Stop Services with Multi Layer Monitoring Lahav Savir System Architect and CEO of Emind Systems [email protected] www.emindsys.com The approach Non-stop applications can t leave on their
Introduction to HPC Workshop. Center for e-research ([email protected])
Center for e-research ([email protected]) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using
Kernel. What is an Operating System? Systems Software and Application Software. The core of an OS is called kernel, which. Module 9: Operating Systems
Module 9: Operating Systems Objective What is an operating system (OS)? OS kernel, and basic functions OS Examples: MS-DOS, MS Windows, Mac OS Unix/Linux Features of modern OS Graphical operating system
Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015
Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians
Quick Introduction to HPSS at NERSC
Quick Introduction to HPSS at NERSC Nick Balthaser NERSC Storage Systems Group [email protected] Joint Genome Institute, Walnut Creek, CA Feb 10, 2011 Agenda NERSC Archive Technologies Overview Use Cases
INF-110. GPFS Installation
INF-110 GPFS Installation Overview Plan the installation Before installing any software, it is important to plan the GPFS installation by choosing the hardware, deciding which kind of disk connectivity
Martinos Center Compute Clusters
Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress
High Performance Computing
High Performance Computing at Stellenbosch University Gerhard Venter Outline 1 Background 2 Clusters 3 SU History 4 SU Cluster 5 Using the Cluster 6 Examples What is High Performance Computing? Wikipedia
How to Run Parallel Jobs Efficiently
How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2
Research Technologies Data Storage for HPC
Research Technologies Data Storage for HPC Supercomputing for Everyone February 17-18, 2014 Research Technologies High Performance File Systems [email protected] Indiana University Intro to HPC on Big
Manual for using Super Computing Resources
Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad
Submitting batch jobs Slurm on ecgate. Xavi Abellan [email protected] User Support Section
Submitting batch jobs Slurm on ecgate Xavi Abellan [email protected] User Support Section Slide 1 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic
Hodor and Bran - Job Scheduling and PBS Scripts
Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.
Grid Engine. Application Integration
Grid Engine Application Integration Getting Stuff Done. Batch Interactive - Terminal Interactive - X11/GUI Licensed Applications Parallel Jobs DRMAA Batch Jobs Most common What is run: Shell Scripts Binaries
High-Performance Reservoir Risk Assessment (Jacta Cluster)
High-Performance Reservoir Risk Assessment (Jacta Cluster) SKUA-GOCAD 2013.1 Paradigm 2011.3 With Epos 4.1 Data Management Configuration Guide 2008 2013 Paradigm Ltd. or its affiliates and subsidiaries.
Installing IBM Websphere Application Server 7 and 8 on OS4 Enterprise Linux
Installing IBM Websphere Application Server 7 and 8 on OS4 Enterprise Linux By the OS4 Documentation Team Prepared by Roberto J Dohnert Copyright 2013, PC/OpenSystems LLC This whitepaper describes how
Parallels Cloud Server 6.0
Parallels Cloud Server 6.0 Parallels Cloud Storage I/O Benchmarking Guide September 05, 2014 Copyright 1999-2014 Parallels IP Holdings GmbH and its affiliates. All rights reserved. Parallels IP Holdings
icer Bioinformatics Support Fall 2011
icer Bioinformatics Support Fall 2011 John B. Johnston HPC Programmer Institute for Cyber Enabled Research 2011 Michigan State University Board of Trustees. Institute for Cyber Enabled Research (icer)
SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
Introduction to Programming and Computing for Scientists
Oxana Smirnova (Lund University) Programming for Scientists Tutorial 7b 1 / 48 Introduction to Programming and Computing for Scientists Oxana Smirnova Lund University Tutorial 7b: Grid certificates and
Module 6: Parallel processing large data
Module 6: Parallel processing large data Thanks to all contributors: Alison Pamment, Sam Pepler, Ag Stephens, Stephen Pascoe, Kevin Marsh, Anabelle Guillory, Graham Parton, Esther Conway, Eduardo Damasio
DiskPulse DISK CHANGE MONITOR
DiskPulse DISK CHANGE MONITOR User Manual Version 7.9 Oct 2015 www.diskpulse.com [email protected] 1 1 DiskPulse Overview...3 2 DiskPulse Product Versions...5 3 Using Desktop Product Version...6 3.1 Product
NYUAD HPC Center Running Jobs
NYUAD HPC Center Running Jobs 1 Overview... Error! Bookmark not defined. 1.1 General List... Error! Bookmark not defined. 1.2 Compilers... Error! Bookmark not defined. 2 Loading Software... Error! Bookmark
LoadLeveler Overview. January 30-31, 2012. IBM Storage & Technology Group. IBM HPC Developer Education @ TIFR, Mumbai
IBM HPC Developer Education @ TIFR, Mumbai IBM Storage & Technology Group LoadLeveler Overview January 30-31, 2012 Pidad D'Souza ([email protected]) IBM, System & Technology Group 2009 IBM Corporation
High-Performance Computing
High-Performance Computing Windows, Matlab and the HPC Dr. Leigh Brookshaw Dept. of Maths and Computing, USQ 1 The HPC Architecture 30 Sun boxes or nodes Each node has 2 x 2.4GHz AMD CPUs with 4 Cores
