icer Bioinformatics Support Fall 2011 John B. Johnston HPC Programmer Institute for Cyber Enabled Research 2011 Michigan State University Board of Trustees.
Institute for Cyber Enabled Research (icer) Hardware (HPCC) Software and Support Education Consulting Collaboration
icer: What is it? The Institute for Cyber Enabled Research (icer) at Michigan State University (MSU) was established to coordinate and support multidisciplinary resource for computation and computational sciences. The Center's goal is to enhance MSU's national and international presence and competitive edge in disciplines and research thrusts that rely on advanced computing.
HPCC: What is it? The HPCC provides computational hardware and support to MSU faculty, students and researchers. The HPCC is contained within icer; effectively representing the hardware, systems and software arm of icer s research support mission.
Bioinformatics Outreach HPCC hardware Software resources Help Desk Seminars One-on-one Consulting Limited on-site systems setup and configuration Programming and scripting assistance FREE! wiki.hpcc.msu.edu/display/bioinfo/bioinformatics+support+at+msu
HPCC Cluster Overview Linux operating system Primary interface is text based though Secure Shell (ssh) All Machines in the main cluster are binary compatible (compile once, run anywhere) Each user has 50Gigs of personal hard drive space. /mnt/home/username/ Users have access to 33TB of scratch space. /mnt/scratch/username/ A scheduler is used to manage jobs running on the cluster A submission script is used to tell the scheduler the resources required and how to run a job A Module system is used to manage the loading and unloading of software configurations
Gateway to the System Access to HPCC is primarily though the gateway machine: ssh username@hpc.msu.edu ssh username@gateway.hpcc.msu.edu Access to all HPCC services uses MSU username and password. Once in, you can go to the user-oriented destination of choice.
HPCC System Diagram
Why the HPCC Cluster? Large data sets Lots of number crunching A need to run many simultaneous jobs with different data sets and/or configuration settings You need software you don t have, don t want to / can t setup Comprehensive readymade development environment that is actively administered
Linux? OH NOES! If you are a Linux pro, go ahead and take a short nap (you ve got ~60 seconds) If you re not, don t worry! That s why I get the (not so) big bucks. The Bioinformatics Help Desk is here to get you up and running.
Linux Support Client application selection Bring in your laptop (if you have one) Cookbook tutorials and cheat sheets (more on the way) One-on-one consultation Limited on-site support and training We also provide samba support for Windows and Mac boxes so you can map your HPCC account directory to your workstation
HPCC Online Resources www.hpcc.msu.edu HPCC home wiki.hpcc.msu.edu Public/Private Wiki forums.hpcc.msu.edu User forums rt.hpcc.msu.edu Help desk request tracking mon.hpcc.msu.edu System Monitors
Available Software Center Supported Development Software Intel compilers, openmp, openmpi, mvapich, totalview, mkl, pathscale, gnu... Center Supported Research Software Matlab, R, amber, blast, charmm, emboss... Customer Software (module use.cus) Clustalw, QuEST, MEME, Velvet, mpiblast, bowtie, AMOS, ABySS, MUMmer, HMMER, phylip, SAMTools For a more up to date list, see the documentation wiki: http://wiki.hpcc.msu.edu/ Don t See it Here? Ask for it, we ll try to help
User Software 50GB of initial user space provided Install your own in user space HPCC offers a rich build environment Quota increases can be made available Code installation and (modest) modification support is available through moi
Virtual Machines Virtual Servers expressed in software Available for research labs/working groups Flavors currently available: Galaxy BLAST (web browser based) UCSC Genome Browser more on the way...
Database Offerings db-01: Internal MySQL database node attached to the cluster. Host user datasets of modest size. BLAST database repository VM-based UCSC for example Up to 1TB total user space for free, $250/yr. per TB thereafter
Multiprocessor Apps Many bioinformatics applications are beginning to appear in multiprocessor-capable versions. Workload can be divided to allow each processor to complete part of the job in parallel, decreasing run time. HPC provides accessibility to a large number of processing cores, memory, and disk space.
Some Examples Multithreaded BLAST shared memory mpiblast distributed memory Velvet Assembler multithreaded shared mem MAKER2 MPI, distributed memory OpenMP, OpenMPI, MVAPICH
Cluster Developer Nodes Developer Nodes are accessible from gateway and used for testing. ssh dev-amd05 Same hardware as amd05 ssh dev-intel07 Same hardware as intel07 ssh dev-intel10 Same hardware as intel10 ssh dev-amd09 Same hardware as amd09 ssh dev-gfx10 Same hardware as gfx10 We periodically have some test boxes: ssh dev-gfx08 Nvidia Graphics Processing Node ssh dev-cell08 Playstation 3 Cell processor ssh dev-intel09 8 core Intel Xeon with 24GB of memory Jobs running on the developer nodes should be limited to two hours of walltime. Developer nodes are shared by everyone.
HPCC System Diagram
Steps in Using the HPCC Connect to HPCC Determine required software Transfer required input files and source code Compile programs (if needed) Test software/programs on a developer node Write a submission script Submit the job Get your results and write a paper!!
A couple of examples Biological model long running, many similar but not identical runs Multiprocessor BLAST searches Multiprocessor Velvet assembly Use of the HPCC cluster was able to produce more results in less time, with little or no active user management
But I don t need a cluster Tool selection, setup Scripting assistance Data browsing, sharing, group analysis Lab help or training
Scripting Customized, standardized, modify Python, Perl, or? We have a growing collection available as a Git repository. Perhaps you don t know anything about scripting; or maybe you do, but could use some help?
Tutorials Titus Brown's ANGUS-NGS tutorials, converted for using examples on HPC instead of Amazon Using UCSC for certain tasks mpiblast Velvet and Oases Others being developed...
Seminars and Education NextGen Bioinformatics Seminars wiki.hpcc.msu.edu/display/bioinfo/nextgen+bioinformatics+seminars HPCC Mid-Morning Break wiki.hpcc.msu.edu/display/announce/hpcc+mid-morning+break+series
Setting up an account All account requests must come via a PI. Have your PI fill-in the form at: www.hpcc.msu.edu/request Once received, we will process your request and notify you when your account is ready.
Bioinformatics Contact John Johnston, HPC Programmer M-W, 1449 BPS, 884-2572 Th-F, 505 BMB, 432-7177 johnj@msu.edu Ticket requests: https://rt.hpcc.msu.edu/index.html Please include Bioinformatics Help in the subject to more quickly route your request.