Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013
Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage on Minerva Using Minerva Monitoring Jobs Additional Help
Why use HPC? Computation Speed HPC systems might be able to complete computations significantly faster than your personal PC. Computation Ability HPC systems are able conduct computations that your personal PC is not capable of: Large memory Massive parallelization Storage HPC systems have large scale storage for your data. Accessibility HPC systems can be accessed from anywhere
Hardware Nominal Specifications 70 Teraflops peak speed 64 million CPU hours available per year 7,680 Advanced Micro Devices (AMD) 2.3 GHz Interlagos cores 20 Dell C6145, 2 blade chassis nodes 64 compute cores in four sockets and 256 Gigabytes (GB)s of memory per node 30 Terabytes (TB) of RAM Interconnected with Mellanox Quad Data Rate (QDR 40 Gbps) Fat-Tree Infiniband 1.5 Petabytes (PB) of Data Direct Networks (DDN) SFA10K high-speed storage Dual 10 Gigabit Ethernet links to the Sinai campus network
Software Over 550 packages/versions installed module avail Compilers: and tools: C, C++, Fortran, openmpi,blas, LAPACK,... Scripting systems: Python, Perl, R Many genomics packages: Bioconductor, Tuxedo suite( bowtie, etc),gatk, pindel,... Chemistry packages NAMD, Gromacs, Amber, Gaussian Imaging FSL
HPC Accounts HPC accounts are available for all MSSM faculty, staff and students Collaborators on as needed basis Visiting scholars with sponsorship To request an account please fill out the form found off of the Minerva Home Page: http://hpc.mssm.edu
HPC ACCOUNTING Minerva has an allocation process in place Users must apply for an allocation to have priority access to Minerva Allocation applications are considered periodically throughout the year Applications and the schedule can be found on the Minerva web site There is a default account named scavenger that can be used by anyone and does not count against any allocation. The scavenger account has a wall time limit of 24hrs and must use the *_24hr queues.
Logging onto Minerva First connect to the Minerva login host, minerva.hpc.mssm.edu Windows Download and install PuTTY (google it) Connect via PuTTY Mac/Linux Terminal ssh <user-id>@minerva.hpc.mssm.edu To enable X11 forwarding (GUI) Windows Download and install Xming and enable X11 forwarding within PuTTY Mac/Linux : Uses X11 so just include the -X flag: ssh X <user-id>@minerva.hpc.mssm.edu
Logging-in Minerva.hpc.mssm.edu Works inside or outside External connections can only access Minerva, not rest of intranet Minerva uses 2-Factor authentication at all times ( see hpc.mssm.edu>access>logging In ) Password Sinai password Minerva password Token Symantec VIP token (physical or software) Yubi-key push button Get physical tokens from SciComp - Icahn L3-3
The Linux Shell and Commands What is a shell? A command line interpreter that communicates with the operating system. Commonly used commands: Navigating/Making Directories pwd prints your current working directory cd changes your current working directory ls lists the contents of a directory mkdir makes a directory rmdir removes a directory Moving/Editing Files cp copies a file rm removes a file mv moves (or renames) a file more, less or cat views files vi or emacs, nano file editor man manual/help for commands And of course google
Available Storage on HPC home: /home/<user-id> 10 GB quota backed up Program development space for: dot files, source code, scripts, libraries, etc Scratch: /scratch/<user-id> /projects 5 TB and 1 million inode quotas NOT BACKED UP All are shared file systems accessible from the login and compute nodes Archive Long-term off-line storage ( 7 year ) Snapshot in functionality Duplicated off-site
Using HPC Software Software is installed into the /packages folder and can be accessed from the login, interactive and compute nodes Software environment is managed using the module command. To view the available software: module avail To load a module module load <module> -> module load amber/12 To unload a module module unload <module> -> module unload cufflinks/2.0.0
Using the Clusters Login nodes -- For: editing text, compiling, transferring files, submitting PBS jobs, etc minerva.hpc.mssm.edu Compute nodes -- For: running jobs Interactive nodes Interactive free-for-all interactive1, interactive2 Job scheduler: Moab/TORQUE (PBS) Jobs are submitted using the qsub command. Note: Never run heavy jobs on the login nodes! CPU Time is limited and session will be cancelled
Priority Minerva Queues Queue Description Max Walltime small medium 1-64 core jobs on a single node 16-2048 core node exclusive jobs Defaults (nodes/ppn/hrs) 144 hrs 1 / 1 / 5 144 hrs 1 / 64 / 5 himem 1 high memory node (1TB), 8-64 cores node exclusive jobs gpu 4 GPU nodes (1 GPU card per node), 1-64 cores matlab for Matlab jobs using MDCS, 1-32 cores express for debugging and quick testing, 1-256 cores 144 hrs 1 / 64 / 5 144 hrs 1 / 64 / 5 144 hrs 1 / 8 / 5 2 hrs 1 / 8 / 1
Scavenger Minerva Queues Queue Description Max Walltime small_24hr medium_24hr 1-64 core jobs on a single node 16-2048 core node exclusive jobs Defaults (nodes/ppn/hrs) 24 hrs 1 / 1 / 5 24 hrs 1 / 64 / 5 himem_24hr 1 high memory node (1TB), 8-64 cores node exclusive jobs gpu_24hr 4 GPU nodes (1 GPU card per node), 1-64 cores matlab_24hr for Matlab jobs using MDCS, 1-32 cores intel_24hr dedicated to node exclusive MPI jobs (8 xeon cores per node) express for debugging and quick testing, 1-256 cores 24 hrs 1 / 64 / 5 24 hrs 1 / 64 / 5 24 hrs 1 / 8 / 5 24 hrs 1 / 8 / 5 2 hrs 1 / 8 / 1
PBS Example: Serial Job #!/bin/bash #PBS -V #PBS -N R-Test #PBS q express #PBS -l nodes=1:ppn=1,walltime=01:00:00 #PBS oe cd $PBS_O_WORKDIR module load R R --no-save -q f my_marvelous_script >output.file To submit the script qsub <script name> (ex: test.pbs) <will use scavenger account >
PBS Example: Parallel Job #!/bin/bash #PBS -V #PBS -N mpitest #PBS -l nodes=2:ppn=8,walltime=04:00:00 cd PBS_O_WORKDIR module load openmpi module load R mpi_run RMPISNOW --no-save -q -f input > $output To submit the script qsub q small test.pbs <will not work; no account field > qsub q small A act_555 test.pbs
Monitoring Jobs Monitoring Jobs showq -u <userid> - Displays information about jobs for user: userid qstat -u <userid> - Shows the status of PBS jobs checkjob <job-id> - shows if job is being denied dispatc and for what reason Killing Jobs qdel <job-id> Cancels a job canceljob <job-id> Cancels a job
Detailed information can be found on the Minerva Web Site Intermediate or advanced questions can be sent directly to the HPC team: hpchelp@mssm.edu Bugs/Software requests should go this route Questions, too, but you can just email us individually or catch us in the hall Questions and scripting requests can be sent to Gene, Hyung Min, Anthony, Sveta, Jonathon or Zach by email, carrier pigeon, semaphore, smoke signal,...