Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015
2 Computing Resources for University Researchers Lab resources Laptop, desktop, in-house servers, etc. Suitable for development, testing, prototyping University-centralized resources Shared cluster environment (e.g., ACCRE) Meant for scaling up and accelerating computational research via parallel processing Government/federal resources Supercomputers at national labs, XSEDE resources Enables larger, more complicated problems to be solved, perhaps pushing the limits of what has been done previously Less specialized environment More scalable
3 ACCRE Advanced Computing Center for Research and Education Provides a centralized computing infrastructure and environment for Vanderbilt researchers Started in 2002 through collaboration between a particle physicist and geneticist Users from VUSE, VUMC, A&S, Peabody, and Owen Operates as a co-op in which researchers allowed to burst onto one another s hardware Staff of ten (system administrators, software/research specialists, center administrators) Relieves researchers of the administrative burden so they can focus on their research Provides advanced training and support Support costs of using ACCRE centrally subsidized Located in The Commons; Hill Center, Suite 201
4 ACCRE Services Computing Provide a Linux environment for submitting and running jobs Many popular software packages installed including Matlab, Python, R, C/C++/Fortran/Java/CUDA compilers, multi-thread/process libraries Resource limits based on use and/or support fees paid by researchers Storage 25 GB of home directory space, 50 GB of scratch space for new users Additional space available for purchase per TB Backups Home directories backed up nightly to tape, going back 3 months Also provide backups for off-site servers Customer gateways Researchers can purchase their own server that is connected to the cluster but has a customized environment (administered by ACCRE)
5 ACCRE Resources ~600 standard compute nodes Intel Xeon processors, Nehalem generation through Haswell 8-12 CPU cores per node (>6,000 CPU cores total in cluster) Memory per node ranges from 24-256 GB ~40 GPU nodes Each equipped with four NVIDIA GeForce GTX 480 GPUs Well suited for vector and matrix operations Completely free to use; current usage is light Also testing new Intel Xeon Phi nodes (currently five available) ~4 PB total storage ~20 TB for home directories, ~570 TB for scratch space, the rest is for analysis of high-energy physics data from CERN Use IBM s General Parallel File System (GPFS) for mounting user home and scratch directories
ACCRE Cluster Layout ACCRE CLUSTER auth ~600 Compute Nodes ~40 GPU Nodes In a nutshell: authentication server vmp201 vmp301 vmp801 - A bunch of computers networked together! - Enables users to burst onto idle computers Gateways vmp202 vmp302 vmp802 User s laptop/desktop Key concepts: - Submit jobs from gateway ssh - Scheduler runs jobs for you on compute node(s) - Files are visible everywhere - Change your password by logging into auth (type rsh auth from a gateway) and typing passwd vmps11 vmps12 vmps13. Gateways are used for: - Logging in - Managing/editing files - Writing code/scripts - Submitting jobs - Running short tests vmp203 vmp204 vmp303 vmp304 vmp803 vmp804... - Jobs are run on compute nodes by the job scheduler - At any given time, ~1500-5000 jobs are running on compute nodes - Users often have multiple jobs running at once - Users do not need to log in to a compute node Job Scheduler Server - Runs the software (called SLURM) for managing cluster resources and scheduling jobs - SLURM commands available across the cluster (e.g. from a gateway) File Servers - Mounts/maps users files across all gateways and compute nodes - Close to 4 Petabytes of storage
7 ACCRE User Support Free training classes offered twice a month Intro to Linux (optional), Intro to the Cluster, Intro to SLURM, Compiling programs (optional), GPU computing (optional) Many users come in with no Linux background, while others are comfortable on the command line and are only required to take two training courses Advanced classes offered by request only Online help desk Support from ACCRE staff ACCRE staff also available for appointments Web Tools Website (www.accre.vanderbilt.edu) includes Getting Started pages, Frequently Asked Questions, SLURM documentation, software pages, suggested grant text Github repositories (www.github.com/accre) where users can see examples and contribute their own
8 SLURM Simple Linux Utility for Resource Management ACCRE switched to SLURM from Torque/Moab in January 2015 Features: Excellent performance Able to process tens of thousands of jobs per hour (scalability) - as of June 2014, six of the top ten supercomputers were using SLURM Multi-threaded High throughput for smaller jobs (accepts up to 1,000 jobs per second) Fault tolerant (backup server can take over transparently) Supports control groups (cgroups) Allows memory and CPU requests to be enforced on compute nodes Uses a database to store job statistics and account info
9 What s on the Horizon? ACCRE will evolve as dictated by researcher demand An example of this occurred in 2010 when an interdisciplinary grant proposal funded a group of GPU nodes for the cluster Similar opportunities are being pursued for a group of nodes equipped with Intel Xeon Phi coprocessors Massively multi-core processors are becoming ubiquitous in HPC GPUs composed of thousands of cores Intel Xeon Phi coprocessors composed of ~60 CPU cores each Understanding what types of problems translate well to these environments is essential Programming burden can be large, but it s diminishing as libraries and high-level packages mature Environments optimized for Big Data Hadoop, Spark, etc.
10 The New Moore s Law Number of cores doubles every 18-24 months Frequency (clock speed) remaining fairly constant Dual-core, quad-core,. Doubling in code speed with each generation of processor is no longer guaranteed Codes must be written to exploit parallelism On-chip parallelism is different from distributed memory parallelism used on large supercomputers like ORNL Crays New computing paradigm Even business and consumer codes will need to be parallelized to take advantage of new hardware Parallelism has been explored in open-source community Parallel libraries have been developed, e.g., openmp and MPI Ramifications for closed-source vendors (e.g., Microsoft)
11 Massively Multi-Core Era Multi-core Era: A new paradigm in computing Massively Parallel Era USA, Japan, Europe Vector Era USA, Japan 1970 1985 2000 2015 2030
How are GPUs Different than CPUs? CPUs are designed to handle complexity well, while GPUs are designed to handle concurrency well. - Axel Kohlmeyer Single Instruction, Multiple Thread (SIMT) GPU Multiprocessor Core Thread
13 GPU-CPU Performance Comparison Figures taken from CUDA Programming Guide
14 Becoming an Advanced ACCRE User Spend time thinking about performance Investigate/test tools that enable faster execution time Examples: Intel-compiled software, GPU-enabled software, multi-thread or multiprocess software Don t reinvent the wheel, look for libraries/packages that will allow you to maximize performance without spending 3-4 months programming Not all problems are well-suited for parallelism Automate your workflows Explore job arrays for single-core, embarrassingly parallel jobs SLURM makes this easy to do and it also puts less stress on the scheduler Avoid the it-works-so-don t-touch-it mentality When your jobs are running, spend time improving your workflow to make them more efficient and to avoid any manual input/processing Write scripts that pass input and output between different jobs Collaborate! Contribute examples to ACCRE Github repositories
15 SLURM Job Arrays time script1 script2 script3 script4 script5
Running ipython Notebooks on the Cluster
17 Vanderbilt Course in Parallel Programming and High-Performance Computing Offered every Spring as a part of Vanderbilt s Scientific Computing Minor Program Covers the following topics: Linux command line C programming Compiling/building HPC software Shared memory, multi-thread programming Distributed memory, multi-process programming Programming for NVIDIA GPUs with CUDA Programming for Intel Xeon Phi co-processors Performance benchmarking Gain valuable experience in a HPC environment Emphasis on applying these tools to a research problem from your domain Students present results from their capstone projects at the end of the semester
18 Concluding Remarks n Continue to educate yourself about the resources that are available to you as a university researcher n As someone performing computational research, always be thinking about ways you can improve performance and efficiency n n n n Your research stands to benefit Your career stands to benefit Feel free to contact me with any questions you might have n will@accre.vanderbilt.edu n @W_R_French Thank you for your attention! n Questions?