Overview of HPC Resources at Vanderbilt



Similar documents
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Parallel Programming Survey

Trends in High-Performance Computing for Power Grid Applications

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

GPUs for Scientific Computing

HPC Wales Skills Academy Course Catalogue 2015

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Building a Top500-class Supercomputing Cluster at LNS-BUAP

SLURM Workload Manager

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

ST810 Advanced Computing

Resource Scheduling Best Practice in Hybrid Clusters

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Berkeley Research Computing. Town Hall Meeting Savio Overview

Part I Courses Syllabus

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

High Performance Computing in CST STUDIO SUITE

Multicore Parallel Computing with OpenMP

How To Build A Cloud Computer

Introduction to parallel computing and UPPMAX

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Turbomachinery CFD on many-core platforms experiences and strategies

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano

A quick tutorial on Intel's Xeon Phi Coprocessor

icer Bioinformatics Support Fall 2011

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Introduction to Supercomputing with Janus

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Cluster Implementation and Management; Scheduling

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Matlab on a Supercomputer

GeoImaging Accelerator Pansharp Test Results

Parallel Computing with MATLAB

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Using NeSI HPC Resources. NeSI Computational Science Team

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

The Asterope compute cluster

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

Introduction to HPC Workshop. Center for e-research

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

locuz.com HPC App Portal V2.0 DATASHEET

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Using the Windows Cluster

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Large-Data Software Defined Visualization on CPUs

Virtualization of ArcGIS Pro. An Esri White Paper December 2015

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

Embedded Systems: map to FPGA, GPU, CPU?

Evaluation of CUDA Fortran for the CFD code Strukti

HPC Software Requirements to Support an HPC Cluster Supercomputer

Data Centric Systems (DCS)

Automating Big Data Benchmarking for Different Architectures with ALOJA

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Course Development of Programming for General-Purpose Multicore Processors

Parallel Algorithm Engineering

SURFsara HPC Cloud Workshop

HyperQ Storage Tiering White Paper

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Next Generation GPU Architecture Code-named Fermi

CHESS DAQ* Introduction

Getting Started with HPC

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD

Stream Processing on GPUs Using Distributed Multimedia Middleware

Big Data Visualization on the MIC

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

High Performance Computing Infrastructure at DESY

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Transcription:

Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015

2 Computing Resources for University Researchers Lab resources Laptop, desktop, in-house servers, etc. Suitable for development, testing, prototyping University-centralized resources Shared cluster environment (e.g., ACCRE) Meant for scaling up and accelerating computational research via parallel processing Government/federal resources Supercomputers at national labs, XSEDE resources Enables larger, more complicated problems to be solved, perhaps pushing the limits of what has been done previously Less specialized environment More scalable

3 ACCRE Advanced Computing Center for Research and Education Provides a centralized computing infrastructure and environment for Vanderbilt researchers Started in 2002 through collaboration between a particle physicist and geneticist Users from VUSE, VUMC, A&S, Peabody, and Owen Operates as a co-op in which researchers allowed to burst onto one another s hardware Staff of ten (system administrators, software/research specialists, center administrators) Relieves researchers of the administrative burden so they can focus on their research Provides advanced training and support Support costs of using ACCRE centrally subsidized Located in The Commons; Hill Center, Suite 201

4 ACCRE Services Computing Provide a Linux environment for submitting and running jobs Many popular software packages installed including Matlab, Python, R, C/C++/Fortran/Java/CUDA compilers, multi-thread/process libraries Resource limits based on use and/or support fees paid by researchers Storage 25 GB of home directory space, 50 GB of scratch space for new users Additional space available for purchase per TB Backups Home directories backed up nightly to tape, going back 3 months Also provide backups for off-site servers Customer gateways Researchers can purchase their own server that is connected to the cluster but has a customized environment (administered by ACCRE)

5 ACCRE Resources ~600 standard compute nodes Intel Xeon processors, Nehalem generation through Haswell 8-12 CPU cores per node (>6,000 CPU cores total in cluster) Memory per node ranges from 24-256 GB ~40 GPU nodes Each equipped with four NVIDIA GeForce GTX 480 GPUs Well suited for vector and matrix operations Completely free to use; current usage is light Also testing new Intel Xeon Phi nodes (currently five available) ~4 PB total storage ~20 TB for home directories, ~570 TB for scratch space, the rest is for analysis of high-energy physics data from CERN Use IBM s General Parallel File System (GPFS) for mounting user home and scratch directories

ACCRE Cluster Layout ACCRE CLUSTER auth ~600 Compute Nodes ~40 GPU Nodes In a nutshell: authentication server vmp201 vmp301 vmp801 - A bunch of computers networked together! - Enables users to burst onto idle computers Gateways vmp202 vmp302 vmp802 User s laptop/desktop Key concepts: - Submit jobs from gateway ssh - Scheduler runs jobs for you on compute node(s) - Files are visible everywhere - Change your password by logging into auth (type rsh auth from a gateway) and typing passwd vmps11 vmps12 vmps13. Gateways are used for: - Logging in - Managing/editing files - Writing code/scripts - Submitting jobs - Running short tests vmp203 vmp204 vmp303 vmp304 vmp803 vmp804... - Jobs are run on compute nodes by the job scheduler - At any given time, ~1500-5000 jobs are running on compute nodes - Users often have multiple jobs running at once - Users do not need to log in to a compute node Job Scheduler Server - Runs the software (called SLURM) for managing cluster resources and scheduling jobs - SLURM commands available across the cluster (e.g. from a gateway) File Servers - Mounts/maps users files across all gateways and compute nodes - Close to 4 Petabytes of storage

7 ACCRE User Support Free training classes offered twice a month Intro to Linux (optional), Intro to the Cluster, Intro to SLURM, Compiling programs (optional), GPU computing (optional) Many users come in with no Linux background, while others are comfortable on the command line and are only required to take two training courses Advanced classes offered by request only Online help desk Support from ACCRE staff ACCRE staff also available for appointments Web Tools Website (www.accre.vanderbilt.edu) includes Getting Started pages, Frequently Asked Questions, SLURM documentation, software pages, suggested grant text Github repositories (www.github.com/accre) where users can see examples and contribute their own

8 SLURM Simple Linux Utility for Resource Management ACCRE switched to SLURM from Torque/Moab in January 2015 Features: Excellent performance Able to process tens of thousands of jobs per hour (scalability) - as of June 2014, six of the top ten supercomputers were using SLURM Multi-threaded High throughput for smaller jobs (accepts up to 1,000 jobs per second) Fault tolerant (backup server can take over transparently) Supports control groups (cgroups) Allows memory and CPU requests to be enforced on compute nodes Uses a database to store job statistics and account info

9 What s on the Horizon? ACCRE will evolve as dictated by researcher demand An example of this occurred in 2010 when an interdisciplinary grant proposal funded a group of GPU nodes for the cluster Similar opportunities are being pursued for a group of nodes equipped with Intel Xeon Phi coprocessors Massively multi-core processors are becoming ubiquitous in HPC GPUs composed of thousands of cores Intel Xeon Phi coprocessors composed of ~60 CPU cores each Understanding what types of problems translate well to these environments is essential Programming burden can be large, but it s diminishing as libraries and high-level packages mature Environments optimized for Big Data Hadoop, Spark, etc.

10 The New Moore s Law Number of cores doubles every 18-24 months Frequency (clock speed) remaining fairly constant Dual-core, quad-core,. Doubling in code speed with each generation of processor is no longer guaranteed Codes must be written to exploit parallelism On-chip parallelism is different from distributed memory parallelism used on large supercomputers like ORNL Crays New computing paradigm Even business and consumer codes will need to be parallelized to take advantage of new hardware Parallelism has been explored in open-source community Parallel libraries have been developed, e.g., openmp and MPI Ramifications for closed-source vendors (e.g., Microsoft)

11 Massively Multi-Core Era Multi-core Era: A new paradigm in computing Massively Parallel Era USA, Japan, Europe Vector Era USA, Japan 1970 1985 2000 2015 2030

How are GPUs Different than CPUs? CPUs are designed to handle complexity well, while GPUs are designed to handle concurrency well. - Axel Kohlmeyer Single Instruction, Multiple Thread (SIMT) GPU Multiprocessor Core Thread

13 GPU-CPU Performance Comparison Figures taken from CUDA Programming Guide

14 Becoming an Advanced ACCRE User Spend time thinking about performance Investigate/test tools that enable faster execution time Examples: Intel-compiled software, GPU-enabled software, multi-thread or multiprocess software Don t reinvent the wheel, look for libraries/packages that will allow you to maximize performance without spending 3-4 months programming Not all problems are well-suited for parallelism Automate your workflows Explore job arrays for single-core, embarrassingly parallel jobs SLURM makes this easy to do and it also puts less stress on the scheduler Avoid the it-works-so-don t-touch-it mentality When your jobs are running, spend time improving your workflow to make them more efficient and to avoid any manual input/processing Write scripts that pass input and output between different jobs Collaborate! Contribute examples to ACCRE Github repositories

15 SLURM Job Arrays time script1 script2 script3 script4 script5

Running ipython Notebooks on the Cluster

17 Vanderbilt Course in Parallel Programming and High-Performance Computing Offered every Spring as a part of Vanderbilt s Scientific Computing Minor Program Covers the following topics: Linux command line C programming Compiling/building HPC software Shared memory, multi-thread programming Distributed memory, multi-process programming Programming for NVIDIA GPUs with CUDA Programming for Intel Xeon Phi co-processors Performance benchmarking Gain valuable experience in a HPC environment Emphasis on applying these tools to a research problem from your domain Students present results from their capstone projects at the end of the semester

18 Concluding Remarks n Continue to educate yourself about the resources that are available to you as a university researcher n As someone performing computational research, always be thinking about ways you can improve performance and efficiency n n n n Your research stands to benefit Your career stands to benefit Feel free to contact me with any questions you might have n will@accre.vanderbilt.edu n @W_R_French Thank you for your attention! n Questions?