Introduc)on to High Performance Compu)ng Advanced Research Computing September 9, 2015
Outline What cons)tutes high performance compu)ng (HPC)? When to consider HPC resources What kind of problems are typically solved? What are the components of HPC? What resources are available? Overview of HPC Resources at Virginia Tech 2
Should I Pursue HPC? Are local resources insufficient to meet your needs? Very large jobs Very many jobs Large data Do you have na)onal collaborators? Share projects between different en))es Convenient mechanisms for data sharing 3
Who Uses HPC? Training (51) 2% Earth Sci (29) 2% ScienEfic CompuEng (60) 2% Chemistry (161) 7% Chemical, Thermal Sys (89) 8% Materials Research (131) 9% Atmospheric Sciences (72) 11% Physics (91) 19% Molecular Biosciences (271) 17% Astronomical Sciences (115) 13% >2 billion core- hours allocated 1400 alloca)ons 350 ins)tu)ons 32 research domains
Learning Curve Linux: Command- line interface Scheduler: Shares resources among mul)ple users Parallel Compu)ng: Need to parallelize code to take advantage of supercomputer s resources Third party programs or libraries make this easier
Popular So\ware Packages Molecular Dynamics: Gromacs, LAMMPS CFD: OpenFOAM, Ansys Finite Elements: Deal II, Abaqus Chemistry: VASP, Gaussian Climate: CESM Bioinforma)cs: Mothur, QIIME, MPIBLAST Numerical Compu)ng/Sta)s)cs: R, Matlab Visualiza)on: ParaView, VisIt, Ensight
What is Parallel Compu)ng? 8
Parallel Compu)ng 101 Parallel compu)ng: use of mul)ple processors or computers working together on a common task. Each processor works on its sec)on of the problem Processors can exchange informa)on Grid of Problem to be solved y CPU #1 works on this area of the problem exchange exchange CPU #2 works on this area of the problem exchange CPU #3 works on this area of the problem exchange CPU #4 works on this area of the problem x 9
Why Do Parallel Compu)ng? Limits of single CPU compu)ng performance available memory I/O rates Parallel compu)ng allows one to: solve problems that don t fit on a single CPU solve problems that can t be solved in a reasonable )me We can solve larger problems faster more cases 10
Parallelism is the New Moore s Law Power and energy efficiency impose a key constraint on design of micro- architectures Clock speeds have plateaued Hardware parallelism is increasing rapidly to make up the difference
What does a modern supercomputer look like? 14
Essen)al Components of HPC Supercompu)ng resources Storage Visualiza)on Data management Network infrastructure Support 16
Terminology Core: A computa)onal unit Socket: A single CPU ( processor ). Includes roughly 4-15 cores. Node: A single computer. Includes roughly 2-8 sockets. Cluster: A single supercomputer consis)ng of many nodes. GPU: Graphics processing unit. Amached to some nodes. General purpose GPUs (GPGPUs) can be used to speed up certain kinds of codes. Xeon Phi: Intel s product name for its GPU compe)tor. Also called MIC.
Shared vs. Distributed memory M M M M M Memory P P P P P P P P P P Network All processors have access to a pool of shared memory Access )mes vary from CPU to CPU in NUMA systems Example: SGI UV, CPUs on same node Memory is local to each processor Data exchange by message passing over a network Example: Clusters with single- socket blades
Mul)- core systems Memory Memory Memory Memory Memory Network Current processors place mul)ple processor cores on a die Communica)on details are increasingly complex Cache access Main memory access Quick Path / Hyper Transport socket connec)ons Node to node connec)on via network
Accelerator- based Systems Memory Memory Memory Memory G P U G P U G P U G P U Network Calcula)ons made in both CPUs and GPUs No longer limited to single precision calcula)ons Load balancing cri)cal for performance Requires specific libraries and compilers (CUDA, OpenCL) Co- processor from Intel: MIC (Many Integrated Core)
HPC Trends Memory Memory M P GPU Architecture Single core Mul)core GPU Cluster Code Serial OpenMP, Pthreads CUDA, OpenACC MPI
How are accelerators different? Intel Xeon E5-2670 (CPU) Intel Xeon Phi 5110P (MIC) Nvidia Tesla K20X (GPU) Cores 8 60 14 SMX Logical Cores 16 240 2,688 CUDA cores Frequency 2.60 GHz 1.05 GHz 0.74 MHz GFLOPs (double) 333 1,010 1,317 Memory 64 GB 8GB 6GB Memory B/W 51.2GB/s 320GB/s 250GB/s
Batch Submission Process Login Node Compute Nodes ssh qsub job Queue Master Node C1 C2 C3 mpirun np #./a.out
ARC Overview 26
Advanced Research Compu)ng Unit within the Office of the Vice President of Informa)on Technology Provide centralized resources for: Research compu)ng Visualiza)on Staff to assist users Website: hmp://www.arc.vt.edu
Goals Advance the use of compu)ng and visualiza)on in VT research Centralize resource acquisi)on, maintenance, and support for research community Provide support to facilitate usage of resources and minimize barriers to entry Enable and par)cipate in research collabora)ons between departments
Personnel Associate VP for Research Compu)ng: Terry Herdman Director, HPC: Vijay Agarwala Director, Visualiza)on: Nicholas Polys Computa)onal Scien)sts Jus)n Krome)s James McClure Brian Marshall Srinivas Yarlanki Srijith Rajamohan
Personnel (Con)nued) System Administrators Tim Rhodes Chris Snapp Brandon Sawyers Business Manager: Alana Romanella User Support GRAs: Umar Kalim, Saeed Izadi, Sangeetha Srinivasa
Compute Resources System Usage Nodes Node DescripEon Special Features Ithaca Beginners, MATLAB 79 8 cores, 24GB (2 Intel Nehalem) 10 double- memory nodes HokieOne Shared, Large Memory 82 6 cores, 32GB (Intel Westmere) 2.6TB shared- memory HokieSpeed GPGPU 201 BlueRidge NewRiver Large- scale CPU, MIC Large- scale, Data Intensive 408 134 12 cores, 24 GB (2 Intel Westmere) 16 cores, 64 GB (2 Intel Sandy Bridge) 24 cores, 128 GB (2 Intel Haswell) 402 Tesla C2050 GPU 260 Intel Xeon Phi 4 K40 GPU 18 128GB nodes 8 K80 GPGPU 16 big data nodes 24 512GB nodes 2 3TB nodes
Computa)onal Resources Name NewRiver BlueRidge HokieSpeed HokieOne Ithaca Key Features, Uses Scalable CPU, Data Intensive Scalable CPU or MIC GPU Shared Memory Beginners, MATLAB Available August 2015 March 2013 Sept 2012 Apr 2012 Fall 2009 Theore)cal Peak (TFlops/s) 152.6 398.7 238.2 5.4 6.1 Nodes 134 408 201 N/A 79 Cores 3,288 6,528 2,412 492 632 Cores/Node 24 16 12 N/A* 8 Accelerators/ Coprocessors 8 Nvidia K80 GPU 260 Intel Xeon Phi 8 Nvidia K40 GPU 408 Nvidia C2050 GPU N/A N/A Memory Size 34.4 TB 27.3 TB 5.0 TB 2.62 TB 2 TB Memory/Core 5.3 GB* 4 GB* 2 GB 5.3 GB 3 GB* Memory/Node 128 GB* 64 GB* 24 GB N/A* 24 GB*
Visualiza)on Resources VisCube: 3D immersion environment with three 10ʹ by 10ʹ walls and a floor of 1920 1920 stereo projec)on screens DeepSix: Six )led monitors with combined resolu)on of 7680 3200 ROVR Stereo Wall AISB Stereo Wall
Gexng Started with ARC Review ARC s system specifica)ons and choose the right system(s) for you Specialty so\ware Apply for an account online the Advanced Research Compu)ng website When your account is ready, you will receive confirma)on from ARC s system administrators
Resources ARC Website: hmp://www.arc.vt.edu ARC Compute Resources & Documenta)on: hmp://www.arc.vt.edu/hpc New Users Guide: hmp://www.arc.vt.edu/newusers Frequently Asked Ques)ons: hmp://www.arc.vt.edu/faq Linux Introduc)on: hmp://www.arc.vt.edu/unix
Thank you Ques)ons?