Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Size: px

Start display at page:

Download "Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1"

Bertina Short
10 years ago
Views:

1 Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1

2 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these are NOT High Availability clusters HPTC = High Performance Technical Computing The ultimate aim of HPC users is to max out the CPUs!

these are NOT High Availability clusters HPTC = High

3 Agenda Parallel Computing Concepts Clusters Cluster Usage

4 Concurrency and Parallel Computing A central concept in computer science is concurrency: Concurrency: Computing in which multiple tasks are active at the same time. There are many ways to use Concurrency: Concurrency is key to all modern Operating Systems as a way to hide latencies. Concurrency can be used together with redundancy to provide high availability. Parallel Computing uses concurrency to decrease program runtimes. HPC systems are based on onparallel Computing

There are many ways to use Concurrency: Concurrency is key to all modern Operating Systems as a way to hide latencies.

5 Hardware for Parallel Computing Parallel computers are classified in terms of streams of data and streams of instructions: MIMD Computers: Multiple streams of instructions acting on multiple streams of data. SIMD Computers: A single stream of instructions acting on multiple streams of data. Parallel Hardware comes in many forms: On chip: Instruction level parallelism (e.g. IPF) Multicore: Multiple execution cores inside a single CPU Multiprocessor: Multiple processors inside a single computer. Multicomputer: networks of computers working together.

SIMD Computers: A single stream of instructions acting on multiple streams of data.

6 Hardware for Parallel Computing Parallel Computers Single Instruction Multiple Data (SIMD)* Multiple Instruction Multiple Data (MIMD) Shared Address Space Disjoint Address Space Symmetric Multiprocessor (SMP) Non-uniform Memory Architecture (NUMA) Massively Parallel Processor (MPP) Cluster Distributed Computing

Address Space Disjoint Address Space Symmetric Multiprocessor (SMP)

7 What is an HPC Cluster A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. A typical cluster uses: Commodity off the shelf (COTS) parts Low latency communication protocols between the disjoint address spaces (memory)

together as a single, integrated computing resource.

8 What is HPCC? Master Node File Server / Gateway Compute nodes Cluster Management Tools

9 Cluster Architecture View Application Parallel Benchmarks: Perf,, Ring, HINT, NAS, Real Applications Middleware shmem MPI PVM OS OS Other OSes Linux Protocol TCP/IP VIA Proprietary Interconnect Ethernet Quadrics Infiniband Myrinet Hardware desktop Workstation Server 1P/2P Server 4U +

Other OSes Linux Protocol TCP/IP VIA Proprietary Interconnect Ethernet

10 Cluster Hardware The Node A single element within the cluster Compute Node Just computes little else Private IP address no user access Master/Head/Front End Node User login Job scheduler Public IP address connects to external network Management/Administrator Node Systems/cluster management functions Secure administrator address I/O Node Access to data Generally internal to cluster or to data centre

address connects to external network Management/Administrator Node Systems/cluster management

11 Interconnect Interconnect 100 Mbps Ethernet Typical Latency usec 75 Typical Bandwidth MB/s 80 1Gbit/s Ethernet Gb/s Ethernet Myricom Myrinet* InfiniBand*

1Gbit/s Ethernet 60-90 90 10 Gb/s Ethernet 12-20

12 Agenda Parallel Computing Concepts Clusters Cluster Usage

13 Cluster Usage Performance Measurements Usage Model Application Classification Application Behaviour

14 The Mysterious FLOPS 1 GFlops = 1 billion floating point operations per second Theoretical v Real GFlops Xeon Processor Theoretical peak = 4 x Clock speed Xeons have 128 bit SSE registers which allows the processor to carry out 2 double precision floating point add and 2 multiply operations per clock cycle 2 computational cores per processor 2 processors per node (4 cores per node) Sustained (Rmax) = ~35-80% of theoretical peak (interconnect dependent) You ll NEVER hit peak!

precision floating point add and 2 multiply operations per clock cycle 2 computational cores per processor 2 processors

15 Other measures of CPU performance SPEC Spec CPU2000/2006 Base single core performance indicator Spec CPU2000/2006 Rate node performance indicator SpecFP Floating Point performance SpecINT Integer performance Many other performance metrics may be required STREAM - memory bandwidth HPL High Performance Linpack NPB suite of performance tests Pallas Parallel Benchmark another suite IOZone file system throughput

performance Many other performance metrics may be required STREAM - memory bandwidth HPL High

16 Technology Advancements in 5 Years Codename Release date GHz Number of cores Peak FLOP per CPU cycle Peak GFLOPS per CPU Linpack on 256 Processors Westmere Nov Woodcrest June * From November 2001 top500 supercomputer list (cluster of Dell Precision 530) ** Intel internal cluster built in 2006

17 Usage Model Electronic Design Monte Carlo Design Optimisation Parallel Search Many Serial Jobs (Capacity) Many Users Mixed size Parallel/Serial jobs Ability to Partition and Allocate Jobs to Nodes for Best Performance Meteorology Seismic Analysis Fluid Dynamics Molecular Chemistry One Big Parallel Job (Capability) Batch Usage Load Balancing More Important Job Scheduling very important Normal Mixed Usage Appliance Usage Interconnect More Important

Meteorology Seismic Analysis Fluid Dynamics Molecular Chemistry One Big Parallel Job (Capability) Batch Usage

18 Application and Usage Model HPC clusters run parallel applications, and applications in parallel! One single application that takes advantage of multiple computing platforms Fine-Grained Application Uses many systems to run one application Shares data heavily across systems PDVR3D (Eigenvalues and Eigenstates of a matrix) Coarse-Grained Application Uses many systems to run one application Infrequent data sharing among systems Casino (Monte-Carlo stochastic methods) Pleasurably Parallel/HTC Application An instance of the entire application runs on each node Little or no data sharing among compute nodes BLAST (pattern matching) A shared memory machine will run all sorts of application

systems PDVR3D (Eigenvalues and Eigenstates of a matrix) Coarse-Grained Application Uses many systems to run one application Infrequent data sharing among systems Casino

19 Types of Applications Forward Modelling Inversion Signal Processing Searching/Comparing

20 Forward Modelling Solving linear equations Grid Based Parallelization by domain decomposition (split and distribute the data) Finite element/finite difference

21 Inversion From measurements (F) compute models (M) representing properties (d) of the measured object(s). Deterministic Matrix inversions Conjugate gradient Stochastic Monte Carlo, Markov chain Genetic algorithms Generally large amounts of shared memory Parallelism through multiple runs with different models

22 Signal Processing/Quantum Mechanics Convolution model (stencil) Matrix computations (eigenvalues ) Conjugate gradient methods (matrix methods) Normally not very demanding on latency and bandwidth Some algorithms are embarrassingly parallel Examples: seismic migration/processing, medical imaging, SETI@Home

23 Searching/Comparing Integer operations are more dominant than floating point IO intensive Pattern matching Embarrassingly parallel very suitable for grid computing Examples: encryption/decryption, message interception, bioinformatics, data mining Examples: BLAST, HMMER

24 Application Classes Applications FEA Finite Element Analysis The simulation of hard physical materials, e.g. metal, plastic Crash test, product design, suitability for purpose Examples: MSC Nastran, Ansys, LS-Dyna, Abaqus, ESI PAMCrash, Radioss CFD Computational Fluid Dynamics The simulation of soft physical materials, gases and fluids Engine design, airflow, oil reservoir modelling Examples: Fluent, Star-CD, CFX Geophysical Sciences Seismic Imaging taking echo traces and building a picture of the sub-earth geology Reservoir Simulation CFD specific to oil asset management Examples: Omega, Landmark VIP and Pro/Max, Geoquest Eclipse

25 Application Classes Applications Life Sciences Understanding the living world genome matching, protein folding, drug design, bio-informatics, organic chemistry Examples: BLAST, Gaussian, other High Energy Physics Understanding the atomic and sub-atomic world Software from Fermi-Lab or CERN, or home-grown Financial Modelling Meeting internal and external financial targets particularly regarding investment positions VaR Value at Risk assessing the impact of economic and political factors on the bank s investment portfolio Trader Risk Analysis what is the risk on a trader s position, a group of traders

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance