Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1



Similar documents
1 Bull, 2011 Bull Extreme Computing

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

High Performance Computing

Cloud Computing through Virtualization and HPC technologies

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

High Performance Computing. Course Notes HPC Fundamentals

Improved LS-DYNA Performance on Sun Servers

Performance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Introduction to Cloud Computing

Parallel Programming Survey

Cluster Computing at HRI

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Cluster Computing in a College of Criminal Justice

benchmarking Amazon EC2 for high-performance scientific computing

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Data Centric Systems (DCS)

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Scalability and Classifications

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

White Paper The Numascale Solution: Extreme BIG DATA Computing

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

System Models for Distributed and Cloud Computing

Clusters: Mainstream Technology for CAE

Lecture 2 Parallel Programming Platforms

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Lecture 1: the anatomy of a supercomputer

- An Essential Building Block for Stable and Reliable Compute Clusters

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Fast Setup and Integration of ABAQUS on HPC Linux Cluster and the Study of Its Scalability

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Panasas: High Performance Storage for the Engineering Workflow

High Performance Computing in CST STUDIO SUITE

HP ProLiant SL270s Gen8 Server. Evaluation Report

Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing

Rodrigo Fernandes de Mello, Evgueni Dodonov, José Augusto Andrade Filho

Supercomputing Status und Trends (Conference Report) Peter Wegner

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Chapter 2 Parallel Computer Architecture

A PERFORMANCE COMPARISON USING HPC BENCHMARKS: WINDOWS HPC SERVER 2008 AND RED HAT ENTERPRISE LINUX 5

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

LS DYNA Performance Benchmarks and Profiling. January 2009

Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA (T) (F)

Using PCI Express Technology in High-Performance Computing Clusters

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

Parallel Computing. Introduction

CONSISTENT PERFORMANCE ASSESSMENT OF MULTICORE COMPUTER SYSTEMS

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Harvard Research Group Experience - Expertise - Insight - Results. Microsoft Windows Compute Cluster Server 2003 (WCCS) Industry Standard Expansion

Building Clusters for Gromacs and other HPC applications

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Multicore Parallel Computing with OpenMP

Principles and characteristics of distributed systems and environments

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Trends in High-Performance Computing for Power Grid Applications

Optimizing Shared Resource Contention in HPC Clusters

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

An introduction to Fyrkat

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

High Performance Computing, an Introduction to

Lecture 1. Course Introduction

White Paper The Numascale Solution: Affordable BIG DATA Computing

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

SAS Business Analytics. Base SAS for SAS 9.2

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Multi-Threading Performance on Commodity Multi-Core Processors

Jean-Pierre Panziera Teratec 2011

SR-IOV In High Performance Computing

Symmetric Multiprocessing

Supercomputing on Windows. Microsoft (Thailand) Limited

64-Bit versus 32-Bit CPUs in Scientific Computing

PRIMERGY server-based High Performance Computing solutions

CMSC 611: Advanced Computer Architecture

High Performance Computing (HPC)

Current Trend of Supercomputer Architecture

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

Parallel Computing. Benson Muite. benson.

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Lecture 23: Multiprocessors

Lattice QCD Performance. on Multi core Linux Servers

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Storage Virtualization from clusters to grid

Transcription:

Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1

What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these are NOT High Availability clusters HPTC = High Performance Technical Computing The ultimate aim of HPC users is to max out the CPUs!

Agenda Parallel Computing Concepts Clusters Cluster Usage

Concurrency and Parallel Computing A central concept in computer science is concurrency: Concurrency: Computing in which multiple tasks are active at the same time. There are many ways to use Concurrency: Concurrency is key to all modern Operating Systems as a way to hide latencies. Concurrency can be used together with redundancy to provide high availability. Parallel Computing uses concurrency to decrease program runtimes. HPC systems are based on onparallel Computing

Hardware for Parallel Computing Parallel computers are classified in terms of streams of data and streams of instructions: MIMD Computers: Multiple streams of instructions acting on multiple streams of data. SIMD Computers: A single stream of instructions acting on multiple streams of data. Parallel Hardware comes in many forms: On chip: Instruction level parallelism (e.g. IPF) Multicore: Multiple execution cores inside a single CPU Multiprocessor: Multiple processors inside a single computer. Multicomputer: networks of computers working together.

Hardware for Parallel Computing Parallel Computers Single Instruction Multiple Data (SIMD)* Multiple Instruction Multiple Data (MIMD) Shared Address Space Disjoint Address Space Symmetric Multiprocessor (SMP) Non-uniform Memory Architecture (NUMA) Massively Parallel Processor (MPP) Cluster Distributed Computing

What is an HPC Cluster A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. A typical cluster uses: Commodity off the shelf (COTS) parts Low latency communication protocols between the disjoint address spaces (memory)

What is HPCC? Master Node File Server / Gateway Compute nodes Cluster Management Tools

Cluster Architecture View Application Parallel Benchmarks: Perf,, Ring, HINT, NAS, Real Applications Middleware shmem MPI PVM OS OS Other OSes Linux Protocol TCP/IP VIA Proprietary Interconnect Ethernet Quadrics Infiniband Myrinet Hardware desktop Workstation Server 1P/2P Server 4U +

Cluster Hardware The Node A single element within the cluster Compute Node Just computes little else Private IP address no user access Master/Head/Front End Node User login Job scheduler Public IP address connects to external network Management/Administrator Node Systems/cluster management functions Secure administrator address I/O Node Access to data Generally internal to cluster or to data centre

Interconnect Interconnect 100 Mbps Ethernet Typical Latency usec 75 Typical Bandwidth MB/s 80 1Gbit/s Ethernet 60-90 90 10 Gb/s Ethernet 12-20 800 Myricom Myrinet* 2.2-3 2500 InfiniBand* 2-4 1400-2500

Agenda Parallel Computing Concepts Clusters Cluster Usage

Cluster Usage Performance Measurements Usage Model Application Classification Application Behaviour

The Mysterious FLOPS 1 GFlops = 1 billion floating point operations per second Theoretical v Real GFlops Xeon Processor Theoretical peak = 4 x Clock speed Xeons have 128 bit SSE registers which allows the processor to carry out 2 double precision floating point add and 2 multiply operations per clock cycle 2 computational cores per processor 2 processors per node (4 cores per node) Sustained (Rmax) = ~35-80% of theoretical peak (interconnect dependent) You ll NEVER hit peak!

Other measures of CPU performance SPEC Spec CPU2000/2006 Base single core performance indicator Spec CPU2000/2006 Rate node performance indicator SpecFP Floating Point performance SpecINT Integer performance Many other performance metrics may be required STREAM - memory bandwidth HPL High Performance Linpack NPB suite of performance tests Pallas Parallel Benchmark another suite IOZone file system throughput

Technology Advancements in 5 Years Codename Release date GHz Number of cores Peak FLOP per CPU cycle Peak GFLOPS per CPU Linpack on 256 Processors Westmere Nov 2009 3.0 6 4 72 14500 Woodcrest June 2006 3.0 2 4 24 4781 * From November 2001 top500 supercomputer list (cluster of Dell Precision 530) ** Intel internal cluster built in 2006

Usage Model Electronic Design Monte Carlo Design Optimisation Parallel Search Many Serial Jobs (Capacity) Many Users Mixed size Parallel/Serial jobs Ability to Partition and Allocate Jobs to Nodes for Best Performance Meteorology Seismic Analysis Fluid Dynamics Molecular Chemistry One Big Parallel Job (Capability) Batch Usage Load Balancing More Important Job Scheduling very important Normal Mixed Usage Appliance Usage Interconnect More Important

Application and Usage Model HPC clusters run parallel applications, and applications in parallel! One single application that takes advantage of multiple computing platforms Fine-Grained Application Uses many systems to run one application Shares data heavily across systems PDVR3D (Eigenvalues and Eigenstates of a matrix) Coarse-Grained Application Uses many systems to run one application Infrequent data sharing among systems Casino (Monte-Carlo stochastic methods) Pleasurably Parallel/HTC Application An instance of the entire application runs on each node Little or no data sharing among compute nodes BLAST (pattern matching) A shared memory machine will run all sorts of application

Types of Applications Forward Modelling Inversion Signal Processing Searching/Comparing

Forward Modelling Solving linear equations Grid Based Parallelization by domain decomposition (split and distribute the data) Finite element/finite difference

Inversion From measurements (F) compute models (M) representing properties (d) of the measured object(s). Deterministic Matrix inversions Conjugate gradient Stochastic Monte Carlo, Markov chain Genetic algorithms Generally large amounts of shared memory Parallelism through multiple runs with different models

Signal Processing/Quantum Mechanics Convolution model (stencil) Matrix computations (eigenvalues ) Conjugate gradient methods (matrix methods) Normally not very demanding on latency and bandwidth Some algorithms are embarrassingly parallel Examples: seismic migration/processing, medical imaging, SETI@Home

Searching/Comparing Integer operations are more dominant than floating point IO intensive Pattern matching Embarrassingly parallel very suitable for grid computing Examples: encryption/decryption, message interception, bioinformatics, data mining Examples: BLAST, HMMER

Application Classes Applications FEA Finite Element Analysis The simulation of hard physical materials, e.g. metal, plastic Crash test, product design, suitability for purpose Examples: MSC Nastran, Ansys, LS-Dyna, Abaqus, ESI PAMCrash, Radioss CFD Computational Fluid Dynamics The simulation of soft physical materials, gases and fluids Engine design, airflow, oil reservoir modelling Examples: Fluent, Star-CD, CFX Geophysical Sciences Seismic Imaging taking echo traces and building a picture of the sub-earth geology Reservoir Simulation CFD specific to oil asset management Examples: Omega, Landmark VIP and Pro/Max, Geoquest Eclipse

Application Classes Applications Life Sciences Understanding the living world genome matching, protein folding, drug design, bio-informatics, organic chemistry Examples: BLAST, Gaussian, other High Energy Physics Understanding the atomic and sub-atomic world Software from Fermi-Lab or CERN, or home-grown Financial Modelling Meeting internal and external financial targets particularly regarding investment positions VaR Value at Risk assessing the impact of economic and political factors on the bank s investment portfolio Trader Risk Analysis what is the risk on a trader s position, a group of traders