High Performance Computing. Course Notes 2007-2008. HPC Fundamentals



Similar documents
Symmetric Multiprocessing

Parallel Programming Survey

Lecture 23: Multiprocessors

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

White Paper The Numascale Solution: Extreme BIG DATA Computing

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

LS DYNA Performance Benchmarks and Profiling. January 2009

Principles and characteristics of distributed systems and environments

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Introduction to Cloud Computing

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

White Paper The Numascale Solution: Affordable BIG DATA Computing

Chapter 18: Database System Architectures. Centralized Systems

Data Centric Systems (DCS)

CMSC 611: Advanced Computer Architecture

How To Understand The Concept Of A Distributed System

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Advanced Core Operating System (ACOS): Experience the Performance

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

High Performance Computing

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

A Very Brief History of High-Performance Computing

Rambus Smart Data Acceleration

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

System Models for Distributed and Cloud Computing

General Overview of Shared-Memory Multiprocessor Systems

Basic Concepts in Parallelization

PRIMERGY server-based High Performance Computing solutions

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

Trends in High-Performance Computing for Power Grid Applications

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

Performance Monitoring of Parallel Scientific Applications

Multilevel Load Balancing in NUMA Computers

Principles of Operating Systems CS 446/646

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Contents. Chapter 1. Introduction

Chapter 1: Introduction. What is an Operating System?

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Performance of the JMA NWP models on the PC cluster TSUBAME.

High Performance Computing (HPC)

CS 575 Parallel Processing

Cluster, Grid, Cloud Concepts

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

Achieving Performance Isolation with Lightweight Co-Kernels

Clusters: Mainstream Technology for CAE

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA (T) (F)

OpenMP Programming on ScaleMP

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Clouds vs Grids KHALID ELGAZZAR GOODWIN 531

Scaling Study of LS-DYNA MPP on High Performance Servers

Distributed Systems LEEC (2005/06 2º Sem.)

Big Data Management in the Clouds and HPC Systems

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

MOSIX: High performance Linux farm

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Chapter 2 Parallel Architecture, Software And Performance

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Operating System Multilevel Load Balancing

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Effective Computing with SMP Linux

Operating Systems 4 th Class

Cellular Computing on a Linux Cluster

How To Build A Cloud Computer

Distributed communication-aware load balancing with TreeMatch in Charm++

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Client/Server Computing Distributed Processing, Client/Server, and Clusters

independent systems in constant communication what they are, why we care, how they work

Cosmological simulations on High Performance Computers

Big Data Processing: Past, Present and Future

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Understanding the Benefits of IBM SPSS Statistics Server

Multi-Threading Performance on Commodity Multi-Core Processors

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

BMW11: Dealing with the Massive Data Generated by Many-Core Systems. Dr Don Grice IBM Corporation

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Types Of Operating Systems

HPC Deployment of OpenFOAM in an Industrial Setting

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Cluster Computing at HRI

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

1 Bull, 2011 Bull Extreme Computing

Cloud Optimize Your IT

Performance And Scalability In Oracle9i And SQL Server 2000

Client/Server and Distributed Computing

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

MCA Standards For Closely Distributed Multicore

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD

Architectures for Big Data Analytics A database perspective

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

Transcription:

High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals

Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs 100m FLOPS Today, a 2G Hz desktop/laptop performs a few giga FLOPS Today, a supercomputer performs tens of Tera FLOPS (Top500) High performance: O(1000) more powerful than the latest desktops Most supercomputers are obsolete in terms of performance before the end of their physical life. 2

Applications of HPC HPC is Driven by demand of computation-intensive applications from various areas Medical, Biology, neuroscience (e.g. simulation of brains) Finance (e.g. modelling the world economy) Military and Defence (e.g. modelling explosion of nuclear weapons) Engineering (e.g. simulations of a car crash or a new airplane design) 3

An Example of Demands in Computing Capability Project: Blue Brain aim: construct a simulated brain Building blocks of a brain are neurocortical columns A column consists of about 60,000 neurons Human brain contains millions of such columns First stage: simulate a single column (each processor acting as one or two neurons) Then: simulate a small network of columns Ultimate goal: simulate the whole human brain IBM contributes Blue Gene supercomputer 4

Related Technologies HPC covers a wide range of technologies: computer architecture CPU, memory, VLSI Compilers Identify inefficient implementations Make use of the characteristics of the computer architecture Choose suitable compiler for a certain architecture Algorithms (for parallel and distributed systems) How to program on parallel and distributed systems Middleware From Grid computing technology Application->middleware->operating system Resource discovery and sharing 5

History of High Performance Computing 1960s: Scalar processor Process one data item at a time 1970s: Vector processor Can process an array of data items at one go Architecture Overhead Difference between vector processor and scalar processor Later 1980s: Massively Parallel Processing (MPP) Up to thousands of processors, each with its own memory and OS Break down a problem Difference between MPP and vector processor Later 1990s: Cluster Not a new term itself, but renewed interests Connecting stand-alone computers with high-speed network Difference between cluster and MPP Later 1990s: Grid Tackle collaboration among geographically distributed organisations Draw an analogue from Power grid Difference between Grid and cluster 6

Parallel computing vs. distributed computing Parallel Computing Breaking the problem to be computed into parts that can be run simultaneously in different processors Example: an MPI program to perform matrix multiplication Solve tightly coupled problems Distributed Computing Parts of the work to be computed are computed in different places (Note: does not necessarily imply simultaneous processing) An example: C/S model Solve loosely-coupled problems (no much communication) 7

Architecture Types SMP (Symmetric Multi-Processing) Multiple CPUs, single memory, shared I/O All resources in a SMP machine are equally available to each CPU Does not scale well to a large number of processors (less than 8) - (Scalability is the measure of how well the system performance improves linearly to the number of processing elements) NUMA (Non-Uniform Memory Access) Multiple CPUs Each CPU has fast access to its local area of the memory, but slower access to other areas Scale well to a large number of processors Complicated memory access pattern and system bus MPP (Massively Parallel Processing) Cluster 8

Illustration for Architecture Types Shared memory (uniform memory access - SMP) Processors share access to a common memory space. Implemented over a shared memory bus or communication network. Support for critical sections are required Local cache is critical: If not, bus contention (or network traffic) reduces the systems efficiency. For this reason, pure shared memory systems do not scale naturally. Cache introduces problems of coherency (ensuring that stale cache lines are invalidated when other processors alter shared memory). Shared Memory Interconnect PE PE 0 n 9

Illustration for Architecture Types Shared memory (Nonuniform memory access: NUMA) PE may be fetching from local or remote memory - hence nonuniform access times. NUMA Interconnect cc-numa (cache-coherent Non- Uniform Memory Access) Groups of processors are connected together by a fast interconnect (SMP) These are then connected together by a high-speed interconnect. Global address space. Shared Memory 1 PE PE 1 n Shared Memory m PE PE (m-1)n+1 m.n 10

Illustration for Architecture Types Distributed Memory (MPP, cluster) Each processor has it s own local memory. When processors need to exchange (or share data), they must do this through an explicit communication Message passing (MPI language) Interconnect Typically larger latencies between PEs (especially if they communicate via overnetwork interconnections). Scalability is good if the problems can be sufficiently contained within PEs. PE 0 M 0 PE n M n 11

Goals of HPC Minimise the execution time given the certain number of applications (strong scaling) Maximise the number of applications being completed, given a certain amount of time (weak scaling) Identify compromise between performance and cost. 12