Parallel Computing. Introduction



Similar documents
Introduction to Cloud Computing

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Linux Cluster Computing An Administrator s Perspective

SOSCIP Platforms. SOSCIP Platforms

HP ProLiant SL270s Gen8 Server. Evaluation Report

Parallel Programming Survey

Visit to the National University for Defense Technology Changsha, China. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

ANALYSIS OF SUPERCOMPUTER DESIGN

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

A Very Brief History of High-Performance Computing

GPGPU accelerated Computational Fluid Dynamics

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Trends in High-Performance Computing for Power Grid Applications

High Performance Computing in CST STUDIO SUITE

HPC Wales Skills Academy Course Catalogue 2015

HPC-related R&D in 863 Program

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Accelerating CFD using OpenFOAM with GPUs

Introduction to GPU hardware and to CUDA

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

第 十 三 回 PCクラスタシンポジウム. Cray クラスタ 製 品 のご 紹 介 クレイ ジャパン インク

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

ST810 Advanced Computing

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

Performance of the JMA NWP models on the PC cluster TSUBAME.

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

High Performance Computing. Course Notes HPC Fundamentals

1 Bull, 2011 Bull Extreme Computing

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Overview of HPC systems and software available within

Cluster Computing at HRI

Scalability and Classifications

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

10- High Performance Compu5ng

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

BSC - Barcelona Supercomputer Center

Introduction to GPGPU. Tiziano Diamanti

Cluster Computing in a College of Criminal Justice

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Linux clustering. Morris Law, IT Coordinator, Science Faculty, Hong Kong Baptist University

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

HPC with Multicore and GPUs

High-Performance Computing and Big Data Challenge

Overview of HPC Resources at Vanderbilt

White Paper The Numascale Solution: Extreme BIG DATA Computing

~ Greetings from WSU CAPPLab ~

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

Introduction to Supercomputing with Janus

Part I Courses Syllabus

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Next Generation GPU Architecture Code-named Fermi

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Jezelf Groen Rekenen met Supercomputers

Performance Characteristics of Large SMP Machines

Pedraforca: ARM + GPU prototype

TSUBAME-KFC : a Modern Liquid Submersion Cooling Prototype Towards Exascale

Supercomputing Status und Trends (Conference Report) Peter Wegner

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Case Study on Productivity and Performance of GPGPUs

Stream Processing on GPUs Using Distributed Multimedia Middleware

PRIMERGY server-based High Performance Computing solutions

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

The Assessment of Benchmarks Executed on Bare-Metal and Using Para-Virtualisation

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Evaluation of CUDA Fortran for the CFD code Strukti

Turbomachinery CFD on many-core platforms experiences and strategies

(Toward) Radiative transfer on AMR with GPUs. Dominique Aubert Université de Strasbourg Austin, TX,

GPGPU acceleration in OpenFOAM

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Retargeting PLAPACK to Clusters with Hardware Accelerators

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief

Large-Data Software Defined Visualization on CPUs

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Transcription:

Parallel Computing Introduction Thorsten Grahs, 14. April 2014

Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00 Room RZ 65.4 Exercises Thursday 9:45-11:15 Room RZ 65.4 Matthias Huy 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 2

Administration II Begin Today (obviously!) Next lecture 28.04.2013 (due to eastern) Exercises 17.04.2013 Consulting hours Monday 13:00-14:00 (after the lecture) or via email Web http://www.wire.tu-bs.de/lehre/ss14/e_parallel.html Requirements Knowledge in Unix/Linux Programming experience in C/C++ 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 3

Administration III Criteria Active participation in the exercises (i.e. at least 50% of the homework) Exam (end of the semester) Target audience Students in computer science mathematics, and natural science Engineering/CSE. 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 4

Literature b Parallel Programming for Multicore & Cluster Systems Thomas Rauber, Gudula Rünger Springer Verlag (2010). Introduction to Parallel Computing Grama, Karypis, Kumar & Gupta Pearson (2003) An Introduction to Parallel Programming Peter Pacheco, Morgan Kaufmann (2011) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 5

Parallel Computing What the heck is Parallel Computing? Using different machines? Running as many cores as possible? Using a cluster? Or a super computer? What is this? and... Why parallel computing? 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 6

Parallel Computing Parallel computing is... a form of computation in which many calculations are carried out simultaneously operating on the principle that large problems can often be divided into smaller ones, concurrent computing Often mentioned in context of Super computing High Performance Computing (HPC) Scientific Computing (opposite of serial computing) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 7

Why Parallel Computing Parallel computing deals with size speed Size Problems that are interesting to scientists and engineers can t fit on a PC Speed Large Problems which runs on a single PC for month run on a cluster only for hours 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 8

Disciplines involved Computer Science Algorithms Programming models Communication/Distribution Mathematics Modeling Discretization (PDEs) Algorithms/Numerical linear algebra Engineering/Natural Science Hardware (Electronics) Applications Physics/Chemistry/Biology Manufacturing 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 9

Serial computing Traditionally, software has been written for serial computation To be run on a single computer having a single Central Processing Unit (CPU); Problem is broken into discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time. 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 10

Parallel computing In the simplest sense, the simultaneous use of multiple compute resources to solve a computational problem To be run using multiple CPUs Problem is broken into discrete parts that can be solved concurrently Each part is further broken down to a series of instructions Inst. from each part runs simultaneously on different CPUs 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 11

Paradigm change in HPC I The age of the dinosaurs Big and specialized Vector machines Specialized Computer with Array processors early 1970 mids 1990s Cray Thinking Machines CM-1 & CM-2 Control Data Corp. STAR-100 & ETA-10 Texas Instruments Adv. Scientific Computer (ASC) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 12

Paradigm change in HPC II The chicken shack Small and flexible Cluster Computing Mid/End of 1990s End of 2010 Beowulf-Project (Becker & Sterling, 1994) Beowulf NASA Project distributed memory machines based on standard hardware connected via Ethernet Programming model: MPI 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 13

Paradigm change in HPC III The next think is already out... GPGPU computing General Purpose Graphical Processing units) 2005 now GPGPUs Computing on graphics hardware special designed for calculation throughput orientated Programming model: CUDA/OpenCL GPGPU computing will be handled next semester 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 14

Development of computer resources 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 15

Cluster computing Distributed systems Parallel computing on systems with distributed memory For years just regarded as an theoretical application Paradigm change Problems to solve became bigger Gap between vector computer and pc smaller Standard components much cheaper Free operating systems (Linux) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 16

Cluster computing Supercomputer from standard components Beowulf-Project Donald Becker & Thomas Sterling 1994, NASA Difference to a COW (Cluster of Workstations) Accessible as one computer Original configuration 16 Motherboards with 486DX4 processors 16MB RAM per board Harddisks with 500 MB each per board. Open Source Software Unix/Linux PVM/MPI 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 17

GPGPUs General Purpose computing on General Processing Units Number crunching on Graphics devices) Started early 2000 years (Research field) 2006 Graphic vendors took up the task NVIDIA with CUDA (Programming model) Many powerful Arithmetic Logial Units (ALUs) on GPU 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 18

Hybrid cluster The largest and fastest computers in the world today employ both shared and distributed memory architectures. The shared memory component can be a cache coherent SMP machine and/or graphics processing units (GPU). 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 19

Top 500 Oak Ridge National Laboratory http://www.top500.org Statistics on high-performance computers Ranked by LINPACK benchmark LINPACK (LINear algebra PACKage) by J. Dongarra Ranked by their performance on this benchmark Increasing number of variables (matrix size) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 20

The TOP 500 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 21

Linpack benchmark Top500 vs PCs 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 22

Top #1 # 1 on Top 500 Super Computer list (Nov. 2013) Tianhe-2 (MilkyWay-2) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 23

Tianhe-2 Specifications National Super Computer Center in Guangzhou (China) Manufacturer: NUDT Cores: 3,120,000 Xeon 2.2GHz Linpack Performance (Rmax) 33,862.7 TFlop/s Theoretical Peak (Rpeak) 54,902.4 TFlop/s Power: 17,808.00 kw Memory: 1,024,000 GB Interconnect: TH Express-2 Operating System: Linux Compiler: icc MPI: MPICH2 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 24

The Titan The old number 1 Now # 2 on Top 500 Super Computer list (Nov. 2013) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 25

The Titan Specifications Oak Ridge National Laboratory Manufacturer: Cray Inc. Cores: 560.640 Opteron 6274 16C 2.2GHz Linpack Performance (Rmax) 17,590.0 TFlop/s Theoretical Peak (Rpeak) 27,112.5 TFlop/s Power: 8.209,00 kw Memory: 710.144 GB Interconnect: Gemini interconnect Operating System: Linux 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 26

Top500 systems in Germany 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 27

SC500 systems Accelerator 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 28

SC500 systems CoProcessor 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 29

Nvidia K20/K20X Release November 2012 2.688 ALUs 14 Stream Processors Memory Bandwidth: 250 GB/s 6 GiByte GDDR5-RAM 1,31 TFLOPS DPFP 3,95 TFLOPS SPFP K20x only for Server K20 also for Workstations 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 30

Cluster w. CUDA accelerators top 20 # 2 Titan DOE/SC/Oak Ridge National Laboratory, USA 18,688 Tesla K20x GPUs # 6 Piz Daint Swiss National Supercomputing Centre (CSCS) 5,272 Tesla K20x GPUs # 11 Tsubame 2.5 GSIC Center, Tokyo Institute of Technology, Japan 7168 Tesla K20x GPUs # 12 Tianhe-1A National Supercomputing Center in Tianjin, China 7168 Tesla k20x GPUs 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 31

The world is parallel Application areas Historically, parallel computing has been considered to be the high end of computing". It has been used to model difficult problems in many areas of science and engineering 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 32

Science & engineering Application I Atmosphere, Earth, Environment Physics - applied, nuclear, particle, condensed matter, high pressure, fusion, photonics Electrical Engineering, Circuit Design, Microelectronics Computer Science, Mathematics Chemistry, Molecular Sciences 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 33

Science & engineering Application II Mechanical Engineering - from prosthetics to spacecraft Bioscience, Biotechnology, Genetics Geology, Seismology Climate modeling, Ocean 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 34

Industrial & Commercial Application III Databases, data mining Oil exploration Web search engines, web based business services Medical imaging and diagnosis Pharmaceutical design 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 35

Industrial & Commercial Application III Financial and economic modeling Management of national and multi-national corporations Advanced graphics and virtual reality, particularly in the entertainment industry Networked video and multi-media technologies Collaborative work environments 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 36

Example Weather prediction I Numerical simulation of the atmosphere Discretization of the atmosphere Represented by 3-dimensional grid Computation of physical values in each grid point Navier-Stokes equation (5 equations in 3 dim) Temperature Air pressure (wind) velocity 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 37

Example Weather prediction II Non linearities Local weather (e.g. in Germany) depends on anti-cyclone over the Azores cyclone over Iceland Model has to handle different scales Big scales to incorporate relevant areas, e.g. Azores Iceland Gulf stream) and also local/small scales 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 38

Example Weather prediction III Global weather model Horizontal grid spacing: 1 km Vertical spacing (height): 20 km = 10 10 grid points Temporal resolution depends on spatial resolution (CFL criteria), i.e. t 10 seconds Computing 3 days in advance needs 26.000 time steps Computation of all relevant physical properties i.e. 5 Partial Differential Equations (PDEs) Assumption: 100 operations per time step = 2.6 10 16 operations for the forecast 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 39

Example Weather prediction IV FLOPs Floating Point Operations Per Second (FLOPs) is a measure for the performance of (super) computer Consider the 2.6 10 16 operations for the forecast Personal Computer (PC) 10 10 9 FLOPs, i.e. 1 GigaFLOP Simulation time: 30 days Cluster computer 10 10 12 FLOPs i.e.1 TerraFLOP Simulation time: 8 hours 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 40