Parallel Computing Introduction Thorsten Grahs, 14. April 2014
Administration Lecturer Dr. Thorsten Grahs (that s me) t.grahs@tu-bs.de Institute of Scientific Computing Room RZ 120 Lecture Monday 11:30-13:00 Room RZ 65.4 Exercises Thursday 9:45-11:15 Room RZ 65.4 Matthias Huy 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 2
Administration II Begin Today (obviously!) Next lecture 28.04.2013 (due to eastern) Exercises 17.04.2013 Consulting hours Monday 13:00-14:00 (after the lecture) or via email Web http://www.wire.tu-bs.de/lehre/ss14/e_parallel.html Requirements Knowledge in Unix/Linux Programming experience in C/C++ 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 3
Administration III Criteria Active participation in the exercises (i.e. at least 50% of the homework) Exam (end of the semester) Target audience Students in computer science mathematics, and natural science Engineering/CSE. 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 4
Literature b Parallel Programming for Multicore & Cluster Systems Thomas Rauber, Gudula Rünger Springer Verlag (2010). Introduction to Parallel Computing Grama, Karypis, Kumar & Gupta Pearson (2003) An Introduction to Parallel Programming Peter Pacheco, Morgan Kaufmann (2011) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 5
Parallel Computing What the heck is Parallel Computing? Using different machines? Running as many cores as possible? Using a cluster? Or a super computer? What is this? and... Why parallel computing? 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 6
Parallel Computing Parallel computing is... a form of computation in which many calculations are carried out simultaneously operating on the principle that large problems can often be divided into smaller ones, concurrent computing Often mentioned in context of Super computing High Performance Computing (HPC) Scientific Computing (opposite of serial computing) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 7
Why Parallel Computing Parallel computing deals with size speed Size Problems that are interesting to scientists and engineers can t fit on a PC Speed Large Problems which runs on a single PC for month run on a cluster only for hours 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 8
Disciplines involved Computer Science Algorithms Programming models Communication/Distribution Mathematics Modeling Discretization (PDEs) Algorithms/Numerical linear algebra Engineering/Natural Science Hardware (Electronics) Applications Physics/Chemistry/Biology Manufacturing 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 9
Serial computing Traditionally, software has been written for serial computation To be run on a single computer having a single Central Processing Unit (CPU); Problem is broken into discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time. 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 10
Parallel computing In the simplest sense, the simultaneous use of multiple compute resources to solve a computational problem To be run using multiple CPUs Problem is broken into discrete parts that can be solved concurrently Each part is further broken down to a series of instructions Inst. from each part runs simultaneously on different CPUs 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 11
Paradigm change in HPC I The age of the dinosaurs Big and specialized Vector machines Specialized Computer with Array processors early 1970 mids 1990s Cray Thinking Machines CM-1 & CM-2 Control Data Corp. STAR-100 & ETA-10 Texas Instruments Adv. Scientific Computer (ASC) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 12
Paradigm change in HPC II The chicken shack Small and flexible Cluster Computing Mid/End of 1990s End of 2010 Beowulf-Project (Becker & Sterling, 1994) Beowulf NASA Project distributed memory machines based on standard hardware connected via Ethernet Programming model: MPI 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 13
Paradigm change in HPC III The next think is already out... GPGPU computing General Purpose Graphical Processing units) 2005 now GPGPUs Computing on graphics hardware special designed for calculation throughput orientated Programming model: CUDA/OpenCL GPGPU computing will be handled next semester 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 14
Development of computer resources 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 15
Cluster computing Distributed systems Parallel computing on systems with distributed memory For years just regarded as an theoretical application Paradigm change Problems to solve became bigger Gap between vector computer and pc smaller Standard components much cheaper Free operating systems (Linux) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 16
Cluster computing Supercomputer from standard components Beowulf-Project Donald Becker & Thomas Sterling 1994, NASA Difference to a COW (Cluster of Workstations) Accessible as one computer Original configuration 16 Motherboards with 486DX4 processors 16MB RAM per board Harddisks with 500 MB each per board. Open Source Software Unix/Linux PVM/MPI 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 17
GPGPUs General Purpose computing on General Processing Units Number crunching on Graphics devices) Started early 2000 years (Research field) 2006 Graphic vendors took up the task NVIDIA with CUDA (Programming model) Many powerful Arithmetic Logial Units (ALUs) on GPU 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 18
Hybrid cluster The largest and fastest computers in the world today employ both shared and distributed memory architectures. The shared memory component can be a cache coherent SMP machine and/or graphics processing units (GPU). 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 19
Top 500 Oak Ridge National Laboratory http://www.top500.org Statistics on high-performance computers Ranked by LINPACK benchmark LINPACK (LINear algebra PACKage) by J. Dongarra Ranked by their performance on this benchmark Increasing number of variables (matrix size) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 20
The TOP 500 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 21
Linpack benchmark Top500 vs PCs 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 22
Top #1 # 1 on Top 500 Super Computer list (Nov. 2013) Tianhe-2 (MilkyWay-2) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 23
Tianhe-2 Specifications National Super Computer Center in Guangzhou (China) Manufacturer: NUDT Cores: 3,120,000 Xeon 2.2GHz Linpack Performance (Rmax) 33,862.7 TFlop/s Theoretical Peak (Rpeak) 54,902.4 TFlop/s Power: 17,808.00 kw Memory: 1,024,000 GB Interconnect: TH Express-2 Operating System: Linux Compiler: icc MPI: MPICH2 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 24
The Titan The old number 1 Now # 2 on Top 500 Super Computer list (Nov. 2013) 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 25
The Titan Specifications Oak Ridge National Laboratory Manufacturer: Cray Inc. Cores: 560.640 Opteron 6274 16C 2.2GHz Linpack Performance (Rmax) 17,590.0 TFlop/s Theoretical Peak (Rpeak) 27,112.5 TFlop/s Power: 8.209,00 kw Memory: 710.144 GB Interconnect: Gemini interconnect Operating System: Linux 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 26
Top500 systems in Germany 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 27
SC500 systems Accelerator 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 28
SC500 systems CoProcessor 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 29
Nvidia K20/K20X Release November 2012 2.688 ALUs 14 Stream Processors Memory Bandwidth: 250 GB/s 6 GiByte GDDR5-RAM 1,31 TFLOPS DPFP 3,95 TFLOPS SPFP K20x only for Server K20 also for Workstations 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 30
Cluster w. CUDA accelerators top 20 # 2 Titan DOE/SC/Oak Ridge National Laboratory, USA 18,688 Tesla K20x GPUs # 6 Piz Daint Swiss National Supercomputing Centre (CSCS) 5,272 Tesla K20x GPUs # 11 Tsubame 2.5 GSIC Center, Tokyo Institute of Technology, Japan 7168 Tesla K20x GPUs # 12 Tianhe-1A National Supercomputing Center in Tianjin, China 7168 Tesla k20x GPUs 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 31
The world is parallel Application areas Historically, parallel computing has been considered to be the high end of computing". It has been used to model difficult problems in many areas of science and engineering 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 32
Science & engineering Application I Atmosphere, Earth, Environment Physics - applied, nuclear, particle, condensed matter, high pressure, fusion, photonics Electrical Engineering, Circuit Design, Microelectronics Computer Science, Mathematics Chemistry, Molecular Sciences 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 33
Science & engineering Application II Mechanical Engineering - from prosthetics to spacecraft Bioscience, Biotechnology, Genetics Geology, Seismology Climate modeling, Ocean 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 34
Industrial & Commercial Application III Databases, data mining Oil exploration Web search engines, web based business services Medical imaging and diagnosis Pharmaceutical design 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 35
Industrial & Commercial Application III Financial and economic modeling Management of national and multi-national corporations Advanced graphics and virtual reality, particularly in the entertainment industry Networked video and multi-media technologies Collaborative work environments 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 36
Example Weather prediction I Numerical simulation of the atmosphere Discretization of the atmosphere Represented by 3-dimensional grid Computation of physical values in each grid point Navier-Stokes equation (5 equations in 3 dim) Temperature Air pressure (wind) velocity 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 37
Example Weather prediction II Non linearities Local weather (e.g. in Germany) depends on anti-cyclone over the Azores cyclone over Iceland Model has to handle different scales Big scales to incorporate relevant areas, e.g. Azores Iceland Gulf stream) and also local/small scales 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 38
Example Weather prediction III Global weather model Horizontal grid spacing: 1 km Vertical spacing (height): 20 km = 10 10 grid points Temporal resolution depends on spatial resolution (CFL criteria), i.e. t 10 seconds Computing 3 days in advance needs 26.000 time steps Computation of all relevant physical properties i.e. 5 Partial Differential Equations (PDEs) Assumption: 100 operations per time step = 2.6 10 16 operations for the forecast 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 39
Example Weather prediction IV FLOPs Floating Point Operations Per Second (FLOPs) is a measure for the performance of (super) computer Consider the 2.6 10 16 operations for the forecast Personal Computer (PC) 10 10 9 FLOPs, i.e. 1 GigaFLOP Simulation time: 30 days Cluster computer 10 10 12 FLOPs i.e.1 TerraFLOP Simulation time: 8 hours 14. April 2014 Thorsten Grahs Parallel Computing I SS 2014 Seite 40