Parallel Computing. Parallel shared memory computing with OpenMP
|
|
|
- Lynette Shields
- 9 years ago
- Views:
Transcription
1 Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs,
2 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI Thorsten Grahs Parallel Computing I SS 2014 Seite 2
3 OpenMP Easier parallelisation with threads MP stands for Multi Processing Standard for explicit shared memory parallelisation Extension to existing programming languages (C/C++/Fortran) Incremental parallelisation (Parallelisation of an existing serial program) Homepage: Thorsten Grahs Parallel Computing I SS 2014 Seite 3
4 OpenMP Idea Parallelising program by special instructions (directives) Model: Fork-Join Focus: parallelisation of loops Thorsten Grahs Parallel Computing I SS 2014 Seite 4
5 OpenMP fork/join for Distributes iterations of the loop on the team(threads) data parallelism. sections Divides the work into sections/work packages, each of which is executed by a thread functional parallelism single serial execution for a part of the program Thorsten Grahs Parallel Computing I SS 2014 Seite 5
6 OpenMP Goals Standardization To establish a standard between various competing shared memory platforms Lean and Mean Create a simple and limited instruction set for programming shared memory computers. Ease of Use Enable an incremental parallelisation of serial programs (Unlike the all-or-nothing approach of MPI) Portability Support of all common programming languages Open forum for users and developers Thorsten Grahs Parallel Computing I SS 2014 Seite 6
7 OpenMP Pros/Cons Pros simple parallelisation Higher abstraction than threads Sequential version can still be used Standard for shared memory Cons Only shared memory Limited use (loop type) Thorsten Grahs Parallel Computing I SS 2014 Seite 7
8 OpenMP Architecture Review Board (... from OpenMP Homepage) The OpenMP Architecture Review Board or the OpenMP ARB or just the ARB, is the non-profit corporation that owns the OpenMP brand, oversees the OpenMP specification and produces and approves new versions of the specification. Several companies are involved vendor-independent standard for parallel programming Permanent Members of the ARB: AMD, Cray, Fujitsu, HP, IBM, Intel,Microsoft, NEC, The Portland Group, SGI, Sun Auxiliary Members of the ARB: ASC/LLNL, compunity, EPCC, NASA, RWTH Aachen Thorsten Grahs Parallel Computing I SS 2014 Seite 8
9 OpenMP History Development 1997/1998: Version /2002: Version : Version : Version : Version 4.0 Supported Commercial Compiler: IBM, Portland, Intel OpenSource: GNU gcc OpenMP 4.0 from gcc Thorsten Grahs Parallel Computing I SS 2014 Seite 9
10 A first OpenMP program hello.c 1 # ifdef _OPENMP 2 # include <omp.h> 3 # endif 4 # include <stdio. h> 5 int main ( void ){ 6 int i ; 7 # pragma omp parallel for 8 for ( i = 0; i < 8; ++ i ){ 9 int id = omp_get_thread_num(); 10 printf ("Hello World from thread % d \n ", id ); 11 if ( id ==0) 12 printf ("There are % d threads\n",omp_get_num_threads ()); 13 } 14 return 0; Thorsten Grahs Parallel Computing I SS 2014 Seite }
11 Proceeding Starting a program, a single process is started on the CPU. This process corresponds to the master thread Master thread can now create & manage addit. threads The management of threads (creation, managing and termination) is done by OpenMP without user interaction # pragma omp parallel for instructs that the following tt for loop is distributed to the available threads omp_get_thread_num() indicates the current thread number omp_get_num_threads() indicates the total number of threads Thorsten Grahs Parallel Computing I SS 2014 Seite 11
12 Compile and Run Compiling gcc -fopenmp hello.c output 1 th@riemann:~$./a.out 2 Hello World from thread 4 3 Hello World from thread 6 4 Hello World from thread 3 5 Hello World from thread 1 6 Hello World from thread 5 7 Hello World from thread 2 8 Hello World from thread 7 9 Hello World from thread 0 10 There are 24 threads Thorsten Grahs Parallel Computing I SS 2014 Seite 12
13 Compiler directives OpenMP is defined mainly on compiler directives Directive format in C/C++ # pragma omp direktive-name [ clauses...] Fortran compiler not enabled for OpenMP process these directives as comments C-Compiler not enabled for OpenMP ignore unknown directives 1 $ gcc - Wall hello.c # ( gcc version ( < 4.2)) 2 hello.c : In function main : 3 hello.c :12: warning : ignoring # pragma omp parallel the program can be compiled by each (even not OpenMP enabled) compiler Thorsten Grahs Parallel Computing I SS 2014 Seite 13
14 Compiler directives II Conditional compilation C/C++: Macro _OPENMP is defined: 1 # ifdef _OPENMP 2 /* Openmp specific code, e.g. */ 3 nummer = omp_get_thread_num () ; 4 # endif omp parallel Creates additional threads, i.e. work is executed by all threads. Original thread (master thread) has thread ID 0. # pragma omp parallel [ clauses ] /* structured block ( no gotos...) */ Thorsten Grahs Parallel Computing I SS 2014 Seite 14
15 Work-sharing between threads I Loops The work is divided among the threads. E.g. for two threads Thread_1: Loop elements 0,... (N/2) 1 Thread_2: Loop elements (N/2),... N 1 1 # pragma omp parallel [ clauses...] 2 # pragma omp for [ clauses...] 3 for ( i =0; i < N ; i ++) 4 a [ i ]= i * i ; This can be summarized (omp parallel for): 1 # pragma omp parallel for [ clauses...] 2 for ( i =0; i < N ; i ++) 3 a [ i ]= i * i ; Thorsten Grahs Parallel Computing I SS 2014 Seite 15
16 Work-sharing between threads II parallel The work is distributed. Each thread processes a section: 1 # pragma omp parallel 2 # pragma omp sections 3 { 4 # pragma omp section [ clauses...] 5 [... section A runs parallel to B...] 6 # pragma omp section [ clauses...] 7 [... section B runs parallel to A...] 8 } Again one can combine: 1 # pragma omp parallel sections [ klauseln...] Thorsten Grahs Parallel Computing I SS 2014 Seite 16
17 Data sharing attribute clauses Scope of data Data clauses that are specified within OpenMP directives controls how variables are handled/shared between threads shared() the data within a parallel region is shared, which means visible and accessible by all threads simultaneously private() The data within a parallel region is private to each thread, which means each thread will have a local copy and use it as a temporary variable. A private variable is not initialized and the value is not maintained for use outside the parallel region Thorsten Grahs Parallel Computing I SS 2014 Seite 17
18 Scope of data Values of private variables are undefined during entry and leaving of loops. The following keywords allow to initialize/finalize variables: default(shared private none) Specifies the default value none Each variable has to be declared explicitly as shared() or private() firstprivate() Like private() but all copies are initialized with the values the variables have before the parallel loop/region. lastprivate() Variable keeps the last value from within the loop after leaving the section Thorsten Grahs Parallel Computing I SS 2014 Seite 18
19 Example private Initialization 1 #include <stdio.h> 2 #include <omp.h> 3 int main(int argc, char* argv[]) 4 { 5 int t=2; 6 int result = 0; 7 int A[100], i=0, j=0; 8 omp_set_num_threads(t); // Explicitly setting of 2 threads 9 for(i=0; i<100; i++){ 10 A[i] = i; 11 result += A[i]; 12 } 13 printf("array-sum BEFORE calculation: %d\n", result); Thorsten Grahs Parallel Computing I SS 2014 Seite 19
20 Example private Parallel section 1 i=0; 2 int T[2]; 3 T[0] = 0; 4 T[1] = 0; 5 #pragma omp parallel 6 { 7 #pragma omp for 8 for(i=0; i<100; i++){ 9 for(j=0; j<10; j++){ 10 A[i] = A[i] * 2; 11 T[omp_get_thread_num()]++; 12 } 13 } 14 } Thorsten Grahs Parallel Computing I SS 2014 Seite 20
21 Example private Output 1 i=0; result=0; 2 for(i=0; i<100; i++) 3 result += A[i]; 4 printf("array-sum AFTER calculation: %d\n", result); 5 printf("thread 1: %d calculations\n", T[0]); 6 printf("thread 2: %d calculations\n", T[1]); 7 return 0; 8 } Thorsten Grahs Parallel Computing I SS 2014 Seite 21
22 Example Calc. without private(j) Without private declaration 1 Array-Sum BEFORE calculation: Array-Sum AFTER calculation: Thread 1: 450 calculations 4 Thread 2: 485 calculations j is automatically initialized as shared Variable j is shared by both threads. Reason for wrong result Thorsten Grahs Parallel Computing I SS 2014 Seite 22
23 Example Modifying parallel section Parallel section with private(j) 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) 4 for(i=0; i<100; i++){ 5 for(j=0; j<10; j++){ 6 A[i] = A[i] * 2; 7 T[omp_get_thread_num()]++; 8 } 9 } 10 } Thorsten Grahs Parallel Computing I SS 2014 Seite 23
24 Example Calculation with private(j) With private declaration 1 Array-Sum BEFORE calculation: Array-Sum AFTER calculation: Thread 1: 500 calculations 4 Thread 2: 500 calculations j is now managed individually for each thread, i.e. privately declared Each thread has its own variable, so do not get confused in calculating Thorsten Grahs Parallel Computing I SS 2014 Seite 24
25 Race conditions Critical Situation (race condition) A race condition is a constellation where the result of an operation depends on the temporal behaviour of certain individual operations Could be handled in OpenMP: Critical section pragma opm critical ( name) Is used to resolve a race condition Let only one thread from the team executes the condition Thorsten Grahs Parallel Computing I SS 2014 Seite 25
26 Example critical Parallel section 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) 4 for(i=0; i<100; i++){ 5 for(j=0; j<10; j++){ 6 A[i] = A[i] * 2; 7 T[omp_get_thread_num()]++; 8 NumOfIters++; /*added*/ 9 } 10 } 11 } Thorsten Grahs Parallel Computing I SS 2014 Seite 26
27 Example critical Without critical declaration 1 Array-Sum BEFORE calculation: Array-Sum AFTER calculation: Thread 1: 500 calculations 4 Thread 2: 500 calculations 5 NumOfIters: 693 NumOfIters provides wrong result Threads hinder each other by incrementation private declaration is no solution, since all threads should be counted Thorsten Grahs Parallel Computing I SS 2014 Seite 27
28 Example critical Parallel section with critical declaration 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) 4 for(i=0; i<100; i++){ 5 for(j=0; j<10; j++){ 6 A[i] = A[i] * 2; 7 T[omp_get_thread_num()]++; 8 #pragma omp critical /*added critical*/ 9 NumOfIters++; 10 #pragma omp end critical 11 } 12 } 13 } Thorsten Grahs Parallel Computing I SS 2014 Seite 28
29 Example critical With critical declaration 1 Array-Sum BEFORE calculation: Array-Sum AFTER calculation: Thread 1: 500 calculations 4 Thread 2: 500 calculations 5 NumOfIters: 1000 NumOfIters is now executed serial Threads don t hinder each other by incrementation NumOfIters is increased by each thread one after another Thorsten Grahs Parallel Computing I SS 2014 Seite 29
30 Reduction Reduction of data Critical sections could be a bottle neck of a calculation. In our example, there is another way to measure the number of iterations safely: the reduction reduction The reduction-clause identifies specific, commonly used variables. In this variable multiple threads can accumulate values Collects contributions by different threads like reduction in MPI Thorsten Grahs Parallel Computing I SS 2014 Seite 30
31 Example reduction Parallel section with reduction() 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) reduction(+: NumOfIters) 4 5 for(i=0; i<100; i++){ 6 for(j=0; j<10; j++){ 7 A[i] = A[i] * 2; 8 T[omp_get_thread_num()]++; 9 NumOfIters++; 10 } 11 } 12 } Thorsten Grahs Parallel Computing I SS 2014 Seite 31
32 Example reduction With reduction() 1 Array-Sum BEFORE calculation: Array-Sum BEFORE calculation: Thread 1: 500 calculations 4 Thread 2: 500 calculations 5 NumOfIters: 1000 Reduction is accomplished by instruction reduction after loop-parallelisation Requires arithmetic operation and the reduction variable, separated by colons: reduction(+ : NumOfIters) Thorsten Grahs Parallel Computing I SS 2014 Seite 32
33 Conditional parallelisation if clauses It may be desirable to parallelise loops only when the effort is well justified. F.i. to assure that using multiple threads the run time is lesser compared to serial execution Properties Allows to decide at runtime whether a loop is executed in parallel (fork-join) or is executed serial Thorsten Grahs Parallel Computing I SS 2014 Seite 33
34 Example if clauses Parallel section with if 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) reduction(+:numofiters) if(n >500) 4 for(i=0; i<n; i++){ 5 for(j=0; j<10; j++){ 6 A[i] = A[i] * 2; 7 T[omp_get_thread_num()]++; 8 NumOfIters++; 9 } 10 } 11 } Thorsten Grahs Parallel Computing I SS 2014 Seite 34
35 Synchronization OpenMP provides several ways to coordinate thread execution. An important component is the synchronization of threads An application area we have already met: Reduction in case of the race condition in the critical section. Here we had implicit synchronization so that all threads execute the critical section sequentially The behaviour we find also in the atomic synchronisation Thorsten Grahs Parallel Computing I SS 2014 Seite 35
36 barrier-synchronisation barrier Threads are waiting until all have reached a common point. 1 # pragma omp 2 { 3 # pragma omp for nowait 4 for (i=0;i< N;i++) a[i] = b[i] + c[i]; 5 # pragma omp barrier 6 # pragma omp for 7 for (i=0;i<n;i++) d[i] = a[i] + b[i]; 8 } Thorsten Grahs Parallel Computing I SS 2014 Seite 36
37 atomic-synchronisation atomic similar to critical only permitted for specific operations i.e. (x++, ++x, x-, -x) 1 #pragma omp atomic 2 NumOfIters++; Thorsten Grahs Parallel Computing I SS 2014 Seite 37
38 master/single-synchronisation only one thread to execute a region (e.g. I/O) master Master executes the code section 1 # pragma omp master { 2 [ Code which should be executed once 3 ( by Master Thread ) ] 4 } single any thread could execute the code section 1 # pragma omp single { 2 [ Code which should be executed once 3 ( not necessarily by Master Thread ] 4 } Thorsten Grahs Parallel Computing I SS 2014 Seite 38
39 flush/ordered-synchronisation flush ensures a consistent view of memory. i.e. each thread gets its own temporary state M T of common memory M similar to a cache # pragma omp flush ( variables ) ordered specifies that the order of execution of the iterations inside the block must be the same as for serial program execution only permitted within a for loop # pragma omp for ordered {... }; Thorsten Grahs Parallel Computing I SS 2014 Seite 39
40 OpenMP vs. MPI I Advantages OpenMP Code is easier to parallelise and to understand Easy access to parallelisation Communication is implicit Program development much simpler More suitable as MPI for shared memory (when used correctly) Run-time scheduling is available good coarse & fine grained applications are possible Code can be executed serially Thorsten Grahs Parallel Computing I SS 2014 Seite 40
41 OpenMP vs. MPI II Disadvantages OpenMP In the form described so far only applicable to shared memory systems Classical synchronization errors (such as deadlocks and races conditions) can occur In coarse-grained parallelism often similar strategies as in MPI are necessary. Synchronization has to be implemented explicitly Parallelisation is mainly loop-parallelisation Thorsten Grahs Parallel Computing I SS 2014 Seite 41
42 OpenMP vs. MPI III Advantages MPI Properly applied explicit parallelisation can perform optimal results Optimal communication routines are predefined Synchronization is implicitly associated with routine-calls and therefore less error prone Full control is given to the programmer Run on both shared and distributed memory systems Thorsten Grahs Parallel Computing I SS 2014 Seite 42
43 OpenMP vs. MPI III Disadvantages MPI Programming with MPI is complex often large changes in the code are necessary Communication between different nodes is relatively slow Not very well suited for fine granularity Global Operations (on all nodes) can be extremely expensive Thorsten Grahs Parallel Computing I SS 2014 Seite 43
44 OpenMP & MPI I Hello World with MPI & OpenMP 1 #include <stdio. h> 2 #include "mpi.h" 3 #include <omp.h> 4 5 int main(int argc, char argv []) { 6 int numprocs, rank, namelen; 7 char processor_name[mpi_max_processor_name]; 8 int iam = 0, np = 1; Thorsten Grahs Parallel Computing I SS 2014 Seite 44
45 OpenMP & MPI II 1 MPI_Init(&argc, &argv); 2 MPI_Comm_size(MPI_COMM_WORLD, &numprocs); 3 MPI_Comm_rank(MPI_COMM_WORLD, &rank); 4 MPI_Get_processor_name(processor_name, &namelen); 5 6 #pragma omp parallel default(shared) private(iam, np) 7 { 8 np = omp_get_num_threads(); 9 iam = omp_get_thread_num(); 10 printf ("Hello from thread %d out of %d from process % d out of %d on %s\n", 11 iam, np, rank, numprocs, processor_name); 12 } 13 MPI_Finalize(); 14 } Thorsten Grahs Parallel Computing I SS 2014 Seite 45
46 OpenMP & MPI III Compiling > mpicc -openmp hello.c -o hello Executing > export OMP_NUM_THREADS=4 > mpirun -np 2 -machinefile machinefile.morab -x OMP_NUM_THREADS./hello Thorsten Grahs Parallel Computing I SS 2014 Seite 46
47 OpenMP & MPI VI Output 1 Hello from thread 0 out of 4 from process 0 out of 2 on m006 2 Hello from thread 1 out of 4 from process 0 out of 2 on m006 3 Hello from thread 2 out of 4 from process 0 out of 2 on m006 4 Hello from thread 3 out of 4 from process 0 out of 2 on m006 5 Hello from thread 0 out of 4 from process 1 out of 2 on m001 6 Hello from thread 3 out of 4 from process 1 out of 2 on m001 7 Hello from thread 1 out of 4 from process 1 out of 2 on m001 8 Hello from thread 2 out of 4 from process 1 out of 2 on m Thorsten Grahs Parallel Computing I SS 2014 Seite 47
48 OpenMP & MPI Outlook Current trend - combined MPI/OpenMP OpenMP on Multi-core processors MPI Communication between processors/nodes Advantage Optimal utilization of resources Disadvantages Even more complicated to program Not necessarily faster Thorsten Grahs Parallel Computing I SS 2014 Seite 48
Parallel Computing. Shared memory parallel programming with OpenMP
Parallel Computing Shared memory parallel programming with OpenMP Thorsten Grahs, 27.04.2015 Table of contents Introduction Directives Scope of data Synchronization 27.04.2015 Thorsten Grahs Parallel Computing
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State
OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
Objectives. Overview of OpenMP. Structured blocks. Variable scope, work-sharing. Scheduling, synchronization
OpenMP Objectives Overview of OpenMP Structured blocks Variable scope, work-sharing Scheduling, synchronization 1 Overview of OpenMP OpenMP is a collection of compiler directives and library functions
WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed
WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between
High Performance Computing
High Performance Computing Oliver Rheinbach [email protected] http://www.mathe.tu-freiberg.de/nmo/ Vorlesung Introduction to High Performance Computing Hörergruppen Woche Tag Zeit Raum
Why Choose C/C++ as the programming language? Parallel Programming in C/C++ - OpenMP versus MPI
Parallel Programming (Multi/cross-platform) Why Choose C/C++ as the programming language? Compiling C/C++ on Windows (for free) Compiling C/C++ on other platforms for free is not an issue Parallel Programming
An Introduction to Parallel Programming with OpenMP
An Introduction to Parallel Programming with OpenMP by Alina Kiessling E U N I V E R S I H T T Y O H F G R E D I N B U A Pedagogical Seminar April 2009 ii Contents 1 Parallel Programming with OpenMP 1
OpenMP C and C++ Application Program Interface
OpenMP C and C++ Application Program Interface Version.0 March 00 Copyright 1-00 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP
#pragma omp critical x = x + 1; !$OMP CRITICAL X = X + 1!$OMP END CRITICAL. (Very inefficiant) example using critical instead of reduction:
omp critical The code inside a CRITICAL region is executed by only one thread at a time. The order is not specified. This means that if a thread is currently executing inside a CRITICAL region and another
What is Multi Core Architecture?
What is Multi Core Architecture? When a processor has more than one core to execute all the necessary functions of a computer, it s processor is known to be a multi core architecture. In other words, a
Towards OpenMP Support in LLVM
Towards OpenMP Support in LLVM Alexey Bataev, Andrey Bokhanko, James Cownie Intel 1 Agenda What is the OpenMP * language? Who Can Benefit from the OpenMP language? OpenMP Language Support Early / Late
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators Sandra Wienke 1,2, Christian Terboven 1,2, James C. Beyer 3, Matthias S. Müller 1,2 1 IT Center, RWTH Aachen University 2 JARA-HPC, Aachen
Informatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
Hybrid Programming with MPI and OpenMP
Hybrid Programming with and OpenMP Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2015/2016 R. Rocha and F. Silva (DCC-FCUP) Programming
HPCC - Hrothgar Getting Started User Guide MPI Programming
HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...
Scalability evaluation of barrier algorithms for OpenMP
Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science
To copy all examples and exercises to your local scratch directory type: /g/public/training/openmp/setup.sh
OpenMP by Example To copy all examples and exercises to your local scratch directory type: /g/public/training/openmp/setup.sh To build one of the examples, type make (where is the
Practical Introduction to
1 Practical Introduction to http://tinyurl.com/cq-intro-openmp-20151006 By: Bart Oldeman, Calcul Québec McGill HPC [email protected], [email protected] Partners and Sponsors 2 3 Outline
OpenMP Application Program Interface
OpenMP Application Program Interface Version.1 July 0 Copyright 1-0 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP Architecture
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
OpenACC Basics Directive-based GPGPU Programming
OpenACC Basics Directive-based GPGPU Programming Sandra Wienke, M.Sc. [email protected] Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum (RZ) PPCES,
University of Amsterdam - SURFsara. High Performance Computing and Big Data Course
University of Amsterdam - SURFsara High Performance Computing and Big Data Course Workshop 7: OpenMP and MPI Assignments Clemens Grelck [email protected] Roy Bakker [email protected] Adam Belloum [email protected]
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
OpenMP* 4.0 for HPC in a Nutshell
OpenMP* 4.0 for HPC in a Nutshell Dr.-Ing. Michael Klemm Senior Application Engineer Software and Services Group ([email protected]) *Other brands and names are the property of their respective owners.
BLM 413E - Parallel Programming Lecture 3
BLM 413E - Parallel Programming Lecture 3 FSMVU Bilgisayar Mühendisliği Öğr. Gör. Musa AYDIN 14.10.2015 2015-2016 M.A. 1 Parallel Programming Models Parallel Programming Models Overview There are several
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
Introduction to Multicore Programming
Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Introduction to Multicore Programming CS 433 - CS 9624 1 /
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
High performance computing systems. Lab 1
High performance computing systems Lab 1 Dept. of Computer Architecture Faculty of ETI Gdansk University of Technology Paweł Czarnul For this exercise, study basic MPI functions such as: 1. for MPI management:
Lightning Introduction to MPI Programming
Lightning Introduction to MPI Programming May, 2015 What is MPI? Message Passing Interface A standard, not a product First published 1994, MPI-2 published 1997 De facto standard for distributed-memory
Getting OpenMP Up To Speed
1 Getting OpenMP Up To Speed Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline The
Introduction to Hybrid Programming
Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming
Parallel Programming with MPI on the Odyssey Cluster
Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: [email protected] FAS Research Computing Harvard University Objectives: To introduce you
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1
Programming the Intel Xeon Phi Coprocessor
Programming the Intel Xeon Phi Coprocessor Tim Cramer [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Motivation Many Integrated Core (MIC) Architecture Programming Models Native
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
CSC230 Getting Starting in C. Tyler Bletsch
CSC230 Getting Starting in C Tyler Bletsch What is C? The language of UNIX Procedural language (no classes) Low-level access to memory Easy to map to machine language Not much run-time stuff needed Surprisingly
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)
Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
An Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
OpenACC Programming on GPUs
OpenACC Programming on GPUs Directive-based GPGPU Programming Sandra Wienke, M.Sc. [email protected] Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum
Workshare Process of Thread Programming and MPI Model on Multicore Architecture
Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma
Scheduling Task Parallelism" on Multi-Socket Multicore Systems"
Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Software and the Concurrency Revolution
Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
Cloud-based OpenMP Parallelization Using a MapReduce Runtime. Rodolfo Wottrich, Rodolfo Azevedo and Guido Araujo University of Campinas
Cloud-based OpenMP Parallelization Using a MapReduce Runtime Rodolfo Wottrich, Rodolfo Azevedo and Guido Araujo University of Campinas 1 MPI_Init(NULL, NULL); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); MPI_Comm_rank(MPI_COMM_WORLD,
Chapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Gabriele Jost and Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 {gjost,hjin}@nas.nasa.gov
Introduction to OpenMP Programming. NERSC Staff
Introduction to OpenMP Programming NERSC Staff Agenda Basic informa,on An selec(ve introduc(on to the programming model. Direc(ves for work paralleliza(on and synchroniza(on. Some hints on usage Hands-
The programming language C. sws1 1
The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan
MAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
Session 2: MUST. Correctness Checking
Center for Information Services and High Performance Computing (ZIH) Session 2: MUST Correctness Checking Dr. Matthias S. Müller (RWTH Aachen University) Tobias Hilbrich (Technische Universität Dresden)
OpenACC Parallelization and Optimization of NAS Parallel Benchmarks
OpenACC Parallelization and Optimization of NAS Parallel Benchmarks Presented by Rengan Xu GTC 2014, S4340 03/26/2014 Rengan Xu, Xiaonan Tian, Sunita Chandrasekaran, Yonghong Yan, Barbara Chapman HPC Tools
Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C
Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva [email protected] Introduction Intel
UTS: An Unbalanced Tree Search Benchmark
UTS: An Unbalanced Tree Search Benchmark LCPC 2006 1 Coauthors Stephen Olivier, UNC Jun Huan, UNC/Kansas Jinze Liu, UNC Jan Prins, UNC James Dinan, OSU P. Sadayappan, OSU Chau-Wen Tseng, UMD Also, thanks
Case Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke [email protected] ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
Using the Intel Inspector XE
Using the Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) Race Condition Data Race: the typical OpenMP programming error, when: two or more threads access the same memory
reduction critical_section
A comparison of OpenMP and MPI for the parallel CFD test case Michael Resch, Bjíorn Sander and Isabel Loebich High Performance Computing Center Stuttgart èhlrsè Allmandring 3, D-755 Stuttgart Germany [email protected]
A Case Study - Scaling Legacy Code on Next Generation Platforms
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy
Application Performance Analysis Tools and Techniques
Mitglied der Helmholtz-Gemeinschaft Application Performance Analysis Tools and Techniques 2012-06-27 Christian Rössel Jülich Supercomputing Centre [email protected] EU-US HPC Summer School Dublin
Performance Analysis of a Hybrid MPI/OpenMP Application on Multi-core Clusters
Performance Analysis of a Hybrid MPI/OpenMP Application on Multi-core Clusters Martin J. Chorley a, David W. Walker a a School of Computer Science and Informatics, Cardiff University, Cardiff, UK Abstract
A Performance Monitoring Interface for OpenMP
A Performance Monitoring Interface for OpenMP Bernd Mohr, Allen D. Malony, Hans-Christian Hoppe, Frank Schlimbach, Grant Haab, Jay Hoeflinger, and Sanjiv Shah Research Centre Jülich, ZAM Jülich, Germany
Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)
Introduction Reading Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF 1 Programming Assignment Notes Assume that memory is limited don t replicate the board on all nodes Need to provide load
OpenACC Programming and Best Practices Guide
OpenACC Programming and Best Practices Guide June 2015 2015 openacc-standard.org. All Rights Reserved. Contents 1 Introduction 3 Writing Portable Code........................................... 3 What
P1 P2 P3. Home (p) 1. Diff (p) 2. Invalidation (p) 3. Page Request (p) 4. Page Response (p)
ËÓØÛÖ ØÖÙØ ËÖ ÅÑÓÖÝ ÓÚÖ ÎÖØÙÐ ÁÒØÖ ÖØØÙÖ ÁÑÔÐÑÒØØÓÒ Ò ÈÖÓÖÑÒ ÅÙÖÐÖÒ ÊÒÖÒ Ò ÄÚÙ ÁØÓ ÔÖØÑÒØ Ó ÓÑÔÙØÖ ËÒ ÊÙØÖ ÍÒÚÖ ØÝ È ØÛÝ ÆÂ ¼¹¼½ ÑÙÖÐÖ ØÓ ºÖÙØÖ ºÙ ØÖØ ÁÒ Ø ÔÔÖ Û Ö Ò ÑÔÐÑÒØØÓÒ Ó ÓØÛÖ ØÖÙØ ËÖ ÅÑÓÖÝ Ëŵ
LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI
Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling
Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 231 237 doi: 10.14794/ICAI.9.2014.1.231 Parallelization of video compressing
SYCL for OpenCL. Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March 2014. Copyright Khronos Group 2014 - Page 1
SYCL for OpenCL Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March 2014 Copyright Khronos Group 2014 - Page 1 Where is OpenCL today? OpenCL: supported by a very wide range of platforms
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
OpenCL for programming shared memory multicore CPUs
Akhtar Ali, Usman Dastgeer and Christoph Kessler. OpenCL on shared memory multicore CPUs. Proc. MULTIPROG-212 Workshop at HiPEAC-212, Paris, Jan. 212. OpenCL for programming shared memory multicore CPUs
Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber
Introduction to grid technologies, parallel and cloud computing Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber OUTLINES Grid Computing Parallel programming technologies (MPI- Open MP-Cuda )
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
How to Run Parallel Jobs Efficiently
How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2
