Parallel Computing. Shared memory parallel programming with OpenMP
|
|
|
- Ashlynn Whitehead
- 10 years ago
- Views:
Transcription
1 Parallel Computing Shared memory parallel programming with OpenMP Thorsten Grahs,
2 Table of contents Introduction Directives Scope of data Synchronization Thorsten Grahs Parallel Computing I SS 2015 Seite 2
3 OpenMP MP stands for Multi Processing De-facto standard Application Program Interface (API) for explicit shared memory parallelism. Extension to existing programming languages (C/C++/Fortran) Incremental parallelism (Parallelization of an existing serial program) Approach Workers who do the work in parallel (threads) cooperate through shared memory Memory accesses instead of explicit messages local model parallelization of the serial code Allows incremental parallelization Thorsten Grahs Parallel Computing I SS 2015 Seite 3
4 OpenMP Goals Standardization To establish a standard between various competing shared memory platforms Lean and Mean Create a simple and limited instruction set for programming shared memory computers. Ease of Use Enable an incremental parallelism of serial programs (Unlike the all-or-nothing approach of MPI) Portability Support of all common programming languages Open forum for users and developers Thorsten Grahs Parallel Computing I SS 2015 Seite 4
5 OpenMP History Open specifications for Multi Processing maintained by the OpenMP Architecture Review Board Supported Commercial Compiler: IBM, Portland, Intel Open Source: GNU gcc OpenMP 4.0 from gcc 4.9 OpenMP 4.0 released July 2013 Development Thorsten Grahs Parallel Computing I SS 2015 Seite 5
6 OpenMP Execution model Thread-based parallelism Compiler Directive Based Explicit Parallelism Fork-Join Model (Focus: parallelism of loops) Thorsten Grahs Parallel Computing I SS 2015 Seite 6
7 OpenMP fork/join for Distributes iterations of the loop on the team(threads) data parallelism. sections Divides the work into sections/work packages, each of which is executed by a thread functional parallelism single serial execution for a part of the program Thorsten Grahs Parallel Computing I SS 2015 Seite 7
8 OpenMP Memory model All threads have access to the same globally shared memory Data in private memory is only accessible by the thread owning this memory No other thread sees the change(s) Data transfer is through shared memory and is completely transparent to the application Thorsten Grahs Parallel Computing I SS 2015 Seite 8
9 OpenMP Pros/Cons Pros simple parallelism Higher abstraction than threads Sequential version can still be used Standard for shared memory Cons Only shared memory Limited use (loop type) Thorsten Grahs Parallel Computing I SS 2015 Seite 9
10 OpenMP Main components Compiler Directives and Clauses appear as comments, executed when the appropriate OpenMP flag is specified Parallel construct Work-sharing constructs Synchronization constructs Data Attribute clauses C/C++: #pragma omp directive-name [clause[clause]...] Fortran free form:!$omp directive-name [clause[clause]...] Thorsten Grahs Parallel Computing I SS 2015 Seite 10
11 Compiling See: for the full list Thorsten Grahs Parallel Computing I SS 2015 Seite 11
12 Environment Variables OMP_NUM_THREADS: sets number of threads OMP_STACKSIZE size [B K M G] : size of the stack for threads OMP_DYNAMIC TRUE FALSE: dynamic thread adjustment OMP_SCHEDULE schedule[,chunk] : iteration scheduling scheme OMP_PROC_BIND TRUE FALSE: bound threads to processors OMP_NESTED TRUE FALSE: nested parallelism To set them In csh/tcsh: setenv OMP_NUM_THREADS 4 In sh/bash: export OMP_NUM_THREADS= Thorsten Grahs Parallel Computing I SS 2015 Seite 12
13 Basic functions Query/specify some specific feature or setting omp_get_thread_num(): get thread ID (0 for master thread) omp_get_num_threads(): get number of threads in the team omp_set_num_threads(int n): set number of threads Allow you to manage fine-grained access (lock) omp_init_lock(lock_var): initializes the OpenMP lock variable lock_var of type omp_lock_t Timing functions omp_get_wtime(): returns elapsed wallclock time omp_get_wtick(): returns timer precision Functions interface: C/C++: #include <omp.h> Fortran: use omp_lib (or include omp_lib.h ) Thorsten Grahs Parallel Computing I SS 2015 Seite 13
14 A first example 4 threads Thorsten Grahs Parallel Computing I SS 2015 Seite 14
15 Hello World hello.c 1 # ifdef _OPENMP 2 # include <omp.h> 3 # endif 4 # include <stdio.h> 5 int main ( void ){ 6 int i ; 7 # pragma omp parallel for 8 for ( i = 0; i < 8; ++ i ){ 9 int id = omp_get_thread_num(); 10 printf ("Hello World from thread % d \n ", id ); 11 if ( id ==0) 12 printf("there are % d threads\n",omp_get_num_threads()); 13 } 14 return 0; 15 } Thorsten Grahs Parallel Computing I SS 2015 Seite 15
16 Proceeding Starting a program, a single process is started on the CPU. This process corresponds to the master thread Master thread can now create & manage addit. threads The management of threads (creation, managing and termination) is done by OpenMP without user interaction # pragma omp parallel for instructs that the following tt for loop is distributed to the available threads omp_get_thread_num() indicates the current thread number omp_get_num_threads() indicates the total number of threads Thorsten Grahs Parallel Computing I SS 2015 Seite 16
17 Compile and Run Compiling gcc -fopenmp hello.c output 1 th@riemann:~$./a.out 2 Hello World from thread 4 3 Hello World from thread 6 4 Hello World from thread 3 5 Hello World from thread 1 6 Hello World from thread 5 7 Hello World from thread 2 8 Hello World from thread 7 9 Hello World from thread 0 10 There are 24 threads Thorsten Grahs Parallel Computing I SS 2015 Seite 17
18 OpenMP pthread translation A sample OpenMP program with its Pthreads translation that might be performed by an OpenMP compiler Thorsten Grahs Parallel Computing I SS 2015 Seite 18
19 Compiler directives OpenMP is defined mainly on compiler directives Directive format in C/C++ # pragma omp construct [ clauses...] constructs are functionalities of the language clauses are parameters to those functionalities construct + clauses = directive C-Compiler not enabled for OpenMP ignore unknown directives 1 $ gcc - Wall hello.c # ( gcc version ( < 4.2)) 2 hello.c : In function main : 3 hello.c :12: warning : ignoring # pragma omp parallel the program can be compiled by each (even not OpenMP enabled) compiler Thorsten Grahs Parallel Computing I SS 2015 Seite 19
20 Compiler directives II Conditional compilation C/C++: Macro _OPENMP is defined: 1 # ifdef _OPENMP 2 /* Openmp specific code, e.g. */ 3 number = omp_get_thread_num () ; 4 # endif omp parallel Creates additional threads, i.e. work is executed by all threads. Original thread (master thread) has thread ID 0. # pragma omp parallel [ clauses ] /* structured block ( no gotos...) */ Thorsten Grahs Parallel Computing I SS 2015 Seite 20
21 Work-sharing between threads I Loops The work is divided among the threads. E.g. for two threads Thread_1: Loop elements 0,... (N/2) 1 Thread_2: Loop elements (N/2),... N 1 1 # pragma omp parallel [ clauses...] 2 # pragma omp for [ clauses...] 3 for ( i =0; i < N ; i ++) 4 a [ i ]= i * i ; This can be summarized (omp parallel for): 1 # pragma omp parallel for [ clauses...] 2 for ( i =0; i < N ; i ++) 3 a [ i ]= i * i ; Thorsten Grahs Parallel Computing I SS 2015 Seite 21
22 Work-sharing between threads II parallel The work is distributed. Each thread processes a section: 1 # pragma omp parallel 2 # pragma omp sections 3 { 4 # pragma omp section [ clauses...] 5 [... section A runs parallel to B...] 6 # pragma omp section [ clauses...] 7 [... section B runs parallel to A...] 8 } Again one can combine: 1 # pragma omp parallel sections [ clauses...] Thorsten Grahs Parallel Computing I SS 2015 Seite 22
23 Data sharing attribute clauses Scope of data Data clauses that are specified within OpenMP directives controls how variables are handled/shared between threads shared() the data within a parallel region is shared, which means visible and accessible by all threads simultaneously private() The data within a parallel region is private to each thread, which means each thread will have a local copy and use it as a temporary variable. A private variable is not initialized and the value is not maintained for use outside the parallel region Thorsten Grahs Parallel Computing I SS 2015 Seite 23
24 Scope of data Values of private variables are undefined during entry and leaving of loops. The following keywords allow to initialize/finalize variables: default(shared private none) Specifies the default value none Each variable has to be declared explicitly as shared() or private() firstprivate() Like private() but all copies are initialized with the values the variables have before the parallel loop/region. lastprivate() Variable keeps the last value from within the loop after leaving the section Thorsten Grahs Parallel Computing I SS 2015 Seite 24
25 Example private Initialization 1 #include <stdio.h> 2 #include <omp.h> 3 int main(int argc, char* argv[]) 4 { 5 int t=2; 6 int result = 0; 7 int A[100], i=0, j=0; 8 omp_set_num_threads(t); // Explicitly setting of 2 threads 9 for(i=0; i<100; i++){ 10 A[i] = i; 11 result += A[i]; 12 } 13 printf("array-sum BEFORE calculation: %d\n", result); Thorsten Grahs Parallel Computing I SS 2015 Seite 25
26 Example private Parallel section 1 i=0; 2 int T[2]; 3 T[0] = 0; 4 T[1] = 0; 5 #pragma omp parallel 6 { 7 #pragma omp for 8 for(i=0; i<100; i++){ 9 for(j=0; j<10; j++){ 10 A[i] = A[i] * 2; 11 T[omp_get_thread_num()]++; 12 } 13 } 14 } Thorsten Grahs Parallel Computing I SS 2015 Seite 26
27 Example private Output/print 1 i=0; result=0; 2 for(i=0; i<100; i++) 3 result += A[i]; 4 printf("array-sum AFTER calculation: %d\n", result); 5 printf("thread 1: %d calculations\n", T[0]); 6 printf("thread 2: %d calculations\n", T[1]); 7 return 0; 8 } Thorsten Grahs Parallel Computing I SS 2015 Seite 27
28 Example Calc. without private(j) Output without private declaration 1 Array-Sum BEFORE calculation: Array-Sum AFTER calculation: Thread 1: 450 calculations 4 Thread 2: 485 calculations j is automatically initialized as shared Variable j is shared by both threads. Reason for wrong result Thorsten Grahs Parallel Computing I SS 2015 Seite 28
29 Example Modifying parallel section Parallel section with private(j) 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) 4 for(i=0; i<100; i++){ 5 for(j=0; j<10; j++){ 6 A[i] = A[i] * 2; 7 T[omp_get_thread_num()]++; 8 } 9 } 10 } Thorsten Grahs Parallel Computing I SS 2015 Seite 29
30 Example Calculation with private(j) Output with private declaration 1 Array-Sum BEFORE calculation: Array-Sum AFTER calculation: Thread 1: 500 calculations 4 Thread 2: 500 calculations j is now managed individually for each thread, i.e. privately declared Each thread has its own variable, so do not get confused in calculating Thorsten Grahs Parallel Computing I SS 2015 Seite 30
31 Critical section Critical section Used (similar to the examples in pthreads) to avoid ill-conditioned runtime behaviour Defined in OpenMP as #pragma opm critical... critical section #pragma omp end critical Could be used to resolve a race condition Let only one thread from the team executes the condition Thorsten Grahs Parallel Computing I SS 2015 Seite 31
32 Example critical Parallel section 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) 4 for(i=0; i<100; i++){ 5 for(j=0; j<10; j++){ 6 A[i] = A[i] * 2; 7 T[omp_get_thread_num()]++; 8 NumOfIters++; /*added*/ 9 } 10 } 11 } Thorsten Grahs Parallel Computing I SS 2015 Seite 32
33 Example critical Output without critical declaration 1 Array-Sum BEFORE calculation: Array-Sum AFTER calculation: Thread 1: 500 calculations 4 Thread 2: 500 calculations 5 NumOfIters: 693 NumOfIters provides wrong result Threads hinder each other by incrementation private declaration is no solution, since all threads should be counted Thorsten Grahs Parallel Computing I SS 2015 Seite 33
34 Example critical Parallel section with critical declaration 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) 4 for(i=0; i<100; i++){ 5 for(j=0; j<10; j++){ 6 A[i] = A[i] * 2; 7 T[omp_get_thread_num()]++; 8 #pragma omp critical /*added critical*/ 9 NumOfIters++; 10 #pragma omp end critical 11 } 12 } 13 } Thorsten Grahs Parallel Computing I SS 2015 Seite 34
35 Example critical Output with critical declaration 1 Array-Sum BEFORE calculation: Array-Sum AFTER calculation: Thread 1: 500 calculations 4 Thread 2: 500 calculations 5 NumOfIters: 1000 NumOfIters is now executed serial Threads don t hinder each other by incrementation NumOfIters is increased by each thread one after another Thorsten Grahs Parallel Computing I SS 2015 Seite 35
36 Reduction Reduction of data Critical sections could be a bottle neck of a calculation. In our example, there is another way to measure the number of iterations safely: the reduction reduction A reduction operator is a binary operation (such as addition or multiplication). A reduction is a computation that repeatedly applies the same reduction operator to a sequence of operands in order to get a single result. All of the intermediate results of the operation should be stored in the same variable: the reduction variable Thorsten Grahs Parallel Computing I SS 2015 Seite 36
37 Reduction Usage #pragma omp for private(j) reduction(op: var) The reduction-clause identifies specific, commonly used variables. We have to give the reduction operator op and the reduction variable var The operator op could one of this: +, *, -, /, &, ^,, &&, In the variable var multiple threads can accumulate values Collects contributions by different threads like reduction in MPI Thorsten Grahs Parallel Computing I SS 2015 Seite 37
38 Example reduction Parallel section with reduction() 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) reduction(+: NumOfIters) 4 5 for(i=0; i<100; i++){ 6 for(j=0; j<10; j++){ 7 A[i] = A[i] * 2; 8 T[omp_get_thread_num()]++; 9 NumOfIters++; 10 } 11 } 12 } Thorsten Grahs Parallel Computing I SS 2015 Seite 38
39 Example reduction Output with reduction() 1 Array-Sum BEFORE calculation: Array-Sum BEFORE calculation: Thread 1: 500 calculations 4 Thread 2: 500 calculations 5 NumOfIters: 1000 Reduction is accomplished by instruction reduction after loop-parallelism Requires arithmetic operation and the reduction variable, separated by colons: reduction(+ : NumOfIters) Thorsten Grahs Parallel Computing I SS 2015 Seite 39
40 Conditional parallelism if clauses It may be desirable to parallelise loops only when the effort is well justified. F.i. to assure that using multiple threads the run time is lesser compared to serial execution Properties Allows to decide at runtime whether a loop is executed in parallel (fork-join) or is executed serial Thorsten Grahs Parallel Computing I SS 2015 Seite 40
41 Example if clauses Parallel section with if 1 #pragma omp parallel 2 { 3 #pragma omp for private(j) reduction(+:numofiters) if(n>500) 4 for(i=0; i<n; i++){ 5 for(j=0; j<10; j++){ 6 A[i] = A[i] * 2; 7 T[omp_get_thread_num()]++; 8 NumOfIters++; 9 } 10 } 11 } Thorsten Grahs Parallel Computing I SS 2015 Seite 41
42 Synchronization In a parallel region threads proceed asynchronously. Sometimes coordination is necessary OpenMP provides several ways to coordinate thread execution. An important component is the synchronization of threads An application area we have already met: Reduction in case of the race condition in the critical section. Here we had implicit synchronization so that all threads execute the critical section sequentially The behaviour we find also in the atomic synchronisation Thorsten Grahs Parallel Computing I SS 2015 Seite 42
43 barrier construct At the barrier all threads wait and continue only when all threads have reached the barrier The barrier guarantees that ALL the code above has been executed We have Explicit barriers #pragma omp barrier Implicit barrier At the end of the worksharing constructs (i.e. for/do, sections, single constructs) It can be removed with the clause nowait Thorsten Grahs Parallel Computing I SS 2015 Seite 43
44 barrier-synchronisation barrier Threads are waiting until all have reached a common point. 1 # pragma omp 2 { 3 # pragma omp for nowait 4 for (i=0;i< N;i++) a[i] = b[i] + c[i]; 5 # pragma omp barrier 6 # pragma omp for 7 for (i=0;i<n;i++) d[i] = a[i] + b[i]; 8 } Thorsten Grahs Parallel Computing I SS 2015 Seite 44
45 atomic construct atomic - a similar construct #pragma omp atomic [clause] <statement> The atomic construct applies only to statements that update the value of a variable Ensures that no other thread updates the variable between reading and writing It is a special lightweight form of a critical Only read/write are serialized, and only if two or more threads access the same memory address Thorsten Grahs Parallel Computing I SS 2015 Seite 45
46 atomic-synchronisation Behaviour is related to critical Only permitted for specific operations i.e. (x++, ++x, x-, -x) 1 #pragma omp atomic 2 NumOfIters++; Thorsten Grahs Parallel Computing I SS 2015 Seite 46
47 master/single-synchronisation #pragma omp master [clause] [ Code which should be executed once by Master Thread] Only master thread to execute a region (e.g. I/O) #pragma omp single [clause] [ Code which should be executed once] Only one thread execute a region (e.g. I/O) This is not necessarily the master thread! Thorsten Grahs Parallel Computing I SS 2015 Seite 47
48 Resume OpenMP is the De-facto standard for shared memory parallelism Easy to use Code can be fast parallalized. Supported for all commonly used compilers Drawbacks If your problem is becomes big enough you has to use distributed memory approaches Not so ellaborated control over the execution order like in message passing Thorsten Grahs Parallel Computing I SS 2015 Seite 48
49 Further readings OpenMP tutorials Blaise Barney, Lawrence Livermore National Laboratory Guide into OpenMP Easy multithreading programming for C++ By Joel Yliluoma Introduction to High Performance Computing for Scientists and Engineers G. Hager & G. Wellein Ch. 6 Shared memory parallel programming with OpenMP Ch. 7 Efficient OpenMP programming Chapman & Hall, Thorsten Grahs Parallel Computing I SS 2015 Seite 49
Parallel Computing. Parallel shared memory computing with OpenMP
Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State
OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
OpenMP Application Program Interface
OpenMP Application Program Interface Version.1 July 0 Copyright 1-0 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP Architecture
OpenMP C and C++ Application Program Interface
OpenMP C and C++ Application Program Interface Version.0 March 00 Copyright 1-00 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP
To copy all examples and exercises to your local scratch directory type: /g/public/training/openmp/setup.sh
OpenMP by Example To copy all examples and exercises to your local scratch directory type: /g/public/training/openmp/setup.sh To build one of the examples, type make (where is the
High Performance Computing
High Performance Computing Oliver Rheinbach [email protected] http://www.mathe.tu-freiberg.de/nmo/ Vorlesung Introduction to High Performance Computing Hörergruppen Woche Tag Zeit Raum
Objectives. Overview of OpenMP. Structured blocks. Variable scope, work-sharing. Scheduling, synchronization
OpenMP Objectives Overview of OpenMP Structured blocks Variable scope, work-sharing Scheduling, synchronization 1 Overview of OpenMP OpenMP is a collection of compiler directives and library functions
An Introduction to Parallel Programming with OpenMP
An Introduction to Parallel Programming with OpenMP by Alina Kiessling E U N I V E R S I H T T Y O H F G R E D I N B U A Pedagogical Seminar April 2009 ii Contents 1 Parallel Programming with OpenMP 1
Practical Introduction to
1 Practical Introduction to http://tinyurl.com/cq-intro-openmp-20151006 By: Bart Oldeman, Calcul Québec McGill HPC [email protected], [email protected] Partners and Sponsors 2 3 Outline
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
University of Amsterdam - SURFsara. High Performance Computing and Big Data Course
University of Amsterdam - SURFsara High Performance Computing and Big Data Course Workshop 7: OpenMP and MPI Assignments Clemens Grelck [email protected] Roy Bakker [email protected] Adam Belloum [email protected]
#pragma omp critical x = x + 1; !$OMP CRITICAL X = X + 1!$OMP END CRITICAL. (Very inefficiant) example using critical instead of reduction:
omp critical The code inside a CRITICAL region is executed by only one thread at a time. The order is not specified. This means that if a thread is currently executing inside a CRITICAL region and another
What is Multi Core Architecture?
What is Multi Core Architecture? When a processor has more than one core to execute all the necessary functions of a computer, it s processor is known to be a multi core architecture. In other words, a
Informatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
Getting OpenMP Up To Speed
1 Getting OpenMP Up To Speed Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline The
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
Introduction to OpenMP Programming. NERSC Staff
Introduction to OpenMP Programming NERSC Staff Agenda Basic informa,on An selec(ve introduc(on to the programming model. Direc(ves for work paralleliza(on and synchroniza(on. Some hints on usage Hands-
OpenACC Basics Directive-based GPGPU Programming
OpenACC Basics Directive-based GPGPU Programming Sandra Wienke, M.Sc. [email protected] Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum (RZ) PPCES,
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators Sandra Wienke 1,2, Christian Terboven 1,2, James C. Beyer 3, Matthias S. Müller 1,2 1 IT Center, RWTH Aachen University 2 JARA-HPC, Aachen
OpenMP* 4.0 for HPC in a Nutshell
OpenMP* 4.0 for HPC in a Nutshell Dr.-Ing. Michael Klemm Senior Application Engineer Software and Services Group ([email protected]) *Other brands and names are the property of their respective owners.
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
Towards OpenMP Support in LLVM
Towards OpenMP Support in LLVM Alexey Bataev, Andrey Bokhanko, James Cownie Intel 1 Agenda What is the OpenMP * language? Who Can Benefit from the OpenMP language? OpenMP Language Support Early / Late
Scalability evaluation of barrier algorithms for OpenMP
Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
A Performance Monitoring Interface for OpenMP
A Performance Monitoring Interface for OpenMP Bernd Mohr, Allen D. Malony, Hans-Christian Hoppe, Frank Schlimbach, Grant Haab, Jay Hoeflinger, and Sanjiv Shah Research Centre Jülich, ZAM Jülich, Germany
Using the Intel Inspector XE
Using the Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) Race Condition Data Race: the typical OpenMP programming error, when: two or more threads access the same memory
Hybrid Programming with MPI and OpenMP
Hybrid Programming with and OpenMP Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2015/2016 R. Rocha and F. Silva (DCC-FCUP) Programming
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
Workshare Process of Thread Programming and MPI Model on Multicore Architecture
Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma
Programming the Intel Xeon Phi Coprocessor
Programming the Intel Xeon Phi Coprocessor Tim Cramer [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Motivation Many Integrated Core (MIC) Architecture Programming Models Native
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
Introduction to Hybrid Programming
Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
Why Choose C/C++ as the programming language? Parallel Programming in C/C++ - OpenMP versus MPI
Parallel Programming (Multi/cross-platform) Why Choose C/C++ as the programming language? Compiling C/C++ on Windows (for free) Compiling C/C++ on other platforms for free is not an issue Parallel Programming
A Case Study - Scaling Legacy Code on Next Generation Platforms
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C
Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection
COSCO 2015 Heterogeneous Computing Programming
COSCO 2015 Heterogeneous Computing Programming Michael Meyer, Shunsuke Ishikuro Supporters: Kazuaki Sasamoto, Ryunosuke Murakami July 24th, 2015 Heterogeneous Computing Programming 1. Overview 2. Methodology
SYCL for OpenCL. Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March 2014. Copyright Khronos Group 2014 - Page 1
SYCL for OpenCL Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March 2014 Copyright Khronos Group 2014 - Page 1 Where is OpenCL today? OpenCL: supported by a very wide range of platforms
WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed
WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between
PROACTIVE BOTTLENECK PERFORMANCE ANALYSIS IN PARALLEL COMPUTING USING OPENMP
PROACTIVE BOTTLENECK PERFORMANCE ANALYSIS IN PARALLEL COMPUTING USING OPENMP Vibha Rajput Computer Science and Engineering M.Tech.2 nd Sem (CSE) Indraprastha Engineering College. M. T.U Noida, U.P., India
Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)
Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications
Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 231 237 doi: 10.14794/ICAI.9.2014.1.231 Parallelization of video compressing
OMPT and OMPD: OpenMP Tools Application Programming Interfaces for Performance Analysis and Debugging
OMPT and OMPD: OpenMP Tools Application Programming Interfaces for Performance Analysis and Debugging Alexandre Eichenberger, John Mellor-Crummey, Martin Schulz, Nawal Copty, John DelSignore, Robert Dietrich,
Middleware. Peter Marwedel TU Dortmund, Informatik 12 Germany. technische universität dortmund. fakultät für informatik informatik 12
Universität Dortmund 12 Middleware Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2010 年 11 月 26 日 These slides use Microsoft clip arts. Microsoft copyright
The programming language C. sws1 1
The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan
Introduction to Multicore Programming
Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Introduction to Multicore Programming CS 433 - CS 9624 1 /
MAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
OAMulator. Online One Address Machine emulator and OAMPL compiler. http://myspiders.biz.uiowa.edu/~fil/oam/
OAMulator Online One Address Machine emulator and OAMPL compiler http://myspiders.biz.uiowa.edu/~fil/oam/ OAMulator educational goals OAM emulator concepts Von Neumann architecture Registers, ALU, controller
XMOS Programming Guide
XMOS Programming Guide Document Number: Publication Date: 2014/10/9 XMOS 2014, All Rights Reserved. XMOS Programming Guide 2/108 SYNOPSIS This document provides a consolidated guide on how to program XMOS
Real Time Programming: Concepts
Real Time Programming: Concepts Radek Pelánek Plan at first we will study basic concepts related to real time programming then we will have a look at specific programming languages and study how they realize
Sources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I)
SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I) Shan He School for Computational Science University of Birmingham Module 06-19321: SSC Outline Outline of Topics
OpenCL for programming shared memory multicore CPUs
Akhtar Ali, Usman Dastgeer and Christoph Kessler. OpenCL on shared memory multicore CPUs. Proc. MULTIPROG-212 Workshop at HiPEAC-212, Paris, Jan. 212. OpenCL for programming shared memory multicore CPUs
OMPT: OpenMP Tools Application Programming Interfaces for Performance Analysis
OMPT: OpenMP Tools Application Programming Interfaces for Performance Analysis Alexandre Eichenberger, John Mellor-Crummey, Martin Schulz, Michael Wong, Nawal Copty, John DelSignore, Robert Dietrich, Xu
How To Write A Multi Threaded Software On A Single Core (Or Multi Threaded) System
Multicore Systems Challenges for the Real-Time Software Developer Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany [email protected] Abstract Multicore systems have become
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
The OpenACC Application Programming Interface
The OpenACC Application Programming Interface Version 1.0 November, 2011 Contents 1. Introduction... 4 1.1 Scope... 4 1.2 Execution Model... 4 1.3 Memory Model... 5 1.4 Organization of this document...
Scheduling Task Parallelism" on Multi-Socket Multicore Systems"
Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction
CSC230 Getting Starting in C. Tyler Bletsch
CSC230 Getting Starting in C Tyler Bletsch What is C? The language of UNIX Procedural language (no classes) Low-level access to memory Easy to map to machine language Not much run-time stuff needed Surprisingly
Facing the Challenges for Real-Time Software Development on Multi-Cores
Facing the Challenges for Real-Time Software Development on Multi-Cores Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany [email protected] Abstract Multicore systems introduce
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva [email protected] Introduction Intel
Software and the Concurrency Revolution
Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )
Introduction. What is an Operating System?
Introduction What is an Operating System? 1 What is an Operating System? 2 Why is an Operating System Needed? 3 How Did They Develop? Historical Approach Affect of Architecture 4 Efficient Utilization
Fundamentals of Programming
Fundamentals of Programming Introduction to the C language Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa February 29, 2012 G. Lipari (Scuola Superiore Sant Anna) The C language
OpenACC Programming on GPUs
OpenACC Programming on GPUs Directive-based GPGPU Programming Sandra Wienke, M.Sc. [email protected] Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum
Performance Analysis Tools For Parallel Java Applications on Shared-memory Systems
Performance Analysis Tools For Parallel Java Applications on Shared-memory Systems Jordi Guitart, Jordi Torres, Eduard Ayguadé European Center for Parallelism of Barcelona (CEPBA) Computer Architecture
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification
Introduction Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Advanced Topics in Software Engineering 1 Concurrent Programs Characterized by
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
The Double-layer Master-Slave Model : A Hybrid Approach to Parallel Programming for Multicore Clusters
The Double-layer Master-Slave Model : A Hybrid Approach to Parallel Programming for Multicore Clusters User s Manual for the HPCVL DMSM Library Gang Liu and Hartmut L. Schmider High Performance Computing
BLM 413E - Parallel Programming Lecture 3
BLM 413E - Parallel Programming Lecture 3 FSMVU Bilgisayar Mühendisliği Öğr. Gör. Musa AYDIN 14.10.2015 2015-2016 M.A. 1 Parallel Programming Models Parallel Programming Models Overview There are several
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
An Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
Library (versus Language) Based Parallelism in Factoring: Experiments in MPI. Dr. Michael Alexander Dr. Sonja Sewera.
Library (versus Language) Based Parallelism in Factoring: Experiments in MPI Dr. Michael Alexander Dr. Sonja Sewera Talk 2007-10-19 Slide 1 of 20 Primes Definitions Prime: A whole number n is a prime number
Leak Check Version 2.1 for Linux TM
Leak Check Version 2.1 for Linux TM User s Guide Including Leak Analyzer For x86 Servers Document Number DLC20-L-021-1 Copyright 2003-2009 Dynamic Memory Solutions LLC www.dynamic-memory.com Notices Information
High Performance Computing
High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.
