Practical Introduction to

Size: px
Start display at page:

Download "Practical Introduction to"

Transcription

1 1 Practical Introduction to By: Bart Oldeman, Calcul Québec McGill HPC

2 Partners and Sponsors 2

3 3 Outline of the workshop Theoretical / practical introduction Parallelizing your serial code What is OpenMP? Why do we need it? How do we run OpenMP codes (on the Guillimin cluster)? How to program with OpenMP? Program structure Basic functions Examples

4 4 Outline of the workshop Practical exercises on Guillimin Login, setup environment, launch OpenMP code Analyzing and running examples Modifying and tuning OpenMP codes

5 5 Exercise 1: Log in to Guillimin, setting up the environment 1) Log in to Guillimin: ssh 2) Check for loaded software modules: $ module list 3) See all available modules: $ module av 4) Load necessary modules: $ module add ifort icc 5) Check loaded modules again

6 Parallelizing your serial code Models for parallel computing (as an ordinary user sees it...) Implicit Parallelization minimum work for you Threaded libraries (MKL, ACML, GOTO, etc...) Compiler directives (OpenMP) Good for desktops and shared memory machines Explicit Parallelization work is required! You tell what should be done on what CPU Low-level option for shared memory machines: POSIX Threads (pthreads) Solution for distributed clusters (MPI: shared nothing!) 6

7 7 OpenMP Shared Memory API Open Multi-Processing: An Application Program Interface for multi-threaded programs in a shared-memory environment. Consists of Compiler directives Runtime library routines Environment variables Allows for relatively simple incremental parallelization. Not distributed, but can be combined with MPI (hybrid: see Advanced MPI workshop).

8 8 Shared memory approach Thread 0 Private memory Shared memory Thread 1 Private memory Most memory is shared by all threads. Each thread also has some private memory: variables explicitly declared private, local variables in functions and subroutines.

9 OpenMP: fork/join model Master Thread Serial Region Parallel Region Serial Region Parallel Region Serial Region Worker Threads Fork Join Fork Join Master + Workers = Team Implementations use thread pools so worker threads sleep from join to fork. 9

10 What is OpenMP for a user? OpenMP is NOT a language! OpenMP is NOT a compiler or specific product OpenMP is a de-facto industry standard, a specification for an Application Program Interface (API). You use its directives, routines, and environment variables. You compile and link your code with specific flags. History: version 1.0 (1997), 2.5 (2005), 3.0 (2008), 3.1 (2011), 4.0 (2013). Different implementations : GCC (4.2+), Intel, PGI, Visual C++, Solaris Studio, CLang (3.7+),... 10

11 11 Basic features of OpenMP program Include basic definitions (#include <omp.h>, INCLUDE omp lib.h, or USE omp lib). Parallel region declared by a directive of the form #pragma omp parallel (C) or!$omp PARALLEL (Fortran), declaring which variables are private. Optional: code only compiled for OpenMP: use OPENMP preprocessor symbol (C) or!$ prefix (Fortran).

12 12 Example: Hello from N cores Fortran PROGRAM hello!$ USE omp_lib IMPLICIT NONE INTEGER rank, size rank = 0 size = 1!$OMP PARALLEL PRIVATE(rank, size)!$ size = omp_get_num_threads()!$ rank = omp_get_thread_num() WRITE(*,*) Hello from processor,& rank, of, size!$omp END PARALLEL END PROGRAM hello C #include <stdio.h> #ifdef _OPENMP #include <omp.h> #endif int main (int argc, char * argv[]) { int rank = 0, size = 1; #ifdef _OPENMP #pragma omp parallel private(rank, size) #endif { #ifdef _OPENMP rank = omp_get_thread_num(); size = omp_get_num_threads(); #endif printf("hello from processor %d" " of %d\n", rank, size ); return 0;

13 13 POSIX Threads Hello from N cores // pthreads.c #include <stdio.h> #include <pthread.h> #define SIZE 4 void *hello(void *arg) { printf("hello from processor %d of %d\n", *(int *)arg, SIZE); return NULL; int main(int argc, char* argv[]) { int i, p[size]; pthread_t threads[size]; for (i = 1; i < SIZE; i++) { /* Fork threads */ p[i] = i; pthread_create(&threads[i], NULL, hello, &p[i]); p[0] = 0; hello(&p[0]); /* thread 0 greets as well */ for (i = 1; i < SIZE; i++) /* Join threads. */ pthread_join(threads[i], NULL); return 0;

14 14 Compiling your OpenMP code NOT defined by the standard A special compilation flag must be used. On the Guillimin cluster: module add gcc gcc -fopenmp hello.c -o hello gfortran -fopenmp hello.f90 -o hello module add ifort icc icc -openmp hello.c -o hello ifort -openmp hello.f90 -o hello module add pgi pgcc -mp hello.c -o hello pgfortran -mp hello.f90 -o hello

15 15 Running your OpenMP code Important: environment variable OMP NUM THREADS. export OMP NUM THREADS=4./hello Hello from processor 2 of 4 Hello from processor 0 of 4 Hello from processor 3 of 4 Hello from processor 1 of 4 unset OMP NUM THREADS pgcc -mp hello.c -o hello./hello Hello from processor 0 of 1 gcc -fopenmp hello.c -o hello./hello Hello from processor 3 of 8 (...)

16 16 Running your OpenMP code On your laptop or desktop, just compile and run your code as above. On Guillimin cluster, use batch system to submit non-trivial OpenMP jobs! Example: hello.pbs: #!/bin/bash #PBS -l nodes=1:ppn=2 #PBS -l walltime=00:05:00 #PBS -V #PBS -N hello cd $PBS_O_WORKDIR export OMP_NUM_THREADS=2./hello > hello.out Submit your job: $ qsub hello.pbs

17 17 Exercise 2: Hello, compilation 1) Copy all files to your home directory: $ cp -a /software/workshop/cq-formation-openmp/* / 2) Compile your code: $ ifort -openmp hello.f90 -o hello $ icc -openmp hello.c -o hello

18 18 Exercise 2: Hello, job submission 3) View the file hello.pbs : #!/bin/bash #PBS -l nodes=1:ppn=2 #PBS -l walltime=00:05:00 #PBS -V #PBS -N hello cd $PBS_O_WORKDIR export OMP_NUM_THREADS=2./hello > hello.out

19 19 Exercise 2: Hello, job submission 4) Submit your job: $ qsub hello.pbs 5) Check the job status: $ qstat -u $USER $ showq -u $USER 6) Check the output (hello.out)

20 Exercise 2: Hello, compile and run Alternatively, using interactive qsub, or on your own Mac/Linux/Cygwin/MSYS computer: 1) Interactive login: $ qsub -I -l nodes=1:ppn=2,walltime=7:00:00 or create and then copy all files to a directory: yourlaptop> git clone -b mcgill \ cd cq-formation-intro-openmp 2) Compile your code: > gfortran -fopenmp hello.f90 -o hello > gcc -fopenmp hello.c -o hello 3) Run your code: > # can use any value here; default: number of cores > export OMP NUM THREADS=2 >./hello 20

21 OpenMP directives Format: sentinel directive [clause,] where sentinel is #pragma omp or!$omp. Examples: #pragma omp parallel (C),!$OMP PARALLEL,!$OMP END PARALLEL (Fortran): Parallel region construct. #pragma omp for: A workshare construct that makes a loop parallel (!$OMP DO in Fortran). #pragma omp parallel for: A combined construct: defines a parallel region that only contains the loop. #pragma omp barrier: A synchronization directive: all threads wait for each other here. 21

22 OpenMP clauses Examples: Data scope: private, shared, and default.!$omp parallel private(i) shared(x) The variable i is private to the thread but the variable x is shared with all other threads. Default: all variables shared except loop variables (C: outer, Fortran:all), and variables declared inside block.!$omp parallel default(shared) private(i) All variables are shared except i.!$omp parallel default(none) private(i) Using default(none) requires listing each variable or else the compiler complains (helps debugging!). 22

23 OpenMP important library routines int omp get max threads(void); Get maximum number of threads used here. void omp set num threads(int); Set number of threads for next parallel region. int omp get thread num(void); Get current thread number in parallel region. int omp get num threads(void); Get number of threads in parallel region. double omp get wtime(void); Portable wall clock timing routine. More exist, for example for locks and nested regions. 23

24 24 OpenMP main environment variables OMP NUM THREADS Sets the maximum number of threads used (default: compiler dependent but often the number of available (hyper)threads). OMP SCHEDULE Used for run-time scheduling. More exist, for example to control nested parallelism.

25 25 parallel for Example: void addvectors(const int *a, const int *b, int *c, int n) { int i; #pragma omp parallel for for (i = 0; i < n; i++) c[i] = a[i] + b[i]; Here i is automatically made private because it is the loop variable. All other variables are shared. Loop split between threads, for example with two threads, for n=10, thread 0 does index 0 to 4 and thread 1 does index 5 to 9.

26 26 parallel for Eliminate overhead from fork and join (in practise: synchronization) by using just one parallel region for two vector additions: void addvectors(const int *a, const int *b, int *c, int n) { int i; #pragma omp for for (i = 0; i < n; i++) c[i] = a[i] + b[i];... #pragma omp parallel { addvectors(a, b, c, n); addvectors(b, c, d, n);

27 27 parallel for, nowait and barrier void addvectors(const int *a, const int *b, int *c, int n) { int i; #pragma omp for nowait for (i = 0; i < n; i++) c[i] = a[i] + b[i];... #pragma omp parallel { addvectors(a, b, c, n); #pragma omp barrier addvectors(b, c, d, n/2); addvectors(e, f, g, n); omp for by default implies a barrier where all threads wait at the end of the loop. Eliminate synchronization overhead using nowait clause. But need to add explicit barrier (dependency)!

28 28 parallel for: if clause The if clause allows conditional parallel regions: if n is too small, the overhead is not worth it: void addvectors(const int *a, const int *b, int *c, int n) { int i; #pragma omp for for (i = 0; i < n; i++) c[i] = a[i] + b[i];... #pragma omp parallel if (n > 10000) { addvectors(a, b, c, n);

29 parallel for: scheduling schedule(static, 10000) allocates chunks of loop iterations to every thread: void addvectors(const int *a, const int *b, int *c, int n) { int i; #pragma omp for schedule(static, 10000) for (i = 0; i < n; i++) c[i] = a[i] + b[i]; Use dynamic instead of static to dynamically assign threads, if one finishes it is assigned the next chunk. Useful for unequal work within iterations. guided instead of dynamic: chunk sizes decrease as less work is left to do. runtime: use OMP SCHEDULE environment variable. 29

30 30 Nested loops For perfectly nested rectangular loops we can parallelize multiple loops in the nest with the collapse clause: Argument is number of loops to collapse. Will form a single loop of length NxM and then parallelize and schedule that. Useful if N is close to the number of threads so parallelizing the outer loop may not have good load balance More efficient alternative to (advanced) nested parallelism #pragma omp parallel for collapse(2) for (int i=0; i<n; i++) { for (int j=0; j<m; j++) {...

31 parallel: manual scheduling (SPMD) SPMD=Single Program Multiple Data, like in MPI. void addvectors(const int *a, const int *b, int *c, int n) { for (i = 0; i < n; i++) c[i] = a[i] + b[i];... int tid, nthreads, low, high; #pragma omp parallel default(none) private(tid, nthreads,\ low, high) shared(a, b, c, n) { tid = omp_get_thread_num(); nthreads = omp_get_num_threads(); low = (n * tid) / nthreads; high = (n * (tid + 1)) / nthreads; addvectors(&a[low], &b[low], &c[low], high-low); Calculate which thread does which loop iterations. Note: no barrier. 31

32 SPMD vs. worksharing Worksharing (omp for/omp do) is easiest to implement. SPMD (do work based on thread ID) may give better performance but is harder to implement. SPMD like in MPI: Instead of using large shared arrays, use smaller arrays private to threads: mark all (non-read-only) global and persistent (static/save) variables threadprivate, and communicate using buffers and barriers. Fewer cache misses using more private data may give better performance. More advanced topic: see Advanced OpenMP workshop in

33 The sections are individual code blocks that are distributed over the threads. More flexible alternative (OpenMP 3.0): omp task, useful when traversing dynamic data structures (lists, trees, etc.). 33 sections (SPMD construct) Example: #pragma omp parallel sections { #pragma omp section addvectors(a, b, c, n); #pragma omp section printf("hello world!\n"); #pragma omp section printf("i may or may not be the third thread\n");

34 workshare (Fortran) Example: integer a(10000), b(10000), c(10000), d(10000)!$omp PARALLEL!$OMP WORKSHARE c(:) = a(:) + b(:)!$omp END WORKSHARE NOWAIT!$OMP WORKSHARE d(:) = a(:)!$omp END WORKSHARE NOWAIT!$OMP END PARALLEL Array assignments in Fortran are distributed among threads like loops. Note that nowait in Fortran comes at the end. 34

35 35 Exercise 3: Modifying Hello Ask each CPU to do its own computation by inserting code as follows: IF (rank == 0) THEN a=sqrt(2.0) b=0.0 WRITE(*,*) a,b=,a,b, on proc,rank END IF IF (rank == 1) THEN a=0.0 b=sqrt(3.0) WRITE(*,*) a,b=,a,b, on proc,rank END IF Recompile as before and submit to the queue...

36 36 Exercise 4: Modifying Hello Do (almost) the same thing, now using omp sections!$omp SECTIONS!$OMP SECTION a=sqrt(2.0) b=0.0 WRITE(*,*) a,b=,a,b, on proc,rank!$omp SECTION a=0.0 b=sqrt(3.0) WRITE(*,*) a,b=,a,b, on proc,rank!$omp END SECTIONS Recompile as before and submit to the queue...

37 Race conditions (Data races) Example: innerprod.c, innerprod.f90 ip = 0; #pragma omp parallel for private(i) shared(ip,a,b) for (i = 0; i < N; i++) ip += a[i] * b[i]; Problem: could be internally run as: for (i = 0; i < N; i++) { int register = ip; register += a[i] * b[i]; ip = register; Threads may sum to their private CPU registers at the same time and overwrite ip, losing the addition from the other threads! 37

38 38 Race conditions Example: a={1,2, b={3,4, inner product 1*3+2*4=11, two threads. ip register0 register1 tid=0 int register = ip; 0 0 unknown tid=1 int register = ip; tid=0 register += a[0] * b[0]; tid=1 register += a[1] * b[1]; tid=0 ip = register; tid=1 ip = register; Wrong result: ip=8. But may just as well be: ip=3: non-deterministic.

39 39 Solution: critical section Example: ip = 0; #pragma omp parallel for private(i) shared(ip,a,b) for (i = 0; i < N; i++) #pragma omp critical ip += a[i] * b[i]; critical makes sure only one thread can run the (compound) statement at a time Problem: slow!

40 40 Solution: atomic section Example: ip = 0; #pragma omp parallel for private(i) shared(ip,a,b) for (i = 0; i < N; i++) #pragma omp atomic ip += a[i] * b[i]; atomic is like critical but can only apply to a specific memory location. Less general but less overhead.

41 41 Solution: local summing Example: ip = 0; #pragma omp parallel private(i,localip) shared(ip,a,b) { localip = 0; #pragma omp for nowait for (i = 0; i < N; i++) localip += a[i] * b[i]; #pragma omp atomic ip += localip; Still needs atomic but only times the number of threads, not times N, greatly reducing overhead and improving performance.

42 Solution: local summing using array Example: ip = 0; int *localips = malloc(omp_get_max_threads()*sizeof(*localips)); #pragma omp parallel private(i,tid) shared(ip,a,b,localips) { tid = omp_get_thread_num(); localips[tid] = 0; #pragma omp for for (i = 0; i < N; i++) localips[tid] += a[i] * b[i]; #pragma omp single for (i = 0; i < omp_get_num_threads(); i++) ip += localips[i]; One single thread does the final summing. Could also use omp master here, to force the master thread to do the final summing. omp master works like if (tid==0). 42

43 43 Solution: local summing using array Example: ip = 0; int *localips = malloc(omp_get_max_threads()*sizeof(*localips)); #pragma omp parallel private(i,tid) shared(ip,a,b,localips) { tid = omp_get_thread_num(); localips[tid] = 0; #pragma omp for for (i = 0; i < N; i++) localips[tid] += a[i] * b[i]; #pragma omp single for (i = 0; i < omp_get_num_threads(); i++) ip += localips[i]; master and single are especially useful for I/O. Problem here: false cache sharing of the array localips.

44 Solution: local summing using array Example: ip = 0; int *localips = malloc(omp_get_max_threads()*sizeof(*localips)); #pragma omp parallel private(i,tid,localip) shared(ip,a,b,localips) { tid = omp_get_thread_num(); localip = 0; #pragma omp for nowait for (i = 0; i < N; i++) localip += a[i] * b[i]; localips[tid] = localip; #pragma omp barrier #pragma omp single for (i = 0; i < omp_get_num_threads(); i++) ip += localips[i]; Minimizes false sharing, moving it outside the loop. Note the barrier! 44

45 45 Solution: local summing using array Could also use padding in localips, but need to know size of L1 cache (e.g. 64 bytes): #define PAD 16 ip = 0; int (*localips)[pad] = malloc(omp_get_max_threads()*sizeof(*localips)); #pragma omp parallel private(i,tid) shared(ip,a,b,localips) { tid = omp_get_thread_num(); localips[tid][0] = 0; #pragma omp for for (i = 0; i < N; i++) localips[tid][0] += a[i] * b[i]; #pragma omp single for (i = 0; i < omp_get_num_threads(); i++) ip += localips[i][0];

46 46 Easiest solution: use reduction Example: ip = 0; #pragma omp parallel for reduction(+:ip) for (i = 0; i < N; i++) ip += a[i] * b[i]; Reduction is the most straightforward solution. Caveat: only works on scalars, and cannot control rounding errors caused by floating point calculations. For vectors use one of the other methods, Fortran, or OpenMP 4.0 user defined reductions.

47 47 Exercise 5: Compilation error See omp bug.c or omp bug.f90, courtesy of Blaise Barney, Lawrence Livermore National Laboratory and find the compilation error in: #pragma omp parallel for \ shared(a,b,c,chunk) \ private(i,tid) \ schedule(static,chunk) { tid = omp_get_thread_num(); for (i=0; i < N; i++) { c[i] = a[i] + b[i]; printf("tid= %d i= %d c[i]= %f\n", tid, i, c[i]); /* end of parallel for construct */

48 48 Exercise 6: Computing π Consider pi collect.c(f90), π = 4 arctan 1 = 4 ( 1) i 1 i=0 2i + 1 = Let s add timings: double t1, t2 t1 = omp_get_wtime();... t2 = omp_get_wtime(); printf("time = %.16f\n", t2-t1); Try some of the other alternatives (atomic, critical) to a reduction and measure the performance.

49 49 Exercise 7: Matrix multiplication See the file mm.c or mm.f90. Make the initialization and multiplication parallel and measure the speedup.

50 50 Further information: The standard itself, news, development, links to tutorials: Intel tutorial on YouTube (from Tim Mattson): New: OpenMP 4.0: thread affinity, SIMD, accelerators (GPUs, coprocessors), in GCC 4.9+, Intel compilers 14 and 15 (used in November 12 Xeon Phi and 2016 Advanced OpenMP workshops). Questions? Write the guillimin support team at

Parallel Computing. Shared memory parallel programming with OpenMP

Parallel Computing. Shared memory parallel programming with OpenMP Parallel Computing Shared memory parallel programming with OpenMP Thorsten Grahs, 27.04.2015 Table of contents Introduction Directives Scope of data Synchronization 27.04.2015 Thorsten Grahs Parallel Computing

More information

Parallel Computing. Parallel shared memory computing with OpenMP

Parallel Computing. Parallel shared memory computing with OpenMP Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014

More information

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State

More information

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based

More information

An Introduction to Parallel Programming with OpenMP

An Introduction to Parallel Programming with OpenMP An Introduction to Parallel Programming with OpenMP by Alina Kiessling E U N I V E R S I H T T Y O H F G R E D I N B U A Pedagogical Seminar April 2009 ii Contents 1 Parallel Programming with OpenMP 1

More information

Objectives. Overview of OpenMP. Structured blocks. Variable scope, work-sharing. Scheduling, synchronization

Objectives. Overview of OpenMP. Structured blocks. Variable scope, work-sharing. Scheduling, synchronization OpenMP Objectives Overview of OpenMP Structured blocks Variable scope, work-sharing Scheduling, synchronization 1 Overview of OpenMP OpenMP is a collection of compiler directives and library functions

More information

To copy all examples and exercises to your local scratch directory type: /g/public/training/openmp/setup.sh

To copy all examples and exercises to your local scratch directory type: /g/public/training/openmp/setup.sh OpenMP by Example To copy all examples and exercises to your local scratch directory type: /g/public/training/openmp/setup.sh To build one of the examples, type make (where is the

More information

High Performance Computing

High Performance Computing High Performance Computing Oliver Rheinbach oliver.rheinbach@math.tu-freiberg.de http://www.mathe.tu-freiberg.de/nmo/ Vorlesung Introduction to High Performance Computing Hörergruppen Woche Tag Zeit Raum

More information

#pragma omp critical x = x + 1; !$OMP CRITICAL X = X + 1!$OMP END CRITICAL. (Very inefficiant) example using critical instead of reduction:

#pragma omp critical x = x + 1; !$OMP CRITICAL X = X + 1!$OMP END CRITICAL. (Very inefficiant) example using critical instead of reduction: omp critical The code inside a CRITICAL region is executed by only one thread at a time. The order is not specified. This means that if a thread is currently executing inside a CRITICAL region and another

More information

OpenMP 1. OpenMP. Jalel Chergui Pierre-François Lavallée. Multithreaded Parallelization for Shared-Memory Machines. <Prénom.Nom@idris.

OpenMP 1. OpenMP. Jalel Chergui Pierre-François Lavallée. Multithreaded Parallelization for Shared-Memory Machines. <Prénom.Nom@idris. OpenMP 1 OpenMP Multithreaded Parallelization for Shared-Memory Machines Jalel Chergui Pierre-François Lavallée Reproduction Rights 2 Copyright c 2001-2012 CNRS/IDRIS OpenMP : plan

More information

Getting OpenMP Up To Speed

Getting OpenMP Up To Speed 1 Getting OpenMP Up To Speed Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline The

More information

Debugging with TotalView

Debugging with TotalView Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich

More information

Towards OpenMP Support in LLVM

Towards OpenMP Support in LLVM Towards OpenMP Support in LLVM Alexey Bataev, Andrey Bokhanko, James Cownie Intel 1 Agenda What is the OpenMP * language? Who Can Benefit from the OpenMP language? OpenMP Language Support Early / Late

More information

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University

More information

Hybrid Programming with MPI and OpenMP

Hybrid Programming with MPI and OpenMP Hybrid Programming with and OpenMP Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2015/2016 R. Rocha and F. Silva (DCC-FCUP) Programming

More information

OpenMP* 4.0 for HPC in a Nutshell

OpenMP* 4.0 for HPC in a Nutshell OpenMP* 4.0 for HPC in a Nutshell Dr.-Ing. Michael Klemm Senior Application Engineer Software and Services Group (michael.klemm@intel.com) *Other brands and names are the property of their respective owners.

More information

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Miami University RedHawk Cluster Working with batch jobs on the Cluster Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

OpenMP C and C++ Application Program Interface

OpenMP C and C++ Application Program Interface OpenMP C and C++ Application Program Interface Version.0 March 00 Copyright 1-00 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP

More information

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators Sandra Wienke 1,2, Christian Terboven 1,2, James C. Beyer 3, Matthias S. Müller 1,2 1 IT Center, RWTH Aachen University 2 JARA-HPC, Aachen

More information

OpenACC 2.0 and the PGI Accelerator Compilers

OpenACC 2.0 and the PGI Accelerator Compilers OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group michael.wolfe@pgroup.com This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1 Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

OpenACC Basics Directive-based GPGPU Programming

OpenACC Basics Directive-based GPGPU Programming OpenACC Basics Directive-based GPGPU Programming Sandra Wienke, M.Sc. wienke@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum (RZ) PPCES,

More information

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959

More information

Basic Concepts in Parallelization

Basic Concepts in Parallelization 1 Basic Concepts in Parallelization Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline

More information

Introduction to Hybrid Programming

Introduction to Hybrid Programming Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming

More information

Common Mistakes in OpenMP and How To Avoid Them

Common Mistakes in OpenMP and How To Avoid Them Common Mistakes in OpenMP and How To Avoid Them A Collection of Best Practices Michael Süß and Claudia Leopold University of Kassel, Research Group Programming Languages / Methodologies, Wilhelmshöher

More information

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86

More information

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014 Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan

More information

Running applications on the Cray XC30 4/12/2015

Running applications on the Cray XC30 4/12/2015 Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes

More information

Parallel Debugging with DDT

Parallel Debugging with DDT Parallel Debugging with DDT Nate Woody 3/10/2009 www.cac.cornell.edu 1 Debugging Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece

More information

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

What is Multi Core Architecture?

What is Multi Core Architecture? What is Multi Core Architecture? When a processor has more than one core to execute all the necessary functions of a computer, it s processor is known to be a multi core architecture. In other words, a

More information

Introduction to SDSC systems and data analytics software packages "

Introduction to SDSC systems and data analytics software packages Introduction to SDSC systems and data analytics software packages " Mahidhar Tatineni (mahidhar@sdsc.edu) SDSC Summer Institute August 05, 2013 Getting Started" System Access Logging in Linux/Mac Use available

More information

Introduction to OpenMP Programming. NERSC Staff

Introduction to OpenMP Programming. NERSC Staff Introduction to OpenMP Programming NERSC Staff Agenda Basic informa,on An selec(ve introduc(on to the programming model. Direc(ves for work paralleliza(on and synchroniza(on. Some hints on usage Hands-

More information

Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015

Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015 1 Partners and sponsors 2 Exercise 0: Login and Setup Ubuntu login:

More information

Programming the Intel Xeon Phi Coprocessor

Programming the Intel Xeon Phi Coprocessor Programming the Intel Xeon Phi Coprocessor Tim Cramer cramer@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Agenda Motivation Many Integrated Core (MIC) Architecture Programming Models Native

More information

University of Amsterdam - SURFsara. High Performance Computing and Big Data Course

University of Amsterdam - SURFsara. High Performance Computing and Big Data Course University of Amsterdam - SURFsara High Performance Computing and Big Data Course Workshop 7: OpenMP and MPI Assignments Clemens Grelck C.Grelck@uva.nl Roy Bakker R.Bakker@uva.nl Adam Belloum A.S.Z.Belloum@uva.nl

More information

OpenMP and Performance

OpenMP and Performance Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an

More information

A Case Study - Scaling Legacy Code on Next Generation Platforms

A Case Study - Scaling Legacy Code on Next Generation Platforms Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy

More information

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing

More information

Informatica e Sistemi in Tempo Reale

Informatica e Sistemi in Tempo Reale Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)

More information

OpenMP Application Program Interface

OpenMP Application Program Interface OpenMP Application Program Interface Version.1 July 0 Copyright 1-0 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP Architecture

More information

OpenMP Application Program Interface

OpenMP Application Program Interface OpenMP Application Program Interface Version.0 - July 01 Copyright 1-01 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP Architecture

More information

NYUAD HPC Center Running Jobs

NYUAD HPC Center Running Jobs NYUAD HPC Center Running Jobs 1 Overview... Error! Bookmark not defined. 1.1 General List... Error! Bookmark not defined. 1.2 Compilers... Error! Bookmark not defined. 2 Loading Software... Error! Bookmark

More information

Parallelization: Binary Tree Traversal

Parallelization: Binary Tree Traversal By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt. SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

BLM 413E - Parallel Programming Lecture 3

BLM 413E - Parallel Programming Lecture 3 BLM 413E - Parallel Programming Lecture 3 FSMVU Bilgisayar Mühendisliği Öğr. Gör. Musa AYDIN 14.10.2015 2015-2016 M.A. 1 Parallel Programming Models Parallel Programming Models Overview There are several

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt. SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

Parallel and Distributed Computing Programming Assignment 1

Parallel and Distributed Computing Programming Assignment 1 Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong

More information

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

Workshare Process of Thread Programming and MPI Model on Multicore Architecture Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma

More information

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable

More information

An Incomplete C++ Primer. University of Wyoming MA 5310

An Incomplete C++ Primer. University of Wyoming MA 5310 An Incomplete C++ Primer University of Wyoming MA 5310 Professor Craig C. Douglas http://www.mgnet.org/~douglas/classes/na-sc/notes/c++primer.pdf C++ is a legacy programming language, as is other languages

More information

HPCC - Hrothgar Getting Started User Guide MPI Programming

HPCC - Hrothgar Getting Started User Guide MPI Programming HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...

More information

Hodor and Bran - Job Scheduling and PBS Scripts

Hodor and Bran - Job Scheduling and PBS Scripts Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.

More information

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015 Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians

More information

The Double-layer Master-Slave Model : A Hybrid Approach to Parallel Programming for Multicore Clusters

The Double-layer Master-Slave Model : A Hybrid Approach to Parallel Programming for Multicore Clusters The Double-layer Master-Slave Model : A Hybrid Approach to Parallel Programming for Multicore Clusters User s Manual for the HPCVL DMSM Library Gang Liu and Hartmut L. Schmider High Performance Computing

More information

MAQAO Performance Analysis and Optimization Tool

MAQAO Performance Analysis and Optimization Tool MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22

More information

OpenACC Programming and Best Practices Guide

OpenACC Programming and Best Practices Guide OpenACC Programming and Best Practices Guide June 2015 2015 openacc-standard.org. All Rights Reserved. Contents 1 Introduction 3 Writing Portable Code........................................... 3 What

More information

CSC230 Getting Starting in C. Tyler Bletsch

CSC230 Getting Starting in C. Tyler Bletsch CSC230 Getting Starting in C Tyler Bletsch What is C? The language of UNIX Procedural language (no classes) Low-level access to memory Easy to map to machine language Not much run-time stuff needed Surprisingly

More information

WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed

WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between

More information

A Performance Monitoring Interface for OpenMP

A Performance Monitoring Interface for OpenMP A Performance Monitoring Interface for OpenMP Bernd Mohr, Allen D. Malony, Hans-Christian Hoppe, Frank Schlimbach, Grant Haab, Jay Hoeflinger, and Sanjiv Shah Research Centre Jülich, ZAM Jülich, Germany

More information

RA MPI Compilers Debuggers Profiling. March 25, 2009

RA MPI Compilers Debuggers Profiling. March 25, 2009 RA MPI Compilers Debuggers Profiling March 25, 2009 Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar

More information

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Introduction to Multicore Programming CS 433 - CS 9624 1 /

More information

Using the Intel Inspector XE

Using the Intel Inspector XE Using the Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Race Condition Data Race: the typical OpenMP programming error, when: two or more threads access the same memory

More information

NEC HPC-Linux-Cluster

NEC HPC-Linux-Cluster NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Using Parallel Computing to Run Multiple Jobs

Using Parallel Computing to Run Multiple Jobs Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The

More information

A quick tutorial on Intel's Xeon Phi Coprocessor

A quick tutorial on Intel's Xeon Phi Coprocessor A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be damien.francois@uclouvain.be Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed

More information

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27. Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

Cluster@WU User s Manual

Cluster@WU User s Manual Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut

More information

How to Run Parallel Jobs Efficiently

How to Run Parallel Jobs Efficiently How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2

More information

Martinos Center Compute Clusters

Martinos Center Compute Clusters Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress

More information

Multi-core Programming System Overview

Multi-core Programming System Overview Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Scalability evaluation of barrier algorithms for OpenMP

Scalability evaluation of barrier algorithms for OpenMP Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science

More information

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew

More information

The CNMS Computer Cluster

The CNMS Computer Cluster The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the

More information

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu Setup Login to Ranger: - ssh -X username@ranger.tacc.utexas.edu Make sure you can export graphics

More information

Shared Address Space Computing: Programming

Shared Address Space Computing: Programming Shared Address Space Computing: Programming Alistair Rendell See Chapter 6 or Lin and Synder, Chapter 7 of Grama, Gupta, Karypis and Kumar, and Chapter 8 of Wilkinson and Allen Fork/Join Programming Model

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 Oct 4, 2013, p 1 Name: CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 1. (max 18) 4. (max 16) 2. (max 12) 5. (max 12) 3. (max 24) 6. (max 18) Total: (max 100)

More information

Performance Tools. Tulin Kaman. tkaman@ams.sunysb.edu. Department of Applied Mathematics and Statistics

Performance Tools. Tulin Kaman. tkaman@ams.sunysb.edu. Department of Applied Mathematics and Statistics Performance Tools Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook/BNL New York Center for Computational Science tkaman@ams.sunysb.edu Aug 24, 2012 Performance Tools Community Tools:

More information

Getting Started with HPC

Getting Started with HPC Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage

More information

Juropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de

Juropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Juropa Batch Usage Introduction May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Batch System Usage Model A Batch System: monitors and controls the resources on the system manages and schedules

More information

Matlab on a Supercomputer

Matlab on a Supercomputer Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C 1 An essential part of any embedded system design Programming 2 Programming in Assembly or HLL Processor and memory-sensitive

More information

The RWTH Compute Cluster Environment

The RWTH Compute Cluster Environment The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de

More information

OpenACC Programming on GPUs

OpenACC Programming on GPUs OpenACC Programming on GPUs Directive-based GPGPU Programming Sandra Wienke, M.Sc. wienke@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum

More information

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Scheduling Task Parallelism on Multi-Socket Multicore Systems Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction

More information

Performance Analysis and Optimization Tool

Performance Analysis and Optimization Tool Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop

More information

Introduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz)

Introduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz) Center for e-research (eresearch@nesi.org.nz) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using

More information

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism Intel Cilk Plus: A Simple Path to Parallelism Compiler extensions to simplify task and data parallelism Intel Cilk Plus adds simple language extensions to express data and task parallelism to the C and

More information