Sieve of Eratosthenes. Finding Primes
|
|
|
- Bethanie Wilson
- 9 years ago
- Views:
Transcription
1 Sieve of Eratosthenes Finding Primes
2 Timing How fast is your code? How long to find the primes between 1 and 10,000,000? How long to find primes between 1 and 1 billion? Optimization: the C++ compilers have optimization switches. g++ filename.cpp -O3 will optimize the code and give significant speed ups with no effort on your part.
3 //timing routines #include <time.h> #include <sys/time.h> #include <iostream> using namespace std; //function for elapsed time double gettimeelapsed(struct timeval end, struct timeval start) { return (end.tv_sec - start.tv_sec) + (end.tv_usec - start.tv_usec) / ; } int main() { // timing info variables in main struct timeval t0, t1; double htime; gettimeofday(&t0, NULL); //computations to do here gettimeofday(&t1, NULL); // calculate elapsed time and print htime=gettimeelapsed(t1,t0); cout<<"time for computation: "<<htime<<endl; }
4 Multi-threaded Applications Multi-core computers Emphasize multiple full-blown processor cores, implementing the complete instruction set of the CPU The cores are out-of-order implying that they could be doing different tasks They may additionally support hyperthreading with two hardware threads Designed to maximize the execution speed of sequential programs Approaches Posix threads (pthreads; 1995) OpenMP (1997)
5 Multi-threaded/multiprocessor Approach Primes 1. One thread or processor per number: half threads gone after one step 2. Thread 0 responsible for 2, 2+p, 2+2p, Thread 1 responsible for 3, 3+p, 3+2p,.. etc. where p is the number of processors 3. Divide numbers between processors. Each processor does N/p numbers A thread is an independent set of instructions that can be scheduled by the processor.
6 OpenMP Fork and Join Model
7 Try OpenMP: Hello, World Type, save as hello.cpp, compile, and run*: #include <iostream> using namespace std; int main() { int ID = 0; printf("hello, World (%d)\n",id) ; } * /usr/local/bin/g++ hello.cpp
8 Now with openmp Add the openmp terms, compile, and run*: #include <omp.h> #include <iostream> using namespace std; int main() { #pragma omp parallel { int ID = omp_get_thread_num(); printf("hello, World (%d)\n",id) ; } } * /usr/local/bin/g++ hello.cpp -fopenmp
9 #include <iostream> #define SIZE 1024 C++ vectoradd.cpp using namespace std; void vectoradd(int *a, int *b, int *c, int n) { for (int i = 1; i < n; i++) c[i] = a[i] + b[i]; } int main() { int * a; int * b; int * c; a = new int[size]; b = new int[size]; c = new int[size]; for (int i = 0; i < SIZE; i++) { a[i]=i; b[i]=i; c[i]=0; } //declare dynamic array //initialize vectors
10 C++ vectoradd.cpp vectoradd(a,b,c, SIZE); //call function // print out first 10 numbers c[i] for(int i = 0; i<10; i++) cout<< "c[" <<i<<"] = "<<c[i]<<endl; delete [] a; //free array memory delete [] b; delete [] c; } To make parallel: add OpenMP directives
11 #include <iostream> #include <omp.h> #define SIZE 1024 using namespace std; void vectoradd(int *a, int *b, int *c, int n) { #pragma omp parallel { #prama omp for for (int i = 1; i < n; i++) c[i] = a[i] + b[i]; } } int main() { C++ vectoradd.cpp int * a; int * b; int * c; a = new int[size]; b = new int[size]; c = new int[size]; for (int i = 0; i < SIZE; i++) //1. include the openmp header //2. pragma omp parallel creates team of parallel threads // 3. pragma imp for means each thread handles a different portion of the loop. //initialize vectors
12 inline void vectoradd(int *a, int *b, int *c, int n) { for (int i = 1; i < n; i++) c[i] = a[i] + b[i]; } int main() { int * a; int * b; int * c; a = new int[size]; b = new int[size]; c = new int[size]; #pragma omp parallel { #pragma omp for for (int i = 0; i < SIZE; i++) //initialize vectors in parallel { a[i]=i; b[i]=i; c[i]=0; } vectoradd(a,b,c, SIZE); } //end parallel section Alternative Method: vectoraddinline
13 Parallel Computing Embarassingly Parallel (the best problems) These are problems that can divided up among workers readily. Example: find a owner of a random number in a Maryland phone book: xxx-xxxx. One person: start at page 1 and scan book until the end. 100 people: each scans 1/100 of the book. 100 times faster (avg) Data Parallelism is a program property where many arithmetic operations can be performed on the data simultaneously. Simulations with many data points and many time steps. Matrix multiplication: row x column for all rows and columns; each independent.
14 OpenMP A portable threading model to use all processors on a multi-core chip Laptop: 4 cores; Mac Pro: up to 12 cores; Intel 60 and 80 cores ClearSpeed 96 cores Available for many compilers and for FORTRAN, C, C++ Uses #pragma commands (pre-processor commands) Easy to use: can make a loop parallel with one command /usr/local/bin/g++ file.cc -fopenmp -O3 -o file Reference for OpenMP 4.0:
15 A Test Suite for OpenMP Barcelona OpenMP Test Suite Project
16 Tests
17 Speed-up If there are N threads or processors and the problem was 100% parallelizable, then each processor does 1/N of the problem and the total computational time is proportional to 1/N. Calculate speed-up, S: S = Time for 1 processor, T(1) = N Time for N processors, T(N) This is linear speed-up If the program consists of parts that can only be done by serial execution and parts in parallel, then we can divide the execution time into T = Ts and Tp. Therefore the above relationship for this type of program is S= Ts + Tp Amdahl s Law Ts + Tp/N
18 Amdahl s Law If a given fraction, F, of a program can be run parallel, then the non-parallel part is 1-F. Speed-up Processors
19 Hard to get speed ups If 20 processors, but only 20% of program is in parallel form, then Amdahl s law says S=1.23. For S = N, then F=1.0, meaning 100% parallel code Very steep near 1-F near zero.
20 Gustafson/Barsis Law Amdahl Law for a fixed problem, but as problems grow in size, then efficiency grows. That is, problems we do grow with the equipment that we have. Also the serial part of the code does not grow that much. If s and p represent the amount of time computing on a parallel machine, then the serial machine will take s+n P to do the task. Amdahl: scaled on time; fixed problem size; G/B scaled on N of processors S= s + N p
21 Exploitable Concurrency Most programs have parts of the code that can be decomposed into subproblems that can be computed independently. You have to identify these parts and recognize any dependencies that might lead to race conditions. Procedure: Find Concurrency Design parallel algorithms to exploit concurrency Mattson, T et al. Patterns for Parallel Programming
22 OpenMP: Open Multi-Processing Multi-threaded shared memory parallelism C, C++, FORTRAN, since Schematic of code: Fork and Join Model No jumping out or into parallel sections. Uses environmental variables for max no. of threads setenv OMP_NUM_THREADS 8 or export OMP_NUM_THREADS 8 OMP_WAIT_POLICY ACTIVE PASSIVE
23 Procedure When a thread reaches a parallel directive, it creates a team of threads and becomes the master thread, with thread number 0. The code in the parallel region is duplicated and all threads execute that code There is an implied barrier at the end of the parallel region, only master thread continues. All other threads terminate. Number of threads dictated by env variable: OMP_NUM_THREADS or by omp_set_num_threads()
24 for
25 #include <omp.h> Generic OpenMP Structure main () { int var1, var2, var3; Serial code here.. Beginning of parallel section. Fork a team of threads while specifing variable scoping #pragma omp parallel private(var1, var2) shared(var3) { Creates a team of threads and each thread executes the same code Other OpenMP directives may be here as well as run-time library calls omp_get_thread_num(); All variables here are local to the block private (var 1, var2) are declared earlier, but have no initial value/leave with none shared (var3) means var3 value known to all threads At end, all threads join master thread and disband } Resume serial code }
26 OpenMP pragmas #pragma omp parallel { stuff done with threads } Inside parallel block, additional commands #pragma omp single { done by one thread } #pragma omp for distribute the for loop among the existing team of threads int main(int argc, char *argv[]) { const int N = ; int i, a[n]; #pragma omp parallel for for (i = 0; i < N; i++) a[i] = 2 * i; return 0; }
27 Parallel Programming Pitfalls Threads can change variables and lead to race conditions A race condition occurs when two threads access a shared variable at the same time. The first thread reads the variable, and the second thread reads the same value from the variable. Then the first thread and second thread perform their operations on the value, and they race to see which thread can write the value last to the shared variable. The value of the thread that writes its value last is preserved, because that thread is writing over the value that the previous thread wrote. Solve by controlling access to shared variables. Compare two possible outcomes for the case where two threads want to increment a global variable x by 1. Both can happen, depending on the processor: Start: x=0 1. T1 reads x=0 2. T1 calculates x=0+1=1 3. T1 writes x=1 4. T2 reads x=1 5. T2 calculates x=1+1=2 6. T2 writes x=2 Result: x=2 Start: x=0 1. T1 reads x=0 2. T2 reads x=0 3. T1 calculates x=0+1=1 4. T2 calculates x=0+1=1 5. T1 writes x=1 6. T2 writes x=1 Result: x=1
28 OpenMP and Functions Loops including function calls are not usually parallelized unless the compiler can guarantee there are no dependencies between iterations. A work-around is to inline the function call within the loop, which works if no dependencies. #include <iostream> using namespace std; inline void hello() { cout<<"hello"; } int main() { hello(); //Call it like a normal function... cin.get(); } The compiler writes the function code directly within the main loop.
29 #include <stdio.h> Barrier #include <omp.h> int main(){ int x; x = 2; #pragma omp parallel num_threads(4) shared(x) { if (omp_get_thread_num() == 0) { x = 5; } else { /* Print 1: the following read of x has a race */ printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x ); } #pragma omp barrier if (omp_get_thread_num() == 0) { /* Print 2 */ printf("2: Thread# %d: x = %d\n", omp_get_thread_num(),x ); } else { /* Print 3 */ printf("3: Thread# %d: x = %d\n", omp_get_thread_num(),x ); } } return 0; }
30 Single #include <stdio.h> #include <omp.h> int main(){ int x; x = 2; #pragma omp parallel num_threads(4) shared(x) { #pragma omp single { x = 5; } printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x ); if (omp_get_thread_num() == 0) { /* Print 2 */ printf("2: Thread# %d: x = %d\n", omp_get_thread_num(),x ); } else { /* Print 3 */ printf("3: Thread# %d: x = %d\n", omp_get_thread_num(),x ); } } return 0; }
Parallel Computing. Shared memory parallel programming with OpenMP
Parallel Computing Shared memory parallel programming with OpenMP Thorsten Grahs, 27.04.2015 Table of contents Introduction Directives Scope of data Synchronization 27.04.2015 Thorsten Grahs Parallel Computing
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Parallel Computing. Parallel shared memory computing with OpenMP
Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State
OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
What is Multi Core Architecture?
What is Multi Core Architecture? When a processor has more than one core to execute all the necessary functions of a computer, it s processor is known to be a multi core architecture. In other words, a
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
High Performance Computing
High Performance Computing Oliver Rheinbach [email protected] http://www.mathe.tu-freiberg.de/nmo/ Vorlesung Introduction to High Performance Computing Hörergruppen Woche Tag Zeit Raum
An Incomplete C++ Primer. University of Wyoming MA 5310
An Incomplete C++ Primer University of Wyoming MA 5310 Professor Craig C. Douglas http://www.mgnet.org/~douglas/classes/na-sc/notes/c++primer.pdf C++ is a legacy programming language, as is other languages
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
Hybrid Programming with MPI and OpenMP
Hybrid Programming with and OpenMP Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2015/2016 R. Rocha and F. Silva (DCC-FCUP) Programming
An Introduction to Parallel Programming with OpenMP
An Introduction to Parallel Programming with OpenMP by Alina Kiessling E U N I V E R S I H T T Y O H F G R E D I N B U A Pedagogical Seminar April 2009 ii Contents 1 Parallel Programming with OpenMP 1
Objectives. Overview of OpenMP. Structured blocks. Variable scope, work-sharing. Scheduling, synchronization
OpenMP Objectives Overview of OpenMP Structured blocks Variable scope, work-sharing Scheduling, synchronization 1 Overview of OpenMP OpenMP is a collection of compiler directives and library functions
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
#pragma omp critical x = x + 1; !$OMP CRITICAL X = X + 1!$OMP END CRITICAL. (Very inefficiant) example using critical instead of reduction:
omp critical The code inside a CRITICAL region is executed by only one thread at a time. The order is not specified. This means that if a thread is currently executing inside a CRITICAL region and another
Informatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C
Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection
To copy all examples and exercises to your local scratch directory type: /g/public/training/openmp/setup.sh
OpenMP by Example To copy all examples and exercises to your local scratch directory type: /g/public/training/openmp/setup.sh To build one of the examples, type make (where is the
Introduction to Multicore Programming
Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Introduction to Multicore Programming CS 433 - CS 9624 1 /
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
OpenACC Basics Directive-based GPGPU Programming
OpenACC Basics Directive-based GPGPU Programming Sandra Wienke, M.Sc. [email protected] Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum (RZ) PPCES,
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
Towards OpenMP Support in LLVM
Towards OpenMP Support in LLVM Alexey Bataev, Andrey Bokhanko, James Cownie Intel 1 Agenda What is the OpenMP * language? Who Can Benefit from the OpenMP language? OpenMP Language Support Early / Late
OpenMP* 4.0 for HPC in a Nutshell
OpenMP* 4.0 for HPC in a Nutshell Dr.-Ing. Michael Klemm Senior Application Engineer Software and Services Group ([email protected]) *Other brands and names are the property of their respective owners.
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Parallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
University of Amsterdam - SURFsara. High Performance Computing and Big Data Course
University of Amsterdam - SURFsara High Performance Computing and Big Data Course Workshop 7: OpenMP and MPI Assignments Clemens Grelck [email protected] Roy Bakker [email protected] Adam Belloum [email protected]
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
Performance Evaluation and Optimization of A Custom Native Linux Threads Library
Center for Embedded Computer Systems University of California, Irvine Performance Evaluation and Optimization of A Custom Native Linux Threads Library Guantao Liu and Rainer Dömer Technical Report CECS-12-11
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Practical Introduction to
1 Practical Introduction to http://tinyurl.com/cq-intro-openmp-20151006 By: Bart Oldeman, Calcul Québec McGill HPC [email protected], [email protected] Partners and Sponsors 2 3 Outline
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 231 237 doi: 10.14794/ICAI.9.2014.1.231 Parallelization of video compressing
COSCO 2015 Heterogeneous Computing Programming
COSCO 2015 Heterogeneous Computing Programming Michael Meyer, Shunsuke Ishikuro Supporters: Kazuaki Sasamoto, Ryunosuke Murakami July 24th, 2015 Heterogeneous Computing Programming 1. Overview 2. Methodology
Rethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
Introduction to Hybrid Programming
Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
Comp151. Definitions & Declarations
Comp151 Definitions & Declarations Example: Definition /* reverse_printcpp */ #include #include using namespace std; int global_var = 23; // global variable definition void reverse_print(const
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva [email protected] Introduction Intel
Quiz for Chapter 1 Computer Abstractions and Technology 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,
on an system with an infinite number of processors. Calculate the speedup of
1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements
Chapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
Course Development of Programming for General-Purpose Multicore Processors
Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 [email protected]
WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed
WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between
Programming the Intel Xeon Phi Coprocessor
Programming the Intel Xeon Phi Coprocessor Tim Cramer [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Motivation Many Integrated Core (MIC) Architecture Programming Models Native
Lecture 3. Optimising OpenCL performance
Lecture 3 Optimising OpenCL performance Based on material by Benedict Gaster and Lee Howes (AMD), Tim Mattson (Intel) and several others. - Page 1 Agenda Heterogeneous computing and the origins of OpenCL
Parallel Programming with MPI on the Odyssey Cluster
Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: [email protected] FAS Research Computing Harvard University Objectives: To introduce you
Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)
Introduction Reading Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF 1 Programming Assignment Notes Assume that memory is limited don t replicate the board on all nodes Need to provide load
Threads Scheduling on Linux Operating Systems
Threads Scheduling on Linux Operating Systems Igli Tafa 1, Stavri Thomollari 2, Julian Fejzaj 3 Polytechnic University of Tirana, Faculty of Information Technology 1,2 University of Tirana, Faculty of
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
OpenMP and Performance
Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group [email protected] IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an
Study of a neural network-based system for stability augmentation of an airplane
Study of a neural network-based system for stability augmentation of an airplane Author: Roger Isanta Navarro Annex 3 ANFIS Network Development Supervisors: Oriol Lizandra Dalmases Fatiha Nejjari Akhi-Elarab
INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism
Intel Cilk Plus: A Simple Path to Parallelism Compiler extensions to simplify task and data parallelism Intel Cilk Plus adds simple language extensions to express data and task parallelism to the C and
An Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
Parallelism and Cloud Computing
Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication
Workshare Process of Thread Programming and MPI Model on Multicore Architecture
Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma
The Double-layer Master-Slave Model : A Hybrid Approach to Parallel Programming for Multicore Clusters
The Double-layer Master-Slave Model : A Hybrid Approach to Parallel Programming for Multicore Clusters User s Manual for the HPCVL DMSM Library Gang Liu and Hartmut L. Schmider High Performance Computing
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators Sandra Wienke 1,2, Christian Terboven 1,2, James C. Beyer 3, Matthias S. Müller 1,2 1 IT Center, RWTH Aachen University 2 JARA-HPC, Aachen
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Why Choose C/C++ as the programming language? Parallel Programming in C/C++ - OpenMP versus MPI
Parallel Programming (Multi/cross-platform) Why Choose C/C++ as the programming language? Compiling C/C++ on Windows (for free) Compiling C/C++ on other platforms for free is not an issue Parallel Programming
CSC230 Getting Starting in C. Tyler Bletsch
CSC230 Getting Starting in C Tyler Bletsch What is C? The language of UNIX Procedural language (no classes) Low-level access to memory Easy to map to machine language Not much run-time stuff needed Surprisingly
Shared Address Space Computing: Programming
Shared Address Space Computing: Programming Alistair Rendell See Chapter 6 or Lin and Synder, Chapter 7 of Grama, Gupta, Karypis and Kumar, and Chapter 8 of Wilkinson and Allen Fork/Join Programming Model
www.quilogic.com SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology
SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology The parallel revolution Future computing systems are parallel, but Programmers
Scheduling Task Parallelism" on Multi-Socket Multicore Systems"
Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction
C++ Programming Language
C++ Programming Language Lecturer: Yuri Nefedov 7th and 8th semesters Lectures: 34 hours (7th semester); 32 hours (8th semester). Seminars: 34 hours (7th semester); 32 hours (8th semester). Course abstract
Performance metrics for parallel systems
Performance metrics for parallel systems S.S. Kadam C-DAC, Pune [email protected] C-DAC/SECG/2006 1 Purpose To determine best parallel algorithm Evaluate hardware platforms Examine the benefits from parallelism
A Performance Monitoring Interface for OpenMP
A Performance Monitoring Interface for OpenMP Bernd Mohr, Allen D. Malony, Hans-Christian Hoppe, Frank Schlimbach, Grant Haab, Jay Hoeflinger, and Sanjiv Shah Research Centre Jülich, ZAM Jülich, Germany
High Performance Computing
High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.
Contributions to Gang Scheduling
CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,
OpenMP Application Program Interface
OpenMP Application Program Interface Version.1 July 0 Copyright 1-0 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP Architecture
Getting OpenMP Up To Speed
1 Getting OpenMP Up To Speed Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline The
C++ INTERVIEW QUESTIONS
C++ INTERVIEW QUESTIONS http://www.tutorialspoint.com/cplusplus/cpp_interview_questions.htm Copyright tutorialspoint.com Dear readers, these C++ Interview Questions have been designed specially to get
Sources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015
Operating Systems 05. Threads Paul Krzyzanowski Rutgers University Spring 2015 February 9, 2015 2014-2015 Paul Krzyzanowski 1 Thread of execution Single sequence of instructions Pointed to by the program
A Case Study - Scaling Legacy Code on Next Generation Platforms
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy
CSE 6040 Computing for Data Analytics: Methods and Tools
CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 12 Computer Architecture Overview and Why it Matters DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS
Using the RDTSC Instruction for Performance Monitoring
Using the Instruction for Performance Monitoring http://developer.intel.com/drg/pentiumii/appnotes/pm1.htm Using the Instruction for Performance Monitoring Information in this document is provided in connection
A deeper look at Inline functions
A deeper look at Inline functions I think it s safe to say that all Overload readers know what C++ inline functions are. When we declare a function or member function as inline we are trying to avoid the
5 Arrays and Pointers
5 Arrays and Pointers 5.1 One-dimensional arrays Arrays offer a convenient way to store and access blocks of data. Think of arrays as a sequential list that offers indexed access. For example, a list of
Parallel Computing for Data Science
Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint
OpenCL for programming shared memory multicore CPUs
Akhtar Ali, Usman Dastgeer and Christoph Kessler. OpenCL on shared memory multicore CPUs. Proc. MULTIPROG-212 Workshop at HiPEAC-212, Paris, Jan. 212. OpenCL for programming shared memory multicore CPUs
Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI
Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
Understanding Hardware Transactional Memory
Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence
Programing the Microprocessor in C Microprocessor System Design and Interfacing ECE 362
PURDUE UNIVERSITY Programing the Microprocessor in C Microprocessor System Design and Interfacing ECE 362 Course Staff 1/31/2012 1 Introduction This tutorial is made to help the student use C language
Using the Intel Inspector XE
Using the Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) Race Condition Data Race: the typical OpenMP programming error, when: two or more threads access the same memory
MAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
SQLITE C/C++ TUTORIAL
http://www.tutorialspoint.com/sqlite/sqlite_c_cpp.htm SQLITE C/C++ TUTORIAL Copyright tutorialspoint.com Installation Before we start using SQLite in our C/C++ programs, we need to make sure that we have
C++FA 3.1 OPTIMIZING C++
C++FA 3.1 OPTIMIZING C++ Ben Van Vliet Measuring Performance Performance can be measured and judged in different ways execution time, memory usage, error count, ease of use and trade offs usually have
OpenACC Programming and Best Practices Guide
OpenACC Programming and Best Practices Guide June 2015 2015 openacc-standard.org. All Rights Reserved. Contents 1 Introduction 3 Writing Portable Code........................................... 3 What
The C Programming Language course syllabus associate level
TECHNOLOGIES The C Programming Language course syllabus associate level Course description The course fully covers the basics of programming in the C programming language and demonstrates fundamental programming
