Hybrid Programming with MPI and OpenMP
|
|
|
- Bonnie Stevens
- 9 years ago
- Views:
Transcription
1 Hybrid Programming with and OpenMP Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2015/2016 R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 1 / 14
2 on Clusters On a cluster of multiprocessors (or multicore processors), we can execute a parallel program by having a process executing in each processor (or core). It may be the case that multiple processes execute in the same multiprocessor (or multicore processor), but still the interactions among those processes are based on message passing. R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 2 / 14
3 and OpenMP on Clusters A different approach is to design an hybrid program in which only one process executes in each multiprocessor (or multicore processor), and then launch a set of threads equal to the number of processors (or cores) in each machine to execute the parallel regions of the program. R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 3 / 14
4 and OpenMP on Clusters As an alternative, we could combine both strategies and adapt the division between processes and threads that optimizes the use of the available resources without violating possible constraints or requirements of the problem being solved. R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 4 / 14
5 Hybrid Programming with and OpenMP Hybrid programming adapts perfectly to current architectures based on clusters of multiprocessors and/or multicore processors, as it induces less communication among different nodes and increases performance of each node without having to increase memory requirements. Applications with two levels of parallelism may use processes to exploit large grain parallelism, occasionally exchanging messages to synchronize information and/or share work, and use threads to exploit medium/small grain parallelism by resorting to a shared address space. Applications with constraints or requirements that may limit the number of processes that can be used (e.g., Fox s algorithm), may take advantage of OpenMP to exploit the remaining computational resources. Applications for which load balancing is hard to achieve with only processes, may benefit from OpenMP to balance work, by assigning a different number of threads to each process as a function of its load. R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 5 / 14
6 Hybrid Programming with and OpenMP The simplest and safe way to combine with OpenMP is to never use the calls inside the OpenMP parallel regions. When that happens, there is no problem with the calls, given that only the master thread is active during all communications. main(int argc, char **argv) { _Init(&argc, &argv); // master thread only --> calls here #pragma omp parallel { // team of threads --> no calls here // master thread only --> calls here _Finalize(); R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 6 / 14
7 Matrix-Vector Product (mpi-omp-matrixvector.c) The product of a matrix mat[rows,cols] and a column vector vec[cols] is a row vector prod[rows] such that each prod[i] is the scalar product of row i of the matrix by the column vector. If we have P processes and T threads per process, then each process can compute ROWS/P (P_ROWS) elements of the result and each thread can compute P_ROWS/T (T_ROWS) elements of the result. // distribute the matrix _Scatter(mat, P_ROWS * COLS, _INT, submat, P_ROWS * COLS, _INT, ROOT, _COMM_WORLD); // calculate the matrix-vector product #pragma omp parallel for num_threads(nthreads) for (i = 0; i < P_ROWS; i++) subprod[i] = scalar_product(&submat[i * COLS], vec, COLS); // gather the submatrices _Gather(subprod, P_ROWS, _INT, prod, P_ROWS, _INT, ROOT, _COMM_WORLD); R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 7 / 14
8 Matrix Multiplication with Fox s Algorithm Given two matrices A[N][N] and B[N][N] and P processes (P=Q*Q), Fox s algorithm divides both matrices in P submatrices of size (N/Q)*(N/Q) and assigns each submatrix to one of the P available processes. To compute the multiplication of its submatrix, each process only communicates with the processes in the same row and in the same column of its own submatrix. for (stage = 0; stage < Q; stage++) { bcast = // pick a submatrix A in each row of processes and _Bcast(); // send it to all processes in the same row // multiply the received submatrix A with current submatrix B #pragma omp parallel for private(i,j,k) for (i = 0; i < N/Q; i++) for (j = 0; j < N/Q; j++) for (k = 0; k < N/Q; k++) C[i][j] += A[i][k] * B[k][j]; _Send() // send submatrix B to process above and _Recv() // receive the submatrix B from process below R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 8 / 14
9 Safe If a program is parallelized so that it has calls inside OpenMP parallel regions, then multiple threads can call the same communications and at the same time. For this to be possible, it is necessary that the implementation be thread safe. #pragma omp parallel private(tid) { tid = omp_get_thread_num(); if (tid == id1) mpi_calls_f1(); // thread id1 makes calls here else if (tid == id2) mpi_calls_f2(); // thread id2 makes calls here else do_something(); -1 does not support multithreading. Such support was only considered with -2 with the introduction of the _Init_thread() call. R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 9 / 14
10 _Init_thread() initializes the execution environment (similarly to _Init()) and defines the support level for multithreading: required is the aimed support level provided is the support level provided by the implementation The support level for multithreading can be: _THREAD_SINGLE only one thread will execute (the same as initializing the environment with _Init()) _THREAD_FUNNELED only the master thread can make calls _THREAD_SERIALIZED all threads can make calls, but only one thread at a time can be in such state _THREAD_MULTIPLE all threads can make simultaneous calls without any constraints R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 10 / 14 Initializing with Support for Multithreading int _Init_thread(int *argc, char ***argv, int required, int *provided)
11 _THREAD_FUNNELED With support level _THREAD_FUNNELED only the master thread can make calls. One way to ensure this is to protect the calls with the omp master directive. However, the omp master directive does not define any implicit synchronization barrier among all threads in the parallel region (at entrance or exit of the omp master directive) in order to protect the call. #pragma omp parallel { #pragma omp barrier // explicit barrier at entrance #pragma omp master // only the master thread makes the call mpi_call(); #pragma omp barrier // explicit barrier at exit R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 11 / 14
12 _THREAD_SERIALIZED With support level _THREAD_SERIALIZED all threads can make calls, but only one thread at a time can be in such state. One way to ensure this is to protect the calls with the omp single directive that allows to define code blocks that should be executed only by one thread. However, the omp single directive does not define an implicit synchronization barrier at the entrance of the directive. In order to protect the call, it is necessary to set an explicit omp barrier directive at entrance of the omp single directive. #pragma omp parallel { #pragma omp barrier // explicit barrier at entrance #pragma omp single // only one thread makes the call mpi_call(); // explicit barrier at exit R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 12 / 14
13 _THREAD_MULTIPLE With support level _THREAD_MULTIPLE all threads can make simultaneous calls without any constraints. Given that the implementation is thread safe, there is no need for any additional synchronization mechanism among the threads in the parallel region in order to protect the calls. #pragma omp parallel private(tid) { tid = omp_get_thread_num(); if (tid == id1) mpi_calls_f1(); // thread id1 makes calls here else if (tid == id2) mpi_calls_f2(); // thread id2 makes calls here else do_something(); R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 13 / 14
14 _THREAD_MULTIPLE The communication among threads of different processes raises the problem of identifying the thread that is involved in the communication (as communications only include arguments to identify the ranks of the processes). A simple way to solve this problem is to use the tag argument to identify the thread involved in the communication. _Comm_rank(_COMM_WORLD, &my_rank); #pragma omp parallel num_threads(nthreads) private(tid) { tid = omp_get_thread_num(); if (my_rank == 0) // process 0 sends NTHREADS messages _Send(a, 1, _INT, 1, tid, _COMM_WORLD); else if (my_rank == 1) // process 1 receives NTHREADS messages _Recv(b, 1, _INT, 0, tid, _COMM_WORLD, &status); R. Rocha and F. Silva (DCC-FCUP) Programming with and OpenMP Parallel Computing 15/16 14 / 14
What is Multi Core Architecture?
What is Multi Core Architecture? When a processor has more than one core to execute all the necessary functions of a computer, it s processor is known to be a multi core architecture. In other words, a
Parallel Algorithm for Dense Matrix Multiplication
Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem
WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed
WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between
Parallel Computing. Parallel shared memory computing with OpenMP
Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State
OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)
Introduction Reading Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF 1 Programming Assignment Notes Assume that memory is limited don t replicate the board on all nodes Need to provide load
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Parallel Computing. Shared memory parallel programming with OpenMP
Parallel Computing Shared memory parallel programming with OpenMP Thorsten Grahs, 27.04.2015 Table of contents Introduction Directives Scope of data Synchronization 27.04.2015 Thorsten Grahs Parallel Computing
Lecture 3. Optimising OpenCL performance
Lecture 3 Optimising OpenCL performance Based on material by Benedict Gaster and Lee Howes (AMD), Tim Mattson (Intel) and several others. - Page 1 Agenda Heterogeneous computing and the origins of OpenCL
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
CUDAMat: a CUDA-based matrix class for Python
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
OpenMP and Performance
Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group [email protected] IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an
LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
Shared Memory Abstractions for Heterogeneous Multicore Processors
Shared Memory Abstractions for Heterogeneous Multicore Processors Scott Schneider Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
Introduction to Hybrid Programming
Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming
Matrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
High Performance Computing
High Performance Computing Oliver Rheinbach [email protected] http://www.mathe.tu-freiberg.de/nmo/ Vorlesung Introduction to High Performance Computing Hörergruppen Woche Tag Zeit Raum
Scheduling Task Parallelism" on Multi-Socket Multicore Systems"
Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction
Introduction to Programming (in C++) Multi-dimensional vectors. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC
Introduction to Programming (in C++) Multi-dimensional vectors Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC Matrices A matrix can be considered a two-dimensional vector,
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
Workshare Process of Thread Programming and MPI Model on Multicore Architecture
Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma
GPI Global Address Space Programming Interface
GPI Global Address Space Programming Interface SEPARS Meeting Stuttgart, December 2nd 2010 Dr. Mirko Rahn Fraunhofer ITWM Competence Center for HPC and Visualization 1 GPI Global address space programming
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI
Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling
Petascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp
Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since
OpenMP* 4.0 for HPC in a Nutshell
OpenMP* 4.0 for HPC in a Nutshell Dr.-Ing. Michael Klemm Senior Application Engineer Software and Services Group ([email protected]) *Other brands and names are the property of their respective owners.
Scalability evaluation of barrier algorithms for OpenMP
Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
MAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
An Introduction to Parallel Programming with OpenMP
An Introduction to Parallel Programming with OpenMP by Alina Kiessling E U N I V E R S I H T T Y O H F G R E D I N B U A Pedagogical Seminar April 2009 ii Contents 1 Parallel Programming with OpenMP 1
CPU Scheduling Outline
CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different
#pragma omp critical x = x + 1; !$OMP CRITICAL X = X + 1!$OMP END CRITICAL. (Very inefficiant) example using critical instead of reduction:
omp critical The code inside a CRITICAL region is executed by only one thread at a time. The order is not specified. This means that if a thread is currently executing inside a CRITICAL region and another
Towards OpenMP Support in LLVM
Towards OpenMP Support in LLVM Alexey Bataev, Andrey Bokhanko, James Cownie Intel 1 Agenda What is the OpenMP * language? Who Can Benefit from the OpenMP language? OpenMP Language Support Early / Late
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Parallel Programming with MPI on the Odyssey Cluster
Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: [email protected] FAS Research Computing Harvard University Objectives: To introduce you
PPD: Scheduling and Load Balancing 2
PPD: Scheduling and Load Balancing 2 Fernando Silva Computer Science Department Center for Research in Advanced Computing Systems (CRACS) University of Porto, FCUP http://www.dcc.fc.up.pt/~fds 2 (Some
Intel Many Integrated Core Architecture: An Overview and Programming Models
Intel Many Integrated Core Architecture: An Overview and Programming Models Jim Jeffers SW Product Application Engineer Technical Computing Group Agenda An Overview of Intel Many Integrated Core Architecture
Hadoop Parallel Data Processing
MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for
CUDA Programming. Week 4. Shared memory and register
CUDA Programming Week 4. Shared memory and register Outline Shared memory and bank confliction Memory padding Register allocation Example of matrix-matrix multiplication Homework SHARED MEMORY AND BANK
Package Rdsm. February 19, 2015
Version 2.1.1 Package Rdsm February 19, 2015 Author Maintainer Date 10/01/2014 Title Threads Environment for R Provides a threads-type programming environment
High performance computing systems. Lab 1
High performance computing systems Lab 1 Dept. of Computer Architecture Faculty of ETI Gdansk University of Technology Paweł Czarnul For this exercise, study basic MPI functions such as: 1. for MPI management:
Lecture 1: an introduction to CUDA
Lecture 1: an introduction to CUDA Mike Giles [email protected] Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Overview hardware view software view CUDA programming
A quick tutorial on Intel's Xeon Phi Coprocessor
A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be [email protected] Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed
Here are some examples of combining elements and the operations used:
MATRIX OPERATIONS Summary of article: What is an operation? Addition of two matrices. Multiplication of a Matrix by a scalar. Subtraction of two matrices: two ways to do it. Combinations of Addition, Subtraction,
Retour vers le futur des bibliothèques de squelettes algorithmiques et DSL
Retour vers le futur des bibliothèques de squelettes algorithmiques et DSL Sylvain Jubertie [email protected] Journée LaMHA - 26/11/2015 Squelettes algorithmiques 2 / 29 Squelettes algorithmiques
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
Programming on Parallel Machines
Programming on Parallel Machines Norm Matloff University of California, Davis GPU, Multicore, Clusters and More See Creative Commons license at http://heather.cs.ucdavis.edu/ matloff/probstatbook.html
Introduction to Multicore Programming
Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Introduction to Multicore Programming CS 433 - CS 9624 1 /
Lightning Introduction to MPI Programming
Lightning Introduction to MPI Programming May, 2015 What is MPI? Message Passing Interface A standard, not a product First published 1994, MPI-2 published 1997 De facto standard for distributed-memory
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
Getting OpenMP Up To Speed
1 Getting OpenMP Up To Speed Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline The
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
An Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Introduction to Cloud Computing
Introduction to Cloud Computing Distributed Systems 15-319, spring 2010 11 th Lecture, Feb 16 th Majd F. Sakr Lecture Motivation Understand Distributed Systems Concepts Understand the concepts / ideas
Parallel Computing with Mathematica UVACSE Short Course
UVACSE Short Course E Hall 1 1 University of Virginia Alliance for Computational Science and Engineering [email protected] October 8, 2014 (UVACSE) October 8, 2014 1 / 46 Outline 1 NX Client for Remote
CSE 6040 Computing for Data Analytics: Methods and Tools
CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 12 Computer Architecture Overview and Why it Matters DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS
HPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
Objectives. Overview of OpenMP. Structured blocks. Variable scope, work-sharing. Scheduling, synchronization
OpenMP Objectives Overview of OpenMP Structured blocks Variable scope, work-sharing Scheduling, synchronization 1 Overview of OpenMP OpenMP is a collection of compiler directives and library functions
Informatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings
Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Gabriele Jost and Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 {gjost,hjin}@nas.nasa.gov
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Parallelism and Cloud Computing
Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Programming the Cell Multiprocessor: A Brief Introduction
Programming the Cell Multiprocessor: A Brief Introduction David McCaughan, HPC Analyst SHARCNET, University of Guelph [email protected] Overview Programming for the Cell is non-trivial many issues to be
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Detecting the Presence of Virtual Machines Using the Local Data Table
Detecting the Presence of Virtual Machines Using the Local Data Table Abstract Danny Quist {[email protected]} Val Smith {[email protected]} Offensive Computing http://www.offensivecomputing.net/
FPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
Deal with big data in R using bigmemory package
Deal with big data in R using bigmemory package Xiaojuan Hao Department of Statistics University of Nebraska-Lincoln April 28, 2015 Background What Is Big Data Size (We focus on) Complexity Rate of growth
Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1
Lecture 6: Introduction to MPI programming Lecture 6: Introduction to MPI programming p. 1 MPI (message passing interface) MPI is a library standard for programming distributed memory MPI implementation(s)
Cloud-based OpenMP Parallelization Using a MapReduce Runtime. Rodolfo Wottrich, Rodolfo Azevedo and Guido Araujo University of Campinas
Cloud-based OpenMP Parallelization Using a MapReduce Runtime Rodolfo Wottrich, Rodolfo Azevedo and Guido Araujo University of Campinas 1 MPI_Init(NULL, NULL); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); MPI_Comm_rank(MPI_COMM_WORLD,
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
Linear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the
