Calcul Parallèle sous MATLAB

Size: px
Start display at page:

Download "Calcul Parallèle sous MATLAB"

Transcription

1 Calcul Parallèle sous MATLAB Journée Calcul Parallèle GPU/CPU du PEPI MACS Olivier de Mouzon INRA Gremaq Toulouse School of Economics Lundi 28 novembre 2011 Paris Présentation en grande partie fondée sur des extraits du séminaire récent : Mathworks GREMAQ TSE 15 th of November 2011 Mounir El Bedraoui Stefan Duprey Sales Account Manager Academia Financial Application Engineer 2011 The MathWorks, Inc. 1

2 Plan Fonctions déjà (silencieusement) parallélisées Parallel Computing Toolbox (PCT) PCT seule vs. MDCS (Matlab Distributed Computing System) Local vs. cluster Utilisation (explicite) de fonctions parallélisées Utilisation de fonctions de base : parfor & spmd Utilisation de travaux et tâches Un point sur MPI Un point sur GPU (NVIDIA CUDA) 2

3 Fonctions déjà (silencieusement) parallélisées Multithreaded computations, introduced in R2007a, are now on by default. Many MATLAB functions are now multithreaded: sort bsxfun mldivide for sparse rectangular matrix input qr for sparse matrix input filter for matrices and higher-dimensional arrays gamma, gammaln erf, erfc, erfcx, erfinv, erfcinv 3

4 Parallel Computing Toolbox (PCT) matlabpool open local Code matlabpool close matlabpool open 9 Par défaut : 12 coeurs 4

5 Parallel Computing enables you to Larger Compute Pool Larger Memory Pool Speed up Computations Running Independent Tasks or Iterations Work with Large Data

6 Parallel Computing with MATLAB Parallel Computing Toolbox MATLAB Distributed Computing Server MATLAB Workers User s Desktop Compute Cluster 6

7 PCT : utilisation (explicite) de fonctions parallélisées - exemple %% Parallel bootstrapped aggregated tree % crossval, jackknife, bootstrp ntrees = 50; matlabpool open local; opt = statset('useparallel','always'); tic; b = TreeBagger(nTrees, X, Y, 'opt',opt); toc; matlabpool close; 7

8 Tools with Built-in parallel Support Optimization Toolbox Global Optimization Toolbox Statistics Toolbox SystemTest Simulink Design Optimization Bioinformatics Toolbox Model-Based Calibration Toolbox TOOLBOXES BLOCKSETS Worker Worker Worker Worker Worker Worker Worker Directly leverage functions in Parallel Computing Toolbox 8

9 PCT : fonctions de base parfor et spmd 9

10 PCT : parfor N = 250; a = zeros(n, 1); matlabpool open local; tic; parfor i = 1:N %for i = 1:N a(i) = max(eig(rand(300))); end toc; matlabpool close; 10

11 Case 1 : Speed up Computations Distributing similar problems to different processors (or Task-parallelism) Processes Time Time 11

12 The Mechanics of parfor Loops a = zeros(10, 1) parfor i = 1:10 a(i) = i; end a Worker a(i) = i; Worker a(i) = i; Worker a(i) = i; Worker a(i) = i; Pool of MATLAB Workers 12

13 PCT : parfor Chaque itération doit être indépendante des autres Minimiser les échanges de données avec les différents cœurs Passage des variables en entrée Récupération des variables en sortie 13

14 PCT : spmd matlabpool open 2; n = 100; spmd % simple spmd block a = rand(n,n); display(size(a)) display(a(1:2,1:2)) end spmd % creating distributed arrays a = rand(n,n,codistributor); display( size(getlocalpart(a))); end spmd d = svd(a) end dgathered=gather(d); D = distributed.rand(1000); % Data is created and stored on the workers. b = distributed.rand(1000, 1); % Created on the workers x = D \ b; matlabpool close 14

15 Case 2 : Work with large data Distributing arrays to different processors (or Data-parallelism) TOOLBOXES BLOCKSETS C O N F I D E N T I A L 15

16 Examples of distributed and codistributed arrays spmd blocks spmd % single program across workers end Run on a pool of MATLAB resources Single Program runs simultaneously across workers Multiple Data spread across multiple workers C O N F I D E N T I A L 16

17 A mental model for SPMD END x = 1 spmd y = x + 1 end y Worker x 1 y = x + 1 Worker x 1 y = x + 1 Worker x 1 y = x + 1 Worker x 1 y = x + 1 Pool of MATLAB Workers C O N F I D E N T I A L 17

18 Client-Side Distributed Arrays and spmd Client-side distributed arrays Class distributed Ability to create and manipulate directly from the client Simpler access to memory on labs Client-side visualization capabilities spmd Block of code executed on workers Worker-specific commands Explicit communication between workers Mixture of parallel and serial code C O N F I D E N T I A L 18

19 Enhanced MATLAB Functions That Operate on Distributed Arrays C O N F I D E N T I A L 19

20 PCT : travaux et tâches Used findresource to find scheduler Used createjob and createtask to set up the problem Used submit to offload and run in parallel Used getalloutputarguments to retrieve all task outputs 20

21 Scheduling Work Work Worker TOOLBOXES BLOCKSETS Result Scheduler Worker Worker Worker 21

22 Scheduling Task-parallel applications Compute cluster MATLAB Distributed Computing Server Client Machine Task Job Result CPU Worker Task Result Worker CPU TOOLBOXES Parallel Computing Toolbox Result Scheduler Task Result Worker CPU BLOCKSETS Task Result Worker CPU 22

23 23

24 24

25 Scheduling Data-parallel applications Compute cluster MATLAB Distributed Computing Server Client Machine Task Result CPU Lab Job Task Result Lab CPU TOOLBOXES Parallel Computing Toolbox Result Scheduler Task Result Lab CPU BLOCKSETS Task Result Lab CPU 25

26 26

27 27

28 PCT : MPI-Based Functions Use when a high degree of control over parallel algorithm is required High-level abstractions of MPI functions labsendreceive, labbroadcast, and others Send, receive, and broadcast any data type in MATLAB Automatic bookkeeping Setup: communication, ranks, etc. Error detection: deadlocks and miscommunications Pluggable Use any MPI implementation that is binary-compatible with MPICH2 28

29 Using an InfiniBand network Parallel Computing Toolbox does not have built-in support for InfiniBand. However, the toolboxes provide all the necessary hooks to take advantage of it. The user will need to provide their own custom build of MPI that supports InfiniBand. See "Using a Different MPI Build on UNIX Operating Systems for more details 29

30 Summary 30

31 Programming Parallel Applications Level of control Parallel Options Minimal Support built into Toolboxes Some Extensive High-Level Programming Constructs: (e.g. parfor, batch, distributed, Jobs/Tasks) Low-Level Programming Constructs: (e.g. MPI-based) 31

32 Parallel Computing with MATLAB Built-in parallel functionality within specific toolboxes (also requires Parallel Computing Toolbox) Optimization Toolbox Global Optimization Toolbox Statistics Toolbox SystemTest Simulink Design Optimization Bioinformatics Toolbox Model-Based Calibration Toolbox High-level parallel language MATLAB and Parallel Computing Tools parfor matlabpool batch Low-level parallel functions createjob createtask Built on industry-standard libraries Industry Libraries Message Passing Interface (MPI) ScaLAPACK 32

33 GPU support - R2010b 33

34 What is a GPU Originally for graphics acceleration, now also used for scientific calculations Massively parallel array of integer and floating point processors Typically hundreds of processors per card GPU cores complement CPU cores Dedicated high-speed memory 34

35 GPU vs CPU 35

36 Performance Gain with More Hardware Using More Cores (CPUs) Using GPUs Core 1 Core 2 Core 3 Core 4 Cache Device Memory 36

37 Technical language of GPU 37

38 Nvidia solutions GPU GeForce Quadro Tesla Mass market Graphics Calculations ECC Memory Faster PCIe Communication 38

39 Supported Cards and Operating Systems To use GPU functionalities the user should have: MATLAB + PCT R2010b 32-bit or 64-bit Microsoft Windows or Linux operating system NVIDIA CUDA-enabled device with compute capability of 1.3 or greater NVIDIA CUDA device driver 3.0 or greater NVIDIA CUDA Toolkit 3.0 (recommended) for compiling PTX files 39

40 Using GPU with PCT R2010b 3 Main Ways to Access GPU: Ease of Use 1. Use GPU array interface and MATLAB built-in functions 2. Execute custom functions on elements of the GPU array 3. Create kernels from your CUDA code and PTX files Greater Functionality 40

41 1. Using MATLAB Built-In functions Feels like using distributed arrays A = gpuarray(rand(1000)); B = gpuarray(rand(1000)); C = transpose(a); D = C * log(b); E = gather(d); C O N F I D E N T I A L 41

42 Performance: A\b with Double Precision 42

43 Supported functions 43

44 2. Using MATLAB function file Create a MATLAB function (kernel) function c = myop(a,b) a1 = log(a); b1 = log(b); c = round(a1.* b1); From MATLAB: a = gpuarray(1/2*rand(1000)); b = gpuarray(3*rand(1000)); c = arrayfun(@myop,a,b); d = gather(c); arrayfun() 44

45 Performances 45

46 Main Limitations The code can call only supported functions and cannot call scripts Indexing is not supported Persistent or global variables are not supported if, for, while, parfor, spmd, switch, try/catch, and return not supported single, double, int32, uint32, and logical are the only supported data type conversions Functional forms of arithmetic operators are not supported, but symbol operators are (i.e. + supported, plus() not supported) 46

47 3. Invoking CUDA code Develop the CUDA code (kernel) for your computation Compile the CUDA code in MATLAB using NVIDIA compiler nvcc ptx myfun.cu Create a MATLAB function MyFun.m containing the commands kernel = parallel.gpu.cudakernel( myfun.ptx, myfun.cu ); To create a kernel object Res = feval(kernel, input_arguments); Allows users to evaluate their kernel on the GPU Execute the MATLAB function Res = myfun (input_arguments); 47

48 Example of CUDA code 48

49 Performances 49

50 Performance Acceleration Options in the Parallel Computing Toolbox Technology Example MATLAB Workers Execution Target matlabpool parfor Required CPU Cores user-defined tasks createtask Required CPU Cores GPU-based parallelism GPUArray No NVIDIA GPU with Compute Capability 1.3 or greater 50

51 Questions? Thank you. 51

Speed up numerical analysis with MATLAB

Speed up numerical analysis with MATLAB 2011 Technology Trend Seminar Speed up numerical analysis with MATLAB MathWorks: Giorgia Zucchelli Marieke van Geffen Rachid Adarghal TU Delft: Prof.dr.ir. Kees Vuik Thales Nederland: Dènis Riedijk 2011

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing

More information

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult

More information

Matlab on a Supercomputer

Matlab on a Supercomputer Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides

More information

Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015

Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015 1 Partners and sponsors 2 Exercise 0: Login and Setup Ubuntu login:

More information

High-Performance Computing

High-Performance Computing High-Performance Computing Windows, Matlab and the HPC Dr. Leigh Brookshaw Dept. of Maths and Computing, USQ 1 The HPC Architecture 30 Sun boxes or nodes Each node has 2 x 2.4GHz AMD CPUs with 4 Cores

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

What s New in MATLAB and Simulink

What s New in MATLAB and Simulink What s New in MATLAB and Simulink Kevin Cohan Product Marketing, MATLAB Michael Carone Product Marketing, Simulink 2015 The MathWorks, Inc. 1 What was new for Simulink in R2012b? 2 What Was New for MATLAB

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.5 July 2013 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v6.5 August 2014 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

Speeding up MATLAB and Simulink Applications

Speeding up MATLAB and Simulink Applications Speeding up MATLAB and Simulink Applications 2009 The MathWorks, Inc. Customer Tour 2009 Today s Schedule Introduction to Parallel Computing with MATLAB and Simulink Break Master Class on Speeding Up MATLAB

More information

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD yoel@emet.co.il

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD yoel@emet.co.il :Introducing Star-P The Open Platform for Parallel Application Development Yoel Jacobsen E&M Computing LTD yoel@emet.co.il The case for VHLLs Functional / applicative / very high-level languages allow

More information

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS DU-05349-001_v6.0 February 2014 Installation and Verification on TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2.

More information

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant Arvind.Hosagrahara@mathworks.

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant Arvind.Hosagrahara@mathworks. MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant Arvind.Hosagrahara@mathworks.com 310-819-3970 2014 The MathWorks, Inc. 1 Outline Problem Statement The Big Picture

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department

More information

ST810 Advanced Computing

ST810 Advanced Computing ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

Installation Guide. (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom

Installation Guide. (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom Installation Guide (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom Tel: +44 (0) 141 3322681 Fax: +44 (0) 141 3326792 www.mve.com Table of Contents 1.

More information

Origins, Evolution, and Future Directions of MATLAB Loren Shure

Origins, Evolution, and Future Directions of MATLAB Loren Shure Origins, Evolution, and Future Directions of MATLAB Loren Shure 2015 The MathWorks, Inc. 1 Agenda Origins Peaks 5 Evolution 0-5 Tomorrow 2 0 y -2-3 -2-1 x 0 1 2 3 2 Computational Finance Workflow Access

More information

10- High Performance Compu5ng

10- High Performance Compu5ng 10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Learn CUDA in an Afternoon: Hands-on Practical Exercises Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

CS1112 Spring 2014 Project 4. Objectives. 3 Pixelation for Identity Protection. due Thursday, 3/27, at 11pm

CS1112 Spring 2014 Project 4. Objectives. 3 Pixelation for Identity Protection. due Thursday, 3/27, at 11pm CS1112 Spring 2014 Project 4 due Thursday, 3/27, at 11pm You must work either on your own or with one partner. If you work with a partner you must first register as a group in CMS and then submit your

More information

Optimizing and interfacing with Cython. Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin)

Optimizing and interfacing with Cython. Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin) Optimizing and interfacing with Cython Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin) Extension modules Python permits modules to be written in C. Such modules

More information

Parallel Computing with Mathematica UVACSE Short Course

Parallel Computing with Mathematica UVACSE Short Course UVACSE Short Course E Hall 1 1 University of Virginia Alliance for Computational Science and Engineering uvacse@virginia.edu October 8, 2014 (UVACSE) October 8, 2014 1 / 46 Outline 1 NX Client for Remote

More information

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29. bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29. September 2010 Richling/Kredel (URZ/RUM) bwgrid Treff WS 2010/2011 1 / 25 Course

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing

More information

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,

More information

OPC COMMUNICATION IN REAL TIME

OPC COMMUNICATION IN REAL TIME OPC COMMUNICATION IN REAL TIME M. Mrosko, L. Mrafko Slovak University of Technology, Faculty of Electrical Engineering and Information Technology Ilkovičova 3, 812 19 Bratislava, Slovak Republic Abstract

More information

Case Study on Productivity and Performance of GPGPUs

Case Study on Productivity and Performance of GPGPUs Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

GeoImaging Accelerator Pansharp Test Results

GeoImaging Accelerator Pansharp Test Results GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27. Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

A quick tutorial on Intel's Xeon Phi Coprocessor

A quick tutorial on Intel's Xeon Phi Coprocessor A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be damien.francois@uclouvain.be Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

GPU for Scientific Computing. -Ali Saleh

GPU for Scientific Computing. -Ali Saleh 1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU

More information

GPGPU accelerated Computational Fluid Dynamics

GPGPU accelerated Computational Fluid Dynamics t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

More information

Clusters with GPUs under Linux and Windows HPC

Clusters with GPUs under Linux and Windows HPC Clusters with GPUs under Linux and Windows HPC Massimiliano Fatica (NVIDIA), Calvin Clark (Microsoft) Hillsborough Room Oct 2 2009 Agenda Overview Requirements for GPU Computing Linux clusters Windows

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, typeng2001@yahoo.com

More information

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which

More information

Packet-based Network Traffic Monitoring and Analysis with GPUs

Packet-based Network Traffic Monitoring and Analysis with GPUs Packet-based Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2014 March 24-27, 2014 SAN JOSE, CALIFORNIA Background Main

More information

Cross-Platform GP with Organic Vectory BV Project Services Consultancy Services Expertise Markets 3D Visualization Architecture/Design Computing Embedded Software GIS Finance George van Venrooij Organic

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

Operating Systems. Notice that, before you can run programs that you write in JavaScript, you need to jump through a few hoops first

Operating Systems. Notice that, before you can run programs that you write in JavaScript, you need to jump through a few hoops first Operating Systems Notice that, before you can run programs that you write in JavaScript, you need to jump through a few hoops first JavaScript interpreter Web browser menu / icon / dock??? login??? CPU,

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Introduction to MATLAB Gergely Somlay Application Engineer gergely.somlay@gamax.hu

Introduction to MATLAB Gergely Somlay Application Engineer gergely.somlay@gamax.hu Introduction to MATLAB Gergely Somlay Application Engineer gergely.somlay@gamax.hu 2012 The MathWorks, Inc. 1 What is MATLAB? High-level language Interactive development environment Used for: Numerical

More information

Solving Big Data Problems in Computer Vision with MATLAB Loren Shure

Solving Big Data Problems in Computer Vision with MATLAB Loren Shure Solving Big Data Problems in Computer Vision with MATLAB Loren Shure 2015 The MathWorks, Inc. 1 Why Are We Talking About Big Data? 100 hours of video uploaded to YouTube per minute 1 Explosive increase

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014 Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

International Engineering Journal For Research & Development

International Engineering Journal For Research & Development Evolution Of Operating System And Open Source Android Application Nilesh T.Gole 1, Amit Manikrao 2, Niraj Kanot 3,Mohan Pande 4 1,M.tech(CSE)JNTU, 2 M.tech(CSE)SGBAU, 3 M.tech(CSE),JNTU, Hyderabad 1 sheyanilu@gmail.com,

More information

HIGH PERFORMANCE BIG DATA ANALYTICS

HIGH PERFORMANCE BIG DATA ANALYTICS HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning

More information

Parallel and Distributed Computing Programming Assignment 1

Parallel and Distributed Computing Programming Assignment 1 Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014 HPC Cluster Decisions and ANSYS Configuration Best Practices Diana Collier Lead Systems Support Specialist Houston UGM May 2014 1 Agenda Introduction Lead Systems Support Specialist Cluster Decisions Job

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University

More information

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014 Debugging in Heterogeneous Environments with TotalView ECMWF HPC Workshop 30 th October 2014 Agenda Introduction Challenges TotalView overview Advanced features Current work and future plans 2014 Rogue

More information

supercomputing. simplified.

supercomputing. simplified. supercomputing. simplified. INTRODUCING WINDOWS HPC SERVER 2008 R2 SUITE Windows HPC Server 2008 R2, Microsoft s third-generation HPC solution, provides a comprehensive and costeffective solution for harnessing

More information

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur 2015 The MathWorks, Inc. 1 Model-Based Design Continuous Verification and Validation Requirements

More information

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,

More information

CUDAMat: a CUDA-based matrix class for Python

CUDAMat: a CUDA-based matrix class for Python Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based

More information

OpenCL Programming for the CUDA Architecture. Version 2.3

OpenCL Programming for the CUDA Architecture. Version 2.3 OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different

More information

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information