Sélection adaptative de codes polyédriques pour GPU/CPU

Size: px
Start display at page:

Download "Sélection adaptative de codes polyédriques pour GPU/CPU"

Transcription

1 Sélection adaptative de codes polyédriques pour GPU/CPU Jean-François DOLLINGER, Vincent LOECHNER, Philippe CLAUSS INRIA - Équipe CAMUS Université de Strasbourg Saint-Hippolyte - Le 6 décembre

2 Sommaire 1 Introduction 2 Framework Pedro-CUDA Génération de code CUDA Profilage Prédiction 3 Expériences 4 Conclusion 2

3 Introduction Framework Pedro [Pradelle et al. 2011] Multi-versioning de nids de boucles Cible : CPU multi-cœurs Sélection de code à la volée Génération de code statique Profilage hors-ligne Sélection en ligne 3

4 Introduction Table de classement Nombre de coeurs version 1 version 2 version ms 55 ms 32 ms 2 32 ms 28 ms 17 ms 3 22 ms 15 ms 9 ms 4 14 ms 7 ms 8 ms Calcul du temps d exécution cnt = {2000, 600, 300, 300} tpred = (( ) 40) + (( ) 32) + (0 22) + (300 14) = 69800ms 4

5 Introduction Idée : Framework Pedro-CUDA Extension aux GPU et CPU + GPU Coûts de transferts Temps de calculs (difficilement prédictibles) HMPP prévoit des optimisations Fortement couplées à l architecture cible Exemple de directives HMPP #pragma hmpp... cond= (N > 1000) 5

6 Introduction Phase de compilation Code source Compilateur Versions de code CPU + GPU Phase de profilage Profileur Table de classement et table de débits Phase de sélection Contexte d'exécution Prédicteur Code binaire de l'application Version Version sélectionnée Version M 6

7 Introduction Architecture GPU Modèle mémoire Mémoire globale Mémoire partagée Registres Mémoires constantes et textures Modèle processeur Grille de blocs de threads GPU exécutent des grilles Streaming Multiprocessors exécutent des blocs (SIMD) Streaming Processors exécutent des threads 7

8 Sommaire 1 Introduction 2 Framework Pedro-CUDA Génération de code CUDA Profilage Prédiction 3 Expériences 4 Conclusion 8

9 Framework Pedro-CUDA Génération de code CUDA PLuTo [Bondhugula et al. 2008, Baskaran et al. 2008] Générateur C-vers-CUDA non fonctionnel Assertions invalides Configuration de l exécution grossière Validité du code produit? Solution : Adapter un code CPU Génération par des scripts Python Code kernel Code host Version de code Fichier CLooG Dimensions parallèles et tailles de blocs 9

10 Framework Pedro-CUDA Génération de code CUDA Code host Boucles externes séquentielles copiées Appels kernels synchrones Transferts de données à l aide de cudamemcpy() Code device Boucles parallèles Transformées en gardes Mappées sur la grille de threads CUDA Identifiant de thread affecté à l indice original Boucles internes copiées dans le kernel 10

11 Framework Pedro-CUDA Génération de code CUDA version host kernel for(t1...) for(t2...) //parallel for(t3...) //parallel for(t4...) for(t5...) S; cudamemcpy(h2d) for(t1...) { cudamemcpy(h2d) kernel<<...>>(); cudamemcpy(d2h) } t2 = f(threadidx.x); t3 = f(threadidx.y); if(t2...) if(t3...) for(t4...) for(t5...) S; 11

12 Sommaire 1 Introduction 2 Framework Pedro-CUDA Génération de code CUDA Profilage Prédiction 3 Expériences 4 Conclusion 12

13 Framework Pedro-CUDA Profilage Transferts mémoire host device Mesures pour certaines tailles Interpolation à l exécution 6000 Memory copies between host and device - Nvidia Quadro Bandwidth (MB/s) log10(message size (bytes)) cudamemcpy(cudamemcpyhosttodevice) cudamemcpy(cudamemcpydevicetohost) 13

14 Framework Pedro-CUDA Profilage Évaluation offline du code Exécuté sur la machine cible Calcul des paramètres des boucles parallèles nok = t r u e w h i l e ( nok ) { // a j u s t e m e n t des p a r a m e t r e s 1 a d j u s t p a r a m s (+1); f o r e a c h ( p a r a l l e l d i m ) { i t = e h r h a r t ( p a r a l l e l d i m ) i f ( i t > g r i d s z ( p a r a l l e l d i m ) ) { nok = nok & f a l s e ; } e l s e { nok = nok & true ; } } } adjust params ( 1); 14

15 Framework Pedro-CUDA Profilage Calcul du temps par itération f o r ( ; ; ) { // a j u s t e m e n t p a r a m e t r e s 2 a d j u s t p a r a m s ( ) ; // i n i t i a l i s a t i o n des p o i n t e u r s i n i t p o i n t e r s ( ) ; t a b s = gettime ( ) k e r n e l <<b l o c k s z, g r i d s z >>(); t abs = gettime ( ) t abs ; n b i t e r = e h r h a r t ( ) ; t i t = t a b s / n b i t e r ; } i f ( measure ok ( t a b s ) ) { // mesure f i a b l e? r e t u r n t i t ; } 15

16 Framework Pedro-CUDA Profilage Temps d exécution par itération Kernel launch (matmul no optimization on 2D grid) 8x2 blocks 8x4 blocks 8x8 blocks 16x4 blocks 16x16 blocks log10(time (ns)) per iteration number of blocks 16

17 Framework Pedro-CUDA Profilage Table de classement Temps d exécution par itération Interpolation constante après le seuil Mesures Nombre de blocs incrémenté de 1 Arrêt à un seuil arbitraire Table de bande passante Bande passante host device Mesures Taille des données transférées multipliée par deux Jusqu à saturation de la mémoire disponible 17

18 Sommaire 1 Introduction 2 Framework Pedro-CUDA Génération de code CUDA Profilage Prédiction 3 Expériences 4 Conclusion 18

19 Framework Pedro-CUDA Prédiction Nids de prédiction Nids de boucles simplifiés Temps d exécution et temps de transfert totaux approximés f o r e a c h ( v e r s i o n ) { // e s t i m a t i o n du temps de t r a n s f e r t pour l e n i d de b o u c l e s // host to d e v i c e f o r ( t1... ) { // e s t i m a t i o n du temps de t r a n s f e r t par a p p e l k e r n e l // host to d e v i c e e t d e v i c e to h o s t } } // e s t i m a t i o n du temps d e x e c u t i o n du k e r n e l 19

20 Framework Pedro-CUDA Prédiction Exécution de la version estimée être la meilleure Hopla Geiss! 20

21 Sommaire 1 Introduction 2 Framework Pedro-CUDA Génération de code CUDA Profilage Prédiction 3 Expériences 4 Conclusion 21

22 Expériences Prédiction des temps d exécution Time in seconds matmul4 matmul3 matmul2 matmul1 gemm4 gemm3 gemm2 gemm1 covariance3 covariance2 covariance1 mat-init2 mat-init1 predicted time execution time 22

23 Expériences Overhead de la prédiction 1.6e e e-05 Time in seconds 1e-05 8e-06 6e-06 4e-06 2e-06 matmul4 matmul3 matmul2 matmul1 gemm4 gemm3 gemm2 gemm1 covariance4 covariance3 covariance2 covariance1 0 matinit2 matinit1 overhead 23

24 Sommaire 1 Introduction 2 Framework Pedro-CUDA Génération de code CUDA Profilage Prédiction 3 Expériences 4 Conclusion 24

25 Conclusion Travail réalisé Ce qui a été fait Méthode de prédiction pour GPU et CPU + GPU Génération de code avec scripts python Codes de profilage et de prédiction Ce qui marche déjà Profilage et prédiction 25

26 Conclusion Perspectives Générateur CUDA Gagner en flexibilité Utilisation de codes adaptés aux GPU Intégration CPU vs GPU Choix d une version CPU si non-performante sur GPU Expériences et consolidation de la méthode Amélioration du profilage 26

27 Sélection adaptative de codes polyédriques pour GPU/CPU Jean-François DOLLINGER, Vincent LOECHNER, Philippe CLAUSS INRIA - Équipe CAMUS Université de Strasbourg Saint-Hippolyte - Le 6 décembre

28 Périphérique Stream multiprocessor N Stream multiprocessor 2 Stream multiprocessor 1 Registres Registres Registres Mémoire de constantes Mémoire de textures Mémoire globale 28

29 y blockdim.x griddim.y Block (0, 2) Block (1, 2) blockdim.y threadidx.y blockidx.y Block (0, 0) Block (1, 0) blockidx.x griddim.x x threadidx.x 29

1-20020138637 26-sept-2002 Computer architecture and software cells for broadband networks Va avec 6526491

1-20020138637 26-sept-2002 Computer architecture and software cells for broadband networks Va avec 6526491 Les brevets CELL 14 décembre 2006 1 ARCHITECTURE GENERALE 1-20020138637 26-sept-2002 Computer architecture and software cells for broadband networks 6526491 2-6526491 25-févr-03 Memory protection system

More information

CUDA Basics. Murphy Stein New York University

CUDA Basics. Murphy Stein New York University CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture

More information

Retour vers le futur des bibliothèques de squelettes algorithmiques et DSL

Retour vers le futur des bibliothèques de squelettes algorithmiques et DSL Retour vers le futur des bibliothèques de squelettes algorithmiques et DSL Sylvain Jubertie sylvain.jubertie@lri.fr Journée LaMHA - 26/11/2015 Squelettes algorithmiques 2 / 29 Squelettes algorithmiques

More information

Sur 1 Bit bit n 4 bit n 3 bit n 2 bit n 1 bit n 4 bit n 3 bit n 2 bit n 1 bit n 4 bit n 3 bit n 2 bit n 1

Sur 1 Bit bit n 4 bit n 3 bit n 2 bit n 1 bit n 4 bit n 3 bit n 2 bit n 1 bit n 4 bit n 3 bit n 2 bit n 1 Si on a 1 QUARTET Si on a 2 QUARTETS QUARTET Faible Poids QUARTET Fort Poids QUARTET Faible Poids Sur 1 Bit bit n 4 bit n 3 bit n 2 bit n 1 bit n 4 bit n 3 bit n 2 bit n 1 bit n 4 bit n 3 bit n 2 bit n

More information

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Learn CUDA in an Afternoon: Hands-on Practical Exercises Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA

More information

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Amin Safi Faculty of Mathematics, TU dortmund January 22, 2016 Table of Contents Set

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Personnalisez votre intérieur avec les revêtements imprimés ALYOS design

Personnalisez votre intérieur avec les revêtements imprimés ALYOS design Plafond tendu à froid ALYOS technology ALYOS technology vous propose un ensemble de solutions techniques pour vos intérieurs. Spécialiste dans le domaine du plafond tendu, nous avons conçu et développé

More information

1. If we need to use each thread to calculate one output element of a vector addition, what would

1. If we need to use each thread to calculate one output element of a vector addition, what would Quiz questions Lecture 2: 1. If we need to use each thread to calculate one output element of a vector addition, what would be the expression for mapping the thread/block indices to data index: (A) i=threadidx.x

More information

Hands-on CUDA exercises

Hands-on CUDA exercises Hands-on CUDA exercises CUDA Exercises We have provided skeletons and solutions for 6 hands-on CUDA exercises In each exercise (except for #5), you have to implement the missing portions of the code Finished

More information

Remote Method Invocation

Remote Method Invocation 1 / 22 Remote Method Invocation Jean-Michel Richer jean-michel.richer@univ-angers.fr http://www.info.univ-angers.fr/pub/richer M2 Informatique 2010-2011 2 / 22 Plan Plan 1 Introduction 2 RMI en détails

More information

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Amol

More information

Langages Orientés Objet Java

Langages Orientés Objet Java Langages Orientés Objet Java Exceptions Arnaud LANOIX Université Nancy 2 24 octobre 2006 Arnaud LANOIX (Université Nancy 2) Langages Orientés Objet Java 24 octobre 2006 1 / 32 Exemple public class Example

More information

Advanced CUDA Webinar. Memory Optimizations

Advanced CUDA Webinar. Memory Optimizations Advanced CUDA Webinar Memory Optimizations Outline Overview Hardware Memory Optimizations Data transfers between host and device Device memory optimizations Summary Measuring performance effective bandwidth

More information

Introduction au BIM. ESEB 38170 Seyssinet-Pariset Economie de la construction email : contact@eseb.fr

Introduction au BIM. ESEB 38170 Seyssinet-Pariset Economie de la construction email : contact@eseb.fr Quel est l objectif? 1 La France n est pas le seul pays impliqué 2 Une démarche obligatoire 3 Une organisation plus efficace 4 Le contexte 5 Risque d erreur INTERVENANTS : - Architecte - Économiste - Contrôleur

More information

Sun Management Center Change Manager 1.0.1 Release Notes

Sun Management Center Change Manager 1.0.1 Release Notes Sun Management Center Change Manager 1.0.1 Release Notes Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817 0891 10 May 2003 Copyright 2003 Sun Microsystems, Inc. 4150

More information

SunFDDI 6.0 on the Sun Enterprise 10000 Server

SunFDDI 6.0 on the Sun Enterprise 10000 Server SunFDDI 6.0 on the Sun Enterprise 10000 Server Sun Microsystems, Inc. 901 San Antonio Road Palo Alto, CA 94303-4900 USA 650 960-1300 Fax 650 969-9131 Part No.: 806-3610-11 November 1999, Revision A Send

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

Calcul parallèle avec R

Calcul parallèle avec R Calcul parallèle avec R ANF R Vincent Miele CNRS 07/10/2015 Pour chaque exercice, il est nécessaire d ouvrir une fenètre de visualisation des processes (Terminal + top sous Linux et Mac OS X, Gestionnaire

More information

Parallel Discrepancy-based Search

Parallel Discrepancy-based Search Parallel Discrepancy-based Search T. Moisan, J. Gaudreault, C.-G. Quimper Université Laval, FORAC research consortium February 21 th 2014 T. Moisan, J. Gaudreault, C.-G. Quimper Parallel Discrepancy-based

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology Parallel Reduction Common and important data parallel primitive Easy to implement in CUDA Harder to get it right Serves as

More information

Solaris 10 Documentation README

Solaris 10 Documentation README Solaris 10 Documentation README Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817 0550 10 January 2005 Copyright 2005 Sun Microsystems, Inc. 4150 Network Circle, Santa

More information

"Premiers résultats sur le portage de NHOES (NonHydrostatic Ocean model for Earth Simulator) en GPU, en utilisant HMPP"

Premiers résultats sur le portage de NHOES (NonHydrostatic Ocean model for Earth Simulator) en GPU, en utilisant HMPP "Premiers résultats sur le portage de NHOES (NonHydrostatic Ocean model for Earth Simulator) en GPU, en utilisant HMPP" Guillaume PESQUET(ENSIETA) Tina ODAKA (IFREMER) NHOES NonHydrostatic Ocean model

More information

Note concernant votre accord de souscription au service «Trusted Certificate Service» (TCS)

Note concernant votre accord de souscription au service «Trusted Certificate Service» (TCS) Note concernant votre accord de souscription au service «Trusted Certificate Service» (TCS) Veuillez vérifier les éléments suivants avant de nous soumettre votre accord : 1. Vous avez bien lu et paraphé

More information

Cours de Java. Sciences-U Lyon. Java - Introduction Java - Fondamentaux Java Avancé. http://www.rzo.free.fr

Cours de Java. Sciences-U Lyon. Java - Introduction Java - Fondamentaux Java Avancé. http://www.rzo.free.fr Cours de Java Sciences-U Lyon Java - Introduction Java - Fondamentaux Java Avancé http://www.rzo.free.fr Pierre PARREND 1 Octobre 2004 Sommaire Java Introduction Java Fondamentaux Java Avancé GUI Graphical

More information

Introduction to CUDA C

Introduction to CUDA C Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU computing as first-class capability Retain traditional DirectX/OpenGL graphics performance CUDA C Based on industry-standard

More information

Optimizing and interfacing with Cython. Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin)

Optimizing and interfacing with Cython. Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin) Optimizing and interfacing with Cython Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin) Extension modules Python permits modules to be written in C. Such modules

More information

Audit de sécurité avec Backtrack 5

Audit de sécurité avec Backtrack 5 Audit de sécurité avec Backtrack 5 DUMITRESCU Andrei EL RAOUSTI Habib Université de Versailles Saint-Quentin-En-Yvelines 24-05-2012 UVSQ - Audit de sécurité avec Backtrack 5 DUMITRESCU Andrei EL RAOUSTI

More information

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA CUDA Optimization with NVIDIA Tools Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nvidia Tools 2 What Does the Application

More information

Introduction ToIP/Asterisk Quelques applications Trixbox/FOP Autres distributions Conclusion. Asterisk et la ToIP. Projet tuteuré

Introduction ToIP/Asterisk Quelques applications Trixbox/FOP Autres distributions Conclusion. Asterisk et la ToIP. Projet tuteuré Asterisk et la ToIP Projet tuteuré Luis Alonso Domínguez López, Romain Gegout, Quentin Hourlier, Benoit Henryon IUT Charlemagne, Licence ASRALL 2008-2009 31 mars 2009 Asterisk et la ToIP 31 mars 2009 1

More information

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Jinglin Zhang, Jean François Nezan, Jean-Gabriel Cousin, Erwan Raffin To cite this version: Jinglin Zhang,

More information

OpenCL. Administrivia. From Monday. Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011. Assignment 5 Posted. Project

OpenCL. Administrivia. From Monday. Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011. Assignment 5 Posted. Project Administrivia OpenCL Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 5 Posted Due Friday, 03/25, at 11:59pm Project One page pitch due Sunday, 03/20, at 11:59pm 10 minute pitch

More information

Optimizing Solaris Resources Through Load Balancing

Optimizing Solaris Resources Through Load Balancing Optimizing Solaris Resources Through Load Balancing By Tom Bialaski - Enterprise Engineering Sun BluePrints Online - June 1999 http://www.sun.com/blueprints Sun Microsystems, Inc. 901 San Antonio Road

More information

Annual Event 2016 Workshop New to Interreg, where to begin? Évènement annuel 2016 Atelier «Interreg pour les débutants, par où commencer?

Annual Event 2016 Workshop New to Interreg, where to begin? Évènement annuel 2016 Atelier «Interreg pour les débutants, par où commencer? Annual Event 2016 Workshop New to Interreg, where to begin? Évènement annuel 2016 Atelier «Interreg pour les débutants, par où commencer?» Contents 1. Why get involved in an Interreg project? 1. Pourquoi

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

Machine de Soufflage defibre

Machine de Soufflage defibre Machine de Soufflage CABLE-JET Tube: 25 à 63 mm Câble Fibre Optique: 6 à 32 mm Description générale: La machine de soufflage parfois connu sous le nom de «câble jet», comprend une chambre d air pressurisé

More information

Distributed Application Management using Jini Connection Technology

Distributed Application Management using Jini Connection Technology Distributed Application Management using Jini Connection Technology The Jini Technology Enabled Applications Manager provides a framework, utilizing Jini and JavaSpaces technologies, for selecting and

More information

CUDA Programming. Week 4. Shared memory and register

CUDA Programming. Week 4. Shared memory and register CUDA Programming Week 4. Shared memory and register Outline Shared memory and bank confliction Memory padding Register allocation Example of matrix-matrix multiplication Homework SHARED MEMORY AND BANK

More information

Image Processing & Video Algorithms with CUDA

Image Processing & Video Algorithms with CUDA Image Processing & Video Algorithms with CUDA Eric Young & Frank Jargstorff 8 NVIDIA Corporation. introduction Image processing is a natural fit for data parallel processing Pixels can be mapped directly

More information

POB-JAVA Documentation

POB-JAVA Documentation POB-JAVA Documentation 1 INTRODUCTION... 4 2 INSTALLING POB-JAVA... 5 Installation of the GNUARM compiler... 5 Installing the Java Development Kit... 7 Installing of POB-Java... 8 3 CONFIGURATION... 9

More information

Parallel Computing in Python: multiprocessing. Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin)

Parallel Computing in Python: multiprocessing. Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin) Parallel Computing in Python: multiprocessing Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin) Parallel computing: Theory Parallel computers Multiprocessor/multicore:

More information

"Internationalization vs. Localization: The Translation of Videogame Advertising"

Internationalization vs. Localization: The Translation of Videogame Advertising Article "Internationalization vs. Localization: The Translation of Videogame Advertising" Raquel de Pedro Ricoy Meta : journal des traducteurs / Meta: Translators' Journal, vol. 52, n 2, 2007, p. 260-275.

More information

Technical Service Bulletin

Technical Service Bulletin Technical Service Bulletin FILE CONTROL CREATED DATE MODIFIED DATE FOLDER OpenDrive 02/05/2005 662-02-25008 Rev. : A Installation Licence SCO sur PC de remplacement English version follows. Lors du changement

More information

Sun Grid Engine 5.2.3 Release Notes

Sun Grid Engine 5.2.3 Release Notes Sun Grid Engine 5.2.3 Release Notes Sun Microsystems, Inc. 901 San Antonio Road Palo Alto, CA 94303-4900 U.S.A. 650-960-1300 Part No. 816-2082-10 July 2001 For more information, go to http://www.sun.com/gridware

More information

Memo bconsole. Memo bconsole

Memo bconsole. Memo bconsole Memo bconsole Page 1 / 24 Version du 10 Avril 2015 Page 2 / 24 Version du 10 Avril 2015 Sommaire 1 Voir les différentes travaux effectués par bacula3 11 Commande list jobs 3 12 Commande sqlquery 3 2 Afficher

More information

Memory Eye SSTIC 2011. Yoann Guillot. Sogeti / ESEC R&D yoann.guillot(at)sogeti.com

Memory Eye SSTIC 2011. Yoann Guillot. Sogeti / ESEC R&D yoann.guillot(at)sogeti.com Memory Eye SSTIC 2011 Yoann Guillot Sogeti / ESEC R&D yoann.guillot(at)sogeti.com Y. Guillot Memory Eye 2/33 Plan 1 2 3 4 Y. Guillot Memory Eye 3/33 Memory Eye Analyse globale d un programme Un outil pour

More information

niveau : 1 ere année spécialité : mécatronique & froid et climatisation AU : 2014-2015 Programmation C Travaux pratiques

niveau : 1 ere année spécialité : mécatronique & froid et climatisation AU : 2014-2015 Programmation C Travaux pratiques École Supérieure Privée d Ingénieurs de Monastir niveau : 1 ere année spécialité : mécatronique & froid et climatisation AU : 2014-2015 Programmation C Travaux pratiques Correction Exercice 1 TP3 long

More information

Troncatures dans les modèles linéaires simples et à effets mixtes sous R

Troncatures dans les modèles linéaires simples et à effets mixtes sous R Troncatures dans les modèles linéaires simples et à effets mixtes sous R Lyon, 27 et 28 juin 2013 D.Thiam, J.C Thalabard, G.Nuel Université Paris Descartes MAP5 UMR CNRS 8145 IRD UMR 216 2èmes Rencontres

More information

Calcul Parallèle sous MATLAB

Calcul Parallèle sous MATLAB Calcul Parallèle sous MATLAB Journée Calcul Parallèle GPU/CPU du PEPI MACS Olivier de Mouzon INRA Gremaq Toulouse School of Economics Lundi 28 novembre 2011 Paris Présentation en grande partie fondée sur

More information

Vincent Rullier Technology specialist Microsoft Suisse Romande

Vincent Rullier Technology specialist Microsoft Suisse Romande Vincent Rullier Technology specialist Microsoft Suisse Romande Pourquoi virtualiser Différents types de virtualisation Présentation Applications Postes de travail Serveurs Bénéfices Conclusion Q&A Technology

More information

Short Form Description / Sommaire: Carrying on a prescribed activity without or contrary to a licence

Short Form Description / Sommaire: Carrying on a prescribed activity without or contrary to a licence NOTICE OF VIOLATION (Corporation) AVIS DE VIOLATION (Société) Date of Notice / Date de l avis: August 29, 214 AMP Number / Numéro de SAP: 214-AMP-6 Violation committed by / Violation commise par : Canadian

More information

Guide Share France Groupe de Travail MQ sept 2013

Guide Share France Groupe de Travail MQ sept 2013 Guide Share France Groupe de Travail MQ sept 2013 Carl Farkas Pan-EMEA zwebsphere Application Integration Consultant IBM France D/2708 Paris, France Internet : farkas@fr.ibm.com 2013 IBM Corporation p1

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Thursday, February 7, 2013. DOM via PHP

Thursday, February 7, 2013. DOM via PHP DOM via PHP Plan PHP DOM PHP : Hypertext Preprocessor Langage de script pour création de pages Web dynamiques Un ficher PHP est un ficher HTML avec du code PHP

More information

Bac + 04 Licence en science commerciale, option marketing et communication. Degree in computer science, engineering or equivalent

Bac + 04 Licence en science commerciale, option marketing et communication. Degree in computer science, engineering or equivalent L un de ces postes vous intéresse? Postulez sur djezzy@talents-network.com Communication Brand senior manager Bac + 04 Licence en science commerciale, option marketing et communication. 05 years minimum

More information

Sun StorEdge A5000 Installation Guide

Sun StorEdge A5000 Installation Guide Sun StorEdge A5000 Installation Guide for Windows NT Server 4.0 Sun Microsystems, Inc. 901 San Antonio Road Palo Alto, CA 94303-4900 USA 650 960-1300 Fax 650 969-9131 Part No. 805-7273-11 October 1998,

More information

N1 Grid Service Provisioning System 5.0 User s Guide for the Linux Plug-In

N1 Grid Service Provisioning System 5.0 User s Guide for the Linux Plug-In N1 Grid Service Provisioning System 5.0 User s Guide for the Linux Plug-In Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 819 0735 December 2004 Copyright 2004 Sun Microsystems,

More information

Support for a reconfiguration DSL in highly constrained embedded systems. Juraj Polakovic, Sebastien Mazaré, Jean-Bernard Stefani Séminaire SARDES

Support for a reconfiguration DSL in highly constrained embedded systems. Juraj Polakovic, Sebastien Mazaré, Jean-Bernard Stefani Séminaire SARDES Support for a reconfiguration DSL in highly constrained embedded systems Juraj Polakovic, Sebastien Mazaré, Jean-Bernard Stefani Séminaire SARDES Dynamic Reconfiguration in constrained embedded systems

More information

CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University

CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University CUDA Debugging GPGPU Workshop, August 2012 Sandra Wienke Center for Computing and Communication, RWTH Aachen University Nikolay Piskun, Chris Gottbrath Rogue Wave Software Rechen- und Kommunikationszentrum

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @

More information

GSAC CONSIGNE DE NAVIGABILITE définie par la DIRECTION GENERALE DE L AVIATION CIVILE Les examens ou modifications décrits ci-dessous sont impératifs. La non application des exigences contenues dans cette

More information

Solaris Resource Manager

Solaris Resource Manager Solaris Resource Manager By Richard McDougall - Enterprise Engineering Sun BluePrints OnLine - April 1999 http://www.sun.com/blueprints Sun Microsystems, Inc. 901 San Antonio Road Palo Alto, CA 94303 USA

More information

Licence Informatique Année 2005-2006. Exceptions

Licence Informatique Année 2005-2006. Exceptions Université Paris 7 Java Licence Informatique Année 2005-2006 TD n 8 - Correction Exceptions Exercice 1 La méthode parseint est spécifiée ainsi : public static int parseint(string s) throws NumberFormatException

More information

L'Innovation en quittant prématurément l'école

L'Innovation en quittant prématurément l'école L'Innovation en quittant prématurément l'école UK/13/LLP-LdV/TOI-658 1 Information sur le projet Titre: Code Projet: Année: 2013 Type de Projet: Statut: Accroche marketing: L'Innovation en quittant prématurément

More information

Méthodes ensemblistes pour une localisation robuste de robots sous-marins

Méthodes ensemblistes pour une localisation robuste de robots sous-marins Méthodes ensemblistes pour une localisation robuste de robots sous-marins Jan Sliwka To cite this version: Jan Sliwka. Méthodes ensemblistes pour une localisation robuste de robots sous-marins. Automatique

More information

Stockage distribué sous Linux

Stockage distribué sous Linux Félix Simon Ludovic Gauthier IUT Nancy-Charlemagne - LP ASRALL Mars 2009 1 / 18 Introduction Répartition sur plusieurs machines Accessibilité depuis plusieurs clients Vu comme un seul et énorme espace

More information

Optimizing Application Performance with CUDA Profiling Tools

Optimizing Application Performance with CUDA Profiling Tools Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory

More information

Administrer les solutions Citrix XenApp et XenDesktop 7.6 CXD-203

Administrer les solutions Citrix XenApp et XenDesktop 7.6 CXD-203 Administrer les solutions Citrix XenApp XenDesktop 7.6 CXD-203 MIEL Centre Agréé : N 11 91 03 54 591 Pour contacter le service formation : 01 60 19 16 27 Pour consulter le planning des formations : www.miel.fr/formation

More information

A Strategy for Managing Performance

A Strategy for Managing Performance A Strategy for Managing Performance John Brady, Sun Professional Services Sun BluePrints OnLine December 2002 http://www.sun.com/blueprints Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95045

More information

AD511 Active Iridium Antenna User Manual Mar 12 V4.0

AD511 Active Iridium Antenna User Manual Mar 12 V4.0 AD511 Active Iridium Antenna User Manual Mar 12 V4.0 AD511 Active Iridium Transmitter/Receiver Antenna with up to 160 metres of coaxial down-lead and DC Power Break-In Box for Iridium Satellite Systems

More information

BILL C-665 PROJET DE LOI C-665 C-665 C-665 HOUSE OF COMMONS OF CANADA CHAMBRE DES COMMUNES DU CANADA

BILL C-665 PROJET DE LOI C-665 C-665 C-665 HOUSE OF COMMONS OF CANADA CHAMBRE DES COMMUNES DU CANADA C-665 C-665 Second Session, Forty-first Parliament, Deuxième session, quarante et unième législature, HOUSE OF COMMONS OF CANADA CHAMBRE DES COMMUNES DU CANADA BILL C-665 PROJET DE LOI C-665 An Act to

More information

GPU Accelerated Monte Carlo Simulations and Time Series Analysis

GPU Accelerated Monte Carlo Simulations and Time Series Analysis GPU Accelerated Monte Carlo Simulations and Time Series Analysis Institute of Physics, Johannes Gutenberg-University of Mainz Center for Polymer Studies, Department of Physics, Boston University Artemis

More information

Liste d'adresses URL

Liste d'adresses URL Liste de sites Internet concernés dans l' étude Le 25/02/2014 Information à propos de contrefacon.fr Le site Internet https://www.contrefacon.fr/ permet de vérifier dans une base de donnée de plus d' 1

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Sun Management Center 3.5 Update 1b Release Notes

Sun Management Center 3.5 Update 1b Release Notes Sun Management Center 3.5 Update 1b Release Notes Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 819 3054 10 June 2005 Copyright 2005 Sun Microsystems, Inc. 4150 Network

More information

Accelerating 3D Cellular automata computation with GP-GPU in the context of integrative biology

Accelerating 3D Cellular automata computation with GP-GPU in the context of integrative biology Accelerating 3D Cellular automata computation with GP-GPU in the context of integrative biology Jonathan Caux 1 David Hill 2 Pridi Siregar 3 Research Report LIMOS/RR-10-10 29 avril 2010 1 LIMOS-ISIMA,

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

SCATTERED DATA VISUALIZATION USING GPU. A Thesis. Presented to. The Graduate Faculty of The University of Akron. In Partial Fulfillment

SCATTERED DATA VISUALIZATION USING GPU. A Thesis. Presented to. The Graduate Faculty of The University of Akron. In Partial Fulfillment SCATTERED DATA VISUALIZATION USING GPU A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Bo Cai May, 2015

More information

Relevé des frais annuels de l entente de licence destinée à l enseignement primaire et secondaire MAI 2016

Relevé des frais annuels de l entente de licence destinée à l enseignement primaire et secondaire MAI 2016 Relevé des frais annuels de l entente de licence destinée à l enseignement primaire et secondaire MAI 2016 Les Informations de Programme scolaire Relient DIRECTIVES SUR LA SÉLECTION DE PRODUIT ET LE CALCUL

More information

Open call for tenders n SCIC C4 2014/01

Open call for tenders n SCIC C4 2014/01 EUROPEAN COMMISSION DIRECTORATE GENERAL FOR INTERPRETATION RESOURCES AND SUPPORT DIRECTORATE Open call for tenders n SCIC C4 2014/01 Accident and sickness insurance for Conference Interpreting Agents Questions

More information

ARE NEW OIL PIPELINES AND TANKER FACILITIES VIABLE IN CANADA?

ARE NEW OIL PIPELINES AND TANKER FACILITIES VIABLE IN CANADA? ARE NEW OIL PIPELINES AND TANKER FACILITIES VIABLE IN CANADA? Research Report June 11, 2013 Innovative Research Group, Inc. www.innovativeresearch.ca Toronto : Calgary : Vancouver Toronto 56 The Esplanade

More information

VETERINARY HEALTH CERTIFICATE EXPORT OF MAMMALS (except Rodents and Lagomorpha) TO JAPAN

VETERINARY HEALTH CERTIFICATE EXPORT OF MAMMALS (except Rodents and Lagomorpha) TO JAPAN NAME OF GOVERNMENT AUTHORITIES: I. ORIGIN OF ANIMALS II. III. IV. Name of consignor: DESTINATION OF ANIMALS Name of Consignee : VETERINARY HEALTH CERTIFICATE EXPORT OF MAMMALS (except Rodents and Lagomorpha)

More information

Lecture 1: an introduction to CUDA

Lecture 1: an introduction to CUDA Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Overview hardware view software view CUDA programming

More information

TP1 : Correction. Rappels : Stream, Thread et Socket TCP

TP1 : Correction. Rappels : Stream, Thread et Socket TCP Université Paris 7 M1 II Protocoles réseaux TP1 : Correction Rappels : Stream, Thread et Socket TCP Tous les programmes seront écrits en Java. 1. (a) Ecrire une application qui lit des chaines au clavier

More information

GPU Computing with CUDA Lecture 3 - Efficient Shared Memory Use. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 3 - Efficient Shared Memory Use. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 3 - Efficient Shared Memory Use Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 2 Shared memory in detail

More information

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015 INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale

More information

GPGPU Parallel Merge Sort Algorithm

GPGPU Parallel Merge Sort Algorithm GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

Dynamic Case-Based Reasoning Based on the Multi-Agent Systems: Individualized Follow-Up of Learners in Distance Learning

Dynamic Case-Based Reasoning Based on the Multi-Agent Systems: Individualized Follow-Up of Learners in Distance Learning Dynamic Case-Based Reasoning Based on the Multi-Agent Systems: Individualized Follow-Up of Learners in Distance Learning 1, 2 A. Zouhair, 1 E. M. En-Naimi, 1 B. Amami, 2 H. Boukachour, 2 P. Person, 2 C.

More information

Guidance on Extended Producer Responsibility (EPR) Analysis of EPR schemes in the EU and development of guiding principles for their functioning

Guidance on Extended Producer Responsibility (EPR) Analysis of EPR schemes in the EU and development of guiding principles for their functioning (EPR) Analysis of in the EU and development of guiding principles for their functioning In association with: ACR+ SITA LUNCH DEBATE 25 September 2014 Content 1. Objectives and 2. General overview of in

More information