Code generation under Control
|
|
|
- Maryann Wright
- 10 years ago
- Views:
Transcription
1 Code generation under Control Rencontres sur la compilation / Saint Hippolyte Henri-Pierre Charles CEA Laboratoire LaSTRE / Grenoble 12 décembre 2011
2 Introduction Présentation Henri-Pierre Charles, two lines CV : CEA/DRT/DACLE/LIST/LaSTRE CRI PILSI context at Gières : assistant professor in Université of Versailles Saint-Quentin en Yvelines, PRiSM laboratory, IUT de Vélizy Keywords : Architecture, HPC, Compiler backend, Parallelism (ILP, Multimedia, Caches) 6809, 68000, i860, trimedia, Itanium, Power, CELL, ARM, MEPHISTO, other GCC, LLVM, FFTW, H264, Spiral, ATLAS, MESA3D, other 3D Image reconstruction, Z-buffer, Video Compression, FFTW, QCD Henri-Pierre Charles Code generation under Control 10 / 10011
3 Introduction CEA / CRI PILSI CEA : Commissariat à l'énergie Atomique et aux Énergies Alternatives DAM : Direction des Applications Militaires DEN : Direction de l'énergie Nucléaire DRT : Direction de la Recherche Technologique DSM : Direction des Sciences de la Matière DSV : Direction des Sciences du Vivant LIST : Laboratoire Intégration des Systèmes et des Technologies SACLAY LETI : Laboratoire Électronique et de Technologie de l'information Grenoble LITEN : Laboratoire Innovation pour les Technologies des Energies Nouvelles et les nanomatéria LaSTRE : Laboratoire Système Temps Réel Saclay / Gières LIALP : Laboratoire Infrastructure et Atelier Logiciel pour Puces Gières Henri-Pierre Charles Code generation under Control 11 / 10011
4 Introduction Présentation LaSTRE Laboratoire Sytèmes Temps Réel : Head : Vincent DAVID OASIS Multi-scaled time-triggered architecture (the system is measured at its own rhythm) Temporal consistency of exchanged data PharOS Same concepts specialized in automotive context : Embedded Systems Multiprocessors MPPA High productivity parallel programming model for embedded HPC : MPPA project c Low Level Code Optimization Dynamic generation, low level optimization, multimedia applications Technologies from high level sources to bare metal machines Henri-Pierre Charles Code generation under Control 100 / 10011
5 Motivation Context Objective? Be at home as fast as possible With safety Speed Limitations Constraints Real Speed Limitations Constraints Gaz Consomption Constraints Engine temperature Constraint Henri-Pierre Charles Code generation under Control 101 / 10011
6 Motivation Context Classical Compilation Chain Idea Algorithm Programmer Source Compiler Intermediate Assembler Assembly Loader Binary System Runnable User Data Compilation objectives Translate source to a semantically binary equivalent Assume successive refinement Optimize for efficency / parallelism : reduce cycle count Performance defaults is now a bug (not only in RT systems) Performance counter in the loop Henri-Pierre Charles Code generation under Control 110 / 10011
7 Motivation Context Semantic Bottleneck Henri-Pierre Charles Code generation under Control 111 / 10011
8 Motivation Context Ask for program! What are speed variation for this program : int i; for (i= 0; i < N; ++i) { int j; dest[i]= 0; for (j= 0; j < N; ++j) dest[i] += src[j] * m[i][j]; } Compiler, data size, target processor, instruction set, available parallelism, data type, memory location, operating system,... Henri-Pierre Charles Code generation under Control 1000 / 10011
9 Motivation Context Data Size Matter Loop size (value of N) 10 1 Multimedia kernel : Full loop unroll, instruction scheduling, memory caches access, /10 3 / Scientific : loop unroll, loop convertion, data prefetching 10 6 Multimedia flux : multithreading and more High level parallelism : MPI / Grid / Cloud,... N is generally a parameter only known at run-time. Profiling and Iterative compilation does not help. Compilation strategies are complex and are application domain specific Henri-Pierre Charles Code generation under Control 1001 / 10011
10 Architecture Architecture GENEPY CEA-LETI architecture Henri-Pierre Charles Code generation under Control 1010 / 10011
11 Architecture Operateur Mephisto No instruction set (microprogram) Henri-Pierre Charles Code generation under Control 1011 / 10011
12 Architecture Consommation à c lectrique Henri-Pierre Charles Code generation under Control 1100 / 10011
13 Dynamic compilation Compilette in work Idea Algorithm Programmer Source Compiler Intermediate Assembler Assembly Loader Algorithmic optimizer Binary System Runnable User Data Parameter Code generation Compilette Data Driven (Size, Alignment, Values) Energy Driven (ISA selection, Vectorization) Speed Driven (ISA selection, Vectorization quality) Network Topology driven User Driven (Experimentation) Henri-Pierre Charles Code generation under Control 1101 / 10011
14 Dynamic compilation degoal a tool for dynamic generation degoal : a tool for compilette generation Generate a generator Virtual Portable Instruction Set (Register based Data Type) Optimization at compil time & run time Faster than any compiler generator No Intermediate representation Algorithmic level Bottom up approach Target : ARM, GENEPY, XP70V3/4, GPU, K1,... Memory footprint : few Kb General context : telecommunication algorithms (3GPP LTE) Henri-Pierre Charles Code generation under Control 1110 / 10011
15 Dynamic compilation FP7 H4H FP7 : H4H : High Performance for Heterogenous Architecture, GPU JIT for Scilab Generate NVIDIA assembly language PTX dynamically Embed generator in Scilab Optimized data movement Linear algebra context Dynamic generation driven by data size Henri-Pierre Charles Code generation under Control 1111 / 10011
16 Dynamic compilation FP7 Touchmore FP7 : Touchmore : Dynamic generation Dynamic generation for MpSOC GENEPY tile (DSP Mephisto + MIPS) Generate for MIPS or Mephisto Multimedia applications (MP3 / MP4) Dynamic generation driven by performance Henri-Pierre Charles Code generation under Control / 10011
17 Dynamic compilation Smecy FP7 : Smecy Target P2012 MPSoC / XP70 processor Matrix x Matrix dynamic generation Perfect hash dynamic generator Dynamic generation driven by performance and power consomption Henri-Pierre Charles Code generation under Control / 10011
18 Dynamic compilation Related work Jit compilation : Java, LLVM, CUDA : Intermediate representation, heavy weight generators ( footprint & time) Python, perl, php : too high level, glue language FFTW, Spiral : generator, dynamic configuration Atlas : compil time tuning VVM / CCG / HPBCG : previous versions Henri-Pierre Charles Code generation under Control / 10011
19 Dynamic compilation Conclusion Dynamic generation is THE challenge (JIT, Javascript, emulation, multicore simulation,...) Lot of work to do : power characterization MPSoC and HPC systems share some problematics : multiple core, power consomption control,... Control over parameters for generation are multiples and hard to manage Subscribe to DCE 2012 : Workshop on Dynamic Compilation Everywhere (during Hipeac 2012) Henri-Pierre Charles Code generation under Control / 10011
Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems
Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems A. Carbon, Y. Lhuillier, H.-P. Charles CEA LIST DACLE division Embedded Computing Embedded Software Laboratories France
Le langage OCaml et la programmation des GPU
Le langage OCaml et la programmation des GPU GPU programming with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte Le projet OpenGPU : un an plus tard Ecole Polytechnique - 8 juin 2011 Outline
1/20/2016 INTRODUCTION
INTRODUCTION 1 Programming languages have common concepts that are seen in all languages This course will discuss and illustrate these common concepts: Syntax Names Types Semantics Memory Management We
MAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
Performance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
Chapter 1. Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages
Chapter 1 CS-4337 Organization of Programming Languages Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705 Chapter 1 Topics Reasons for Studying Concepts of Programming
PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)
PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages
ICOM 4036 Programming Languages Preliminaries Dr. Amirhossein Chinaei Dept. of Electrical & Computer Engineering UPRM Spring 2010 Language Evaluation Criteria Readability: the ease with which programs
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
Comparative Performance Review of SHA-3 Candidates
Comparative Performance Review of the SHA-3 Second-Round Candidates Cryptolog International Second SHA-3 Candidate Conference Outline sphlib sphlib sphlib is an open-source implementation of many hash
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
Runtime Code Generation for Code Polymorphism
Runtime Code Generation for Code Polymorphism Workshop on Runtime Code Generation for Secured Embedded Devices Damien Couroussé 2015-12-03 www.cea.fr Runtime Cliquez pour Code modifier Generation: le style
High Performance Computing in the Multi-core Area
High Performance Computing in the Multi-core Area Arndt Bode Technische Universität München Technology Trends for Petascale Computing Architectures: Multicore Accelerators Special Purpose Reconfigurable
Research and innovation for advanced new technologies in energy, ICT and life sciences
Alternative Energies And Atomic Energy Commission FROM RESEARCH TO INDUSTRY Research and innovation for advanced new technologies in energy, ICT and life sciences Manon Dirand Technological marketing representative
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Computer Organization
Computer Organization and Architecture Designing for Performance Ninth Edition William Stallings International Edition contributions by R. Mohan National Institute of Technology, Tiruchirappalli PEARSON
The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
Wiggins/Redstone: An On-line Program Specializer
Wiggins/Redstone: An On-line Program Specializer Dean Deaver Rick Gorton Norm Rubin {dean.deaver,rick.gorton,norm.rubin}@compaq.com Hot Chips 11 Wiggins/Redstone 1 W/R is a Software System That: u Makes
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11
Release Notes for Open Grid Scheduler/Grid Engine Version: Grid Engine 2011.11 New Features Berkeley DB Spooling Directory Can Be Located on NFS The Berkeley DB spooling framework has been enhanced such
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
HPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
HPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak [email protected] Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
Introduction to Virtual Machines
Introduction to Virtual Machines Introduction Abstraction and interfaces Virtualization Computer system architecture Process virtual machines System virtual machines 1 Abstraction Mechanism to manage complexity
OpenPOWER Software Stack with Big Data Example March 2014
OpenPOWER Software Stack with Big Data Example March 2014 Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
Lecture 1 Introduction to Android
These slides are by Dr. Jaerock Kwon at. The original URL is http://kettering.jrkwon.com/sites/default/files/2011-2/ce-491/lecture/alecture-01.pdf so please use that instead of pointing to this local copy
GPU Computing - CUDA
GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective
Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?
Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and
12. Introduction to Virtual Machines
12. Introduction to Virtual Machines 12. Introduction to Virtual Machines Modern Applications Challenges of Virtual Machine Monitors Historical Perspective Classification 332 / 352 12. Introduction to
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
10- High Performance Compu5ng
10- High Performance Compu5ng (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Fernando de Cuadra MRE Contents Implemen8ng computa8onal tools 1. High Performance
find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1
Monitors Monitor: A tool used to observe the activities on a system. Usage: A system programmer may use a monitor to improve software performance. Find frequently used segments of the software. A systems
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
HYBRID PEMFC SYSTEM EXPERIMENTATION IN THE SAILBOAT ZERO CO 2
HYBRID PEMFC SYSTEM EXPERIMENTATION IN THE SAILBOAT ZERO CO 2 www.cea.fr Fuel Cells and Hydrogen for Maritime and Harbour Applications FCH Workshop, Venice, June 14 th, 2013 Didier Bouix / CEA Liten /
Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Workshop on Computer Architecture Education 2015 Dan Connors, Kyle Dunn, Ryan Bueter Department of Electrical Engineering University
Cloud Computing. Up until now
Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines
HPC performance applications on Virtual Clusters
Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK [email protected] 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
HPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x
Software Pipelining for (i=1, i
Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking
Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking Kathlene Hurt and Eugene John Department of Electrical and Computer Engineering University of Texas at San Antonio
Compilers and Tools for Software Stack Optimisation
Compilers and Tools for Software Stack Optimisation EJCP 2014 2014/06/20 [email protected] Outline Compilers for a Set-Top-Box Compilers Potential Auto Tuning Tools Dynamic Program instrumentation
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
www.quilogic.com SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology
SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology The parallel revolution Future computing systems are parallel, but Programmers
Learn CUDA in an Afternoon: Hands-on Practical Exercises
Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA
Sélection adaptative de codes polyédriques pour GPU/CPU
Sélection adaptative de codes polyédriques pour GPU/CPU Jean-François DOLLINGER, Vincent LOECHNER, Philippe CLAUSS INRIA - Équipe CAMUS Université de Strasbourg Saint-Hippolyte - Le 6 décembre 2011 1 Sommaire
Full and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
Web and Big Data at LIG. Marie-Christine Rousset (Pr UJF, déléguée scientifique du LIG)
Web and Big Data at LIG Marie-Christine Rousset (Pr UJF, déléguée scientifique du LIG) Data and Knowledge Processing at Large Scale Officers: Massih-Reza Amini - Jean-Pierre Chevallet Teams: AMA EXMO GETALP
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India [email protected] 2 Department
What is a programming language?
Overview Introduction Motivation Why study programming languages? Some key concepts What is a programming language? Artificial language" Computers" Programs" Syntax" Semantics" What is a programming language?...there
Jonathan Worthington Scarborough Linux User Group
Jonathan Worthington Scarborough Linux User Group Introduction What does a Virtual Machine do? Hides away the details of the hardware platform and operating system. Defines a common set of instructions.
Programming Languages & Tools
4 Programming Languages & Tools Almost any programming language one is familiar with can be used for computational work (despite the fact that some people believe strongly that their own favorite programming
Instruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
SoCLib : Une plate-forme de prototypage virtuel pour systèmes multi-processeurs intégrés sur puce
SoCLib : Une plate-forme de prototypage virtuel pour systèmes multi-processeurs intégrés sur puce FETCH 07 Outline SoCLib goals SystemC modeling principles The Mutek Real-Time Operating System The MWMR
Introducción. Diseño de sistemas digitales.1
Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Computing at the HL-LHC
Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory Group ALICE: Pierre Vande Vyvre, Thorsten Kollegger, Predrag Buncic; ATLAS: David Rousseau, Benedetto Gorini,
Levels of Programming Languages. Gerald Penn CSC 324
Levels of Programming Languages Gerald Penn CSC 324 Levels of Programming Language Microcode Machine code Assembly Language Low-level Programming Language High-level Programming Language Levels of Programming
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
OpenCL for programming shared memory multicore CPUs
Akhtar Ali, Usman Dastgeer and Christoph Kessler. OpenCL on shared memory multicore CPUs. Proc. MULTIPROG-212 Workshop at HiPEAC-212, Paris, Jan. 212. OpenCL for programming shared memory multicore CPUs
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
Optimizing Code for Accelerators: The Long Road to High Performance
Optimizing Code for Accelerators: The Long Road to High Performance Hans Vandierendonck Mons GPU Day November 9 th, 2010 The Age of Accelerators 2 Accelerators in Real Life 3 Latency (ps/inst) Why Accelerators?
A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
Operating System Support for Multiprocessor Systems-on-Chip
Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
A Multi-layered Domain-specific Language for Stencil Computations
A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,
Lecture 1 Introduction to Parallel Programming
Lecture 1 Introduction to Parallel Programming EN 600.320/420 Instructor: Randal Burns 4 September 2008 Department of Computer Science, Johns Hopkins University Pipelined Processor From http://arstechnica.com/articles/paedia/cpu/pipelining-2.ars
Real-Time Operating Systems for MPSoCs
Real-Time Operating Systems for MPSoCs Hiroyuki Tomiyama Graduate School of Information Science Nagoya University http://member.acm.org/~hiroyuki MPSoC 2009 1 Contributors Hiroaki Takada Director and Professor
Bogdan Vesovic Siemens Smart Grid Solutions, Minneapolis, USA [email protected]
Evolution of Restructured Power Systems with Regulated Electricity Markets Panel D 2 Evolution of Solution Domains in Implementation of Market Design Bogdan Vesovic Siemens Smart Grid Solutions, Minneapolis,
Trampoline OSEK-VDX & AUTOSAR Compliant Open Source Real-Time Operating System
Trampoline OSEK-VDX & AUTOSAR Compliant Open Source Real-Time Operating System Jean-Luc Béchennec, Mikaël Briday, Sylvain Cotard, Sébastien Faucou, Yvon Trinquet Journée ACTRISS Outils. November 7, 2012
picojava TM : A Hardware Implementation of the Java Virtual Machine
picojava TM : A Hardware Implementation of the Java Virtual Machine Marc Tremblay and Michael O Connor Sun Microelectronics Slide 1 The Java picojava Synergy Java s origins lie in improving the consumer
Power-Aware High-Performance Scientific Computing
Power-Aware High-Performance Scientific Computing Padma Raghavan Scalable Computing Laboratory Department of Computer Science Engineering The Pennsylvania State University http://www.cse.psu.edu/~raghavan
COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING
COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.
CUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
