Code generation under Control



Similar documents
Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Le langage OCaml et la programmation des GPU

1/20/2016 INTRODUCTION

MAQAO Performance Analysis and Optimization Tool

Performance Analysis and Optimization Tool

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Comparative Performance Review of SHA-3 Candidates

Part I Courses Syllabus

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Runtime Code Generation for Code Polymorphism

High Performance Computing in the Multi-core Area

Research and innovation for advanced new technologies in energy, ICT and life sciences

HPC Wales Skills Academy Course Catalogue 2015

Computer Organization

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

Multi-core Programming System Overview

Wiggins/Redstone: An On-line Program Specializer

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine

Next Generation GPU Architecture Code-named Fermi

HPC with Multicore and GPUs

HPC Deployment of OpenFOAM in an Industrial Setting

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Introduction to Virtual Machines

OpenPOWER Software Stack with Big Data Example March 2014

~ Greetings from WSU CAPPLab ~

Lecture 1 Introduction to Android

GPU Computing - CUDA

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

12. Introduction to Virtual Machines

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

10- High Performance Compu5ng

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

HYBRID PEMFC SYSTEM EXPERIMENTATION IN THE SAILBOAT ZERO CO 2

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts

Cloud Computing. Up until now

HPC performance applications on Virtual Clusters

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Multicore Parallel Computing with OpenMP

HPC enabling of OpenFOAM R for CFD applications

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking

Compilers and Tools for Software Stack Optimisation

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Sélection adaptative de codes polyédriques pour GPU/CPU

Full and Para Virtualization

Web and Big Data at LIG. Marie-Christine Rousset (Pr UJF, déléguée scientifique du LIG)

Introduction to GPU hardware and to CUDA

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

What is a programming language?

Jonathan Worthington Scarborough Linux User Group

Programming Languages & Tools

Instruction Set Design

SoCLib : Une plate-forme de prototypage virtuel pour systèmes multi-processeurs intégrés sur puce

Introducción. Diseño de sistemas digitales.1

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Computing at the HL-LHC

Levels of Programming Languages. Gerald Penn CSC 324

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenCL for programming shared memory multicore CPUs

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Optimizing Code for Accelerators: The Long Road to High Performance

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Operating System Support for Multiprocessor Systems-on-Chip

Parallel Programming Survey

A Multi-layered Domain-specific Language for Stencil Computations

Lecture 1 Introduction to Parallel Programming

Real-Time Operating Systems for MPSoCs

Bogdan Vesovic Siemens Smart Grid Solutions, Minneapolis, USA

Trampoline OSEK-VDX & AUTOSAR Compliant Open Source Real-Time Operating System

picojava TM : A Hardware Implementation of the Java Virtual Machine

Power-Aware High-Performance Scientific Computing

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

CUDA programming on NVIDIA GPUs

Transcription:

Code generation under Control Rencontres sur la compilation / Saint Hippolyte Henri-Pierre Charles CEA Laboratoire LaSTRE / Grenoble 12 décembre 2011

Introduction Présentation Henri-Pierre Charles, two lines CV : 2010- CEA/DRT/DACLE/LIST/LaSTRE CRI PILSI context at Gières 1993-2010 : assistant professor in Université of Versailles Saint-Quentin en Yvelines, PRiSM laboratory, IUT de Vélizy Keywords : Architecture, HPC, Compiler backend, Parallelism (ILP, Multimedia, Caches) 6809, 68000, i860, trimedia, Itanium, Power, CELL, ARM, MEPHISTO, other GCC, LLVM, FFTW, H264, Spiral, ATLAS, MESA3D, other 3D Image reconstruction, Z-buffer, Video Compression, FFTW, QCD Henri-Pierre Charles Code generation under Control 10 / 10011

Introduction CEA / CRI PILSI CEA : Commissariat à l'énergie Atomique et aux Énergies Alternatives DAM : Direction des Applications Militaires DEN : Direction de l'énergie Nucléaire DRT : Direction de la Recherche Technologique DSM : Direction des Sciences de la Matière DSV : Direction des Sciences du Vivant LIST : Laboratoire Intégration des Systèmes et des Technologies SACLAY LETI : Laboratoire Électronique et de Technologie de l'information Grenoble LITEN : Laboratoire Innovation pour les Technologies des Energies Nouvelles et les nanomatéria LaSTRE : Laboratoire Système Temps Réel Saclay / Gières LIALP : Laboratoire Infrastructure et Atelier Logiciel pour Puces Gières Henri-Pierre Charles Code generation under Control 11 / 10011

Introduction Présentation LaSTRE Laboratoire Sytèmes Temps Réel : Head : Vincent DAVID OASIS Multi-scaled time-triggered architecture (the system is measured at its own rhythm) Temporal consistency of exchanged data PharOS Same concepts specialized in automotive context : Embedded Systems Multiprocessors MPPA High productivity parallel programming model for embedded HPC : MPPA project c Low Level Code Optimization Dynamic generation, low level optimization, multimedia applications Technologies from high level sources to bare metal machines Henri-Pierre Charles Code generation under Control 100 / 10011

Motivation Context Objective? Be at home as fast as possible With safety Speed Limitations Constraints Real Speed Limitations Constraints Gaz Consomption Constraints Engine temperature Constraint Henri-Pierre Charles Code generation under Control 101 / 10011

Motivation Context Classical Compilation Chain Idea Algorithm Programmer Source Compiler Intermediate Assembler Assembly Loader Binary System Runnable User Data Compilation objectives Translate source to a semantically binary equivalent Assume successive refinement Optimize for efficency / parallelism : reduce cycle count Performance defaults is now a bug (not only in RT systems) Performance counter in the loop Henri-Pierre Charles Code generation under Control 110 / 10011

Motivation Context Semantic Bottleneck Henri-Pierre Charles Code generation under Control 111 / 10011

Motivation Context Ask for program! What are speed variation for this program : int i; for (i= 0; i < N; ++i) { int j; dest[i]= 0; for (j= 0; j < N; ++j) dest[i] += src[j] * m[i][j]; } Compiler, data size, target processor, instruction set, available parallelism, data type, memory location, operating system,... Henri-Pierre Charles Code generation under Control 1000 / 10011

Motivation Context Data Size Matter Loop size (value of N) 10 1 Multimedia kernel : Full loop unroll, instruction scheduling, memory caches access,... 10 2 /10 3 / Scientific : loop unroll, loop convertion, data prefetching 10 6 Multimedia flux : multithreading 10 10 and more High level parallelism : MPI / Grid / Cloud,... N is generally a parameter only known at run-time. Profiling and Iterative compilation does not help. Compilation strategies are complex and are application domain specific Henri-Pierre Charles Code generation under Control 1001 / 10011

Architecture Architecture GENEPY CEA-LETI architecture Henri-Pierre Charles Code generation under Control 1010 / 10011

Architecture Operateur Mephisto No instruction set (microprogram) Henri-Pierre Charles Code generation under Control 1011 / 10011

Architecture Consommation à c lectrique Henri-Pierre Charles Code generation under Control 1100 / 10011

Dynamic compilation Compilette in work Idea Algorithm Programmer Source Compiler Intermediate Assembler Assembly Loader Algorithmic optimizer Binary System Runnable User Data Parameter Code generation Compilette Data Driven (Size, Alignment, Values) Energy Driven (ISA selection, Vectorization) Speed Driven (ISA selection, Vectorization quality) Network Topology driven User Driven (Experimentation) Henri-Pierre Charles Code generation under Control 1101 / 10011

Dynamic compilation degoal a tool for dynamic generation degoal : a tool for compilette generation Generate a generator Virtual Portable Instruction Set (Register based Data Type) Optimization at compil time & run time Faster than any compiler generator No Intermediate representation Algorithmic level Bottom up approach Target : ARM, GENEPY, XP70V3/4, GPU, K1,... Memory footprint : few Kb General context : telecommunication algorithms (3GPP LTE) Henri-Pierre Charles Code generation under Control 1110 / 10011

Dynamic compilation FP7 H4H FP7 : H4H : High Performance for Heterogenous Architecture, GPU JIT for Scilab Generate NVIDIA assembly language PTX dynamically Embed generator in Scilab Optimized data movement Linear algebra context Dynamic generation driven by data size Henri-Pierre Charles Code generation under Control 1111 / 10011

Dynamic compilation FP7 Touchmore FP7 : Touchmore : Dynamic generation Dynamic generation for MpSOC GENEPY tile (DSP Mephisto + MIPS) Generate for MIPS or Mephisto Multimedia applications (MP3 / MP4) Dynamic generation driven by performance Henri-Pierre Charles Code generation under Control 10000 / 10011

Dynamic compilation Smecy FP7 : Smecy Target P2012 MPSoC / XP70 processor Matrix x Matrix dynamic generation Perfect hash dynamic generator Dynamic generation driven by performance and power consomption Henri-Pierre Charles Code generation under Control 10001 / 10011

Dynamic compilation Related work Jit compilation : Java, LLVM, CUDA : Intermediate representation, heavy weight generators ( footprint & time) Python, perl, php : too high level, glue language FFTW, Spiral : generator, dynamic configuration Atlas : compil time tuning VVM / CCG / HPBCG : previous versions Henri-Pierre Charles Code generation under Control 10010 / 10011

Dynamic compilation Conclusion Dynamic generation is THE challenge (JIT, Javascript, emulation, multicore simulation,...) Lot of work to do : power characterization MPSoC and HPC systems share some problematics : multiple core, power consomption control,... Control over parameters for generation are multiples and hard to manage Subscribe to DCE 2012 : Workshop on Dynamic Compilation Everywhere (during Hipeac 2012) Henri-Pierre Charles Code generation under Control 10011 / 10011