Code generation under Control

Size: px

Start display at page:

Download "Code generation under Control"

Maryann Wright
10 years ago
Views:

1 Code generation under Control Rencontres sur la compilation / Saint Hippolyte Henri-Pierre Charles CEA Laboratoire LaSTRE / Grenoble 12 décembre 2011

2 Introduction Présentation Henri-Pierre Charles, two lines CV : CEA/DRT/DACLE/LIST/LaSTRE CRI PILSI context at Gières : assistant professor in Université of Versailles Saint-Quentin en Yvelines, PRiSM laboratory, IUT de Vélizy Keywords : Architecture, HPC, Compiler backend, Parallelism (ILP, Multimedia, Caches) 6809, 68000, i860, trimedia, Itanium, Power, CELL, ARM, MEPHISTO, other GCC, LLVM, FFTW, H264, Spiral, ATLAS, MESA3D, other 3D Image reconstruction, Z-buffer, Video Compression, FFTW, QCD Henri-Pierre Charles Code generation under Control 10 / 10011

Parallelism (ILP, Multimedia, Caches) 6809, 68000, i860, trimedia, Itanium, Power, CELL, ARM, MEPHISTO, other GCC, LLVM, FFTW, H264, Spiral,

3 Introduction CEA / CRI PILSI CEA : Commissariat à l'énergie Atomique et aux Énergies Alternatives DAM : Direction des Applications Militaires DEN : Direction de l'énergie Nucléaire DRT : Direction de la Recherche Technologique DSM : Direction des Sciences de la Matière DSV : Direction des Sciences du Vivant LIST : Laboratoire Intégration des Systèmes et des Technologies SACLAY LETI : Laboratoire Électronique et de Technologie de l'information Grenoble LITEN : Laboratoire Innovation pour les Technologies des Energies Nouvelles et les nanomatéria LaSTRE : Laboratoire Système Temps Réel Saclay / Gières LIALP : Laboratoire Infrastructure et Atelier Logiciel pour Puces Gières Henri-Pierre Charles Code generation under Control 11 / 10011

Technologies SACLAY LETI : Laboratoire Électronique et de Technologie de l'information Grenoble LITEN : Laboratoire Innovation pour les Technologies des Energies Nouvelles et les

4 Introduction Présentation LaSTRE Laboratoire Sytèmes Temps Réel : Head : Vincent DAVID OASIS Multi-scaled time-triggered architecture (the system is measured at its own rhythm) Temporal consistency of exchanged data PharOS Same concepts specialized in automotive context : Embedded Systems Multiprocessors MPPA High productivity parallel programming model for embedded HPC : MPPA project c Low Level Code Optimization Dynamic generation, low level optimization, multimedia applications Technologies from high level sources to bare metal machines Henri-Pierre Charles Code generation under Control 100 / 10011

Multiprocessors MPPA High productivity parallel programming model for embedded HPC : MPPA project c Low Level Code Optimization Dynamic generation, low

5 Motivation Context Objective? Be at home as fast as possible With safety Speed Limitations Constraints Real Speed Limitations Constraints Gaz Consomption Constraints Engine temperature Constraint Henri-Pierre Charles Code generation under Control 101 / 10011

Constraints Real Speed Limitations Constraints Gaz Consomption

6 Motivation Context Classical Compilation Chain Idea Algorithm Programmer Source Compiler Intermediate Assembler Assembly Loader Binary System Runnable User Data Compilation objectives Translate source to a semantically binary equivalent Assume successive refinement Optimize for efficency / parallelism : reduce cycle count Performance defaults is now a bug (not only in RT systems) Performance counter in the loop Henri-Pierre Charles Code generation under Control 110 / 10011

equivalent Assume successive refinement Optimize for efficency / parallelism : reduce cycle count Performance defaults

7 Motivation Context Semantic Bottleneck Henri-Pierre Charles Code generation under Control 111 / 10011

8 Motivation Context Ask for program! What are speed variation for this program : int i; for (i= 0; i < N; ++i) { int j; dest[i]= 0; for (j= 0; j < N; ++j) dest[i] += src[j] * m[i][j]; } Compiler, data size, target processor, instruction set, available parallelism, data type, memory location, operating system,... Henri-Pierre Charles Code generation under Control 1000 / 10011

0; for (j= 0; j < N; ++j) dest[i] += src[j] * m[i][j]; } Compiler, data size, target

9 Motivation Context Data Size Matter Loop size (value of N) 10 1 Multimedia kernel : Full loop unroll, instruction scheduling, memory caches access, /10 3 / Scientific : loop unroll, loop convertion, data prefetching 10 6 Multimedia flux : multithreading and more High level parallelism : MPI / Grid / Cloud,... N is generally a parameter only known at run-time. Profiling and Iterative compilation does not help. Compilation strategies are complex and are application domain specific Henri-Pierre Charles Code generation under Control 1001 / 10011

.. 10 2 /10 3 / Scientific : loop unroll, loop convertion, data prefetching 10 6 Multimedia flux : multithreading 10 10 and more High level

10 Architecture Architecture GENEPY CEA-LETI architecture Henri-Pierre Charles Code generation under Control 1010 / 10011

11 Architecture Operateur Mephisto No instruction set (microprogram) Henri-Pierre Charles Code generation under Control 1011 / 10011

12 Architecture Consommation Ã c lectrique Henri-Pierre Charles Code generation under Control 1100 / 10011

13 Dynamic compilation Compilette in work Idea Algorithm Programmer Source Compiler Intermediate Assembler Assembly Loader Algorithmic optimizer Binary System Runnable User Data Parameter Code generation Compilette Data Driven (Size, Alignment, Values) Energy Driven (ISA selection, Vectorization) Speed Driven (ISA selection, Vectorization quality) Network Topology driven User Driven (Experimentation) Henri-Pierre Charles Code generation under Control 1101 / 10011

(Size, Alignment, Values) Energy Driven (ISA selection, Vectorization) Speed Driven (ISA selection, Vectorization

14 Dynamic compilation degoal a tool for dynamic generation degoal : a tool for compilette generation Generate a generator Virtual Portable Instruction Set (Register based Data Type) Optimization at compil time & run time Faster than any compiler generator No Intermediate representation Algorithmic level Bottom up approach Target : ARM, GENEPY, XP70V3/4, GPU, K1,... Memory footprint : few Kb General context : telecommunication algorithms (3GPP LTE) Henri-Pierre Charles Code generation under Control 1110 / 10011

Intermediate representation Algorithmic level Bottom up approach Target : ARM, GENEPY, XP70V3/4, GPU, K1,.

15 Dynamic compilation FP7 H4H FP7 : H4H : High Performance for Heterogenous Architecture, GPU JIT for Scilab Generate NVIDIA assembly language PTX dynamically Embed generator in Scilab Optimized data movement Linear algebra context Dynamic generation driven by data size Henri-Pierre Charles Code generation under Control 1111 / 10011

Embed generator in Scilab Optimized data movement Linear algebra context Dynamic

16 Dynamic compilation FP7 Touchmore FP7 : Touchmore : Dynamic generation Dynamic generation for MpSOC GENEPY tile (DSP Mephisto + MIPS) Generate for MIPS or Mephisto Multimedia applications (MP3 / MP4) Dynamic generation driven by performance Henri-Pierre Charles Code generation under Control / 10011

MIPS or Mephisto Multimedia applications (MP3 / MP4) Dynamic generation

17 Dynamic compilation Smecy FP7 : Smecy Target P2012 MPSoC / XP70 processor Matrix x Matrix dynamic generation Perfect hash dynamic generator Dynamic generation driven by performance and power consomption Henri-Pierre Charles Code generation under Control / 10011

generator Dynamic generation driven by performance and power

18 Dynamic compilation Related work Jit compilation : Java, LLVM, CUDA : Intermediate representation, heavy weight generators ( footprint & time) Python, perl, php : too high level, glue language FFTW, Spiral : generator, dynamic configuration Atlas : compil time tuning VVM / CCG / HPBCG : previous versions Henri-Pierre Charles Code generation under Control / 10011

level, glue language FFTW, Spiral : generator, dynamic configuration Atlas : compil time

19 Dynamic compilation Conclusion Dynamic generation is THE challenge (JIT, Javascript, emulation, multicore simulation,...) Lot of work to do : power characterization MPSoC and HPC systems share some problematics : multiple core, power consomption control,... Control over parameters for generation are multiples and hard to manage Subscribe to DCE 2012 : Workshop on Dynamic Compilation Everywhere (during Hipeac 2012) Henri-Pierre Charles Code generation under Control / 10011

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems A. Carbon, Y. Lhuillier, H.-P. Charles CEA LIST DACLE division Embedded Computing Embedded Software Laboratories France