Code generation under Control Rencontres sur la compilation / Saint Hippolyte Henri-Pierre Charles CEA Laboratoire LaSTRE / Grenoble 12 décembre 2011
Introduction Présentation Henri-Pierre Charles, two lines CV : 2010- CEA/DRT/DACLE/LIST/LaSTRE CRI PILSI context at Gières 1993-2010 : assistant professor in Université of Versailles Saint-Quentin en Yvelines, PRiSM laboratory, IUT de Vélizy Keywords : Architecture, HPC, Compiler backend, Parallelism (ILP, Multimedia, Caches) 6809, 68000, i860, trimedia, Itanium, Power, CELL, ARM, MEPHISTO, other GCC, LLVM, FFTW, H264, Spiral, ATLAS, MESA3D, other 3D Image reconstruction, Z-buffer, Video Compression, FFTW, QCD Henri-Pierre Charles Code generation under Control 10 / 10011
Introduction CEA / CRI PILSI CEA : Commissariat à l'énergie Atomique et aux Énergies Alternatives DAM : Direction des Applications Militaires DEN : Direction de l'énergie Nucléaire DRT : Direction de la Recherche Technologique DSM : Direction des Sciences de la Matière DSV : Direction des Sciences du Vivant LIST : Laboratoire Intégration des Systèmes et des Technologies SACLAY LETI : Laboratoire Électronique et de Technologie de l'information Grenoble LITEN : Laboratoire Innovation pour les Technologies des Energies Nouvelles et les nanomatéria LaSTRE : Laboratoire Système Temps Réel Saclay / Gières LIALP : Laboratoire Infrastructure et Atelier Logiciel pour Puces Gières Henri-Pierre Charles Code generation under Control 11 / 10011
Introduction Présentation LaSTRE Laboratoire Sytèmes Temps Réel : Head : Vincent DAVID OASIS Multi-scaled time-triggered architecture (the system is measured at its own rhythm) Temporal consistency of exchanged data PharOS Same concepts specialized in automotive context : Embedded Systems Multiprocessors MPPA High productivity parallel programming model for embedded HPC : MPPA project c Low Level Code Optimization Dynamic generation, low level optimization, multimedia applications Technologies from high level sources to bare metal machines Henri-Pierre Charles Code generation under Control 100 / 10011
Motivation Context Objective? Be at home as fast as possible With safety Speed Limitations Constraints Real Speed Limitations Constraints Gaz Consomption Constraints Engine temperature Constraint Henri-Pierre Charles Code generation under Control 101 / 10011
Motivation Context Classical Compilation Chain Idea Algorithm Programmer Source Compiler Intermediate Assembler Assembly Loader Binary System Runnable User Data Compilation objectives Translate source to a semantically binary equivalent Assume successive refinement Optimize for efficency / parallelism : reduce cycle count Performance defaults is now a bug (not only in RT systems) Performance counter in the loop Henri-Pierre Charles Code generation under Control 110 / 10011
Motivation Context Semantic Bottleneck Henri-Pierre Charles Code generation under Control 111 / 10011
Motivation Context Ask for program! What are speed variation for this program : int i; for (i= 0; i < N; ++i) { int j; dest[i]= 0; for (j= 0; j < N; ++j) dest[i] += src[j] * m[i][j]; } Compiler, data size, target processor, instruction set, available parallelism, data type, memory location, operating system,... Henri-Pierre Charles Code generation under Control 1000 / 10011
Motivation Context Data Size Matter Loop size (value of N) 10 1 Multimedia kernel : Full loop unroll, instruction scheduling, memory caches access,... 10 2 /10 3 / Scientific : loop unroll, loop convertion, data prefetching 10 6 Multimedia flux : multithreading 10 10 and more High level parallelism : MPI / Grid / Cloud,... N is generally a parameter only known at run-time. Profiling and Iterative compilation does not help. Compilation strategies are complex and are application domain specific Henri-Pierre Charles Code generation under Control 1001 / 10011
Architecture Architecture GENEPY CEA-LETI architecture Henri-Pierre Charles Code generation under Control 1010 / 10011
Architecture Operateur Mephisto No instruction set (microprogram) Henri-Pierre Charles Code generation under Control 1011 / 10011
Architecture Consommation à c lectrique Henri-Pierre Charles Code generation under Control 1100 / 10011
Dynamic compilation Compilette in work Idea Algorithm Programmer Source Compiler Intermediate Assembler Assembly Loader Algorithmic optimizer Binary System Runnable User Data Parameter Code generation Compilette Data Driven (Size, Alignment, Values) Energy Driven (ISA selection, Vectorization) Speed Driven (ISA selection, Vectorization quality) Network Topology driven User Driven (Experimentation) Henri-Pierre Charles Code generation under Control 1101 / 10011
Dynamic compilation degoal a tool for dynamic generation degoal : a tool for compilette generation Generate a generator Virtual Portable Instruction Set (Register based Data Type) Optimization at compil time & run time Faster than any compiler generator No Intermediate representation Algorithmic level Bottom up approach Target : ARM, GENEPY, XP70V3/4, GPU, K1,... Memory footprint : few Kb General context : telecommunication algorithms (3GPP LTE) Henri-Pierre Charles Code generation under Control 1110 / 10011
Dynamic compilation FP7 H4H FP7 : H4H : High Performance for Heterogenous Architecture, GPU JIT for Scilab Generate NVIDIA assembly language PTX dynamically Embed generator in Scilab Optimized data movement Linear algebra context Dynamic generation driven by data size Henri-Pierre Charles Code generation under Control 1111 / 10011
Dynamic compilation FP7 Touchmore FP7 : Touchmore : Dynamic generation Dynamic generation for MpSOC GENEPY tile (DSP Mephisto + MIPS) Generate for MIPS or Mephisto Multimedia applications (MP3 / MP4) Dynamic generation driven by performance Henri-Pierre Charles Code generation under Control 10000 / 10011
Dynamic compilation Smecy FP7 : Smecy Target P2012 MPSoC / XP70 processor Matrix x Matrix dynamic generation Perfect hash dynamic generator Dynamic generation driven by performance and power consomption Henri-Pierre Charles Code generation under Control 10001 / 10011
Dynamic compilation Related work Jit compilation : Java, LLVM, CUDA : Intermediate representation, heavy weight generators ( footprint & time) Python, perl, php : too high level, glue language FFTW, Spiral : generator, dynamic configuration Atlas : compil time tuning VVM / CCG / HPBCG : previous versions Henri-Pierre Charles Code generation under Control 10010 / 10011
Dynamic compilation Conclusion Dynamic generation is THE challenge (JIT, Javascript, emulation, multicore simulation,...) Lot of work to do : power characterization MPSoC and HPC systems share some problematics : multiple core, power consomption control,... Control over parameters for generation are multiples and hard to manage Subscribe to DCE 2012 : Workshop on Dynamic Compilation Everywhere (during Hipeac 2012) Henri-Pierre Charles Code generation under Control 10011 / 10011