Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München

Size: px

Start display at page:

Download "Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München"

Camron Lang
7 years ago
Views:

1 Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München

2 SuperMUC: 3 Petaflops (3*10 15 =quadrillion), 3 MW 2

3 TOP 500 List TOTAL #1 #500 3

4 TOP 5 Systems: Linear Extens for Exascale *19 = 340 MW *36 = 302 MW *50 = 390 MW *89 = 1115 MW *100 = 394 MW 4

5 Project overview READEX Starting date: 1. September 2015 Duration: 3 years Runtime Exploitation of Application Dynamism for Energy-efficient exascale Computing Funding: European Commission Horizon 2020 grant agreement

Application Dynamism for Energy-efficient exascale

Project partners Technische Universität Dresden (Coordinator), Germany Norwegian University of Science and Technology, Norway Innovations National Supercomputing

6 Project partners Technische Universität Dresden (Coordinator), Germany Norwegian University of Science and Technology, Norway Innovations National Supercomputing Center, Czech Republic Technische Universität München, Germany Intel Exascale Centre, France GNS Braunschweig, Germany National University of Ireland Galway, Ireland 6

Center, Czech Republic Technische Universität München, Germany Intel Exascale

7 Motivation Challenges Energy consumption Extreme scale Dynamism Awareness Ability Effort Problems Solution Dynamism Automatic tuning Design-/Run-time 7

8 General idea HPC Automatic Tuning Embedded System Scenarios

9 Systems Scenario based Methodology 9

10 Outline Static Tuning with the Periscope Tuning Framework Dynamic Tuning with the READEX Tool Suite and Methodology 10

11 Periscope Tuning Framework Automatic application analysis & tuning Tune performance and energy (statically) Plug-in-based architecture Evaluate alternatives online Scalable and distributed framework Support variety of parallel paradigms MPI, OpenMP, OpenCL, Parallel pattern Developed in the AutoTune EU-FP7 project 11

alternatives online Scalable and distributed framework Support variety of

12 Score-P Scalable Performance Measurement Infrastructure for Parallel Codes Common instrumentation and measurement infrastructure 12

13 ENOPT Library for Energy Measurements

14 Tuning Plugin Interface Search Space Exploration Tuning Step Scenario execution Plugin Periscope Frontend Tuning actions Application with Monitor Analysis strategies

15 Tuning Plugins MPI parameters Eager Limit, Buffer space, collective algorithms Application restart or MPIT Tools Interface DVFS Frequency tuning for energy delay product Model-based prediction of frequency Region level tuning Parallelism capping Thread number tuning for energy delay product Exhaustive and curve fitting based prediction

product Model-based prediction of frequency Region level tuning Parallelism capping

16 Tuning Plugins Master/worker Partition factor and number of workers Prediction through performance model based on data measured in preanalysis Parallel Pattern Tuning replication and buffers between pipeline stages Based on component distribution via StarPU OpenCL tuning Compiler flags for offline compilation NDRange tuning

Tuning replication and buffers between pipeline stages Based on component

17 Tuning Plugins MPI IO Tuning data sieving and number of aggregators Exhaustive and model based Compiler Flag Selection Automatic recompilation and execution Selective recompilation based on pre-analysis Exhaustive and individual search Scenario analysis for significant routines Combination with Pathway

and execution Selective recompilation based on pre-analysis Exhaustive and

18 Plugin Evaluation

19 Variation of Energy Measurements 19

20 Predicted vs Measured Time for Seissol 20

21 Tuning with the Persicope Tuning Framework 21

22 Application Dynamism: Beyond Static Tuning 22

23 Inter-phase Dynamism All-to-all Performance 2048 phases PEPC Benchmark of the DEISA Benchmark Suite 23

24 Scenario-Based Tuning Design Time Analysis Periscope Tuning Framework (PTF) Tuning Model Runtime Tuning READEX Runtime Library (RRL) 24

Design Time Analysis Tuning Model Scenarios: set of runtime situations (rts) Classifiers: RTS S Selector: Context CFG Tuning cylces Captures intra-phase dynamism

25 Design Time Analysis Tuning Model Scenarios: set of runtime situations (rts) Classifiers: RTS S Selector: Context CFG Tuning cylces Captures intra-phase dynamism Creates phase TM Sequence of tuning cycles Captures inter-phase dynamism Creates inter-phase TM DTA for multiple inputs Captures input dynamism Creates application TM 25

26 Runtime Tuning with the READEX Runtime Library Enter phase: Capture phase identifiers Enter significant region: Classify rts; apply selector; perform switching Exit significant region: Save objective value Exit phase: Perform calibration

27 RRL Architecture Score-P RRL Parameter Control Online Access Interface Substrate Plugin Interface Scenario Switching Calibration RRL Substrate Plugin Scenario Detection Tuning Plugin Service MPI OpenMP Metrics Compiler Region Identifier Input Identifier Application Tuning Model 27

28 Validation and project goals Goal: Validate the effect of READEX using real-world applications Co-design process: Hand-tune selected applications Compare results with automatic static and dynamic tuning Energy measurements using HDEEM infrastructure 28

29 Conclusion Energy-efficiency at exascale Application developers and users will have to care Lack of capabilities Awareness Expertise Resources Proposed solution READEX: Exploit dynamism Detect at design time, exploit at run-time Tools-aided autotuning methodology 29

30 Thank you! Questions? 30

Recent Advances in Periscope for Performance Analysis and Tuning

Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,