Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik, oleynik@in.tum.de
Outline Periscope overview Advances in Periscope Development I. PAThWay II. Performance Dynamics Analysis with Periscope III. Periscope Tuning Framework 30.08.2013 Yury Oleynik, oleynik@in.tum.de 2
Projects LMAC Leistungsdynamik massiv-paralleler Codes Performance Dynamics of Massively Parallel Codes BMBF project AutoTune Automatic Online Tuning European Union FP7 project 30.08.2013 Yury Oleynik, oleynik@in.tum.de 3
Periscope overview Distributed Architecture Analysis performed by multiple distributed hierarchical agents Iterative Online Analysis Measurements are configured, obtained and evaluated on the fly Automatic Analysis Based on formalized knowledge of performance optimization experts Eclipse Integration Eclipse based integrated development and performance analysis environment Measurement and Instrumentation Score-P or MRIMonitor 30.08.2013 Yury Oleynik, oleynik@in.tum.de 4
Advances in Periscope Development Performance Dynamics Cross-experiment performance dynamics: Provide a tool for automating and organization of performance experiments during the optimization process Runtime performance dynamics: Automatically search for runtime performance dynamics properties Performance Tuning Perform automatic search for application configuration delivering best performance according to given objective 30.08.2013 Yury Oleynik, oleynik@in.tum.de 5
I. Cross-experiment performance dynamics PATHWAY 30.08.2013 Yury Oleynik, oleynik@in.tum.de 6
Problem statement Performance Engineering Performance engineering is an iterative cycle Requires in-depth knowledge of hw and sw Each step may involve many tools & different configurations Repetitive and manual Optimization spans over months Hard to organize data & results No clear track of process evolution Examples Scalability analysis Cross-platform analysis Verify Optimize problematic code sections Baseline Establish/Update Execute Parallel application Monitor Performance Analyze Bottlenecks 30.08.2013 Yury Oleynik, oleynik@in.tum.de 7
PAThWay Eclipse plug-in for structured and methodical performance engineering using workflows Goals: Manage individual tasks as part of one workflow Automate performance engineering tasks, where possible Keep track and organize the process Abstract complexity of the underlying software and hardware 30.08.2013 Yury Oleynik, oleynik@in.tum.de 8
30.08.2013 Yury Oleynik, oleynik@in.tum.de 9
Workflow Editor Workflow editor Available workflow components 30.08.2013 Yury Oleynik, oleynik@in.tum.de 10
Experiment Browser Database stores also properties of the tools Experiments view Standard output and environment configuration Experiments Meta-data 30.08.2013 Yury Oleynik, oleynik@in.tum.de 11
Project Documentation Accessible documentation is important Requirements Work progress Optimization ideas Commonly spread around multiple documents Wiki-based editor Completed experiments Links to other external resources Other wiki pages 30.08.2013 Yury Oleynik, oleynik@in.tum.de 12
Supportive Modules Parallel Tools Platform Module Starting interactive/batch jobs Monitoring execution & accessing data Code Managements Keeps snapshots of the sources Based on Git Environment Detection Detects loaded modules Copies defined environment variables... 30.08.2013 Yury Oleynik, oleynik@in.tum.de 13
PAThWay Available as an Eclipse plugin from the update site: http://periscope.in.tum.de/pathway/eclipse/ Installation guide: http://periscope.in.tum.de/pathway/ 30.08.2013 Yury Oleynik, oleynik@in.tum.de 14
II. Performance Dynamics: at runtime AUTOMATIC PERFORMANCE DYNAMICS ANALYSIS WITH PERISCOPE 30.08.2013 Yury Oleynik, oleynik@in.tum.de 15
Automatic Performance Dynamics Analysis with Periscope Motivation for Performance Dynamics Analysis Location and severity of performance bottlenecks is time-dependent Performance changes manifest themselves at various time scales Dimensionality of performance measurements makes manual investigation by the user tedious Analysis goals: Automatically detect changes in temporal performance behavior Quantify the negative impact of performance changes Reduce complexity and size of time-dependent measurements Simplify comprehension (no graphical visualization) Group entities with similar temporal performance behavior 30.08.2013 Yury Oleynik, oleynik@in.tum.de 16
Automatic Performance Dynamics Analysis with Periscope Helps to answer following typical questions: Does the performance degrade over time? When is the degradation observed? What is the impact of the particular change? Which process/location is impacted by the performance degradation? Are there similar degradations found in other processes or functions? Approach Multi-scale analysis Qualitative abstraction of time series with quantitative information sufficient to characterize impact Representation mimics human mental model of temporal behavior Automatic search for performance dynamics properties 30.08.2013 Yury Oleynik, oleynik@in.tum.de 17
Automatic Performance Dynamics Analysis with Periscope: Analysis Steps 1. Measurement a) Collect dynamic profile time-series using Score-P 2. Preprocessing a) Perform Scale-Space Filtering by filtering with Gaussian b) Extract extremas and inflexion points 3. Qualitative Abstraction a) Track extremas and inflexion points from coarse to fine scales b) Label intervals between extremas and inflexion points c) Extract maximum lifetime level of the resulting tree of intervals 4. Search for performance dynamics properties a) Search maximum lifetime level for predefined patterns both qualitatively and quantitatively 30.08.2013 Yury Oleynik, oleynik@in.tum.de 18
Automatic Performance Dynamics Analysis with Periscope: Analysis Steps DABCBCDABCDABCDABCDABC D A D A B A C C C B CD B C CD B CD C B CD AB C B C B C C B C A - concave increase B - concave decrease C - convex decrease D - convex increase E - linear increase F - linear decrease G - constant 30.08.2013 Yury Oleynik, oleynik@in.tum.de 19
Automatic Performance Dynamics Analysis with Periscope: Search for dynamics properties Search for dynamic properties: Find all picks (AB): DABCBCDABCDABCDABCDABC Find the most prominent valley (CD): DABCBCDABCDABCDABCDABC Find the highest increase (DA): DABCBCDABCDABCDABCDABC 30.08.2013 Yury Oleynik, oleynik@in.tum.de 20
III. Performance tuning PERISCOPE TUNING FRAMEWORK 30.08.2013 Yury Oleynik, oleynik@in.tum.de 21
Periscope Tuning Framework Goals: Tune codes to improve performance and energy efficiency Combine analysis and tuning to speedup the tuning process Support multicore and GPU accelerated parallel systems Idea: Automatically evaluate optimization space Produce tuning recommendation Use it to improve production runs 30.08.2013 Yury Oleynik, oleynik@in.tum.de 22
PTF: Approach Define tuning strategies combining performance analysis infrastructure and tuning plugins Measured performance and energy properties are used in plugins to navigate the search for optimal configuration Available tuning plugins focus on: Tuning of High-Level Patterns for GPGPU Tuning of HMPP Codelets Tuning of Energy Consumption via CPU frequency Tuning of Master-Worker Pattern in MPI Tuning of MPI Runtime Tuning of Compiler Flag Selection 30.08.2013 Yury Oleynik, oleynik@in.tum.de 23
30.08.2013 Yury Oleynik, oleynik@in.tum.de 24
Tuning of High-Level Patterns for GPGPU Target applications Applications implemented in the pipeline patterns framework (developed in PEPPHER project) Tuning objective Optimize throughput of the pipeline Tuning points and tuning actions Replication factors of individual stages Buffer sizes of input and output ports of individual stages Splitting and merging of the stages 30.08.2013 Yury Oleynik, oleynik@in.tum.de 25
Tuning of HMPP Codelets Target applications OpenHMPP annotated applications To be run on heterogeneous many-core architecture Tuning Objective Optimize HMPP codelets performance Tuning points and tuning actions Static codelet tuning points: operations, transformations and algorithms used to implement a codelet, e.g. unrolling factor, the HMPP grid size Dynamic codelet tuning points: variables or callbacks available at runtime 30.08.2013 Yury Oleynik, oleynik@in.tum.de 26
Tuning of Energy Consumption via CPU Frequency Target applications Any application running on the thin-node islands of SuperMUC Tuning objective Minimize energy consumption of an application Tuning points and tuning actions Available governors or direct frequency settings 30.08.2013 Yury Oleynik, oleynik@in.tum.de 27
Tuning of the Master-Worker Pattern in MPI Target applications Applications implemented with Master Worker Pattern Tuning objective Improve load balancing Tuning points and tuning actions Partition factor Number of workers 30.08.2013 Yury Oleynik, oleynik@in.tum.de 28
Tuning of MPI Runtime Target application Currently parallel applications build with ibm MPI Tuning objective Optimize performance Tuning points and tuning actions MPI environment parameters MPI application mapping adapting tasks per node/core, adapting the affinity of the processes MPI communication buffer/protocol adapting the sending/receiving buffer analyzing the size pattern of the messages adapting the communication protocol (eager/rendezvous) code variants for MPI communication 30.08.2013 Yury Oleynik, oleynik@in.tum.de 29
Tuning of Compiler Flag Selection Target applications Any application Tuning objective Reduce the execution time of the application s phase region Tuning points and tuning actions Individual compiler flags of the compiler Switching ON or OFF of compiler switches during recompilation 30.08.2013 Yury Oleynik, oleynik@in.tum.de 30
Thank you! Questions? 30.08.2013 Yury Oleynik, oleynik@in.tum.de 31