Recent Advances in Periscope for Performance Analysis and Tuning



Similar documents
Tools for Analysis of Performance Dynamics of Parallel Applications

Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

Performance analysis with Periscope

for High Performance Computing

Unified Performance Data Collection with Score-P

FAKULTÄT FÜR INFORMATIK. Automatic Characterization of Performance Dynamics with Periscope

AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1

Performance Analysis and Optimization Tool

Unprecedented Performance and Scalability Demonstrated For Meter Data Management:

Sanjeev Kumar. contribute

Multi-GPU Load Balancing for Simulation and Rendering

The Complete Performance Solution for Microsoft SQL Server

A QUICK OVERVIEW OF THE OMNeT++ IDE

Data Center and Cloud Computing Market Landscape and Challenges

Fast and Easy Delivery of Data Mining Insights to Reporting Systems

IBM WebSphere DataStage Online training from Yes-M Systems

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

Experiment design and administration for computer clusters for SAT-solvers (EDACC) system description

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

Windchill Service Information Manager Curriculum Guide

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

10g versions followed on separate paths due to different approaches, but mainly due to differences in technology that were known to be huge.

Enhance visibility into and control over software projects IBM Rational change and release management software

SCADE System Technical Data Sheet. System Requirements Analysis. Technical Data Sheet SCADE System

ENEA BARE METAL PERFORMANCE TOOLS FOR NETLOGIC XLP AND CAVIUM OCTEON PLUS

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Cluster, Grid, Cloud Concepts

Automating Big Data Benchmarking for Different Architectures with ALOJA

Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ARM R&D December 2012

DELL s Oracle Database Advisor

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

HPC Wales Skills Academy Course Catalogue 2015

MAQAO Performance Analysis and Optimization Tool

Part I Courses Syllabus

Spring 2011 Prof. Hyesoon Kim

SOFTWARE TESTING TRAINING COURSES CONTENTS

Integrity 10. Curriculum Guide

Hardware design for ray tracing

Fundamentals of LoadRunner 9.0 (2 Days)

Private Public Partnership Project (PPP) Large-scale Integrated Project (IP)

Data Structure Oriented Monitoring for OpenMP Programs

MCA Standards For Closely Distributed Multicore

Key Attributes for Analytics in an IBM i environment

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

TEST AUTOMATION FRAMEWORK

Exploiting GPU Hardware Saturation for Fast Compiler Optimization

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

IBM Rational ClearCase, Version 8.0

Application Performance Analysis Tools and Techniques

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Customer Analytics. Turn Big Data into Big Value

DB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. mcain@us.ibm.com

Scala Storage Scale-Out Clustered Storage White Paper

Parallel I/O on JUQUEEN

Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks

Chapter 18: Database System Architectures. Centralized Systems

Driving force. What future software needs. Potential research topics

Performance Tuning Guidelines for PowerExchange for Microsoft Dynamics CRM

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie.

GPU Computing - CUDA

Integrated Open-Source Geophysical Processing and Visualization

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

VWF. Virtual Wafer Fab

A Pattern-Based Approach to. Automated Application Performance Analysis

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

WebSphere Business Monitor

A Multi-layered Domain-specific Language for Stencil Computations

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

Findings in High-Speed OrthoMosaic

MS SQL Server 2014 New Features and Database Administration

CHAPTER 4: SOFTWARE PART OF RTOS, THE SCHEDULER

Fast Prototyping Network Data Mining Applications. Gianluca Iannaccone Intel Research Berkeley

SAP Data Services 4.X. An Enterprise Information management Solution

Scientific Computing Programming with Parallel Objects

Once the product is installed, you'll have access to our complete User Guide from the client.

Recent Advances in HPC for Structural Mechanics Simulations

Microsoft Business Intelligence

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

High Performance Matrix Inversion with Several GPUs

What s New in MATLAB and Simulink

IBM Tivoli Composite Application Manager for WebSphere

Communiqué 4. Standardized Global Content Management. Designed for World s Leading Enterprises. Industry Leading Products & Platform

Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction

HP Application Lifecycle Management (ALM)

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Enterprise Manager Performance Tips

Characterizing Performance of Enterprise Pipeline SCADA Systems

TPCalc : a throughput calculator for computer architecture studies


1. PUBLISHABLE SUMMARY

Effective Java Programming. efficient software development

DARPA, NSF-NGS/ITR,ACR,CPA,

Scalability and Classifications

Transcription:

Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik, oleynik@in.tum.de

Outline Periscope overview Advances in Periscope Development I. PAThWay II. Performance Dynamics Analysis with Periscope III. Periscope Tuning Framework 30.08.2013 Yury Oleynik, oleynik@in.tum.de 2

Projects LMAC Leistungsdynamik massiv-paralleler Codes Performance Dynamics of Massively Parallel Codes BMBF project AutoTune Automatic Online Tuning European Union FP7 project 30.08.2013 Yury Oleynik, oleynik@in.tum.de 3

Periscope overview Distributed Architecture Analysis performed by multiple distributed hierarchical agents Iterative Online Analysis Measurements are configured, obtained and evaluated on the fly Automatic Analysis Based on formalized knowledge of performance optimization experts Eclipse Integration Eclipse based integrated development and performance analysis environment Measurement and Instrumentation Score-P or MRIMonitor 30.08.2013 Yury Oleynik, oleynik@in.tum.de 4

Advances in Periscope Development Performance Dynamics Cross-experiment performance dynamics: Provide a tool for automating and organization of performance experiments during the optimization process Runtime performance dynamics: Automatically search for runtime performance dynamics properties Performance Tuning Perform automatic search for application configuration delivering best performance according to given objective 30.08.2013 Yury Oleynik, oleynik@in.tum.de 5

I. Cross-experiment performance dynamics PATHWAY 30.08.2013 Yury Oleynik, oleynik@in.tum.de 6

Problem statement Performance Engineering Performance engineering is an iterative cycle Requires in-depth knowledge of hw and sw Each step may involve many tools & different configurations Repetitive and manual Optimization spans over months Hard to organize data & results No clear track of process evolution Examples Scalability analysis Cross-platform analysis Verify Optimize problematic code sections Baseline Establish/Update Execute Parallel application Monitor Performance Analyze Bottlenecks 30.08.2013 Yury Oleynik, oleynik@in.tum.de 7

PAThWay Eclipse plug-in for structured and methodical performance engineering using workflows Goals: Manage individual tasks as part of one workflow Automate performance engineering tasks, where possible Keep track and organize the process Abstract complexity of the underlying software and hardware 30.08.2013 Yury Oleynik, oleynik@in.tum.de 8

30.08.2013 Yury Oleynik, oleynik@in.tum.de 9

Workflow Editor Workflow editor Available workflow components 30.08.2013 Yury Oleynik, oleynik@in.tum.de 10

Experiment Browser Database stores also properties of the tools Experiments view Standard output and environment configuration Experiments Meta-data 30.08.2013 Yury Oleynik, oleynik@in.tum.de 11

Project Documentation Accessible documentation is important Requirements Work progress Optimization ideas Commonly spread around multiple documents Wiki-based editor Completed experiments Links to other external resources Other wiki pages 30.08.2013 Yury Oleynik, oleynik@in.tum.de 12

Supportive Modules Parallel Tools Platform Module Starting interactive/batch jobs Monitoring execution & accessing data Code Managements Keeps snapshots of the sources Based on Git Environment Detection Detects loaded modules Copies defined environment variables... 30.08.2013 Yury Oleynik, oleynik@in.tum.de 13

PAThWay Available as an Eclipse plugin from the update site: http://periscope.in.tum.de/pathway/eclipse/ Installation guide: http://periscope.in.tum.de/pathway/ 30.08.2013 Yury Oleynik, oleynik@in.tum.de 14

II. Performance Dynamics: at runtime AUTOMATIC PERFORMANCE DYNAMICS ANALYSIS WITH PERISCOPE 30.08.2013 Yury Oleynik, oleynik@in.tum.de 15

Automatic Performance Dynamics Analysis with Periscope Motivation for Performance Dynamics Analysis Location and severity of performance bottlenecks is time-dependent Performance changes manifest themselves at various time scales Dimensionality of performance measurements makes manual investigation by the user tedious Analysis goals: Automatically detect changes in temporal performance behavior Quantify the negative impact of performance changes Reduce complexity and size of time-dependent measurements Simplify comprehension (no graphical visualization) Group entities with similar temporal performance behavior 30.08.2013 Yury Oleynik, oleynik@in.tum.de 16

Automatic Performance Dynamics Analysis with Periscope Helps to answer following typical questions: Does the performance degrade over time? When is the degradation observed? What is the impact of the particular change? Which process/location is impacted by the performance degradation? Are there similar degradations found in other processes or functions? Approach Multi-scale analysis Qualitative abstraction of time series with quantitative information sufficient to characterize impact Representation mimics human mental model of temporal behavior Automatic search for performance dynamics properties 30.08.2013 Yury Oleynik, oleynik@in.tum.de 17

Automatic Performance Dynamics Analysis with Periscope: Analysis Steps 1. Measurement a) Collect dynamic profile time-series using Score-P 2. Preprocessing a) Perform Scale-Space Filtering by filtering with Gaussian b) Extract extremas and inflexion points 3. Qualitative Abstraction a) Track extremas and inflexion points from coarse to fine scales b) Label intervals between extremas and inflexion points c) Extract maximum lifetime level of the resulting tree of intervals 4. Search for performance dynamics properties a) Search maximum lifetime level for predefined patterns both qualitatively and quantitatively 30.08.2013 Yury Oleynik, oleynik@in.tum.de 18

Automatic Performance Dynamics Analysis with Periscope: Analysis Steps DABCBCDABCDABCDABCDABC D A D A B A C C C B CD B C CD B CD C B CD AB C B C B C C B C A - concave increase B - concave decrease C - convex decrease D - convex increase E - linear increase F - linear decrease G - constant 30.08.2013 Yury Oleynik, oleynik@in.tum.de 19

Automatic Performance Dynamics Analysis with Periscope: Search for dynamics properties Search for dynamic properties: Find all picks (AB): DABCBCDABCDABCDABCDABC Find the most prominent valley (CD): DABCBCDABCDABCDABCDABC Find the highest increase (DA): DABCBCDABCDABCDABCDABC 30.08.2013 Yury Oleynik, oleynik@in.tum.de 20

III. Performance tuning PERISCOPE TUNING FRAMEWORK 30.08.2013 Yury Oleynik, oleynik@in.tum.de 21

Periscope Tuning Framework Goals: Tune codes to improve performance and energy efficiency Combine analysis and tuning to speedup the tuning process Support multicore and GPU accelerated parallel systems Idea: Automatically evaluate optimization space Produce tuning recommendation Use it to improve production runs 30.08.2013 Yury Oleynik, oleynik@in.tum.de 22

PTF: Approach Define tuning strategies combining performance analysis infrastructure and tuning plugins Measured performance and energy properties are used in plugins to navigate the search for optimal configuration Available tuning plugins focus on: Tuning of High-Level Patterns for GPGPU Tuning of HMPP Codelets Tuning of Energy Consumption via CPU frequency Tuning of Master-Worker Pattern in MPI Tuning of MPI Runtime Tuning of Compiler Flag Selection 30.08.2013 Yury Oleynik, oleynik@in.tum.de 23

30.08.2013 Yury Oleynik, oleynik@in.tum.de 24

Tuning of High-Level Patterns for GPGPU Target applications Applications implemented in the pipeline patterns framework (developed in PEPPHER project) Tuning objective Optimize throughput of the pipeline Tuning points and tuning actions Replication factors of individual stages Buffer sizes of input and output ports of individual stages Splitting and merging of the stages 30.08.2013 Yury Oleynik, oleynik@in.tum.de 25

Tuning of HMPP Codelets Target applications OpenHMPP annotated applications To be run on heterogeneous many-core architecture Tuning Objective Optimize HMPP codelets performance Tuning points and tuning actions Static codelet tuning points: operations, transformations and algorithms used to implement a codelet, e.g. unrolling factor, the HMPP grid size Dynamic codelet tuning points: variables or callbacks available at runtime 30.08.2013 Yury Oleynik, oleynik@in.tum.de 26

Tuning of Energy Consumption via CPU Frequency Target applications Any application running on the thin-node islands of SuperMUC Tuning objective Minimize energy consumption of an application Tuning points and tuning actions Available governors or direct frequency settings 30.08.2013 Yury Oleynik, oleynik@in.tum.de 27

Tuning of the Master-Worker Pattern in MPI Target applications Applications implemented with Master Worker Pattern Tuning objective Improve load balancing Tuning points and tuning actions Partition factor Number of workers 30.08.2013 Yury Oleynik, oleynik@in.tum.de 28

Tuning of MPI Runtime Target application Currently parallel applications build with ibm MPI Tuning objective Optimize performance Tuning points and tuning actions MPI environment parameters MPI application mapping adapting tasks per node/core, adapting the affinity of the processes MPI communication buffer/protocol adapting the sending/receiving buffer analyzing the size pattern of the messages adapting the communication protocol (eager/rendezvous) code variants for MPI communication 30.08.2013 Yury Oleynik, oleynik@in.tum.de 29

Tuning of Compiler Flag Selection Target applications Any application Tuning objective Reduce the execution time of the application s phase region Tuning points and tuning actions Individual compiler flags of the compiler Switching ON or OFF of compiler switches during recompilation 30.08.2013 Yury Oleynik, oleynik@in.tum.de 30

Thank you! Questions? 30.08.2013 Yury Oleynik, oleynik@in.tum.de 31