PERFORMANCE TOOLS DEVELOPMENTS



Similar documents
Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Perf Tool: Performance Analysis Tool for Linux

Sequential Performance Analysis with Callgrind and KCachegrind

Performance Monitoring of the Software Frameworks for LHC Experiments

ICRI-CI Retreat Architecture track

Data Mining III: Numeric Estimation

AMD CodeXL 1.7 GA Release Notes

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Linux Performance Optimizations for Big Data Environments

Wiggins/Redstone: An On-line Program Specializer

Energy Efficient MapReduce

Sequential Performance Analysis with Callgrind and KCachegrind

Agenda. Capacity Planning practical view CPU Capacity Planning LPAR2RRD LPAR2RRD. Discussion. Premium features Future

Long-term monitoring of apparent latency in PREEMPT RT Linux real-time systems

Overlapping Data Transfer With Application Execution on Clusters

Key Attributes for Analytics in an IBM i environment

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

POMPDs Make Better Hackers: Accounting for Uncertainty in Penetration Testing. By: Chris Abbott

Performance and Energy Efficiency of. Hadoop deployment models

GPU Profiling with AMD CodeXL

Compiler-Assisted Binary Parsing

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

Windows 2003 Performance Monitor. System Monitor. Adding a counter

Full and Para Virtualization

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Networking Virtualization Using FPGAs

Application of Predictive Analytics for Better Alignment of Business and IT

Performance monitoring of the software frameworks for LHC experiments

I/O virtualization. Jussi Hanhirova Aalto University, Helsinki, Finland Hanhirova CS/Aalto

Windows Server 2008 R2 Hyper-V Live Migration

Benchmarking Hadoop & HBase on Violin

Performance Monitoring of Parallel Scientific Applications

Running a typical ROOT HEP analysis on Hadoop/MapReduce. Stefano Alberto Russo Michele Pinamonti Marina Cobal

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0

Perfmon2: A leap forward in Performance Monitoring

This table lists the files/information you need and what VTune Performance Analyzer features they enable.

Power-Aware High-Performance Scientific Computing

Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

The Intel VTune Performance Analyzer

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Korset: Code-based Intrusion Detection for Linux

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan

System/Networking performance analytics with perf. Hannes Frederic Sowa

Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ARM R&D December 2012

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Certification: HP ATA Servers & Storage

Cisco Performance Visibility Manager 1.0.1

11.1 inspectit inspectit

dnstap: high speed DNS logging without packet capture Robert Edmonds Farsight Security, Inc.

Performance Counters on Linux

A Study of Performance Monitoring Unit, perf and perf_events subsystem

1 How to Monitor Performance

Trace-Based and Sample-Based Profiling in Rational Application Developer

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Removing The Linux Routing Cache

EMC VNXe File Deduplication and Compression

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

Xeon+FPGA Platform for the Data Center

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Windows Server 2008 R2 Hyper-V Live Migration

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

POSIX and Object Distributed Storage Systems

MAQAO Performance Analysis and Optimization Tool

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Performance of Software Switching

CRGroup Whitepaper: Digging through the Data. Reporting Options in Microsoft Dynamics GP

Load Balance Strategies for DEVS Approximated Parallel and Distributed Discrete-Event Simulations

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Offloading file search operation for performance improvement of smart phones

Datacenter Operating Systems

A Survey Study on Monitoring Service for Grid

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Getting Started with CodeXL

Network Traffic Monitoring and Analysis with GPUs

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

Keep an eye on your PostgreSQL clusters

RF System Design and Analysis Software Enhances RF Architectural Planning

A Business Process Services Portal

Recent Advances in Periscope for Performance Analysis and Tuning


Dual-use tools and systematics-aware analysis workflows in the ATLAS Run-2 analysis model

Introducing the IBM Software Development Kit for PowerLinux

Performance analysis of a Linux based FTP server

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

MCTS Guide to Microsoft Windows 7. Chapter 10 Performance Tuning

Operating System Impact on SMT Architecture

BIG Data Analytics Move to Competitive Advantage

Transcription:

PERFORMANCE TOOLS DEVELOPMENTS Roberto A. Vitillo presented by Paolo Calafiura & Wim Lavrijsen Lawrence Berkeley National Laboratory Future computing in particle physics, 16 June 2011 1

LINUX PERFORMANCE EVENTS SUBSYSTEM The perf events subsystem was merged into the Linux kernel in version 2.6.31 and introduced the sys_perf_event_open system call Uses special purpose registers on the CPU to count the number of events An HW event can be, for example, the number of cache miss suffered or mispredicted branches SW events, like page misses, are also supported Performance counters are accessed via file descriptors using the above mentioned system call 2

LINUX PERFORMANCE EVENTS SUBSYSTEM (2) perf is an user space utility that is part of the kernel repository Available in Scientific Linux 6 Basic usage: data is collected by using the perf-record tool and displayed with perf-report 3

THE PERF TOOL: EXAMPLE USAGE 4

WHY DO WE CARE? The Linux Performance Events Subsystem provides a low overhead way to measure the workloads of a single application or the full system It s at least an order of magnitude faster than an instrumenting profiler It provides far more information compared to statistical profiler 5

WHAT IS MISSING Annotating the objdump output one event at a time is not enough for efficiently finding bottlenecks A real GUI that can display multiple events and their relations is missing New CPU s have a buffer that records the last taken branches but a support to exploit it is missing 6

PERF EVENTS CONVERTER As a first step a converter tool for the perf-tools data format has been introduced The tool is capable to convert a perf data file to a callgrind one that can be displayed with kcachegrind: multiple events are supported annotated source code, assembly and function list view complete inline chain 7

PERF EVENTS CONVERTER (2) 8

PERF EVENTS VISUALIZER KCachegrind doesn t permit to show an arbitrary number of events at the same time A new converter and a web-based GUI is under development The converter reads the a raw perf data file and produces spreadsheets, cycle accounting trees and call graphs The GUI will be able to: present the available data in spreadsheets, cycle accounting trees and callgraphs offer insights on the callgraph, e.g. mark as hot virtual methods with high call counts correlate different HW/SW events to gain a deeper understanding of the performance bottlenecks 9

LAST BRANCH RECORD SUPPORT New Intel processors have a cyclic buffer that can record taken branches Each recorded branch is composed of a pair of registers for source and destination Last Branch Records (LBR) sampling can be used to, e.g. evaluate the frequency of function calls and perform inline decisions yield the partial path of an event building a partial callgraph 10

IMPORTANCE OF LBR Atlas Software Issues: low instruction retired / call retired ratio high call retired / branch retired ratio Inlining functions called millions of times per event can indeed bring considerable benefits David Levinthal s proposal: Use LBR and static analysis to evaluate frequency and cost of function calls Use social network analysis / network theory to identify clusters of active, costly function call activity Order cluster by total cost and inline 11

LBR DEVELOPMENTS Kernel patch for filtering and dumping of the LBR is completed; After validation the patch will be integrated in the kernel trunk The perf report user space utility has a new feature to display statistics about the taken branches 12

EXPLOITING THE LBR IN PERF Statistics about DSO to DSO and Symbol to Symbol supported Optionally distinguish between predicted and mispredicted branches Filtering support 13

TODO Use a recursive disassembler instead of a linear one? Disassemble a module/function on the fly? Improve basic block counts by: B1 3 4 B2 using LBR to generate software instruction retired event adhering to flow conservation rules while limiting the amount of changes to sample counts to a minimum 1 B3 In general with sampling #B1 + #B2!= #B3 14

CONCLUSIONS The callgrind converter and the new GUI under development will offer an easy way to non experts to navigate and understand the profiled application The LBR support adds important profiling possibilities, vital for OO SW, to the Linux Performance Events Subsystem 15