Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Similar documents
The Intel VTune Performance Analyzer

VTune Performance Analyzer Essentials

Hardware-based performance monitoring with VTune Performance Analyzer under Linux

This table lists the files/information you need and what VTune Performance Analyzer features they enable.

Also on the Performance tab, you will find a button labeled Resource Monitor. You can invoke Resource Monitor for additional analysis of the system.

IBM Tivoli Monitoring V6.2.3, how to debug issues with Windows performance objects issues - overview and tools.

FIGURE Selecting properties for the event log.

Getting Started with CodeXL

Viewing and Troubleshooting Perfmon Logs

XpoLog Center Suite Log Management & Analysis platform

Windows Server 2012 Server Manager

AMD CodeXL 1.7 GA Release Notes

MONITORING PERFORMANCE IN WINDOWS 7

Server Manager Performance Monitor. Server Manager Diagnostics Page. . Information. . Audit Success. . Audit Failure

Tutorial: Analyzing Energy Usage on an Android* Platform

About This Guide Signature Manager Outlook Edition Overview... 5

CA Nimsoft Monitor Snap

WINDOWS PROCESSES AND SERVICES

Zing Vision. Answering your toughest production Java performance questions

Revealing the performance aspects in your code. Intel VTune Amplifier XE Generics. Rev.: Sep 1, 2013

Chapter 2: Getting Started

PERFORMANCE TOOLS DEVELOPMENTS

Eliminate Memory Errors and Improve Program Stability

HPSA Agent Characterization

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

Pcounter Web Report 3.x Installation Guide - v Pcounter Web Report Installation Guide Version 3.4

System Requirements - CommNet Server

Covene Cohesion Server Installation Guide A Modular Platform for Pexip Infinity Management November 11, 2014 Version 2.0 Revision 1.

Windows Server 2008 R2 Hyper V. Public FAQ

PATROL for Microsoft Windows Servers v Reviewer s Guide

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

How To Install An Aneka Cloud On A Windows 7 Computer (For Free)

System Requirements - Table of Contents

NetIQ Sentinel Quick Start Guide

User's Guide FairCom Performance Monitor

System Monitoring and Reporting

Improving Your App with Instruments

vtcommander Installing and Starting vtcommander

Automate Your BI Administration to Save Millions with Command Manager and System Manager

MCTS Guide to Microsoft Windows 7. Chapter 10 Performance Tuning

Backing up IMail Server using Altaro Backup FS

Gary Frost AMD Java Labs

Performance Monitoring of the Software Frameworks for LHC Experiments

LifeKeeper for Linux. Network Attached Storage Recovery Kit v5.0 Administration Guide

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

PATROL Console Server and RTserver Getting Started

GETTING STARTED Exclaimer Signature Manager Exchange Edition Overview Signature Content Signature Rules... 10

NetBeans Profiler is an

CA Nimsoft Monitor. Probe Guide for IIS Server Monitoring. iis v1.5 series

Kaseya 2. User Guide. Version 1.0


User Guide. version 1.2

THE BASICS OF PERFORMANCE- MONITORING HARDWARE

vcenter Operations Management Pack for SAP HANA Installation and Configuration Guide

Performance Navigator Installation

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Content Management Implementation Guide 5.3 SP1

Spectrum Technology Platform. Version 9.0. Administration Guide

User Guide. version 1.0

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

OPERATION MANUAL. MV-410RGB Layout Editor. Version 2.1- higher

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Chapter 3 Application Monitors

Attix5 Pro Server Edition

Spectrum Technology Platform. Version 9.0. Spectrum Spatial Administration Guide

Deploying Microsoft Operations Manager with the BIG-IP system and icontrol

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

IQ MORE / IQ MORE Professional

Performance Optimization for the Intel Atom Architecture

RTI Quick Start Guide

Monitoring Replication

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

Intel Application Software Development Tool Suite 2.2 for Intel Atom processor. In-Depth

Windows 2003 Performance Monitor. System Monitor. Adding a counter

Server Management Tools (ASMT)

Instrumentation Software Profiling

Load Imbalance Analysis

User Manual. Version connmove GmbH Version: Seite 1 von 33

Monitoring.NET Framework with Verax NMS

Running a Workflow on a PowerCenter Grid

High Performance Computing in Aachen

HP Virtualization Performance Viewer

PTC System Monitor Solution Training

User's Guide - Beta 1 Draft

vmprof Documentation Release 0.1 Maciej Fijalkowski, Antonio Cuni, Sebastian Pawlus

Example of Standard API

Mobile App Monitoring. Release Notes. Release 7.0

Using Process Monitor

Building an Embedded Processor System on a Xilinx Zync FPGA (Profiling): A Tutorial

13 Managing Devices. Your computer is an assembly of many components from different manufacturers. LESSON OBJECTIVES

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

TS2Mascot. Introduction. System Requirements. Installation. Interactive Use

Installation and configuration of Real-Time Monitoring Tool (RTMT)

OPERATING SYSTEM SERVICES

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

Heroix Longitude Quick Start Guide V7.1

HP IMC User Behavior Auditor

Next Generation ETS Software A preview of the upcoming new ETS5 system & user enhancements. KNX Association, Brussels André Hänel

Finding Performance and Power Issues on Android Systems. By Eric W Moore

Transcription:

Objectives At the completion of this module, you will be able to: Understand the intended purpose and usage models supported by the VTune Performance Analyzer. Identify hotspots by drilling down through various sample views. Understand how sampling works Use callgraph profiling to find hotspots Intel Software College 2 Agenda VTune Performance Analyzer What is the VTune Performance Analyzer? Performance tuning concepts Using the sampling collector How sampling works Sampling Over Time Call Graph Helps you identify and characterize performance issues by: Collecting performance data from the system running your application. Organizing and displaying the data in a variety of interactive views, from system-wide down to source code or processor instruction perspective. Identifying potential performance issues and suggesting improvements. 3 4 1

Supported Environments Local Performance Analysis Local and remote data collection Profile applications that are running on the system that has the analyzer installed on it, or Run profiling experiments on other systems that are running VTune analyzer remote agents on them Intel IA-32 Processors Microsoft Windows* operating systems Red Hat Linux* SuSE Linux Itanium Family Processors Microsoft Windows operating systems Red Hat Linux SuSE Linux For specific operating systems versions, see the release notes 5 6 Host/Target Environment Feature Overview VTune Performance Analyzer supports remote data collection VTune Performance Analyzer installed on host system Remote agent installed on target system Sampling Call graph Host System Windows* operating system Controls target View results of data collection LAN Connection Target System IA-32 or Itanium processor family Windows or Linux* Intel PXA2xx processors running Windows CE* 7 8 2

VTune Analyzer Features and Usage Models VTune Analyzer Features and Usage Models 9 10 VTune Analyzer Features and Usage Models VTune Analyzer Features and Usage Models 11 12 3

What Is a Hotspot? Where in an application or system there is a significant amount of activity Where = address in memory => OS process => OS thread => executable file or module => user function (requires symbols) => line of source code (requires symbols with line numbers) or processor (assembly) instruction Significant = activity that occurs infrequently probably does not have much impact on system performance Activity = time spent or other internal processor event Examples of other events: Cache misses, branch mispredictions, floatingpoint instructions retired, partial register stalls, and so on. Sampling: The Statistical Method of Finding Hotspots The sampling collector Periodically interrupts the processor Time-based Event-based: Triggered by the occurrence of a certain number of microarchitectural events Collects the execution context Execution address in memory (CS:IP) Operating system process and thread ID Executable module loaded at that address If you have symbols for the module, post-processing can identify the function or method at the memory address. Line numbers from the symbol file can direct you to the relevant line of source code. 13 14 Sampling Collector VTune Analyzer Sampling Collector Periodically interrupt the processor to obtain the execution context Time-based sampling (TBS) is triggered by: Operating system timer services Every n processor clockticks Event-based sampling (EBS) is triggered by processor event counter overflow These events are processor-specific, like L2 cache misses, branch mispredictions, floating-point instructions retired, and so on Sampling Results by Operating System Process This operating system process that has the most clockticks samples Sorted by Clockticks 15 16 4

VTune Analyzer Sampling Collector VTune Analyzer Sampling Collector Click here for OS Process view. Display only Clockticks sample data. Click here for Over Time view. Click here to break down by CPU. This view was filtered by selecting only one item from the process view. 17 18 VTune Analyzer Sampling Collector VTune Analyzer Sampling Collector Table View: Selection Summary of line is highlighted in table. Hotspot view of one module for all OS processes and threads grouped by function (or method). 19 20 5

VTune Analyzer Sampling Collector VTune Analyzer Sampling Collector Totals for Source Line Click here for disassembly view. Activity at Instruction Locations 21 22 VTune Analyzer Sampling Collector Zoom In Zoom Out Totals for Source Line Activity 1: Find the Hotspot Learn how to identify hotspots with the VTune analyzer. Activity at Instruction Locations Select Event Red time intervals have more samples in them. 23 24 6

Three Key Benefits of Sampling You do not have to modify your code. But DO compile/link with symbols and line numbers. But DO make release builds with optimizations. Sampling is system-wide. Not just YOUR application. You can see activity in operating system code, including drivers. Sampling overhead is very low. Validity is highest when perturbation is low. Overhead can be reduced further by turning off progress meters in the user interface. How else can you reduce sampling overhead? How Event-based Sampling (EBS) Works Conceptual Diagram Select Event Signal Count Down Interrupt CPU to Take Sample Sample After Number Underflow to Zero Internal Interrupt Controller How do you choose a Sample After number? 25 26 How Many Samples Are Enough? Objective: 1,000 Samples Per Second One million samples for a five-second run? Do you have enough samples for it to be statistically significant? How much overhead are you causing? What if you only get 100 samples? Is your sample after number 1? Are you getting a good profile? About 1,000 samples per second is a good balance between significance and overhead What is the sample after value for clockticks? Dependent upon CPU clock speed ANSWER: CPU clock speed in KHz If CPU clock speed = 1,400,000,000 Hz Sample after 1,400,000 clockticks What is the sample after value for L2 cache read misses? It depends on how often you miss the L2 cache! Circular definition? Is not that what you are trying to determine? Make an intelligent guess! Estimate! More or less often than the clockticks? 10 times? 100 times? 1000 times? 27 28 7

Calibration Sampling Over Time Sets the sample after value to get a reasonable number of samples. ~1000 samples per second per logical CPU Requires the workload to be run twice Manual Calibration: Uncheck Calibrate Sample After value Found on Advanced Activity Configuration dialog Start with default value or an estimate Run a test Modify the sample after value and re-test Try to get about a 1000 samples per second per logical CPU Shows how sample distributions change over time by process, thread, or module Zoom in on time regions Useful for: Identifying time-variant performance characteristics Understanding thread behavior 29 30 Sampling Over Time Usage Model Activity 2: Sampling Over Time Collect sampling data Select items of interest from either the process, thread, or modules view Click Highlight region of interest Click Click region to see process/thread/address histogram for time Learn how to use the Sampling Over Time view 31 32 8

Call Graph Profiling What Can You Profile? Tracks the function entry and exit points of your code at run time Uses binary instrumentation Uses this data to determine program flow, critical functions and call sequences Not system-wide: Only profiles code in applications call path in Ring 3 Win32 applications Stand-alone Win32* DLLs Stand-alone COM+ DLLs Java applications.net* applications ASP.NET applications Linux32* applications 33 34 Call Graph View Call Graph Navigation Window The red lines show the critical path. The critical path is the most timeconsuming call path. It is based on self time. Filter view by self time Bright orange nodes indicate functions with the highest self time. Use the graph navigation window for an overview of the entire call graph. 35 36 9

Call Graph Call List View Call Graph Metrics Performance Metric Description Self Time Total Time Total time in a function, excluding time spent in its children (includes wait time) Time measured from a function entry to exit point Total Wait Time Time spent in a function and its children when the thread is blocked Switch between call list and call graph views here. Wait Time Calls Time spent in a function when the thread is blocked (excludes blocked time in its children) Number of times the function is called 37 38 Activity 3: Call Graph Sampling Versus Call Graph Find the hotspot in gzip using call graph. Sampling Low overhead Call graph Higher overhead System-wide Ring 3 only on your application call tree System-wide address histogram Show function level hierarchy with call counts, times, and the critical path For function level drill-down, must have debug information Must re-link with /fixed:no, automatically instruments Can sample based on time and other processor events Results are based on time 39 40 10

Java* and.net* Applications Provides performance data for both managed code and unmanaged code Gives insight into how managed code calls translate into Win32* calls Uses managed code profiling API and binary instrumentation What s Been Covered You can use the different profilers in the VTune analyzer to understand the different aspects of the performance of your application. 41 42 Extra Slides 43 11

VTune Analyzer Features and Usage Models VTune Analyzer Features and Usage Models 45 46 Intel Tuning Assistant Intel Tuning Assistant Identifies bottlenecks in: Pentium 4, Pentium M, Itanium 2, and Pentium III processors. Uses EBS and Counter Monitor data. Shows scaling differences between different runs. Code Coach is still available but is not enabled by default. 47 48 12

Intel Tuning Assistant Intel Tuning Assistant VTune analyzer automatically selects events in the Sampling wizard. For more detail, click hyperlink. 49 50 Lab Activity 3: Getting Tuning Advice Windows* Command Line Interface Learn how to get processor-specific tuning advice Collect sampling data from the command line. Useful for integrating performance data collection into your automated regression testing. View the data in the VTune Performance Analyzer or export as ASCII text. Invoke by typing vtl at the command line. 51 52 13

Windows* Command Line Interface Creates hidden project structure To create an activity: vtl create [activity name] + options To run an activity: vtl run [activity name] To view activities type: vtl show To view results of a particular activity type: vtl view [activityname::result] [options] To delete the entire project: vtl delete all To delete a specific activity: vtl delete <activity name> Windows* Command Line Interface Examples Sample on clockticks and instructions retired and launch app matrix.exe: vtl activity a1 c sampling app matrix.exe run See the clocktick hotspots in matrix.exe: vtl view a1::r1 hf mn matrix.exe See the number of samples in each module system wide: vtl a1::r1 view modules 53 54 Windows* Command Line Interface Help For general command line arguments: vtl help For sampling command line arguments and events: vtl help c sampling For in depth help and examples go to: Start->Programs- >Intel VTune Performance Analyzer->Help for the Command Line Lab Activity 4: Using the Windows* Command Line Interface Learn how to collect sampling data from the command line 55 56 14

Call Graph Advanced Configuration Call Graph Advanced Options Set instrumentation levels. Helps control overhead Select which functions are instrumented. Helps control overhead This is the instrumented module status grid. Click here to set module instrumentation levels. 57 58 Instrumentation Levels More Advanced Call Graph Options Instrumentation Level Description Debug Info Required? All Functions Every function in the module is instrumented. Yes Cache directory location Custom Export You can specify which functions are instrumented Every function in the module s export table is instrumented. Yes No This is useful for long runs and very large applications. If you do not set this, the machine might run low on memory. Minimal The module is instrumented but no data is collected for it. No Allow call graph to instrument COM interfaces. 59 60 15

Function Selection Use Sampling and Call Graph Together Click here to enable or disable instrumentation for a particular function. Use sampling to find which functions have hotspots. Use call graph to find out who is calling these functions. 61 62 Lab Activity 6: Using Sampling and Call Graph Together Optimize an application (linpack) using sampling and call graph Sampling and Call Graph Have Different Hotspots? Self time includes blocked time. Event-based sampling (EBS) and time-based sampling (TBS) do not include blocked time in functions (this usually appears in processor.sys). Hotspots should be the same for self time wait time (this is non-blocked self time). 63 64 16

What Counter Monitor Does Performance DLL SDK Collects hardware and software performance counter data Windows* Perfmon* counters Performance DLL SDK Correlate counter data with sampling data SDK for creating custom performance counters that can be used by counter monitor Available on the Intel web site Example: performance counter that measures the transactions per second for a server application 65 66 Performance DLL SDK Monitor Window SDK for creating custom performance counters that can be used by counter monitor Example: performance counter that measures the transactions per second for a server application Click to highlight different counter data in the graph. 67 68 17

To Correlate Sampling Data Lab Activity 7: Counter Monitor Use counter monitor to analyze gzip Click the highlight icon and highlight a time slice by dragging over the graph from left to right. Click on the drill icon. You should now see the sampling data for that time slice. 69 70 Trigger API Allows you to create your own mechanism to programmatically trigger performance counter data collection Example: collect counter monitor data every time a frame is rendered 71 18