Gary Frost AMD Java Labs gary.frost@amd.com



Similar documents
Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Profiling and Testing with Test and Performance Tools Platform (TPTP)

AMD CodeXL 1.7 GA Release Notes

Reminders. Lab opens from today. Many students want to use the extra I/O pins on

Trace-Based and Sample-Based Profiling in Rational Application Developer

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Practical Performance Understanding the Performance of Your Application

Eclipse installation, configuration and operation

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

What s Cool in the SAP JVM (CON3243)

System Requirements Table of contents

Chapter 3 Operating-System Structures

DBX. SQL database extension for Splunk. Siegfried Puchbauer

Android Development: a System Perspective. Javier Orensanz

Performance Monitoring of the Software Frameworks for LHC Experiments

New Technology Introduction: Android Studio with PushBot

System Structures. Services Interface Structure

Jonathan Worthington Scarborough Linux User Group

Effective Java Programming. efficient software development

Configuring Apache Derby for Performance and Durability Olav Sandstå

SimbaEngine SDK 9.4. Build a C++ ODBC Driver for SQL-Based Data Sources in 5 Days. Last Revised: October Simba Technologies Inc.

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

Full and Para Virtualization

Can You Trust Your JVM Diagnostic Tools?

Instrumentation Software Profiling

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

Choosing a Computer for Running SLX, P3D, and P5

Extreme Performance with Java

Zing Vision. Answering your toughest production Java performance questions

Installing Hortonworks Sandbox on Hyper-V

Predicting Change Outcomes Leveraging SQL Server Profiler

Improving Your App with Instruments

Performance tuning Xen

The Design of the Inferno Virtual Machine. Introduction

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Chapter 2 System Structures

ELEC 377. Operating Systems. Week 1 Class 3

Tutorial: Load Testing with CLIF

NetBeans Profiler is an

Hard Disk Drive vs. Kingston SSDNow V+ 200 Series 240GB: Comparative Test

An Easier Way for Cross-Platform Data Acquisition Application Development

Android: How To. Thanks. Aman Nijhawan

Getting Started with CodeXL

SQL Server Performance Tuning and Optimization

Java Troubleshooting and Performance

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

How Java Software Solutions Outperform Hardware Accelerators

Troubleshooting PHP Issues with Zend Server Code Tracing

Run your own Oracle Database Benchmarks with Hammerora

Introduction to Android

Main Bullet #1 Main Bullet #2 Main Bullet #3

Object Instance Profiling

REMOTE DEVELOPMENT OPTION

Holly Cummins IBM Hursley Labs. Java performance not so scary after all

1) SETUP ANDROID STUDIO

CA Unified Infrastructure Management

Using jvmstat and visualgc to Solve Memory Management Problems

How To Use Arcgis For Free On A Gdb (For A Gis Server) For A Small Business

How To Manage An Sap Solution

Preparing a SQL Server for EmpowerID installation

Introducing the IBM Software Development Kit for PowerLinux

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

SQL Server 2008 Performance and Scale

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

Customize Mobile Apps with MicroStrategy SDK: Custom Security, Plugins, and Extensions

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

THE BUSY DEVELOPER'S GUIDE TO JVM TROUBLESHOOTING

DB2 Application Development and Migration Tools

WebSphere Business Monitor V7.0: Clustering Single cluster deployment environment pattern

Microsoft SQL Server: MS Performance Tuning and Optimization Digital

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

IBM Proof of Technology Discovering business application services, featuring IBM WebSphere Application Server Network Deployment V8

MySQL and Virtualization Guide

S. Bouzefrane. How to set up the Java Card development environment under Windows? Samia Bouzefrane.

TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation

Ready Time Observations

How To Use Java On An Ipa (Jspa) With A Microsoft Powerbook (Jempa) With An Ipad And A Microos 2.5 (Microos)

Parallels Virtuozzo Containers for Windows

High Performance or Cycle Accuracy?

Automate Your BI Administration to Save Millions with Command Manager and System Manager

IOS110. Virtualization 5/27/2014 1

EVALUATION ONLY. WA2088 WebSphere Application Server 8.5 Administration on Windows. Student Labs. Web Age Solutions Inc.

ORACLE BUSINESS INTELLIGENCE WORKSHOP. Prerequisites for Oracle BI Workshop

Performance Oriented Management System for Reconfigurable Network Appliances


Lab 0 (Setting up your Development Environment) Week 1

Wiggins/Redstone: An On-line Program Specializer

Agilent Automated Card Extraction Dried Blood Spot LC/MS System

Tool - 1: Health Center

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

Transcription:

Analyzing Java Performance Using Hardware Performance Counters Gary Frost AMD Java Labs gary.frost@amd.com 2008 by AMD; made available under the EPL v1.0

2 Agenda AMD Java Labs Hardware Performance Counters A Workload To Profile Profiling using hprof Profiling with CodeAnalyst Profiling with CodeSleuth Demo Demo closure Q&A

3 AMD Java Labs An engineering team within AMD dedicated to improving Java performance on AMD processors Provide feedback/assistance in optimization work to JVM vendors Create/Evaluate Java performance related tools Provide benchmark analysis (SPEC member) Routing community feedback to silicon designers to improve future products Participate in Java and open source communities

4 What Are Hardware Performance Counters? Programmable counters inside the processor Accessed via device drivers (not from user mode) Can generate interrupts when trigger value is reached System-wide

5 How Performance Counters Can Help Java Developers Understand what the processor is really doing with your code Help isolate cause of bottlenecks Provide detail that JVM based profilers can not hprof takes you to the method, but not into the method No need to inject instrumentation Help developers make informed code and algorithm choices Measure and analyze as you develop code

6 A Workload To Profile Find path between start and end avoiding obstacles using Lee routing algorithm Start End

7 Mark starting cell with 1 1 for each cell with number n mark adjacent empty cells n+1 until end marked no empty cells 2 2 1 2

8 Eventually... Step onto end cell while (n!=1) mark as part of path step on adjacent cell (n-1) 3 2 3 2 1 3 2 4 3 5 4 3 2 3 2 1 3 2 4 3 5 4 4 5 6 5 6 7 8 11 10 9 10 4 5 6 5 6 7 8 11 10 9 10

9 Voila! 2 3 1 4 5 6 7 8 11 10 9

10 Profiling with hprof Shipped with the JDK easy to use command line tool java -agentlib:hprof=cpu=samples -cp lee.jar lee.main Creates a report rank self accum method 1 93% 93% lee.grid.flood 2 2% 95% java.lang.abstractstringbuilder.append 3 2% 97% java.util.collections.unmodifiablelist 4 2% 100% sun.nio.cs.utf_8.newdecoder

11 hprof Reports how much time was spent on various call paths (stacks traces) to a method But Does not tell us where within the method the time was spent Does not expose what happens in code emitted by the JIT (compiled methods) hprof is representative of many Java profilers in that it is limited by the information that can be extracted from the JVM.

12 How AMD s CodeAnalyst Can Help CodeAnalyst is a profiling tool which can access hardware performance counters Timer- or Event-based Profiling Linux 32- and 64-bit and Microsoft Windows 32- and 64-bit OSs Free! (http://developer.amd.com/tools) Maps performance data to address and source (C++/C/Java) Includes code emitted by the JIT

13 CodeAnalyst From The Command Line Coordinate the launching of an executable caprofile /s /l java cp my.jar agentlib:cajvmtia32 MyApp Profiler (caprofile) collects event data JVMTI agent (CAJVMTIA32) collects JIT data Correlates and creates reports cadataanalyze careport /i profile.tbp.dir/profile.tbp /m MyApp

14 CodeAnalyst caprofile /s /l java cp my.jar agentlib:cajvmtia32 MyApp caprofile jvm jvmtia agent caprof.sys start collecting java agentlib:cajvmtia32 MyApp process ends compilation events method compilation data performance data stop collecting

15 CodeAnalyst cadataanalyze careport /i profile.tbp.dir/profile.tbp /m MyApp performance data cadatanalyze method compilation data careport

16 CodeAnalyst Event data, addresses, and methods vma Timer % total Name 0x9877c0 24269 56.00 lee/grid::flood 0x98781b 1 0.00 0x98784d 6132 14.28 0x987853 474 1.10 0x987857 13135 30.58 0x987859 402 0.94

17 CodeAnalyst UI

18 CodeSleuth Open Source (EPL) Brings various CodeAnalyst views into Eclipse More natural for Java developers Same IDE we are used to SourceView is your Java editor Markers indicate event data we can expand Configure CodeAnalyst TM parameters in Launch Configuration Cross reference architecture documentation from within IDE Makes Java analysis easier

19 CodeSleuth Run Configuration

20 CodeSleuth Navigation View

21 Linking views to Java source

22 CodeSleuth Disassembled Code View

23 Documentation Views

24 Demo CodeSleuth

25 Demo Closure Code uses a large two dimensional array The order x by y or y by x for traversing this array is important when arrays are large (cache implications) The original scan order does not take advantage of the cache, in fact it pollutes its own cache By reordering the scan we can decrease cache pollution and thus improve performance

26 Demo Closure If we had 2 cache lines, each of which can hold 4 cells Y by X (Column Major) Cache miss, load, miss, load, miss, load (last load replaces first line!) X by Y (Row Major) Cache miss, load, hit, hit

27 Conclusions Hardware performance counters can help reveal performance issues CodeAnalyst enables you to analyze CPU level performance of Java applications CodeSleuth helps bring the power of this tool into your favorite IDE Promote the inclusion of performance analysis as part of your development/test cycle

28 CodeSleuth Links Site http://developer.amd.com/codesleuth Source http://codesleuth.sourceforge.net Update Site http://developer.amd.com/codesleuth_1_0_0

29 Links http://developer.amd.com Software downloads AMD Java Labs blog Forum discussions Lee, C.Y., "An Algorithm for Path Connections and Its Applications", IRE Transactions on Electronic Computers, vol. EC-10, number 2, pp. 364-365, 1961