How to Use Open SpeedShop BGP and Cray XT/XE ASC Booth Presentation @ SC 2010 New Orleans, LA 1
Why Open SpeedShop? Open Source Performance Analysis Tool Framework Most common performance analysis steps all in one tool Extensible by plugins for data collection and representation Flexible and Easy to use User access through GUI, Command Line, and Python Scripting Several Instrumentation Options All work on unmodified application binaries Offline and online data collection / attach to running codes Supports a wide range of systems Extensively used and tested on a variety of Linux clusters New: Cray XT and Blue Gene/P support Availability Current version and source available via sourceforge 2
Project Team Members Jim Galarowicz, Krell Don Maghrak, Krell David Montoya, LANL Mahesh Rajan, SNLs Martin Schulz, LLNL Larger team William Hachfeld and Dave Whitney, Krell Dane Gardner, LANL Scott Cranford and Joseph Kenny, SNLs Chris Chambreau and Matt Legendre, LLNL Dyninst group (Bart Miller, UW & Jeff Hollingsworth, UMD) Phil Roth, ORNL Ciera Jaspan, CMU 3
Outline Welcome 1 2 3 4 5 Quick Introduction into Open SpeedShop How it works on clusters Quick demonstration of how it works on clusters. How it works on BGP at LLNL Demonstration of how it works on BGP Questions & Additional Information 4
Section 2 Introduction into Open SpeedShop ASC Booth Presentation @ SC 2010 New Orleans, LA 5
Experiment Workflow Open SpeedShop Workflow Application Experiment Consists of one or more data Collectors Process Management Panel Run Results can be displayed using several Views Results Stored in SQL database 6
Basic Interface Step 1 Gather data from command line Example: osspcsamp <application> Create database Step 2 Analyze data in GUI Simple graphics Relate data to source 7
Advanced Interfaces Scripting language Batch interface O SS command line (CLI) Python module Experiment Commands expattach expcreate expdetach expgo expview List Commands list v exp list v hosts import openss list v src my_filename=openss.filelist("myprog.a.out") my_exptype=openss.exptypelist("pcsamp") Session Commands my_id=openss.expcreate(my_filename,my_exptype) setbreak opengui openss.expgo() My_metric_list = openss.metriclist("exclusive") my_viewtype = openss.viewtypelist("pcsamp ) result = openss.expview(my_id,my_viewtype,my_metric_list) 8
Performance Experiments Concept of an Experiment What to measure and what to analyze? Experiment is chosen by user Any experiment can be applied to any application Consists of Collectors and Views Collectors define specific data sources Hardware counters Tracing of library routines Views specify data aggregation and presentation Multiple collectors per experiment possible 9
Sampling Experiments PC Sampling (pcsamp) Record PC in user defined time intervals Low overhead overview of time distribution Call Path Profiling (usertime) PC Sampling and Call stacks for each sample Provides inclusive and exclusive timing data Hardware Counters (hwc, hwctime, hwcsamp) Sample HWC overflow events Access to data like cache and TLB misses Default event is PAPI_TOT_CYC overflows (hwc, hwctime) Sample up to six events at a time (hwcsamp) 10
Tracing Experiments Input/Output Tracing (io, iot) Record invocation of all POSIX I/O events Provides aggregate and individual timings MPI Tracing (mpi, mpit, mpiotf) Record invocation of all MPI routines Provides aggregate and individual timings Create Open Trace Format (OTF) output (mpiotf) Floating Point Exception Tracing (fpe) Triggered by any FPE caused by the application Helps pinpoint numerical problem areas 11
Parallel Experiments O SS supports MPI and threaded codes Tested with a variety of MPI implementations Thread support based on POSIX threads OpenMP supported through POSIX threads Any experiment can be applied to parallel application Automatically applied to all tasks/threads Default views aggregate across all tasks/threads Data from individual tasks/threads available Specific parallel experiments (e.g., MPI) 12
Running a First Experiment 1. Picking the experiment What do I want to measure? We will start with pcsamp to get a first overview 2. Launching the application How do I control my application under O SS? osspcsamp mpirun np 256 smg2000 n 80 80 80 3. Storing the results O SS will create a database Name: smg2000-pcsamp.openss 4. Exploring the gathered data O SS will print a default report Open the GUI to analyze data in detail (run: openss ) 13
Example Run with Output osspcsamp smg2000 n 80 80 80 14
Example Run with Output (2) osspcsamp smg2000 n 80 80 80 15
Default Output Report View Toolbar to switch Views Performance Data Default view: by Function (Data is sum from all processes and threads) Graphical Representation 16
Statement Report Output View Performance Data S-icon:Statement View Statement in Program that took the most time 17
Associate Source & Performance Data Double click to open source window Use window controls to split/arrange windows Selected performance data point 18
MPI (mpi) Tracing Results: Default View How to Analyze the Performance of Parallel Codes? - A Tutorial at SC 2010. 19
Load Balance View for NPB: LU Load Balance View based on functions (pcsamp) How to Analyze the Performance of Parallel Codes? - A Tutorial at SC 2010. 20
View Results: Show MPI Callstacks How to Analyze the Performance of Parallel Codes? - A Tutorial at SC 2010. 21
Section 3 Running on BGP and Cray XT/XE ASC Booth Presentation @ SC 2010 New Orleans, LA 22
Open SpeedShop & Static Executables When shared library support is limited Normal manner of running experiments doesn t work Need to link our collectors into the static executable osslink: A script to help with linking in our collectors osslink is a script that hides a lot of the link details Calls to it are usually embedded inside application makefiles Can also be used to compile and link applications Sorts the experiment specific library and collector specification Sorts out some platform differences to do the correct link The user generally needs find the target that creates the actual static executable and create a collector target that links in the selected collector as shown in the example. 23
Open SpeedShop & Static Executables Using the correct environment on FE versus BE Have dotkit or module files available as examples Build for FE tools Execute on FE tools Build for BGP BE tools Execute on BGP BE tools Execute on BGP BE tools dotkit Sets up the path to the Open SpeedShop tools bin directory Sets the OPENSS_MPI_IMPLEMENTATION environment variable Needed for mpi, mpit experiments to know the MPI implementation data structure definitions Sets up the library path to the Open SpeedShop runtimes and collectors specific to the BE node software environment Use this dotkit when linking in the Open SpeedShop collectors and runtimes for your application. 24
Re-linking application using osslink Example modification for smg2000 application smg2000: smg2000.o @echo "Linking" $@ "... " ${CC} -o smg2000 smg2000.o ${LFLAGS} smg2000-pcsamp: smg2000.o @echo "Linking" $@ "... " osslink -v -c pcsamp ${CC} -o smg2000-pcsamp smg2000.o ${LFLAGS} smg2000-usertime: smg2000.o @echo "Linking" $@ "... " osslink -v -c usertime ${CC} -o smg2000-usertime smg2000.o ${LFLAGS} smg2000-hwcsamp: smg2000.o @echo "Linking" $@ "... " osslink -v -c hwcsamp ${CC} -o smg2000-hwcsamp smg2000.o ${LFLAGS} smg2000-io: smg2000.o @echo "Linking" $@ "... " osslink -u open -v -c io ${CC} -o smg2000-io smg2000.o ${LFLAGS} smg2000-iot: smg2000.o @echo "Linking" $@ "... " osslink -u open -v -c iot ${CC} -o smg2000-iot smg2000.o ${LFLAGS} smg2000-mpi: smg2000.o @echo "Linking" $@ "... " osslink -v -c mpi ${CC} -o smg2000-mpi smg2000.o ${LFLAGS} 25
Running application on BG/P Example execution of relinked smg2000 application mxterm 32 32 30 -A dev -q pdebug # In mxterm window do the following setenv DK_NODE /usr/global/tools/openspeedshop/oss-dev/sles_10_ppc64/dotkit use openss_execute_bgp # pcsamp experiment example setup to run on BE nodes make smg2000-pcsamp rm -f /p/lscratcha/jeg/raw mkdir /p/lscratcha/jeg/raw # Must pass location for raw data to the BE node environment mpirun -np 32 -env "OPENSS_RAWDATA_DIR=/p/lscratcha/jeg/raw"./smg2000-pcsamp 26
Run Utility to Convert Raw Data into DB After running mpirun on BE nodes Convert the raw data created in the OPENSS_RAWDATA_DIR location into an Open SpeedShop database file for viewing ossutil is the utility to use to create the database file on FE node ossutil /p/lscratcha/jeg/raw Processing raw data for sweep3d Processing processes and threads... Processing performance data... Processing functions and statements... # Creates a file with suffix.openss, first one is named: X.0.openss, can be renamed by moving openss -f X.0.openss openss f cli f X.0.openss # Database file can be viewed on other machines/laptops w/o application present mv X.0.openss smg2000-pcsamp-512pe.openss 27
Status on the BG/P and Cray XT/XE Support for shared executables coming On Cray XT/XE hwcsamp not quite ready fpenot tested All other experiments working: pcsamp, usertime, hwc, hwctime, io, iot, mpi, mpit On BG/P usertime not working at scale hangs hwcsamp not quite ready fpenot tested pcsamp, io, iot, mpi, mpit are working 28
Demonstration on BG/P dawdev ASC Booth Presentation @ SC 2010 New Orleans, LA 29
Additional Information ASC Booth Presentation @ SC 2010 New Orleans, LA 30
Open SpeedShop Documentation Current version: 2.0.0 Open SpeedShop User Guide Documentation http://www.openspeedshop.org/docs/user_guide/ /share/doc/packages/openspeedshop/users_guide Python Scripting API Documentation http://www.openspeedshop.org/docs/pyscripting_doc/ /share/doc/packages/openspeedshop/pyscripting_doc Command Line Interface Documentation http://www.openspeedshop.org/docs/user_guide/ /share/doc/packages/openspeedshop/users_guide 31
Availability and Contact Open SpeedShop Website http://www.openspeedshop.org/ Download options: Package with install script Source for tool and base libraries Feedback Bug tracking available from website Contact information on website oss-questions@openspeedshop.org Feel free to contact presenters directly jeg@krellinst.org dpm@krellinst.org 32