How to Use Open SpeedShop BGP and Cray XT/XE



Similar documents
Overview

Improve Fortran Code Quality with Static Analysis

Optimization tools. 1) Improving Overall I/O

Libmonitor: A Tool for First-Party Monitoring

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

RA MPI Compilers Debuggers Profiling. March 25, 2009

Parallel Visualization of Petascale Simulation Results from GROMACS, NAMD and CP2K on IBM Blue Gene/P using VisIt Visualization Toolkit

A Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville

Debugging with TotalView

Getting Started with CodeXL

TEST AUTOMATION FRAMEWORK

Q N X S O F T W A R E D E V E L O P M E N T P L A T F O R M v Steps to Developing a QNX Program Quickstart Guide

The BSN Hardware and Software Platform: Enabling Easy Development of Body Sensor Network Applications

Enterprise Manager Performance Tips

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Installing and running COMSOL on a Linux cluster

ITG Software Engineering

MPI / ClusterTools Update and Plans

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Performance Analysis and Optimization Tool

Analysis Programs DPDAK and DAWN

Running applications on the Cray XC30 4/12/2015

Tutorial: Packaging your server build

ERIKA Enterprise pre-built Virtual Machine

LANL Computing Environment for PSAAP Partners

End-user Tools for Application Performance Analysis Using Hardware Counters

SLURM Workload Manager

About This Document 3. Integration and Automation Capabilities 4. Command-Line Interface (CLI) 8. API RPC Protocol 9.

Linux tools for debugging and profiling MPI codes

Setting up SQL Translation Framework OBE for Database 12cR1

ELEC 377. Operating Systems. Week 1 Class 3

Vampir 7 User Manual

Cassandra 2.0: Tutorial

Application. 1.1 About This Tutorial Tutorial Requirements Provided Files

Generate Android App

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Hardware Performance Monitor (HPM) Toolkit Users Guide

Oak Ridge National Laboratory Computing and Computational Sciences Directorate. Lustre Crash Dumps And Log Files

GPU Tools Sandra Wienke

DS-5 ARM. Using the Debugger. Version 5.7. Copyright 2010, 2011 ARM. All rights reserved. ARM DUI 0446G (ID092311)

Hodor and Bran - Job Scheduling and PBS Scripts

CDH installation & Application Test Report

Automate Your BI Administration to Save Millions with Command Manager and System Manager

How To Visualize Performance Data In A Computer Program

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition

Determine the process of extracting monitoring information in Sun ONE Application Server

Introduction. Created by Richard Bell 10/29/2014

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Developing Parallel Applications with the Eclipse Parallel Tools Platform

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Interoperability between Sun Grid Engine and the Windows Compute Cluster

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

Database Replication Error in Cisco Unified Communication Manager

Tips and Tricks SAGE ACCPAC INTELLIGENCE

Application Performance Analysis Tools and Techniques

Eliminate Memory Errors and Improve Program Stability

NaviCell Data Visualization Python API

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION

Improve Fortran Code Quality with Static Security Analysis (SSA)

NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September National Institute of Standards and Technology (NIST)

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

DB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. mcain@us.ibm.com

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Easing embedded Linux software development for SBCs

Database Studio is the new tool to administrate SAP MaxDB database instances as of version 7.5.

Automated Performance Testing of Desktop Applications

Quick Introduction to HPSS at NERSC

Outbreak questionnaires and data entry using the new EpiData modules

INF-110. GPFS Installation

SourceAnywhere Service Configurator can be launched from Start -> All Programs -> Dynamsoft SourceAnywhere Server.

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Parallel Debugging with DDT

Parallel I/O on JUQUEEN

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

Code Estimation Tools Directions for a Services Engagement

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

Tool - 1: Health Center

For Introduction to Java Programming, 5E By Y. Daniel Liang

Creating Dynamics User Model Dynamic Linked Library (DLL) for Various PSS E Versions

GIVE WINGS TO YOUR IDEAS TOOLS MANUAL

Distributed Operating Systems. Cluster Systems

The "Eclipse Classic" version is recommended. Otherwise, a Java or RCP version of Eclipse is recommended.

Application Note: AN00141 xcore-xa - Application Development

OPERATING SYSTEM SERVICES

Python for Series 60 Platform

LICENSE4J FLOATING LICENSE SERVER USER GUIDE

Getting Started using the SQuirreL SQL Client

A Tutorial on installing and using Eclipse

Integrating NLTK with the Hadoop Map Reduce Framework Human Language Technology Project

Transcription:

How to Use Open SpeedShop BGP and Cray XT/XE ASC Booth Presentation @ SC 2010 New Orleans, LA 1

Why Open SpeedShop? Open Source Performance Analysis Tool Framework Most common performance analysis steps all in one tool Extensible by plugins for data collection and representation Flexible and Easy to use User access through GUI, Command Line, and Python Scripting Several Instrumentation Options All work on unmodified application binaries Offline and online data collection / attach to running codes Supports a wide range of systems Extensively used and tested on a variety of Linux clusters New: Cray XT and Blue Gene/P support Availability Current version and source available via sourceforge 2

Project Team Members Jim Galarowicz, Krell Don Maghrak, Krell David Montoya, LANL Mahesh Rajan, SNLs Martin Schulz, LLNL Larger team William Hachfeld and Dave Whitney, Krell Dane Gardner, LANL Scott Cranford and Joseph Kenny, SNLs Chris Chambreau and Matt Legendre, LLNL Dyninst group (Bart Miller, UW & Jeff Hollingsworth, UMD) Phil Roth, ORNL Ciera Jaspan, CMU 3

Outline Welcome 1 2 3 4 5 Quick Introduction into Open SpeedShop How it works on clusters Quick demonstration of how it works on clusters. How it works on BGP at LLNL Demonstration of how it works on BGP Questions & Additional Information 4

Section 2 Introduction into Open SpeedShop ASC Booth Presentation @ SC 2010 New Orleans, LA 5

Experiment Workflow Open SpeedShop Workflow Application Experiment Consists of one or more data Collectors Process Management Panel Run Results can be displayed using several Views Results Stored in SQL database 6

Basic Interface Step 1 Gather data from command line Example: osspcsamp <application> Create database Step 2 Analyze data in GUI Simple graphics Relate data to source 7

Advanced Interfaces Scripting language Batch interface O SS command line (CLI) Python module Experiment Commands expattach expcreate expdetach expgo expview List Commands list v exp list v hosts import openss list v src my_filename=openss.filelist("myprog.a.out") my_exptype=openss.exptypelist("pcsamp") Session Commands my_id=openss.expcreate(my_filename,my_exptype) setbreak opengui openss.expgo() My_metric_list = openss.metriclist("exclusive") my_viewtype = openss.viewtypelist("pcsamp ) result = openss.expview(my_id,my_viewtype,my_metric_list) 8

Performance Experiments Concept of an Experiment What to measure and what to analyze? Experiment is chosen by user Any experiment can be applied to any application Consists of Collectors and Views Collectors define specific data sources Hardware counters Tracing of library routines Views specify data aggregation and presentation Multiple collectors per experiment possible 9

Sampling Experiments PC Sampling (pcsamp) Record PC in user defined time intervals Low overhead overview of time distribution Call Path Profiling (usertime) PC Sampling and Call stacks for each sample Provides inclusive and exclusive timing data Hardware Counters (hwc, hwctime, hwcsamp) Sample HWC overflow events Access to data like cache and TLB misses Default event is PAPI_TOT_CYC overflows (hwc, hwctime) Sample up to six events at a time (hwcsamp) 10

Tracing Experiments Input/Output Tracing (io, iot) Record invocation of all POSIX I/O events Provides aggregate and individual timings MPI Tracing (mpi, mpit, mpiotf) Record invocation of all MPI routines Provides aggregate and individual timings Create Open Trace Format (OTF) output (mpiotf) Floating Point Exception Tracing (fpe) Triggered by any FPE caused by the application Helps pinpoint numerical problem areas 11

Parallel Experiments O SS supports MPI and threaded codes Tested with a variety of MPI implementations Thread support based on POSIX threads OpenMP supported through POSIX threads Any experiment can be applied to parallel application Automatically applied to all tasks/threads Default views aggregate across all tasks/threads Data from individual tasks/threads available Specific parallel experiments (e.g., MPI) 12

Running a First Experiment 1. Picking the experiment What do I want to measure? We will start with pcsamp to get a first overview 2. Launching the application How do I control my application under O SS? osspcsamp mpirun np 256 smg2000 n 80 80 80 3. Storing the results O SS will create a database Name: smg2000-pcsamp.openss 4. Exploring the gathered data O SS will print a default report Open the GUI to analyze data in detail (run: openss ) 13

Example Run with Output osspcsamp smg2000 n 80 80 80 14

Example Run with Output (2) osspcsamp smg2000 n 80 80 80 15

Default Output Report View Toolbar to switch Views Performance Data Default view: by Function (Data is sum from all processes and threads) Graphical Representation 16

Statement Report Output View Performance Data S-icon:Statement View Statement in Program that took the most time 17

Associate Source & Performance Data Double click to open source window Use window controls to split/arrange windows Selected performance data point 18

MPI (mpi) Tracing Results: Default View How to Analyze the Performance of Parallel Codes? - A Tutorial at SC 2010. 19

Load Balance View for NPB: LU Load Balance View based on functions (pcsamp) How to Analyze the Performance of Parallel Codes? - A Tutorial at SC 2010. 20

View Results: Show MPI Callstacks How to Analyze the Performance of Parallel Codes? - A Tutorial at SC 2010. 21

Section 3 Running on BGP and Cray XT/XE ASC Booth Presentation @ SC 2010 New Orleans, LA 22

Open SpeedShop & Static Executables When shared library support is limited Normal manner of running experiments doesn t work Need to link our collectors into the static executable osslink: A script to help with linking in our collectors osslink is a script that hides a lot of the link details Calls to it are usually embedded inside application makefiles Can also be used to compile and link applications Sorts the experiment specific library and collector specification Sorts out some platform differences to do the correct link The user generally needs find the target that creates the actual static executable and create a collector target that links in the selected collector as shown in the example. 23

Open SpeedShop & Static Executables Using the correct environment on FE versus BE Have dotkit or module files available as examples Build for FE tools Execute on FE tools Build for BGP BE tools Execute on BGP BE tools Execute on BGP BE tools dotkit Sets up the path to the Open SpeedShop tools bin directory Sets the OPENSS_MPI_IMPLEMENTATION environment variable Needed for mpi, mpit experiments to know the MPI implementation data structure definitions Sets up the library path to the Open SpeedShop runtimes and collectors specific to the BE node software environment Use this dotkit when linking in the Open SpeedShop collectors and runtimes for your application. 24

Re-linking application using osslink Example modification for smg2000 application smg2000: smg2000.o @echo "Linking" $@ "... " ${CC} -o smg2000 smg2000.o ${LFLAGS} smg2000-pcsamp: smg2000.o @echo "Linking" $@ "... " osslink -v -c pcsamp ${CC} -o smg2000-pcsamp smg2000.o ${LFLAGS} smg2000-usertime: smg2000.o @echo "Linking" $@ "... " osslink -v -c usertime ${CC} -o smg2000-usertime smg2000.o ${LFLAGS} smg2000-hwcsamp: smg2000.o @echo "Linking" $@ "... " osslink -v -c hwcsamp ${CC} -o smg2000-hwcsamp smg2000.o ${LFLAGS} smg2000-io: smg2000.o @echo "Linking" $@ "... " osslink -u open -v -c io ${CC} -o smg2000-io smg2000.o ${LFLAGS} smg2000-iot: smg2000.o @echo "Linking" $@ "... " osslink -u open -v -c iot ${CC} -o smg2000-iot smg2000.o ${LFLAGS} smg2000-mpi: smg2000.o @echo "Linking" $@ "... " osslink -v -c mpi ${CC} -o smg2000-mpi smg2000.o ${LFLAGS} 25

Running application on BG/P Example execution of relinked smg2000 application mxterm 32 32 30 -A dev -q pdebug # In mxterm window do the following setenv DK_NODE /usr/global/tools/openspeedshop/oss-dev/sles_10_ppc64/dotkit use openss_execute_bgp # pcsamp experiment example setup to run on BE nodes make smg2000-pcsamp rm -f /p/lscratcha/jeg/raw mkdir /p/lscratcha/jeg/raw # Must pass location for raw data to the BE node environment mpirun -np 32 -env "OPENSS_RAWDATA_DIR=/p/lscratcha/jeg/raw"./smg2000-pcsamp 26

Run Utility to Convert Raw Data into DB After running mpirun on BE nodes Convert the raw data created in the OPENSS_RAWDATA_DIR location into an Open SpeedShop database file for viewing ossutil is the utility to use to create the database file on FE node ossutil /p/lscratcha/jeg/raw Processing raw data for sweep3d Processing processes and threads... Processing performance data... Processing functions and statements... # Creates a file with suffix.openss, first one is named: X.0.openss, can be renamed by moving openss -f X.0.openss openss f cli f X.0.openss # Database file can be viewed on other machines/laptops w/o application present mv X.0.openss smg2000-pcsamp-512pe.openss 27

Status on the BG/P and Cray XT/XE Support for shared executables coming On Cray XT/XE hwcsamp not quite ready fpenot tested All other experiments working: pcsamp, usertime, hwc, hwctime, io, iot, mpi, mpit On BG/P usertime not working at scale hangs hwcsamp not quite ready fpenot tested pcsamp, io, iot, mpi, mpit are working 28

Demonstration on BG/P dawdev ASC Booth Presentation @ SC 2010 New Orleans, LA 29

Additional Information ASC Booth Presentation @ SC 2010 New Orleans, LA 30

Open SpeedShop Documentation Current version: 2.0.0 Open SpeedShop User Guide Documentation http://www.openspeedshop.org/docs/user_guide/ /share/doc/packages/openspeedshop/users_guide Python Scripting API Documentation http://www.openspeedshop.org/docs/pyscripting_doc/ /share/doc/packages/openspeedshop/pyscripting_doc Command Line Interface Documentation http://www.openspeedshop.org/docs/user_guide/ /share/doc/packages/openspeedshop/users_guide 31

Availability and Contact Open SpeedShop Website http://www.openspeedshop.org/ Download options: Package with install script Source for tool and base libraries Feedback Bug tracking available from website Contact information on website oss-questions@openspeedshop.org Feel free to contact presenters directly jeg@krellinst.org dpm@krellinst.org 32