Experiences with Tools at NERSC

Similar documents
Application Performance NERSC. David Skinner, Richard Gerber, Nick Wright, Karl Fuerlinger and 4000 others

Tools for Performance Debugging HPC Applications. David Skinner

Unified Performance Data Collection with Score-P

HPC Wales Skills Academy Course Catalogue 2015

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

BG/Q Performance Tools. Sco$ Parker Leap to Petascale Workshop: May 22-25, 2012 Argonne Leadership CompuCng Facility

GPU Tools Sandra Wienke

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

CUDA Tools for Debugging and Profiling. Jiri Kraus (NVIDIA)

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Performance Analysis for GPU Accelerated Applications

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

BG/Q Performance Tools. Sco$ Parker BG/Q Early Science Workshop: March 19-21, 2012 Argonne Leadership CompuGng Facility

Linux tools for debugging and profiling MPI codes

Evaluation of CUDA Fortran for the CFD code Strukti

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Parallel Programming Survey

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Load Imbalance Analysis

Designed for Maximum Accelerator Performance

Overview of HPC Resources at Vanderbilt

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Case Study on Productivity and Performance of GPGPUs

Parallel Debugging with DDT

Part I Courses Syllabus

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

"Premiers résultats sur le portage de NHOES (NonHydrostatic Ocean model for Earth Simulator) en GPU, en utilisant HMPP"

Performance Debugging: Methods and Tools. David Skinner

Keeneland Enabling Heterogeneous Computing for the Open Science Community Philip C. Roth Oak Ridge National Laboratory

The Asterope compute cluster

Performance Monitoring on NERSC s POWER 5 System

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

IUmd. Performance Analysis of a Molecular Dynamics Code. Thomas William. Dresden, 5/27/13

Allinea Forge User Guide. Version 6.0.1

MPI / ClusterTools Update and Plans

Designed for Maximum Accelerator Performance

Trends in High-Performance Computing for Power Grid Applications

Jean-Pierre Panziera Teratec 2011

A quick tutorial on Intel's Xeon Phi Coprocessor

Barry Bolding, Ph.D. VP, Cray Product Division

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

Developing Parallel Applications with the Eclipse Parallel Tools Platform

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Lecture 1: an introduction to CUDA

Performance Monitoring of Parallel Scientific Applications

MetaMorph Microscopy Automation & Image Analysis Software Super-Resolution Module

Performance Measurement and monitoring in TSUBAME2.5 towards next generation supercomputers

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Application Performance Analysis Tools and Techniques

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

8/15/2014. Best (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

End-user Tools for Application Performance Analysis Using Hardware Counters

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Introduction to application performance analysis

Installation and Operational Needs of Multi-purpose GPU Clusters

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

CRESTA DPI OpenMPI Performance Optimisation and Analysis

Data Centric Systems (DCS)

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

HPC Software Requirements to Support an HPC Cluster Supercomputer

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Supercomputing Resources in BSC, RES and PRACE

RA MPI Compilers Debuggers Profiling. March 25, 2009

Cray s Storage History and Outlook Lustre+ Jason Goodman, Cray LUG Denver

Program Grid and HPC5+ workshop

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Uncovering degraded application performance with LWM 2. Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 2014

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

Smarter Cluster Supercomputing from the Supercomputer Experts

Data Centric Interactive Visualization of Very Large Data

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

LANL Computing Environment for PSAAP Partners

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

An Introduction to Parallel Computing/ Programming

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Keeneland: Computational Science Using Heterogeneous GPU Computing

Allinea Performance Reports User Guide. Version 6.0.6

GPU Acceleration of the SENSEI CFD Code Suite

A Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Optimization tools. 1) Improving Overall I/O

Cloud Computing. Alex Crawford Ben Johnstone

Practical Performance Understanding the Performance of Your Application

Transcription:

Experiences with Tools at NERSC Richard Gerber NERSC User Services Programming weather, climate, and earth- system models on heterogeneous mul>- core pla?orms September 7, 2011 at the Na>onal Center for Atmospheric Research in Boulder, Colorado

Thanks for the invita>on My professional goal is to enable scien>sts to use HPC easily and effec>vely Contribute to important discoveries about how our natural world works Make a difference So it is an honor & meaningful to me to par>cipate in this conference One of my primary roles is as deputy on our next procurement team & we are extremely interested in learning about your experiences with hybrid systems and programming 2

Outline Recent experiences providing tools Observa>ons about tool usage by NERSC users What I d like to see in tools for the HPC community 3

Focus I ll focus on performance tools Who is the audience? Scien>sts? CS code engineers? Scien>st/engineers? Development tools Eclipse: We considered installing this, but concluded users best served by installing themselves in their own directories Compilers: PGI, Cray, HMPP direc>ves- based (your experiences?) Debuggers Allinea DDT & Rouge Wave Totalview Both are here and will present today Both are installed and supported at NERSC Generally good experiences, but not extensively used by users 4

Disclaimers These are my observa>ons and experiences Not a review or cri>que of any given tool Not comprehensive survey of tools 5

Genesis of This Talk Hi Richard: I started looking at the single- core performance on Hopper. The first problem that I encountered was that running the code with <tool name redacted> causes it to crash; so I gave up on that approach. What other profiling tools are you planning to support? I did look at gprof on the code but that also seems to have problems. Chris 6

The Plan Provide Chris with tools to scale and test his code s (HiRAM?) performance on NERSC s Cray XE6 Hopper (#5 in Top 500) At the same >me test and install some tools to benefit all NERSC users Report on my successes and survey the tools we would have available at NERSC (including our test GPU system) 7

Experience with Chris Kerr Chris & I went back and forth over a number of months, trying to install and use various tools. All failed Couldn t build a tool with some specific compiler Tool crashed (we did find some bugs) Tool appeared to run, but produced no output Chris found a solu>on for his problem, but it was a lot of work Gehmeofday(), print, somebody s private tool My result: no success iden>fying and installing new tools for our en>re user base 8

Experience with NERSC Users NERSC has about 4,000 users All levels of sophis>ca>on and experience We re commiked to suppor>ng both the cuhng edge & produc>on HPC compu>ng for the masses Users olen ask for advice on which tools to use and we give them sugges>ons Our experience is that very few use programming/debugging/development tools A few users use a few tools a lot, but many try a tool only once 9

Why? Extremely effec>ve? More likely: Too confusing, difficult, didn t work, don t know how to use, don t know which to use It s not that we don t have tools that address specific issues GPU/CUDA tools & compilers TAU, PAPI, HPC Toolkit Craypat, IBM HPC tools, OpenSpeedShop Valgrind Vampirtrace But do most users have the resources to learn how to use these tools, esp. when they don t know if there will be any benefit from any given one? 10

Vampir w/ CUDA 11

Ease of Use 12

Users Want (Need?) Tools Users are asking for tools because HPC systems and programming models are changing More and more components to worry about CPU (caches, FPUs, pipelining, ) Data movement to main memory, GPU memory, levels of cache I/O Network (message passing) CPU Threads (OpenMP) GPU performance 13

Ques>ons to You What tools do you use? What tools do you want? What would you like centers to support? Can you get to exascale without tools? 14

What I Want in a Tool Let the users help themselves Work for everyone all (most of?) the >me Easy to use Useful Easy to interpret the results Affordable ($$ or manpower support costs) Simple, supplement exis>ng complex tools Point the way for a deeper dive in problem areas 15

IPM as A Model NERSC s IPM: Integrated Performance Monitoring How it works (user perspec>ve) % module load IPM* Run program as normal Look at results on the web It s that easy! And extremely low overhead, so IPM is examining your produc>on code * (As long as your system supports dynamic load libs) 16

What IPM measures IPM only gives a high- level, en>re- program- centric view S>ll, very valuable guidance Shows whole- run info per MPI task, OpenMP thread, (CUDA under development) Many pieces of data in one place Reveals what many users don t know about their code High- water memory usage (per task) Load balance Call imbalance MPI >me I/O >me 17

IPM w/ CUDA: Accuracy Benchmarks from the CUDA SDK Run on 1 node of Dirac @ NERSC 2x Nehalem quad core, 1x NVIDIA Tesla C2050 ( Fermi ) GPU Comparison of the results from the CUDA profiler with IPM Very good agreement of the results IPM >mings always larger than CUDA profiler (bracke>ng) 18

IPM Examples Click on your job from a list on the NERSC web site. 19

IPM Examples Click on the metric you are want. 20

IPM Examples 21

IPM Examples 22

IPM Examples 23

IPM Examples 24

Summary HPC tools are, in general, difficult to use or require a large investment of >me to learn to use and interpret the results There is a need for tools to help users get to exascale (or even petascale). Simple, easy to use tools would be extremely useful, even if they did not give low- level details 25