RA MPI Compilers Debuggers Profiling. March 25, 2009

Similar documents
Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Parallel Debugging with DDT

Batch Scripts for RA & Mio

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Tools for Performance Debugging HPC Applications. David Skinner

Hodor and Bran - Job Scheduling and PBS Scripts

HPCC - Hrothgar Getting Started User Guide MPI Programming

Automated Testing of Installed Software

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

To connect to the cluster, simply use a SSH or SFTP client to connect to:

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

The Asterope compute cluster

Running applications on the Cray XC30 4/12/2015

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Grid 101. Grid 101. Josh Hegie.

Parallelization: Binary Tree Traversal

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

Session 2: MUST. Correctness Checking

Quick Tutorial for Portable Batch System (PBS)

The RWTH Compute Cluster Environment

Lightning Introduction to MPI Programming

Debugging with TotalView

MPI Runtime Error Detection with MUST For the 13th VI-HPS Tuning Workshop

MPI Application Development Using the Analysis Tool MARMOT

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

NEC HPC-Linux-Cluster

Installing and running COMSOL on a Linux cluster

NYUAD HPC Center Running Jobs

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

Optimization tools. 1) Improving Overall I/O

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Biowulf2 Training Session

Performance Monitoring of Parallel Scientific Applications

Introduction to SDSC systems and data analytics software packages "

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Using Parallel Computing to Run Multiple Jobs

MPI / ClusterTools Update and Plans

Advanced PBS Workflow Example Bill Brouwer 05/01/12 Research Computing and Cyberinfrastructure Unit, PSU

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

How to Run Parallel Jobs Efficiently

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

SLURM Workload Manager

Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1

The CNMS Computer Cluster

Performance Debugging: Methods and Tools. David Skinner

High performance computing systems. Lab 1

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Introduction to Hybrid Programming

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Getting Started with HPC

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Setting up PostgreSQL

Allinea Performance Reports User Guide. Version 6.0.6

Job Scheduling with Moab Cluster Suite

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

24/08/2004. Introductory User Guide

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

Installation Guide for Basler pylon 2.3.x for Linux

8/15/2014. Best (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

End-user Tools for Application Performance Analysis Using Hardware Counters

Using NeSI HPC Resources. NeSI Computational Science Team

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

DS-5 ARM. Using the Debugger. Version 5.7. Copyright 2010, 2011 ARM. All rights reserved. ARM DUI 0446G (ID092311)

Maintaining Non-Stop Services with Multi Layer Monitoring

Introduction to Supercomputing with Janus

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Deploying Microsoft Operations Manager with the BIG-IP system and icontrol

ELEC 377. Operating Systems. Week 1 Class 3

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

Monitoring, Tracing, Debugging (Under Construction)

High-Performance Reservoir Risk Assessment (Jacta Cluster)

Manual for using Super Computing Resources

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Matlab on a Supercomputer

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Tivoli Access Manager Agent for Windows Installation Guide

Introduction to MPI Programming!

DS-5 ARM. Using the Debugger. Version Copyright ARM. All rights reserved. ARM DUI 0446M (ID120712)

User s Manual

Introduction to HPC Workshop. Center for e-research

Parallel Computing. Parallel shared memory computing with OpenMP

Parallel and Distributed Computing Programming Assignment 1

The SUN ONE Grid Engine BATCH SYSTEM

Transcription:

RA MPI Compilers Debuggers Profiling March 25, 2009

Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar -xzf examples.tgz 5. cd stommel Slides http://geco.mines.edu/workshop/tools Note: There is summary of all scripts given at the end of the slides for easy copy/paste

Experimental MPI Versions

New MPI Compilers Version MVAPICH2 1.2 MVAPICH 1.1 OpenMPI 1.3.1 Both Intel and Portland Group Compilers Support for Debuggers Support for Profiling

Need to modify your Environment Change.tcshrc or.bashrc file Log out then log back in Changes override mpi_selector settings May need to change your PBS script

.tcshrc settings setenv MPI_VERSION /lustre/home/apps/mpi/db/mvapich-1.1 setenv MPI_VERSION /lustre/home/apps/mpi/db/mvapich2-1.2 setenv MPI_VERSION /lustre/home/apps/mpi/db/openmpi1.3.1 setenv MPI_COMPILER intel #setenv MPI_COMPILER pg if ( $?MPI_COMPILER && $?MPI_VERSION ) then setenv MPI_BASE $MPI_VERSION/$MPI_COMPILER setenv LD_LIBRARY_PATH $MPI_BASE/lib:$LD_LIBRARY_PATH setenv LD_LIBRARY_PATH $MPI_BASE/lib/shared:$LD_LIBRARY_PATH setenv MANPATH $MPI_BASE/man:$MPI_BASE/shared/man:$MANPATH set path = ( $MPI_BASE/bin $path ) endif

.bashrc settings export MPI_VERSION=/lustre/home/apps/mpi/db/mvapich-1.1 export MPI_VERSION=/lustre/home/apps/mpi/db/mvapich2-1.2 export MPI_VERSION=/lustre/home/apps/mpi/db/openmpi1.3.1 export MPI_COMPILER=intel #export MPI_COMPILER=pg if [ -n $MPI_COMPILER ]; then if [ -n $MPI_VERSION ]; then export MPI_BASE=$MPI_VERSION/$MPI_COMPILER export LD_LIBRARY_PATH=$MPI_BASE/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=$MPI_BASE/lib/shared:$LD_LIBRARY_PATH export MANPATH=$MPI_BASE/man:$MPI_BASE/shared/man:$MANPATH export PATH=$MPI_BASE/bin:$PATH fi fi

Base Script #!/bin/csh #PBS -l nodes=2:ppn=8 #PBS -l walltime=00:02:00 #PBS -N testio #PBS -o stdout.$pbs_jobid #PBS -e stderr.$pbs_jobid #PBS -r n #PBS -V #----------------------------------------------------- cd $PBS_O_WORKDIR sort -u $PBS_NODEFILE > mynodes.$pbs_jobid ADD YOUR MPI RUN COMMAND HERE

MPI Run commands Version Command openmpi1.3.1 mpiexec -np 16 stc_06 mvapich2-1.2 mpiexec -np 16 /lustre/home/tkaiser/examples/stommel/stc_06 < st.in mvapich-1.1 mpirun_rsh -hostfile $PBS_NODEFILE -np 16 stc_06 < st.in mpirun -machinefile $PBS_NODEFILE -np 16 stc_06 < st.in

Debugging with ddt

Not a big fan of debuggers End up debugging the debugger Steep learning curve Can be misleading Difficult for large processor count and the problem might only show up there My favorite debuggers are printf write

However... I recently used ddt to find a problem for which printf did not work. It might have taken me weeks. Print statements might make the problem go away Debuggers are useful for learning a program that you have never seen ddt is working well on RA

Allinea DDT debugger X-Windows based ssh -X ra An initial setup is done the first time you run Works with both Portland Group and Intel Fortran Good support for Fortran modules Syntax highlighting

.tcshrc Environment for ddt set path = ( /lustre/home/apps/ddt2.4.1/bin $path ) setenv DMALLOCPATH /lustre/home/apps/ddt2.4.1 setenv DMALLOC setenv LD_LIBRARY_PATH $DMALLOCPATH/lib/64:$LD_LIBRARY_PATH.bashrc Requires that you use a MPI that supports debugging such as those listed above export PATH=/lustre/home/apps/ddt2.4.1/bin:$PATH export DMALLOCPATH=/lustre/home/apps/ddt2.4.1 export DMALLOC="" export LD_LIBRARY_PATH=$DMALLOCPATH/lib/64:$LD_LIBRARY_PATH

Debug Compile Line mpicc -g \ -L/lustre/home/apps/gdb-6.8/lib64 \ -liberty \ stc_06.c \ -o stc_06.g

Debug Compile Line mpicc -g -L/lustre/home/apps/gdb-6.8/lib64 -liberty \ stc_06.c \ /lustre/home/apps/ddt2.4.1/lib/64/libdmalloc.a -o \ stc_06.g Here we link to the debug memory library. This is required if you want to track memory usage in ddt. Note it must library be last on the list.

stdin stdout stderr stdin works for both Intel and Portland Group stdout works with the Intel compiler without modification Portland Group compiler requires a special call to be able to see stdout while the program is running, (before MPI_Init) This is NOT a bug call setvbuf3f(6,2,0) setbuf(stdout,null); for Fortan for C

Initial ddt setup Run first time, creates a directory ~/.ddt type ddt Choose a MPI version Choose a list of nodes (Default) Note location of this file Need to change this list to connect to running process Wait a few seconds

Snapz Pro X

Running ddt Select Run and Debug a Program Set number of processes Most likely Set threads to off Click Run Details to follow... Select the program that you will run

To show you... Routine required for correct stdio with Portland Group compiler Setting stdin Module support Changing values Locals / Current Line

Option: Let ddt submit a batch job Your run script becomes a template which ddt fills in the arguments at submit time Tell ddt the particulars Program Input # processors <= 16 ddt will watch the queue for your job to start and then connect

Let ddt submit a batch job Change your run line to run ddt with your program as an argument mpiexec -n 8 stf_03.g < st.in for example, becomes mpiexec -n NUM_PROCS_TAG DDTPATH_TAG/bin/ddt-debugger DDT_DEBUGGER_ARGUMENTS_TAG PROGRAM_ARGUMENTS_TAG Add (Not required but useful for attaching to already running jobs) sort -u $PBS_NODEFILE > mynodes.$pbs_jobid cp mynodes.$pbs_jobid ~/.ddt/nodes

A simple script (more later for specific versions of MPI) #!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testio #PBS -o stdout.$pbs_jobid #PBS -e stderr.$pbs_jobid #PBS -r n #PBS -V #----------------------------------------------------- cd $PBS_O_WORKDIR #save a nicely sorted list of nodes sort -u $PBS_NODEFILE > mynodes.$pbs_jobid cp mynodes.$pbs_jobid ~/.ddt/nodes #for openmpi #mpiexec -n 8 stf_03.g < st.in Note this line is commented out. This one is alive #for openmpi and ddt mpiexec -n NUM_PROCS_TAG DDTPATH_TAG/bin/ddt-debugger \ DDT_DEBUGGER_ARGUMENTS_TAG PROGRAM_ARGUMENTS_TAG

Under Session - Options

Finally select Session - New Session - Run

Let ddt submit the job for you

OpenMPI Debug Script #!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testio #PBS -o stdout.$pbs_jobid #PBS -e stderr.$pbs_jobid #PBS -r n #PBS -V #----------------------------------------------------- cd $PBS_O_WORKDIR #save a nicely sorted list of nodes sort -u $PBS_NODEFILE > mynodes.$pbs_jobid cp mynodes.$pbs_jobid ~/.ddt/nodes DDTPATH_TAG/bin/ddt-client DDT_DEBUGGER_ARGUMENTS_TAG mpiexec -np \ NUM_PROCS_TAG EXTRA_MPI_ARGUMENTS_TAG PROGRAM_TAG \ PROGRAM_ARGUMENTS_TAG

MVAPICH2 Debug Script #!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testio #PBS -o stdout.$pbs_jobid #PBS -e stderr.$pbs_jobid #PBS -r n #PBS -V #----------------------------------------------------- cd $PBS_O_WORKDIR #save a nicely sorted list of nodes sort -u $PBS_NODEFILE > mynodes.$pbs_jobid cp mynodes.$pbs_jobid ~/.ddt/nodes mpiexec -n NUM_PROCS_TAG \ DDTPATH_TAG/bin/ddt-debugger \ DDT_DEBUGGER_ARGUMENTS_TAG PROGRAM_ARGUMENTS_TAG

MVAPICH-1.1 Debug Script #!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:15:00 #PBS -N testio #PBS -o stdout.$pbs_jobid #PBS -e stderr.$pbs_jobid #PBS -r n #PBS -V cd $PBS_O_WORKDIR #save a nicely sorted list of nodes sort -u $PBS_NODEFILE > mynodes.$pbs_jobid cp mynodes.$pbs_jobid ~/.ddt/nodes mpirun_rsh -hostfile $PBS_NODEFILE -n \ NUM_PROCS_TAG DDTPATH_TAG/bin/ddt-debugger \ DDT_DEBUGGER_ARGUMENTS_TAG PROGRAM_ARGUMENTS_TAG

Attaching to a batch job Key here is that ddt needs to know where your job is running Add the following two lines to your script sort -u $PBS_NODEFILE > mynodes.$pbs_jobid cp mynodes.$pbs_jobid ~/.ddt/nodes ddt will look in ~/.ddt/nodes for nodes to search

Attaching to a batch job

To Attach to a Running Process Session - New Session - Attach List should pop up Nodes need to be in ~/.ddt/nodes

Attaching to a interactive job Key here is that ddt needs to know where your job is running ddt will look in ~/.ddt/nodes for nodes to search You may need to manually edit this file

Attaching to an interactive job

Things to show... Changing MPI version Basic setup Setting break points Seeing modules Memory usage Launching a parallel job Seeing and changing variables

Profiling with IPM

Integrated Performance Monitoring (IPM) Developed by Nick Wright of SDSC http://www.sdsc.edu/us/tools/top/ipm/ Local limited documentation http://geco.mines.edu/ipm/ Available on RA for Experimental versions of MVAPICH* Normal Compile - adding IPM library Normal MPI run Summary of MPI stats at the end of your run to stdout Can Generate a Web page with nice pictures

Integrated Performance Monitoring (IPM) Integrated Performance Monitoring (IPM) is a tool that allows users to obtain a concise summary of the performance and communication characteristics of their codes. IPM is invoked by the user at the time a job is run. By default, a short, text-based summary of the code's performance is provided, and a more detailed Web page summary with graphs to help visualize the output can also be obtained.

Environment Additions for IPM.tcshrc set path = ( $path /lustre/home/apps/pl/bin ) set path = ( $path /lustre/home/apps/ipm/bin ) setenv IPM_KEYFILE /lustre/home/apps/ipm/ipm_key.bashrc export PATH=$PATH:/lustre/home/apps/pl/bin export PATH=$PATH:/lustre/home/apps/ipm/bin export IPM_KEYFILE=/lustre/home/apps/ipm/ipm_key

Compiling for IPM mpif90 -g stf_03.f90 -L$MPI_BASE/ipm/lib -lipm -o stf_03.ipm $MPI_BASE = /lustre/home/apps/mpi/db/version VERSION mvapich-1.1/pg mvapich-1.1/intel mvapich2-1.2/pg mvapich2-1.2/intel openmpi/* Works? yes Stay Tuned yes yes No - know problem

##IPMv0.923#################################################################### # # command : unknown (completed) # host : compute-9-9/x86_64_linux mpi_tasks : 8 on 1 nodes # start : 03/24/09/14:08:52 wallclock : 31.347469 sec # stop : 03/24/09/14:09:24 %comm : 1.24 # gbytes : 0.00000e+00 total gflop/sec : 0.00000e+00 total # ############################################################################## # region : * [ntasks] = 8 # # [total] <avg> min max # entries 8 1 1 1 # wallclock 250.773 31.3467 31.3465 31.3475 # user 250.589 31.3236 31.1813 31.3532 # system 0.448929 0.0561161 0.043993 0.089986 # mpi 3.10778 0.388473 0.112456 0.610158 # %comm 1.23925 0.35875 1.94643 # gflop/sec 0 0 0 0 # gbytes 0 0 0 0 # # # [time] [calls] <%mpi> <%wall> # MPI_Recv 2.60098 32032 83.69 1.04 # MPI_Reduce 0.272061 8000 8.75 0.11 # MPI_Send 0.232291 32032 7.47 0.09 # MPI_Bcast 0.00119273 96 0.04 0.00 # MPI_Comm_size 0.000790782 24 0.03 0.00 # MPI_Allreduce 0.000330307 32 0.01 0.00 # MPI_Allgather 0.000130918 16 0.00 0.00 # MPI_Comm_rank 6.7791e-06 46 0.00 0.00 ###############################################################################

3/24/09 2:15 PM Generate a web page: ipm_parse -html tkaiser.1237925332.870435.0 IPM profile for unknown IPM profile for unknown 3/24/09 2:15 PM IPM profile for unknown 3/24/09 2:15 PM unknown Load Balance Communication Balance Message Buffer Sizes Communication Topology Switch Traffic Memmory Usage Executable Info Host List Environment Developer Info command: unknown codename: unknown state: running username: tkaiser group: tkaiser host: Computation compute-9-9 (x86_64_linux) mpi_tasks: 8 on 1 hosts start: 03/24/09/14:08:52 wallclock: 3.13475e+01 sec stop: 03/24/09/14:09:24 %comm: 1.23924675013956 total memory: 0 gbytes total gflop/sec: - 0.255203764255523 switch(send): 0 gbytes switch(recv): 0 gbytes Communication Event Count Pop NULL 0 * % of MPI Time by MPI rank, by MPI time Load balance by task: memory, flops, timings by MPI rank, time detail by MPI time, time detail by rank, call list Message Buffer Size Distributions: time IPM profile for unknown 3/24/09 2:15 PM HPM Counter Statistics Event Ntasks Avg Min(rank) Max(rank) NULL * 0.00 0 (0) 0 (0) Communication Event Statistics (100.00% detail, 3.0422e-06 error) Buffer Size Ncalls IPM Total profile Time for unknown Min Time Max Time %MPI %Wall MPI_Recv 8016 12012 1.779 2.316e-06 1.511e-02 57.26 0.71 MPI_Recv 4016 8008 0.816 8.717e-07 1.487e-02 26.26 0.33 MPI_Reduce 8 8000 0.272 3.898e-06 9.003e-04 8.75 0.11 MPI_Send 8016 16016 0.192 4.191e-08 6.679e-05 6.17 0.08 MPI_Send 4016 16016 0.041 5.402e-08 4.328e-05 1.30 0.02 Load balance by task: HPM counters 3/24/09 2:15 PM by MPI rank, by MPI time Communication balance by task (sorted by MPI time) cumulative values, values Message Buffer Size Distributions: Ncalls file:///users/tkaiser/desktop/unknown_8_tkaiser.1237925332.870435.0_ipm_unknown/index.html Page 1 of 5 file:///users/tkaiser/desktop/unknown_8_tkaiser.1237925332.870435.0_ipm_unknown/index.html Page 2 of 5 file:///users/tkaiser/desktop/unknown_8_tkaiser.1237925332.870435.0_ipm_unknown/index.html Page 3 of 5 cumulative values, values Communication Topology : point to point data flow data sent, data recv, time spent, map_data file map_adjacency file Switch Traffic (volume by node) Memory usage by node file:///users/tkaiser/desktop/unknown_8_tkaiser.1237925332.870435.0_ipm_unknown/index.html Page 5 of 5

Can profile sections Report will have a new page with the given label!turn on profiling call mpi_pcontrol( 1,"proc_a"//char(0))...!turn off profilingcall mpi_pcontrol( -1,"proc_a"//char(0)) /* turn on profiling*/ MPI_Pcontrol( 1,"proc_a");... /* turn off profiling*/ MPI_Pcontrol(-1,"proc_a");

What s Missing What are we doing about it? Timeline style program tracing Time in MPI routines Communication patterns Time in other routines Memory Tracking Performance numbers Flops Cache misses...

Tracing Evaluated a commercial package and rejected it Will be installing Tau http://www.cs.uoregon.edu/research/tau/home.php Large package which does preprocessing of source Works with many analysis packages Includes memory tracking if malloc/allocate can be seen

Performance Information Some Examples: http://www.ncsa.uiuc.edu/userinfo/resources/ Software/Tools/PAPI/ http://perfsuite.ncsa.uiuc.edu/publications/lj135/ x50.html How do we get it? PAIP

PAPI - Performance API http://icl.cs.utk.edu/papi/ Specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors Used by both Tau and IPM Can show the effects of different optimizations Problem: requires Kernel Patch

Tau and PAPI part of POINT http://nic.uoregon.edu/point Productivity from Open, INtegrated Tools (POINT) project is funded as part of the NSF's Software Development for Cyberinfrastructure (SDCI) program Goal: integrate, harden, and deploy an open, portable, robust performance tools environment

Summary The DDT debugger is available for parallel applications DDT can also track memory usage IPM is currently available for simple profiling We will be installing additional performance analysis tools Summary of scripts follows...

.tcshrc additions summary ### mpi settings ## setenv MPI_VERSION /lustre/home/apps/mpi/db/mvapich-1.1 setenv MPI_VERSION /lustre/home/apps/mpi/db/mvapich2-1.2 setenv MPI_VERSION /lustre/home/apps/mpi/db/openmpi1.3.1 setenv MPI_COMPILER intel #setenv MPI_COMPILER pg if ( $?MPI_COMPILER && $?MPI_VERSION ) then setenv MPI_BASE $MPI_VERSION/$MPI_COMPILER setenv LD_LIBRARY_PATH $MPI_BASE/lib:$LD_LIBRARY_PATH setenv LD_LIBRARY_PATH $MPI_BASE/lib/shared:$LD_LIBRARY_PATH setenv MANPATH $MPI_BASE/man:$MPI_BASE/shared/man:$MANPATH set path = ( $MPI_BASE/bin $path ) endif ### ddt settings ### set path = ( /lustre/home/apps/ddt2.4.1/bin $path ) setenv DMALLOCPATH /lustre/home/apps/ddt2.4.1 setenv DMALLOC setenv LD_LIBRARY_PATH $DMALLOCPATH/lib/64:$LD_LIBRARY_PATH ### ipm settings ### set path = ( $path /lustre/home/apps/pl/bin ) set path = ( $path /lustre/home/apps/ipm/bin ) setenv IPM_KEYFILE /lustre/home/apps/ipm/ipm_key

.bashrc additions summary ### mpi settings ### export MPI_VERSION=/lustre/home/apps/mpi/db/mvapich-1.1 export MPI_VERSION=/lustre/home/apps/mpi/db/mvapich2-1.2 export MPI_VERSION=/lustre/home/apps/mpi/db/openmpi1.3.1 export MPI_COMPILER=intel #export MPI_COMPILER=pg if [ -n $MPI_COMPILER ]; then if [ -n $MPI_VERSION ]; then export MPI_BASE=$MPI_VERSION/$MPI_COMPILER export LD_LIBRARY_PATH=$MPI_BASE/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=$MPI_BASE/lib/shared:$LD_LIBRARY_PATH export MANPATH=$MPI_BASE/man:$MPI_BASE/shared/man:$MANPATH export PATH=$MPI_BASE/bin:$PATH fi fi ### ddt settings ### export PATH=/lustre/home/apps/ddt2.4.1/bin:$PATH export DMALLOCPATH=/lustre/home/apps/ddt2.4.1 export DMALLOC="" export LD_LIBRARY_PATH=$DMALLOCPATH/lib/64:$LD_LIBRARY_PATH ### ipm settings ### export PATH=$PATH:/lustre/home/apps/pl/bin export PATH=$PATH:/lustre/home/apps/ipm/bin export IPM_KEYFILE=/lustre/home/apps/ipm/ipm_key

Compiling for IPM mpif90 -g stf_03.f90 -L$MPI_BASE/ipm/lib -lipm -o stf_03.ipm $MPI_BASE = /lustre/home/apps/mpi/db/version VERSION mvapich-1.1/pg mvapich-1.1/intel mvapich2-1.2/pg mvapich2-1.2/intel openmpi/* Works? yes Stay Tuned yes yes No - know problem

Debug Compile Line mpicc -g \ -L/lustre/home/apps/gdb-6.8/lib64 \ -liberty \ stc_06.c \ -o stc_06.g

Debug Compile Line mpicc -g -L/lustre/home/apps/gdb-6.8/lib64 -liberty \ stc_06.c \ /lustre/home/apps/ddt2.4.1/lib/64/libdmalloc.a -o \ stc_06.g Here we link to the debug memory library. This is required if you want to track memory usage in ddt. Note it must library be last on the list.

OpenMPI Debug Script #!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testio #PBS -o stdout.$pbs_jobid #PBS -e stderr.$pbs_jobid #PBS -r n #PBS -V #----------------------------------------------------- cd $PBS_O_WORKDIR #save a nicely sorted list of nodes sort -u $PBS_NODEFILE > mynodes.$pbs_jobid cp mynodes.$pbs_jobid ~/.ddt/nodes DDTPATH_TAG/bin/ddt-client DDT_DEBUGGER_ARGUMENTS_TAG mpiexec -np \ NUM_PROCS_TAG EXTRA_MPI_ARGUMENTS_TAG PROGRAM_TAG \ PROGRAM_ARGUMENTS_TAG

MVAPICH2 Debug Script #!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testio #PBS -o stdout.$pbs_jobid #PBS -e stderr.$pbs_jobid #PBS -r n #PBS -V #----------------------------------------------------- cd $PBS_O_WORKDIR #save a nicely sorted list of nodes sort -u $PBS_NODEFILE > mynodes.$pbs_jobid cp mynodes.$pbs_jobid ~/.ddt/nodes mpiexec -n NUM_PROCS_TAG \ DDTPATH_TAG/bin/ddt-debugger \ DDT_DEBUGGER_ARGUMENTS_TAG PROGRAM_ARGUMENTS_TAG

MVAPICH-1.1 Debug Script #!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:15:00 #PBS -N testio #PBS -o stdout.$pbs_jobid #PBS -e stderr.$pbs_jobid #PBS -r n #PBS -V cd $PBS_O_WORKDIR #save a nicely sorted list of nodes sort -u $PBS_NODEFILE > mynodes.$pbs_jobid cp mynodes.$pbs_jobid ~/.ddt/nodes mpirun_rsh -hostfile $PBS_NODEFILE -n \ NUM_PROCS_TAG DDTPATH_TAG/bin/ddt-debugger \ DDT_DEBUGGER_ARGUMENTS_TAG PROGRAM_ARGUMENTS_TAG