HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski Engineering Manager Software Developer Tools and Services Sun Microsystems Len Wisniewski Engineering Manager, ClusterTools team Sun Microsystems Software 1 1
Overview Sun HPC ClusterTools history Open MPI Sun support for Open MPI Solaris Sun Studio Sun Grid Engine MPI profiling MPI Testing Tool (MTT) Future releases 2
Sun HPC ClusterTools history 1996: Sun acquired the GlobalWorks team from Thinking Machines Corporation 1997-2006: Sun releases six versions of Sun HPC ClusterTools based on Globalworks proprietary source MPI libraries (including MPI I/O) Cluster run-time environment (CRE) Sun Scalable Scientific Library (S3L) Sun Parallel File System High Performance Fortran compiler Prism parallel debugger Installation software ClusterTools 6 (in 2006) was last proprietary release Included just MPI, CRE, and installation software Other components EOL'd or integrated into other products 3
Sun joins Open MPI Currently 15 Members, 9 Contributors, 2 Partners Plus individual contributors 4
Open MPI goals Create a free, open source, peer-reviewed, production-quality complete MPI-2 implementation. Provide extremely high, competitive performance (latency, bandwidth,...pick your favorite metric). Directly involve the HPC community with external development and feedback (vendors, 3rd party researchers, users, etc.). Provide a stable platform for 3rd party research and commercial development. Help prevent the "forking problem" common to other MPI projects. Support a wide variety of HPC platforms and environments. www.open-mpi.org 5
Open MPI architecture MPI API Layer PML = Pt2Pt Messaging Layer BML = BTL Management Layer TCP BTL Shared Memory BTL Open IB (User Verbs) BTL udapl BTL BTL = Byte Transfer Layer 6
Sun HPC ClusterTools 7+ Starting with Sun HPC ClusterTools 7 (CT 7), ClusterTools becomes a binary distribution of Open MPI CT 7 based on Open MPI 1.2 CT 7.1 based on Open MPI 1.2.4 CT 8 to be based on Open MPI 1.3 www.sun.com/clustertools CT 7.1 is available Second early access version of CT 8 (CT8 EA2) available now! First release to support both Solaris and Linux http://www.sun.com/software/products/clustertools/early_ac cess.xml 7
Open MPI / CT Release Timeline 8
Sun support for Open MPI Resource Managers Sun Grid Engine PBS Pro / Torque / Open PBS rsh / ssh SLURM LoadLeveler Xgrid Yod Operating Systems Solaris Linux Mac OS X Windows Compilers Sun Studio gcc (on Linux only) PGI Intel Pathscale Interconnects TCP / ethernet Shared memory Infiniband Myrinet (GM and MX) Portals Loopback 9
Installation / Sun packages Installation tools ctinstall install on multiple nodes ctact switch default links between CT versions Four Sun packages SUNWompi Open MPI files SUNWompiat CT installer utilities SUNWompimn Open MPI man pages SUNWomsc miscellanous support files 10
Documentation Contributed MPI man pages to Open MPI Ship four user documents Migration guide (from CT6 to CT7) User's Guide Install Guide Release Notes Future user documents in progress MPI programming guide Performance guide Plan to collaborate and/or contribute versions of above docs in adaptable format to community http://docs.sun.com/app/docs/coll/hpc-clustertools7.1 11
Solaris support Unique Solaris 10 features Zones Dtrace udapl support (for Solaris IB stack) OpenSolaris 05.2008 packages Support for various tools on Solaris Sun Studio Sun Grid Engine 3 rd party tools Parallel debuggers Totalview (by Totalview Technologies) DDT (by Allinea) Resource managers PBS Pro (by Altair) 12
UDAPL Byte Transfer Layer (BTL) Solaris 10 has its own IB stack OFED IB Verbs not supported on Solaris udapl BTL layered on udapl library udapl is a standard library for accessing underlying interconnects Solaris to support OFED IB Verbs in future Sun will join the common effort to develop openib BTL 13
Sun Studio support Support user codes compiled in Sun Studio 10, 11, and 12 Fortran, C, and C++ Provide compiler wrappers Collaboration with community Enable easy linking of user apps with appropriate libraries when using Studio configurable for other compilers Address unique attributes of compiler, OS, or chip type SPARC alignment C++ and Fortran compilers Sun Studio supports both Solaris and Linux! 14
Sun Grid Engine support Sun Grid Engine (SGE) Free and open-source resource manager Allows integration with MPI implementations Created SGE plug-in for Open MPI Enables invocation of MPI job within SGE Via batch script invoked via qsub command Via interactive shell # Submit an SGE and OMPI job and mpirun in one line shell$ qrsh -V -pe orte 4 mpirun -np 4 mpijob SGE enables exclusive use of set of nodes SGE supports job accounting 15
MPI Profiling CT8 EA2 has support for the following profiling methods: DTrace Sun Studio Analyzer MPI PERUSE VampirTrace 16
DTrace providers DTrace Comprehensive dynamic tracing facility Examine behavior of user and kernel codes CT7 includes DTrace examples for MPI programs CT8 EA2 includes DTrace providers for MPI Piggybacks on the MPI PERUSE framework in Open MPI Unobtrusive tracing programs netstat-like example 17
Dtrace providers netstat-like example Run an MPI program using CT MPI libraries Invoke providers via a dtrace script as follows: /* Collect Removal of Send Requests */ mpi peruse$target:::peruse_comm_req_notify /args[3]->mcs_op=="send"/ { --sends_act; } Run dtrace script while MPI program runs: % dtrace -q -p 13371 -s /opt/sunwhpc/hpc8.0/examples/dtrace/mpistat. d Count decremented above shown as part of netstatlike format on stdout via dtrace script 18
Integration with Sun Studio Analyzer Sun Studio Performance (and Thread) Analyzer Takes measurements Computes metrics from them Attributes metrics to a dynamic callgraph Does not require special compilation GUI support MPI support Show MPI Timeline Define MPI Charts MPI State profiling Works on mixed MPI and OpenMP programs Thread Analyzer helps detect and fix races 19
ClusterTools and 3 rd party products Parallel debugger support Totalview Totalview Technologies DDT Allinea Including message queue viewing Resource manager support PBS Professional Altair Interconnect support Infiniband multiple vendors Myrinet MX - Myricom 20
MPI Testing Tool (MTT) MTT principles Be freely available to minimize deployment cost Incorporate existing tests Simultaneous distributed testing across sites Support specialized on-demand and e-mail reporting Support parallel tests and various resource managers https://svn.open-mpi.org/trac/mtt 21
MPI Testing Tool An Extensible Framework for Distributed Testing of MPI Implementations. Joshua Hursey (Indiana U), Ethan Mallove (Sun), Jeffrey M. Squyres (Cisco) and Andrew Lumsdaine (Indiana U) 14 th PVM / MPI Conference, Paris, France, October 2007. 22
MTT Results Summary example http://www.open- 23
Upcoming release: Open MPI 1.3 Improved job startup scalability Improved MPI_THREAD_MULTIPLE support Checkpoint / restart support Support for Platform LSF OpenIB BTL improvements iwarp support XRC / ConnectX support Processor affinity improvements Message logging VampirTrace support Many more new features and improvements https://svn.open-mpi.org/trac/ompi/wiki See Release document for 1.3 series 24
Upcoming release: Sun HPC ClusterTools 8 Based on Open MPI 1.3 Features Linux support Mellanox ConnectX support Increased scalability Support for 1024 nodes / 4096 processes And more...tacc-sized clusters Profiling support Dtrace providers Sun Studio Analyzer tight integration MPI PERUSE VampirTrace Infiniband multi-rail support on Solaris 25
Priorities for the future Performance (of course!) Job startup scalability TACC-sized clusters Collectives Leverage best algorithms From the community From CT6 Latency / bandwidth on various interconnects Ease of use Open Run-Time Environment utilities to monitor jobs Automation of parameter selection Analysis of regressions across community (MTT) Fault tolerance 26
Links for more information Sun HPC ClusterTools web page, http://www.sun.com/clustertools Open MPI papers, http://www.open-mpi.org/papers/ Open MPI web page, http://www.open-mpi.org Sun Studio web page, http://developers.sun.com/sunstudio/ Sun Grid Engine web page, http://www.sun.com/software/gridware/ Grid Engine web site, http://gridengine.sunsource.net/ DTrace web page, http://www.sun.com/bigadmin/content/dtrace/ MTT web page, https://svn.open-mpi.org/trac/mtt OpenSolaris web site, http://www.opensolaris.org 27
Len Wisniewski leonard.wisniewski@sun.co m 28