MPT and Cluster Software Update for IRIX(TM), UNICOS/mk(TM), and UNICOS(TM) Systems

Similar documents
EMC VPLEX FAMILY. Transparent information mobility within, across, and between data centers ESSENTIALS A STORAGE PLATFORM FOR THE PRIVATE CLOUD

MPI / ClusterTools Update and Plans

Module 14: Scalability and High Availability

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Windows Server 2008 R2 Hyper-V Server and Windows Server 8 Beta Hyper-V

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

Cloud Computing through Virtualization and HPC technologies

QUADRICS IN LINUX CLUSTERS

The functionality and advantages of a high-availability file server system

Simplest Scalable Architecture

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V


Altix Usage and Application Programming. Welcome and Introduction

MOSIX: High performance Linux farm

Integrated Application and Data Protection. NEC ExpressCluster White Paper

Symmetric Multiprocessing

- An Essential Building Block for Stable and Reliable Compute Clusters

Interoperability between Sun Grid Engine and the Windows Compute Cluster

Technical Overview of Windows HPC Server 2008

Running applications on the Cray XC30 4/12/2015

HADOOP MOCK TEST HADOOP MOCK TEST I

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

Cray DVS: Data Virtualization Service

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR

SAN Conceptual and Design Basics

Red Hat Enterprise linux 5 Continuous Availability

XFS File System and File Recovery Tools

The Benefits of Virtualizing

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Multiple Public IPs (virtual service IPs) are supported either to cover multiple network segments or to increase network performance.

High Performance Computing

Technical Computing Suite Job Management Software

Table of Contents Introduction and System Requirements 9 Installing VMware Server 35

Cisco Active Network Abstraction Gateway High Availability Solution

New Features in SANsymphony -V10 Storage Virtualization Software

theguard! ApplicationManager System Windows Data Collector

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Analysis and Implementation of Cluster Computing Using Linux Operating System

Grid Scheduling Dictionary of Terms and Keywords

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Clustering and Queue Replication:

- An Oracle9i RAC Solution

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

Lustre Networking BY PETER J. BRAAM

How To Live Migrate In Hyperv On Windows Server 22 (Windows) (Windows V) (Hyperv) (Powerpoint) (For A Hyperv Virtual Machine) (Virtual Machine) And (Hyper V) Vhd (Virtual Hard Disk

Vess A2000 Series HA Surveillance with Milestone XProtect VMS Version 1.0

Performance Monitoring of Parallel Scientific Applications

MS Configuring and Administering Hyper-V in Windows Server 2012

Avid ISIS

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

QTP Computing Laboratory Strategy

Scaling Study of LS-DYNA MPP on High Performance Servers

How To Visualize Performance Data In A Computer Program

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Parallel Computing. Benson Muite. benson.

VERITAS Business Solutions. for DB2

Streamline Computing Linux Cluster User Training. ( Nottingham University)

PARALLELS CLOUD STORAGE

Configure AlwaysOn Failover Cluster Instances (SQL Server) using InfoSphere Data Replication Change Data Capture (CDC) on Windows Server 2012

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Priority Pro v17: Hardware and Supporting Systems

Comparison: Microsoft Logical Disk Manager (LDM) and VERITAS Volume Manager

Online Transaction Processing in SQL Server 2008

Analysis on Virtualization Technologies in Cloud

A High Performance, High Reliability Perforce Server

Operating System for the K computer

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Recommended hardware system configurations for ANSYS users

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Quantum StorNext. Product Brief: Distributed LAN Client

HiBench Introduction. Carson Wang Software & Services Group

How To Write An Article On An Hp Appsystem For Spera Hana

ORACLE DATABASE 10G ENTERPRISE EDITION

Contents. SnapComms Data Protection Recommendations

Veritas NetBackup 6.0 Server Now from Symantec

INTRODUCTION ADVANTAGES OF RUNNING ORACLE 11G ON WINDOWS. Edward Whalen, Performance Tuning Corporation

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A Step-by-Step Guide to Synchronous Volume Replication (Block Based) over a WAN with Open-E DSS

Dell High Availability Solutions Guide for Microsoft Hyper-V

Cloud Optimize Your IT

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2

Page 1. Overview of System Architecture

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Virtualization across the organization

Public Cloud Partition Balancing and the Game Theory

Fibre Channel Overview of the Technology. Early History and Fibre Channel Standards Development

MICROSOFT HYPER-V SCALABILITY WITH EMC SYMMETRIX VMAX

Transcription:

MPT and Cluster Software Update for IRIX(TM), UNICOS/mk(TM), and UNICOS(TM) Systems Karl Feind Silicon Graphics/Cray Research 655F Lone Oak Drive Eagan, MN 55121 kaf@cray.com http://reality.sgi.com/kaf_craypark ABSTRACT: SGI/Cray Message Passing Toolkit (MPT) and cluster software continues evolving to add both high availability and scalability functions. This paper briefly summarizes recent and planned changes with the MPT, IRIS FailSafe(TM), and IRISconsole(TM) products. The majority of the talk centers around MPI and SHMEM in MPT 1.2.1. In addition to review of new features, performance of MPI on Origin and SHMEM on Origin and CRAY T3E(TM) will be overviewed. KEYWORDS: Message Passing Toolkit, IRIS FailSafe, IRISconsole Introduction The Message Passing Toolkit (MPT) and Cluster Products Group supports a variety of software products that support parallel programming and applications support for highly parallel and clustered Silicon Graphics and Cray computer systems. The MPT product is comprised of MPI, SHMEM, and PVM message passing software, used by highly parallel applications to support applications launch and inter-process communication. IRIS FailSafe provides high-availability (HA) support for applications running on Silicon Graphics servers through automatic fail-over of applications from one server node to another. In the event of a failure, in combination with a RAID or mirrored disk configuration, an IRIS FailSafe cluster provides resilience from any single point of failure and acts as insurance against unplanned outages. IRISconsole is the third major product supported by our group. The IRISconsole multi-server management system manages multi-server clusters from a single workstation with an easy-to-use interface. Servers are managed and monitored, and their activity logging is supported. IRISconsole performs intelligent actions based upon this information and also allows remote console access. IRIS FailSafe Software Releases The most recent IRIS FailSafe release was version 1.2 released in 1997. This release provided fail-over capability for paired server nodes. The next planned release will be IRIS FailSafe 2.0. A June 1998 beta release is planned, with general

product release planned for later in the year. IRIS FailSafe 2.0 will support more generalized fail-over configurations, with the limit of two nodes increasing to eight nodes. This will provide more flexibility for highly available applications because any nodes in the group of up to eight machines can serve as the fail-over system for one or several of the other nodes and applications in the group. For more information about IRIS FailSafe see http://www.sgi.com/products/software/failsafe.html. IRISconsole Software Releases The current IRISconsole release is version 1.2, which was primarily a maintenance release without many new features. The upcoming 1.3 release will include the following features: Support for EtherLite 8 and 16 multiplexers Enhanced control over logging and alarms Support for partitioned arrays Year 2000 compliance For more information about IRISconsole, see http://www.sgi.com/products/remanufactured/challenge/ti_irisconsole.html. Message Passing Software Releases We support the MPI, PVM, and SHMEM message passing models. All are released in the MPT software release, with the exception of SHMEM on CRAY T3E systems which is delivered as part of CrayLibs and Programming Environment software releases. Software Releases Containing Message Passing Software Message Passing Model Hardware Platform Release Package MPI all MPT PVM all MPT SHMEM T3E Programming Environment SHMEM MIPS and PVP MPT Message passing software is supported on all Sillcon Graphics and Cray computer systems. For more information about Message Passing Software, see http://www.sgi.com/products/software/mpt.html.

Message Passing Features Released in 1997 The most significant Message Passing release in 1997 was the June MPT 1.1 release. This was the first time the MPT product was released on IRIX systems. The MPI and PVM message passing models had previously been released independently, and bringing them together with SHMEM in the MPT release package on IRIX systems provided more product consistency across IRIX, CRAY T3E, and CRAY PVP systems. MPT 1.1 Major features released with MPT 1.1 included: IRIX MPI support for Origin 2000 NUMA capabilities IRIX release 6.4 for Origin 2000 contained NUMA-control capabilities which permit message passing applications to request that processes be distributed across the system. MPI was enhanced to use these capabilities. SHMEM introduced on IRIX systems. The first IRIX version of SHMEM message passing was included in MPT 1.1. This version contained an initial portion of the SHMEM API. XMPI released on IRIX systems. XMPI is an MPI program debugger. This tool was originally developed by the LAM project at Ohio State University. CrayLibs Releases The T3E version of the SHMEM library was enhanced in late 1997 in CrayLibs 3.0.1, 3.0.1.2, and 3.0.2 revision and update releases. SHMEM support for CRAY T3E-900 coherent streams. Program start-up code in libsma was enhanced to activate data streams in SHMEM programs when hardware enforced coherency of streams with SHMEM communication. CRAY T3E-1200 get bandwidth increased. The CRAY T3E-1200 system had enhanced router chips which provided higher peak bandwidth in SHMEM get operations. The SHMEM get algorithm was enhanced to obtain the peak bandwidth. SHMEM_GROUP_CREATE_STRIDED added. This function provides a method to declare a group of SHMEM processes (PEs) which will access a global file.

Message Passing Features Released in 1998 There were two all-platform MPT software releases in early 1998 and a third release planned for June 1998. Releases 1.2, 1.2.0.2, and 1.2.1 provided features chiefly for IRIX platforms with a few features that enhanced T3E and PVP message passing. MPT 1.2 The following features were added in January 1998 in the MPT 1.2 release. IRIX and PVP MPI common-sourced. The MPT 1.2 (MPI 3.1) IRIX version of the MPI library was ported to CRAY PVP systems and merged with the prior PVP MPI implementation. This common sourcing improved maintainability and brought some new features to the PVP platforms while preserving shared memory message passing optimizations on PVP systems. Enhanced MPI support for multi-host applications. The shared memory communication method was enabled for same-host MPI processes when the application as a whole is running on multiple hosts. Previously, the shared memory communication mode was available on PVP systems only when the application ran entirely on one host. MPI_ENVIRONMENT environment variable. mpirun and array services set this environment variable for the benefit of.cshrc or.profile start-up files. IRIX MPI support for 64 CPUs per host. Stdin/stdout processing in MPI jobs. On PVP and IRIX systems, the mpirun offers the -p option to add optional prefixes to line written by MPI processes to stdin or stdout. IRIX MPI support for third party products. MPI was modified to allow these third party products to be developed for use with MPI: LSF from Platform Computing, Dolphin TotalView from Dolphin Interconnect Solutions, Inc., and ROMIO from Argonne National Laboratory. MPI and PVM support for CRAY T3E-900 coherent streams. When coherent streams are detected, MPI and PVM enable a faster message passing algorithm. More SHMEM interfaces implemented on IRIX systems.

Additions included shmalloc, shfree, and a set of SHMEM collective communication routines. SHMEM supports user-control environment variables on IRIX systems. Support for the following environment variables were added SMA_DSM_OFF, SMA_DSM_VERBOSE, SMA_DSM_PPM, PAGESIZE_DATA, SMA_SYMMETRIC_SIZE, and SMA_VERSION. IRIX PVM shared arena enhancements. Shared arena support stabilized and PVM_RSH environment variable. MPT 1.2.0.2 The following features were added in March 1998 in the MPT 1.2.0.2 release. F90 SHMEM interface checking made available on IRIX systems. The "-auto_use shmem_interface" option with version 7.2.1 or higher of the f90 command activates compile-time argument checking of SHMEM function calls. shmem_barrier_all performance improved on Origin 2000 systems. A more effective fetch-op algorithm was used. MPT 1.2.1 The following features are added in the MPT 1.2.1 release. Checkpoint/restart capability for IRIX MPI. MPI checkpoint/restart capability is available for MPI applications running within a single host and consisting of a single executable. MPI Miser support. Miser is a queuing tool which can dedicate a set of CPUs on an Origin 2000 system to a particular message passing job. Support MPI on 16 x 128 Origin 2000 clusters. T3E MPI support for ROMIO. MPI_Type_get_contents, MPI_Type_get_envelope and MPI-2 error codes are added. SHMEM iput/iget functions added on PVP. IRIX SHMEM implementation expanded.

shmem_set_lock and shmem_clear_lock are added. MPI and SHMEM interoperability on IRIX. MPI programs can now use SHMEM message passing to communicate with MPI processes on the same host. SHMEM PE numbers are equal to MPI process rank in MPI_COMM_WORLD. Message Passing Library Performance One of the requirements of the message passing libraries on Silicon Graphics and Cray systems is to provide inter-process communication with lowest possible latency and highest possible bandwidth. This fundamental requirement provides a motivation to continue adding performance enhancements to the message passing libraries. The enhanced bandwidth for point-to-point communication on T3E-1200 systems was achieved by enhancing SHMEM get algorithm. The following graph shows the effective bandwidth obtained over varying transfer sizes on the CRAY T3E and the CRAY T3E-1200. Figure 1: SHMEM_GET effective bandwidth A common practice in message passing programs that use SHMEM is to use barrier synchronization to separate computational and communication phases. The barrier synchronization must be as fast as possible for these programs to get best performance. The following graph shows how Origin SHMEM barrier synchronization time has been continually improved in the MPT releases the past year

Figure 2: shmem_barrier_all synchronization time The following graphic compares bandwidths and latencies for MPI and SHMEM message passing on Origin 2000 systems. MPT Roadmap Figure 3: Message Passing Bandwidth on Origin 2000 systems. The MPT and cluster group will be enhancing message passing software in the coming year to provide many customer-requested features. The chart in this section outlines the planned features for the coming years. These are not commitments. These are target plans that might change.

4Q98 1Q99 3Q99 1Q00 MPT 1.3 (Nov 98 MR) MPI scaling MPI latency reduction MPI thread-safe phase 1 Dolphin TotalView msg queue MPI statistics T3E ROMIO support MPT 1.3.1 (Mar 99 MR) T3E MPI-2 one-sided MPI cluster CPR phase1 Native MPI-2 IO MPI error handling PMPIO support MPI collective performance F90 interface blocks MPI multi-board messages MPT 1.4 (Sep 99 MR) MPI performance and scalability MPI cluster CPR phase 2 MPI MPI-2 bindings Cluster SHMEM (GSN) IRIX MPI-2 one-sided MPI over GSN MPI for NT MPT 1.5 (Jan 2000 MR) MPI performance and scalability MPI-2 dynamic T3E MPI-2 bindings Additional ST/GSN support SGI Roadmap Support Conclusion In the past year, MPT and cluster software has been enhanced to better support all Silicon Graphics and Cray hardware platforms. The most significant effort has been made recently for the Origin 2000 system to support increasing CPU counts and to support clustered Origin 2000 configurations. Future enhancements to message passing software will seek to balance the needs of large clustered and non-clustered Silicon Graphics and Cray systems. Author Biography Karl Feind is a Core Design Engineer in the MPT and Cluster Products group, where his primary responsibilities are support of Distributed Shared Memory (SHMEM) data passing and Message Passing Interface (MPI). In the past, Karl s responsibilities have included the Cray Fortran I/O and Flexible File I/O (FFIO) libraries. Home page: http://reality.sgi.com/kaf_craypark E-mail address: kaf@cray.com