Considering Middleware Options
|
|
- Roxanne Wilkins
- 7 years ago
- Views:
Transcription
1 Considering Middleware Options in High-Performance Computing Clusters Middleware is a critical component for the development and porting of parallelprocessing applications in distributed high-performance computing (HPC) cluster infrastructures. This article describes the evolution of the Message Passing Interface (MPI) standard specification as well as both open source and commercial MPI implementations that can be used to enhance Dell HPC cluster environments. BY RINKU GUPTA, MONICA KASHYAP, YUNG-CHIN FANG, AND SAEED IQBAL, PH.D. High-performance computing (HPC) clusters a popular platform for hosting distributed parallel-processing applications comprise multiple standards-based servers connected to each other via network interconnects. A typical HPC cluster has a layered architecture, beginning at the hardware level and concluding with the application level, as shown in Figure 1. Servers reside at the lowest level of the architecture, and each server contributes computational power to the cluster. Servers are connected to each other by a network infrastructure, which may be based on standard Ethernet technologies (such as Fast Ethernet or Gigabit 1 Ethernet) or proprietary high-speed technologies (such as Myricom Myrinet or InfiniBand). On top of the hardware level is the operating system (OS) and required communication protocol libraries, as defined by the specific interconnect for example, TCP/IP for Ethernet or GM for Myrinet. This infrastructure helps provide the computational power of a supercomputer for parallel-processing applications. To enhance this distributed infrastructure and ease development and porting of parallelprocessing applications, a layer of middleware is required. With the growth of parallel-processing application development, two programming models have evolved to provide middleware capabilities: the shared-memory programming model and the message-passing programming model. The shared-memory programming model is based on the concept of shared address space in which data exchange is achieved by writing to the shared space. The message-passing programming model is based on the concept of distributed address space in which data exchange is achieved through explicit message passing. Message Passing Interface (MPI) is the de facto messagepassing standard today. This article focuses on the growth on the MPI standard as well as the open source and commercial implementations of MPI available for use on Dell HPC clusters. Evolution of middleware libraries As massively parallel processing (MPP) and clusters have gained popularity, organizations have developed middleware libraries for use with these powerful systems. Parallel Virtual Machine (PVM) 2 was one of the first full-fledged middleware software libraries. PVM is designed to allow a network of heterogeneous machines to appear logically to the user as a single, large parallel machine. PVM was initially developed in 1989 as a joint research effort between 1 This term does not connote an actual operating speed of 1 Gbps. For high-speed transmission, connection to a Gigabit Ethernet server and network infrastructure is required. 2 For more information about PVM, visit Reprinted from Dell Power Solutions, February Copyright 2005 Dell Inc. All rights reserved. POWER SOLUTIONS 1
2 Application Middleware Operating system Communication protocol Hardware Figure 1. HPC cluster layered architecture the University of Tennessee, Oak Ridge National Laboratory, and Emory University. In addition to providing an MPI implementation for sending and receiving messages, PVM implemented resource management, signal handling, and fault tolerance to help build a user environment for parallel processing. Because PVM was one of the first parallel-processing systems that provided portability across heterogeneous networks, the library was widely adopted by developers of parallel-processing applications. Both the popularity and the shortcomings of PVM have provided a great impetus for the development of the MPI specification. Emergence of the MPI specification The MPI standard 3 specification was developed in 1993 by a diverse group of computer vendors, computer scientists, and software programmers who formed the MPI Forum. During the early 1990s, various vendors were developing their own middleware. The MPI Forum set out to develop a practical, portable, efficient, and flexible standard for communication among nodes and for running parallelprocessing applications on distributed memory architectures. MPI allows data to be moved between the nodes in a cluster by sending and receiving the data as messages. This sending and receiving of messages allows all the nodes in the cluster to be synchronized. Note: The MPI specification is not a language. The specification comprises collections of subroutine application programming interfaces (APIs) that can be called by C and FORTRAN programs. Wide acceptance of MPI has led to multiple implementations of the MPI specification for a variety of distributed memory based clusters. For parallel-processing nodes with specialized networking hardware, native MPI implementations can enhance performance. Various implementations have led to parallel MPI applications being ported across a wide range of architectures. MPI implementations can be fine-tuned for a specific architecture and the interconnects on which they run, helping to optimize efficiency and provide high performance. MPI 1.1 and 1.2 standard specifications The MPI 1.1 and MPI 1.2 specifications introduced many subroutine APIs, which provided great ease of application writing. These APIs included primitives for point-to-point communications and collective operations, and for creating process topologies and process groups. Point-to-point communications comprise communications between two nodes for example, synchronous or asynchronous sending and receiving of messages. Collective operations comprise global communications between groups of nodes for example, barriers that bring about synchronization between groups of nodes; broadcasts for sending messages from one to many nodes; and reduce, scatter, and gather operations. Figure 2 shows some of the subroutine primitives defined within the MPI 1.2 specification. A vendor can implement a subroutine as long as the subroutine primitive provided in the implementation conforms to the specification both syntactically and semantically. MPI 2.0 standard specification Released after MPI 1.2 had been widely accepted, MPI 2.0 made major changes to the MPI 1.2 specification. Some of the most significant enhancements offered by MPI 2.0 include the following: Dynamic process management: Process management allows processes to be dynamically added and deleted. MPI 2.0 supports dynamic process management because many emerging message-passing applications (such as applications that require runtime assessment of the number and type of processes needed) require process control. By contrast, MPI 1.2 based applications are static; that is, no processes can be added to or deleted from an application after the application has been started. One-sided communication operations: MPI 2.0 provides support for one-sided communication operations such as Action put and get. The put operation transfers data directly from the sender node s memory to the receiver, or target, node s memory; the get operation transfers data from the target node s memory to the caller node s memory. Send data (blocking) to a node Receive data (blocking) from a node Broadcast data from one node to many nodes Figure 2. Examples of MPI subroutine primitives MPI command MPI_Send MPI_Recv MPI_Bcast 3 For more information about the MPI standard, visit www-unix.mcs.anl.gov/mpi. 2 POWER SOLUTIONS Reprinted from Dell Power Solutions, February Copyright 2005 Dell Inc. All rights reserved. February 2005
3 Other enhancements relate to extending collective communication operations and defining new nonblocking operations. 4 MPI collective operations Open source MPI implementations MPICH, 5 which is currently maintained by the Argonne National Labs and Mississippi State University, is a freely available, portable implementation of MPI. The development of MPICH began in parallel with the development of the MPI specification to enable the specification to address problems that would be faced by implementers of the specification. Thus, a complete, portable, and efficient MPICH implementation was available when the MPI specification was formally released, allowing developers of parallel-processing applications to experiment with MPI almost immediately. MPICH functionality MPICH was designed with the following goals: MPI point-to-point communications ADI Channel interface Implementations of channel interface Figure 3. MPICH layered architecture Implementations of ADI Maximum portability and reuse of code: In any implementation, a large amount of code is system independent. MPICH was designed to allow complex communication operations to be specified portably in terms of low-level primitives. The developers intention was to maximize the amount of code that can be shared without compromising performance. Fast porting to new architectures: Another design goal was to create a structure whereby MPICH could be ported to a new platform quickly and then gradually tuned for that platform by replacing parts of the shared code with platformspecific code. To achieve these goals, the MPICH implementation follows a layered architecture, as shown in Figure 3. At the top level of the hierarchy are primitives for the MPI collective operations. An example of a collective operation is broadcast ( MPI_Bcast), wherein one of the source nodes can send the same data to multiple nodes within a group of nodes. These collective operations are implemented in the MPICH implementation by calling MPI point-to-point primitives such as send ( MPI_Send) and receive e ( MPI_Recv). These point-topoint primitives call various other functions specified at lower levels of the hierarchy to carry out the actual sending and receiving functions using the communication protocol. One of the lowest layers in the architecture is the abstract device interface (ADI), which is a mechanism designed to help achieve goals of portability and performance. The ADI contains the communication protocol dependent code. All the MPI functions are implemented using the functions and macros defined at the ADI layer. Hence, functions defined at levels higher than the ADI layer are portable. Having multiple implementations of the ADI helps provide portability and ease of implementation. Below the ADI layer is an additional low-level layer called the channel interface. The channel interface is designed to provide a mechanism to quickly port MPICH to new environments. The channel interface comprises functions that provide the basic capability of sending data from one process to another. MPICH thus offers an incremental approach to trading portability for performance. A vendor can start the porting process by creating a channel interface implementation. The implementation can then be expanded to include additional, specialized ADI functionality. Going upward in the MPICH architecture hierarchy increases the performance benefits of the implementation but obviously decreases the portability of the same code for future implementations. The current releases of MPICH are based on the MPI 1.2 standard. MPICH2, 6 which is now under development, is an all-new implementation of MPI that is intended to support research into the implementation of both MPI 1.2 and MPI 2.0. MPICH variations MPICH has been widely adapted by various vendors, and has been the basis for MPI-related research projects in various universities and research institutions. The following sections discuss popular MPICH variations adapted for commonly used high-speed interconnects. 4 A detailed discussion of the MPI specifications is beyond the scope of this article. For more information, refer to the MPI specifications at www-unix.mcs.anl.gov/mpi. 5 For MPICH papers and implementation details, visit www-unix.mcs.anl.gov/mpi/mpich. 6 For more information about MPICH2, visit www-unix.mcs.anl.gov/mpi/mpich2. Reprinted from Dell Power Solutions, February Copyright 2005 Dell Inc. All rights reserved. POWER SOLUTIONS 3
4 MPICH-GM (MPI on Myrinet). Myricom Myrinet 7 is a highspeed, low-latency, high-bandwidth interconnect used in HPC clusters. The GM 8 protocol is a low-level message-passing communication protocol designed for Myrinet networks. Myrinet is theoretically capable of providing unidirectional throughput of up to 2 Gbps and low latency. Low latency is critical for communicationintensive applications because less time is spent on communication overhead, leaving more time for computation. This high performance is achievable on Myrinet networks because GM is a userlevel protocol, which bypasses the OS while sending and receiving messages during communication after the initial connection has been established. MPICH-GM, 9 which is a port of MPICH on top of GM, is the MPI implementation on top of GM. The porting of MPICH on GM is accomplished by creating a new GM device at the ADI and channel interface levels of MPICH. In this way, MPICH-GM offers a portable, efficient implementation of MPI that applications can use to take advantage of performance offered by the low-level Myrinet hardware and the GM protocol. MPICH-GM works on a variety of platforms, including the Linux, Solaris, and FreeBSD operating systems. MPICH-GM is also supported on many architectures, including IA-32, IA-64, and Mac OS X, and is fully supported by Myricom. MVAPICH (MPI on InfiniBand). The InfiniBand architecture is a standard that defines a high-speed network for interprocess communication and storage I/O nodes. The low-latency, highbandwidth capabilities and remote direct memory access (RDMA) features accelerate applications running in HPC and enterprise environments. MVAPICH is an open source MPI 1.2 implementation developed by The Ohio State University and is based on the Verbs API (VAPI) implementation by Mellanox Technologies. MVAPICH is also a port of MPICH on the VAPI layer. This porting is carried out by creating the VAPI device at the ADI level of MPICH. Other open source MPI implementations In addition to the MPICH implementations, other implementations of the MPI standard exist. Local Area Multicomputer (LAM) 10 is an open source implementation of the MPI standard. LAM originated at the Ohio Supercomputing Center and is now maintained by the Open Systems Laboratory at Indiana University. Like other MPI implementations, LAM/MPI provides high performance on many platforms, even on heterogeneous clusters of workstations. Commercial MPI implementations Commercial MPI implementations are available for a wide range of hardware and are produced by many vendors. The following sections briefly discuss some of the popular commercial MPI implementations that enterprises can run on Dell HPC clusters. Verari MPI/Pro. MPI/Pro 11 is a proprietary, commercially supported MPI 1.2 implementation developed by Verari Systems. It is one of the most popular commercial implementations and is supported on both Microsoft Windows and Red Hat Linux operating systems. MPI/Pro features include low CPU overhead and thread safety. The implementation supports TCP, symmetric multiprocessing (SMP), and Myrinet and InfiniBand drivers for Windows and Linux. Verari ChaMPIon/Pro. ChaMPIon/Pro 12 is a full MPI 2.0 implementation available for Linux. ChaMPIon/Pro supports Myrinet, InfiniBand, and Quadrics network interconnects as well as TCP/IP protocols. This MPI implementation supports major MPI 2.0 enhancements, including extended collective operations, dynamic process management, and one-sided communication APIs. Scali MPI Connect. Scali offers an MPI implementation called Scali MPI Connect. 13 Scali s integrated architecture enables third-party applications to be compiled once to run on the various leading interconnect technologies. The implementation is designed to allow binary programs that are linked with Scali MPI Connect to run on any of the supported interconnects Gigabit Ethernet, Myrinet, Dolphin Interconnect scalable coherent interface (SCI), or InfiniBand without recompilation or relinking. Whether the cluster is built using one of these interconnects or a combination thereof, applications and users interact only with Scali MPI Connect. Middleware: A key component for HPC cluster performance Middleware implementations, both commercial and open source, are significant components in HPC cluster configurations. The widely accepted MPI standard has enabled a diverse set of implementations that are designed to enhance performance and ease the development and porting of parallel-processing applications in distributed computing infrastructures. Rinku Gupta is a systems engineer and advisor in the Scalable Systems Group at Dell. Her current research interests are middleware libraries, parallel processing, performance, and interconnect benchmarking. Rinku has a B.E. in Computer Engineering from Mumbai University in India and an M.S. in Computer Information Science from The Ohio State University. 7, 8,9 For more information about Myrinet, GM, and MPICH-GM, visit 10 For more information about LAM/MPI, visit 11,12 For more information about Verari Systems MPI/Pro and ChaMPIon/Pro, visit 13 For more information about Scali MPI Connect, visit 4 POWER SOLUTIONS Reprinted from Dell Power Solutions, February Copyright 2005 Dell Inc. All rights reserved. February 2005
5 Monica Kashyap is a senior systems engineer in the Scalable Systems Group at Dell. Her current interests and responsibilities include in-band and out-of-band cluster management, cluster computing packages, and product development. She has a B.S. in Applied Science and Computer Engineering from the University of North Carolina at Chapel Hill. Yung-Chin Fang is a senior consultant in the Scalable Systems Group at Dell. He specializes in cyberinfrastructure resource management and high-performance computing. He also participates in open source groups and standards organizations as a Dell representative. Yung-Chin has a B.S. in Computer Science from Tamkang University and an M.S. in Computer Science from Utah State University. Saeed Iqbal, Ph.D., is a systems engineer and advisor in the Scalable Systems Group at Dell. His current work involves evaluation of resource managers and job schedulers used for commodity clusters. Saeed is also involved in performance analysis and system design of clusters. He has a Ph.D. in Computer Engineering from The University of Texas at Austin, and an M.S. in Computer Engineering and a B.S. in Electrical Engineering from the University of Engineering and Technology in Lahore, Pakistan. Reprinted from Dell Power Solutions, February Copyright 2005 Dell Inc. All rights reserved. POWER SOLUTIONS 5
Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking
Cluster Grid Interconects Tony Kay Chief Architect Enterprise Grid and Networking Agenda Cluster Grid Interconnects The Upstart - Infiniband The Empire Strikes Back - Myricom Return of the King 10G Gigabit
More informationUsing PCI Express Technology in High-Performance Computing Clusters
Using Technology in High-Performance Computing Clusters Peripheral Component Interconnect (PCI) Express is a scalable, standards-based, high-bandwidth I/O interconnect technology. Dell HPC clusters use
More informationClient/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
More informationMPICH FOR SCI-CONNECTED CLUSTERS
Autumn Meeting 99 of AK Scientific Computing MPICH FOR SCI-CONNECTED CLUSTERS Joachim Worringen AGENDA Introduction, Related Work & Motivation Implementation Performance Work in Progress Summary MESSAGE-PASSING
More informationImproved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
More informationAchieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationWhy Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics
Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400
More informationInfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment
December 2007 InfiniBand Software and Protocols Enable Seamless Off-the-shelf Deployment 1.0 Introduction InfiniBand architecture defines a high-bandwidth, low-latency clustering interconnect that is used
More information- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
More informationMOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm
More informationOperating System for the K computer
Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements
More informationCOMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters
COMP5426 Parallel and Distributed Computing Distributed Systems: Client/Server and Clusters Client/Server Computing Client Client machines are generally single-user workstations providing a user-friendly
More informationDell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. Reza_Rooholamini@dell.com http://www.dell.com/clustering
More informationCan High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationPERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS
PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS Philip J. Sokolowski Department of Electrical and Computer Engineering Wayne State University 55 Anthony Wayne Dr. Detroit, MI 822
More informationExploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
More informationSockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
More informationWhite Paper Solarflare High-Performance Computing (HPC) Applications
Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters
More informationSystem Software for High Performance Computing. Joe Izraelevitz
System Software for High Performance Computing Joe Izraelevitz Agenda Overview of Supercomputers Blue Gene/Q System LoadLeveler Job Scheduler General Parallel File System HPC at UR What is a Supercomputer?
More informationMPI / ClusterTools Update and Plans
HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski
More informationClient/Server and Distributed Computing
Adapted from:operating Systems: Internals and Design Principles, 6/E William Stallings CS571 Fall 2010 Client/Server and Distributed Computing Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Traditional
More informationPARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
More informationFrom Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller
White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller
More informationAnalysis and Implementation of Cluster Computing Using Linux Operating System
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 2, Issue 3 (July-Aug. 2012), PP 06-11 Analysis and Implementation of Cluster Computing Using Linux Operating System Zinnia Sultana
More informationPCI Express High Speed Networks. Complete Solution for High Speed Networking
PCI Express High Speed Networks Complete Solution for High Speed Networking Ultra Low Latency Ultra High Throughput Maximizing application performance is a combination of processing, communication, and
More informationLS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
More informationThe Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
More informationVers des mécanismes génériques de communication et une meilleure maîtrise des affinités dans les grappes de calculateurs hiérarchiques.
Vers des mécanismes génériques de communication et une meilleure maîtrise des affinités dans les grappes de calculateurs hiérarchiques Brice Goglin 15 avril 2014 Towards generic Communication Mechanisms
More informationMellanox Academy Online Training (E-learning)
Mellanox Academy Online Training (E-learning) 2013-2014 30 P age Mellanox offers a variety of training methods and learning solutions for instructor-led training classes and remote online learning (e-learning),
More informationInterconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality
More informationSymmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
More informationBLM 413E - Parallel Programming Lecture 3
BLM 413E - Parallel Programming Lecture 3 FSMVU Bilgisayar Mühendisliği Öğr. Gör. Musa AYDIN 14.10.2015 2015-2016 M.A. 1 Parallel Programming Models Parallel Programming Models Overview There are several
More informationHigh Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationStorage at a Distance; Using RoCE as a WAN Transport
Storage at a Distance; Using RoCE as a WAN Transport Paul Grun Chief Scientist, System Fabric Works, Inc. (503) 620-8757 pgrun@systemfabricworks.com Why Storage at a Distance the Storage Cloud Following
More informationHigh Performance Computing
High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.
More informationMessage-passing over shared memory for the DECK programming environment
This PhD Undergraduate Professor, -passing over shared memory for the DECK programming environment Rafael B Ávila Caciano Machado Philippe O A Navaux Parallel and Distributed Processing Group Instituto
More informationChapter 16 Distributed Processing, Client/Server, and Clusters
Operating Systems: Internals and Design Principles Chapter 16 Distributed Processing, Client/Server, and Clusters Eighth Edition By William Stallings Table 16.1 Client/Server Terminology Applications Programming
More informationLustre Networking BY PETER J. BRAAM
Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information
More informationDB2 Connect for NT and the Microsoft Windows NT Load Balancing Service
DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service Achieving Scalability and High Availability Abstract DB2 Connect Enterprise Edition for Windows NT provides fast and robust connectivity
More informationHow To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)
INFINIBAND NETWORK ANALYSIS AND MONITORING USING OPENSM N. Dandapanthula 1, H. Subramoni 1, J. Vienne 1, K. Kandalla 1, S. Sur 1, D. K. Panda 1, and R. Brightwell 2 Presented By Xavier Besseron 1 Date:
More informationState of the Art Cloud Infrastructure
State of the Art Cloud Infrastructure Motti Beck, Director Enterprise Market Development WHD Global I April 2014 Next Generation Data Centers Require Fast, Smart Interconnect Software Defined Networks
More informationAccelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team
Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC Wenhao Wu Program Manager Windows HPC team Agenda Microsoft s Commitments to HPC RDMA for HPC Server RDMA for Storage in Windows 8 Microsoft
More informationHigh Performance Computing (HPC)
High Performance Computing (HPC) High Performance Computing (HPC) White Paper Attn: Name, Title Phone: xxx.xxx.xxxx Fax: xxx.xxx.xxxx 1.0 OVERVIEW When heterogeneous enterprise environments are involved,
More informationFibre Channel Overview of the Technology. Early History and Fibre Channel Standards Development
Fibre Channel Overview from the Internet Page 1 of 11 Fibre Channel Overview of the Technology Early History and Fibre Channel Standards Development Interoperability and Storage Storage Devices and Systems
More informationSimplest Scalable Architecture
Simplest Scalable Architecture NOW Network Of Workstations Many types of Clusters (form HP s Dr. Bruce J. Walker) High Performance Clusters Beowulf; 1000 nodes; parallel programs; MPI Load-leveling Clusters
More informationVirtual Machines. www.viplavkambli.com
1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software
More informationPetascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing
Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
More informationEnabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
More informationPrinciples and characteristics of distributed systems and environments
Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single
More informationMcMPI. Managed-code MPI library in Pure C# Dr D Holmes, EPCC dholmes@epcc.ed.ac.uk
McMPI Managed-code MPI library in Pure C# Dr D Holmes, EPCC dholmes@epcc.ed.ac.uk Outline Yet another MPI library? Managed-code, C#, Windows McMPI, design and implementation details Object-orientation,
More informationMellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct
Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 Direct Increased Performance, Scaling and Resiliency July 2012 Motti Beck, Director, Enterprise Market Development Motti@mellanox.com
More informationA Tour of the Linux OpenFabrics Stack
A Tour of the OpenFabrics Stack Johann George, QLogic June 2006 1 Overview Beyond Sockets Provides a common interface that allows applications to take advantage of the RDMA (Remote Direct Memory Access),
More informationSolving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
More informationOpenMosix Presented by Dr. Moshe Bar and MAASK [01]
OpenMosix Presented by Dr. Moshe Bar and MAASK [01] openmosix is a kernel extension for single-system image clustering. openmosix [24] is a tool for a Unix-like kernel, such as Linux, consisting of adaptive
More informationRoCE vs. iwarp Competitive Analysis
WHITE PAPER August 21 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...4 Summary...
More informationWindows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability
White Paper Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability The new TCP Chimney Offload Architecture from Microsoft enables offload of the TCP protocol
More informationSupercomputing on Windows. Microsoft (Thailand) Limited
Supercomputing on Windows Microsoft (Thailand) Limited W hat D efines S upercom puting A lso called High Performance Computing (HPC) Technical Computing Cutting edge problems in science, engineering and
More informationNetwork Performance in High Performance Linux Clusters
Network Performance in High Performance Linux Clusters Ben Huang, Michael Bauer, Michael Katchabaw Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 (huang
More informationHigh Speed I/O Server Computing with InfiniBand
High Speed I/O Server Computing with InfiniBand José Luís Gonçalves Dep. Informática, Universidade do Minho 4710-057 Braga, Portugal zeluis@ipb.pt Abstract: High-speed server computing heavily relies on
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationOptimizing the Virtual Data Center
Optimizing the Virtual Center The ideal virtual data center dynamically balances workloads across a computing cluster and redistributes hardware resources among clusters in response to changing needs.
More informationIntroduction to Virtual Machines
Introduction to Virtual Machines Introduction Abstraction and interfaces Virtualization Computer system architecture Process virtual machines System virtual machines 1 Abstraction Mechanism to manage complexity
More informationIntroduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber
Introduction to grid technologies, parallel and cloud computing Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber OUTLINES Grid Computing Parallel programming technologies (MPI- Open MP-Cuda )
More informationA Survey on Availability and Scalability Requirements in Middleware Service Platform
International Journal of Computer Sciences and Engineering Open Access Survey Paper Volume-4, Issue-4 E-ISSN: 2347-2693 A Survey on Availability and Scalability Requirements in Middleware Service Platform
More informationRLX Technologies Server Blades
Jane Wright Product Report 10 July 2003 RLX Technologies Server Blades Summary RLX Technologies has designed its product line to support parallel applications with high-performance compute clusters of
More informationQUADRICS IN LINUX CLUSTERS
QUADRICS IN LINUX CLUSTERS John Taylor Motivation QLC 21/11/00 Quadrics Cluster Products Performance Case Studies Development Activities Super-Cluster Performance Landscape CPLANT ~600 GF? 128 64 32 16
More informationCluster Implementation and Management; Scheduling
Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /
More information1 Organization of Operating Systems
COMP 730 (242) Class Notes Section 10: Organization of Operating Systems 1 Organization of Operating Systems We have studied in detail the organization of Xinu. Naturally, this organization is far from
More informationAgenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
More informationbenchmarking Amazon EC2 for high-performance scientific computing
Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received
More informationIntroduction to High Performance Cluster Computing. Cluster Training for UCL Part 1
Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these
More informationArchitecting Low Latency Cloud Networks
Architecting Low Latency Cloud Networks Introduction: Application Response Time is Critical in Cloud Environments As data centers transition to next generation virtualized & elastic cloud architectures,
More informationWorkshare Process of Thread Programming and MPI Model on Multicore Architecture
Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma
More informationInformatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
More informationLaPIe: Collective Communications adapted to Grid Environments
LaPIe: Collective Communications adapted to Grid Environments Luiz Angelo Barchet-Estefanel Thesis Supervisor: M Denis TRYSTRAM Co-Supervisor: M Grégory MOUNIE ID-IMAG Laboratory Grenoble - France LaPIe:
More informationAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and
More informationMaking Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
More informationLS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
More informationThe EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper.
The EMSX Platform A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks A White Paper November 2002 Abstract: The EMSX Platform is a set of components that together provide
More informationWhere IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine
Where IT perceptions are reality Test Report OCe14000 Performance Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine Document # TEST2014001 v9, October 2014 Copyright 2014 IT Brand
More information10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications
10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications Testing conducted by Solarflare and Arista Networks reveals single-digit
More informationSolid State Storage in Massive Data Environments Erik Eyberg
Solid State Storage in Massive Data Environments Erik Eyberg Senior Analyst Texas Memory Systems, Inc. Agenda Taxonomy Performance Considerations Reliability Considerations Q&A Solid State Storage Taxonomy
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationInfiniBand -- Industry Standard Data Center Fabric is Ready for Prime Time
White Paper InfiniBand -- Industry Standard Data Center Fabric is Ready for Prime Time December 2005 Server and storage clusters benefit today from industry-standard InfiniBand s price, performance, stability,
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationApplications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance. Sunny Gleason COM S 717
Applications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance Sunny Gleason COM S 717 December 17, 2001 0.1 Introduction The proliferation of large-scale
More informationSMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation
SMB Advanced Networking for Fault Tolerance and Performance Jose Barreto Principal Program Managers Microsoft Corporation Agenda SMB Remote File Storage for Server Apps SMB Direct (SMB over RDMA) SMB Multichannel
More information10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications
10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications Testing conducted by Solarflare Communications and Arista Networks shows that
More informationDistributed RAID Architectures for Cluster I/O Computing. Kai Hwang
Distributed RAID Architectures for Cluster I/O Computing Kai Hwang Internet and Cluster Computing Lab. University of Southern California 1 Presentation Outline : Scalable Cluster I/O The RAID-x Architecture
More informationPerformance Monitoring on an HPVM Cluster
Performance Monitoring on an HPVM Cluster Geetanjali Sampemane geta@csag.ucsd.edu Scott Pakin pakin@cs.uiuc.edu Department of Computer Science University of Illinois at Urbana-Champaign 1304 W Springfield
More informationIntroduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution
Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction
More informationXgrid. The simple solution for distributed computing. Features
Xgrid The simple solution for distributed computing. Features Comes built into both Mac OS X and Mac OS X Server Lets you harness the underutilized power of your own workgroup s Mac computers or volunteered
More informationHigh Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand
High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand Hari Subramoni *, Ping Lai *, Raj Kettimuthu **, Dhabaleswar. K. (DK) Panda * * Computer Science and Engineering Department
More informationA Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
More informationImpact of Latency on Applications Performance
Impact of Latency on Applications Performance Rossen Dimitrov and Anthony Skjellum {rossen, tony}@mpi-softtech.com MPI Software Technology, Inc. 11 S. Lafayette Str,. Suite 33 Starkville, MS 39759 Tel.:
More informationIntroduction to MPIO, MCS, Trunking, and LACP
Introduction to MPIO, MCS, Trunking, and LACP Sam Lee Version 1.0 (JAN, 2010) - 1 - QSAN Technology, Inc. http://www.qsantechnology.com White Paper# QWP201002-P210C lntroduction Many users confuse the
More informationTechnical Computing Suite Job Management Software
Technical Computing Suite Job Management Software Toshiaki Mikamo Fujitsu Limited Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Outline System Configuration and Software Stack Features The major functions
More informationBlock based, file-based, combination. Component based, solution based
The Wide Spread Role of 10-Gigabit Ethernet in Storage This paper provides an overview of SAN and NAS storage solutions, highlights the ubiquitous role of 10 Gigabit Ethernet in these solutions, and illustrates
More information