HPC Software Requirements to Support an HPC Cluster Supercomputer
|
|
|
- Juniper Cox
- 10 years ago
- Views:
Transcription
1 HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software
2 Table of Contents Overview... 3 Cray HPC Cluster Software Stack... 4 Operating System... 6 Application Libraries... 6 Development Tools... 6 Performance Verification Software and Monitoring... 7 Cluster Management: System Management, Monitoring and Provisioning... 7 Network Fabric Management Resource Management and Scheduling Parallel File System Conclusion References WP-CCS-Software Page 2 of 12
3 Overview The dramatic growth of the high performance computing (HPC) market has been powered almost exclusively by the adoption of Linux clusters. The low cost and high performance of these systems has been the major driver for this growth. This advantage is always in danger of diminishing if system support software does not keep up with the rapid growth in hardware performance and complexity. A major cause of cluster complexity is proliferating hardware parallelism with systems now averaging thousands of processors. Each of the processors is a multicore chip with its own memory and inputoutput resources. The memory size, number of processing cores and overall performance of these nodes has been increasing for more than 20 years at a rate that doubles the performance approximately every 18 months. Add to this trend the additional issues of controlling power consumption and the difficulty of scaling many applications and cluster management tools quickly becomes a major problem. The global HPC server market in 2012 reached a record level of $11.1 billion, en route to more than $14 billion in 2016 (7.3%CAGR). Most market research firms emphasize the importance of alleviating cluster complexity through a coordinated strategy of investment and innovation to produce totally integrated HPC systems. They stress the software advances will be more important for HPC leadership than hardware progress. Countries and companies that underinvest in cluster software will lose ground. This makes it both important and urgent for companies building supercomputer-class HPC clusters to provide the right cluster software, management tools, system integration support and professional services to provide datacenter administrators and end users with the tools they need for problem solving. [1] This paper is intended for end users who are interested in implementing an HPC system and need to learn about the essential cluster software and management tools required to build and support cluster architecture. You will quickly understand why the software provided for an HPC cluster has become much more important than the differentiated cluster hardware. You will explore how Cray ( a leading provider of innovative supercomputing solutions, combines HPC open source software with key compatibility features of the Advanced Cluster Engine (ACE) management software to provide a complete software stack for its Cray CS300 cluster supercomputer product line. This paper will clarify why an HPC cluster software stack is required to build a powerful, flexible, reliable and highly available Linux supercomputing cluster architecture. This paper will also provide you with a close examination of Cray's HPC cluster software stack and management software features that have been addressing many customer pain points for medium to large HPC deployments. WP-CCS-Software Page 3 of 12
4 Cray HPC Cluster Software Stack The Cray HPC cluster software stack is a validated and compatible set of software components below the end user application layer and essential to support an entire supercomputer operation. This software stack includes programs that are unique to the architecture and are required to support the applications that will run on the system. Each node of the Cray CS300 cluster supercomputer is a shared memory multiprocessor with its own memory and independent operating system software. The nodes are interconnected by a high performance communications network that supports the rapid exchange of data between the individual computers. This network is normally based on InfiniBand technology and allows the individual computers to work together on problems that can be divided across multiple computers. The parts of these parallel programs execute independently and synchronize using the interconnect network fabric only when it is necessary for data sharing. The software stack for one of these machines can be divided into two groups of programs. The first is the software that runs on the individual nodes. The second is the software that aggregates the large number of individual computers into a single system. The software that makes up this second group very often runs on nodes that are specifically dedicated as management nodes. First let s discuss the software that runs on the individual nodes. Each node runs an independent operating system. The operating system used for most HPC systems is Linux. When Linux is installed on the nodes it is configured with a local file system and networking software that supports internode communications. The operating system, file system and network software are all part of the HPC software stack. In addition, this first software group includes libraries of routines optimized for the specific processors being used in the system. These libraries include math functions and other utilities that are repeatedly used by multiple applications. They also include Message Passing Interface (MPI) programs that support efficient message passing and synchronization between the nodes in a cluster. Software development is supported by codes that are also part of the software stack that runs on the individual nodes. These programs support the development of new codes and the maintenance of existing software. They include C Language and Fortran compilers plus debugging, test programs and performance verification programs that can be used to test system performance The second group of software includes three major codes. The first is the management system that allows all of the system hardware to be managed from a single point. The second is the system resource management and scheduling software that provides a single user interface for scheduling and executing parallel problems across all of the processors in the system. The third is the software that supports the routing of messages over high-speed networks such as the network fabric management. The following software components are part of the HPC cluster software stack for its Cray CS300 cluster supercomputer product line. Refer to Figure 1 for the Cray HPC Software Stack and Figure 2 for the entire Cray system architecture: Operating Systems: Linux Red Hat Enterprise Linux, CentOS and SUSE Application Libraries: MPI and Math Libraries Development Tools: Compilers, Debugging Tools Performance Verification Tools and Monitoring Cluster Management: System Management, Monitoring, Provisioning Network Fabric Management Resource Management and Scheduling Parallel File System WP-CCS-Software Page 4 of 12
5 Figure 1: Cray HPC Software Stack Figure 2: Cray CS300 Cluster Supercomputer Architecture WP-CCS-Software Page 5 of 12
6 Operating System Almost all HPC clusters use the Linux operating system. More than 10 versions of the Linux operating system are supported by specific vendors and by the open source community. The versions used for most commodity clusters are Red Hat Enterprise Linux, SUSE, CentOS and occasionally Scientific Linux. These versions are very similar and differ mostly in the installation programs supplied with the operating system software. A larger difference may be the level of support provided by the vendor. The same versions of these operating systems run on workstations, servers, and on the compute nodes of supercomputer clusters. The Cray CS300 cluster supercomputer supports Red Hat Enterprise Linux, SUSE and CentOS operating systems. The Cray Cluster Management System is delivered with Red Hat Enterprise Linux installed on the management nodes. It was selected because of the higher level of support available. Cray delivers preconfigured Red Hat, SUSE or CentOS compute node images. Red Hat and SUSE are priced on a per node basis. CentOS operating system is available as open source software that can be used at no cost. Application Libraries Computer applications make use of libraries to perform commonly used functions. These common functions such as math functions have been designed and optimized to make the most efficient use of a particular processor s capability. Libraries are available for message passing, mathematical functions, and the OpenFabrics Enterprise Distribution (OFED ). The Message Passing Interface (MPI) libraries are especially important to cluster operation since they support efficient messaging and synchronization between codes running on different processors in a cluster. These libraries simplify user programs since they can be included where required with little programming effort. Cray includes the open source MPI libraries available for the Linux operating system and offers the MPI libraries available with the Intel and PGI development suites. Cray also offers the Intel Math Kernel Library which provides Fortran routines and functions that perform a wide variety of operations on vectors and matrices including sparse matrices and interval matrices. The library also includes discrete Fourier transform routines, as well as vector mathematical and vector statistical functions with Fortran and C interfaces. Development Tools Development tools include C Language, Fortran and other compilers that are used to create software for HPC applications. These compilers include features that support creating and debugging code for parallel computing. Cray s HPC cluster software stack provides open source development software that is available with the Linux operating system and offers compilers and development tools from Intel or PGI. Intel Cluster Studio provides the tools needed to develop, test and run programs on clusters based on Intel processors. It bundles Intel s C, C++ and Fortran compilers and includes Intel s version of the MPI message passing protocol that allows server nodes to share work. It also includes various math and multithreading libraries to improve applications performance. The Intel MPI stack can scale to more than 90,000 MPI cores. It also includes support for redundant networks and supports both load sharing WP-CCS-Software Page 6 of 12
7 and failover. The debugger tools not only provide a single node view, they also provide a cluster-level view of multiple data gathering and reporting mechanisms. This software can support finding threading errors and memory leaks across an entire cluster. These tools are essential for supporting parallel computing. The Intel math libraries are essential to getting maximum performance from Intel processors. The PGI development suite provides compilers, MPI libraries, and math libraries for non-intel based clusters and also for Intel-based clusters that use NVIDIA GPUs. These systems are supported by the PGI CDK Cluster Development Kit, which includes support for C++ and Fortran compilers plus parallel libraries and debugging tools. The PGI CDK includes the tools needed to support NVIDIA GPU devices and the NVIDIA CUDA language. [2], [3] Performance Verification Software and Monitoring Cray s HPC cluster software stack includes the standard benchmarking and performance tools used to measure and verify cluster performance. Programs included are HPCC, Perfctr, IOR, Bonnie, PAPI/IPM and netperf. The stack also includes the OFED library that measures the latency and bandwidth for InfiniBand networks. Cluster Management: System Management, Monitoring and Provisioning Cluster management software started as simple remote server management programs based on the Intelligent Platform Management Interface Protocol (IMPI) and Simple Network Management Protocol (SNMP). The programs allowed servers that were remotely located to be turned on, turned off, reset and rebooted. They collected status from the servers and provided a serial I/O connection to the servers for interacting with the operating systems or with programs running on the servers. When clusters began to appear, these remote management programs were adapted to allow all of the nodes in a cluster to be managed from a single point. As clusters grew larger and more complex so did the cluster management programs. [4] Early clusters booted the operating system for a node from a local disk connected directly to the node. This meant that a copy of the operating system had to be stored at each node. Functions were added to the cluster management software to support updating and verifying the software on the nodes from a master copy. As clusters grew to thousands of nodes it created configuration control problems. It was a very slow process making it impractical to quickly load a different OS or library to the nodes. With the continued growth of clusters and I/O performance improvements, diskless compute nodes were introduced. The cluster management software for a diskless system needed to be able to quickly load a new operating system image to the nodes when required. This evolution first led to operating systems using so-called lightweight kernels to simplify the loading process, but these lightweight kernels loaded only the basic operating system kernel capabilities in memory. Newer cluster management systems allow full OS support with the ability to load transient routines quickly from high performance external storage devices managed by the cluster management system. Modern cluster management programs like Cray s Advanced Cluster Engine are able to reboot any or all of the nodes in a cluster within a few minutes. Refer to Figure 3 for the feature diagram of Cray s Advanced Cluster Engine (ACE) management software. WP-CCS-Software Page 7 of 12
8 Figure 3: Advanced Cluster Engine Management Software This feature makes it very easy to install and maintain the software on a cluster. The administrator needs to install or update only one library copy of the OS and applications. All of the servers get the new version as soon as they boot an updated image. This allows a single configuration controlled copy of the software to be loaded to the nodes in a cluster. The node s unique data such as /etc files are created by the cluster management software as a node is booted. This minimizes the storage requirements for the operating system image. This capability also allows logical clusters to be supported. Logical cluster operation allows one or more of the nodes in a cluster to be loaded with a specific operating system configuration while other nodes run different operating system images enabling multiple OS versions to be supported simultaneously on the same cluster. This architecture makes the use of a large cluster much more flexible since nodes can be reconfigured in minutes to best support different users. For example, a large cluster with thousands of nodes can be divided into several logical clusters running the same or different versions of Linux to support different working groups. A set of nodes can be configured to support a specific application and then loaded only when required. Ultimately, the entire machine could be reconfigured in minutes into a single cluster to run large overnight jobs. [5] ACE can simultaneously support hundreds of logical cluster configurations. Some of these may be active while others may be loaded onto nodes only infrequently as they are required to support specific user requirements. The integrated configuration management functions in the ACE software support up to 10 revisions of a compute node image for each logical cluster along with a rollback capability, making it ideal for testing new software packages and providing customized environments. WP-CCS-Software Page 8 of 12
9 The cluster configuration and topology are predefined in a database during the ACE installation. ACE verifies that the servers are properly connected and configured. It supports the capability to turn each individual server on and off, reset the server, or reboot the server. ACE has three interfaces, a command line interface, a graphical user interface, and a file system interface into the database. These interfaces provide the system administrator with a great degree of flexibility to customize the operation and monitoring of the cluster. It also supports remote BIOS and firmware updates. ACE includes the capability to discover and check the interconnect configuration and to recognize switches or links that are down, not operating properly or incorrectly connected. It can support fat tree and torus configurations in single-rail or multi-rail configurations. ACE completely supports diskless operation thereby simplifying the administration of the cluster. It provides a multilevel cached root file system, which contributes to the scalability of the system. Local disks are supported for scratch or temporary space. As clusters have grown to thousands of nodes, availability has become a major consideration. The failure of a single server, data link or switch cannot be allowed to stop the operation of the other nodes in a cluster. ACE supports dual management servers with automatic failover in the event that one of the servers fails. The servers are configured as a replicated pair with the standby server always keeping its database in sync with the active management server. This allows the standby server to take over with no interruption. Since the compute nodes in a cluster are inherently redundant, jobs can always be restarted by the resource manager and scheduler from a previous checkpoint or from the beginning. ACE allows servers that are part of an active logical cluster that have failed to be replaced with no changes required to the configuration. Networks can be configured with redundancy to support continuous operation. It is also designed to work with hierarchical active/active sub-management server pairs to allow systems to scale without diminishing performance. ACE is designed to display the status of servers, networks and logical clusters in an intuitive format. This allows the state of the entire cluster to be determined in a glance and problems to be recognized and dealt with quickly and easily. Example 1: ACE Servers Display WP-CCS-Software Page 9 of 12
10 Example 2: ACE Clusters Display Example 3: ACE Network Status Display Network Fabric Management The OpenFabrics Enterprise Distribution (OFED) Software is also included in the Cray HPC cluster software stack. Cray works closely with the OpenFabrics Alliance, non-profit organization to provide architectures, software repositories, interoperability tests, bug databases, workshops, and BSD- and GPLlicensed code to facilitate development. The OFED stack includes software drivers, core kernel-code, middleware, and user-level interfaces. It offers a range of standard protocols, including IPoIB (IP over InfiniBand) and DAPL (the Direct Access Programming Library). It also supports many other protocols, including various MPI implementations, and many file systems, including Lustre, and NFS over WP-CCS-Software Page 10 of 12
11 RDMA. This collection of codes is designed to support parallel processing on HPC clusters that use InfiniBand networking technology for interconnecting the nodes. Codes are included for managing both fat tree and torus network architectures. This includes managing the routing tables on the InfiniBand switches such as initializing the routing tables and updating the tables to route around connection failures. The library also includes the device driver software for the InfiniBand Host Channel Adapters used to connect the servers to an InfiniBand network. Routines are included to collect status and performance data from InfiniBand networks and programs to test specific network functions such as latency and bandwidth. Resource Management and Scheduling The operating system on a single computer manages the processor, memory and I/O resources. It schedules programs that are submitted for execution only when the resources needed to run the program are available and then monitors the execution of the programs. An HPC cluster needs a similar resource management and scheduling capability for the entire cluster. This has led to the development of applications such as LSF, PBS and Grid Engine. These programs keep track of the processor and memory resources for all of the nodes in a cluster. They submit jobs to the operating systems on the individual nodes when all of the resources are available to run a parallel application. They provide users with a single interface for submitting and monitoring jobs. ACE is designed to provide the system information needed by the cluster resource manager to properly schedule jobs. It is built with a specific interface for the Grid Engine Enterprise Edition resource manager and scheduler, and SLURM (Simple Linux Utility for Resource Management) that tightly integrates the scheduler into the management system. It is also compatible with other resource management and scheduling software, such as Altair s Portable Batch System (PBS Pro), IBM Platform s LSF, Torque/Maui, etc. Parallel File System ACE supports scalable, cached root file systems for diskless nodes, multiple global storage configurations, and high bandwidth to secondary storage. The Cray HPC cluster software stack offers support for common file system options such as the Network File System (NFS), Panasas PanFS and Lustre. Lustre is the most widely used file system on the TOP500 fastest computers in the world with over 60 percent of the top 100, and over 50 percent of the top 500. Lustre enables high performance by allowing system architects to use any common storage technologies along with high-speed interconnects and can scale well as an organization s storage needs grow. By providing multiple paths to the physical storage, the Lustre file system can provide high availability for HPC clusters. ACE supports all of the standard file systems that are used with the Linux operating system including ext3, ext4 and XFS. [6] WP-CCS-Software Page 11 of 12
12 Conclusion In this paper you had a complete overview of the essential cluster software and management tools that are required to build a powerful, flexible, reliable and highly available Linux supercomputer. You learned that Cray combines all the required HPC software tools including operating systems, provisioning, remote console/power management, cluster monitoring, parallel file system, scheduling, development tools and performance monitoring tools with key compatibility and additional powerful features of the Advanced Cluster Engine (ACE) management software to deliver a best-of-breed HPC cluster software stack to support its Cray CS300 supercomputer product line. Cray has also taken a customer-centric, technology-agnostic approach that offers the customer a wide range of hardware and software configurations based on the latest open standards technologies, innovative cluster tools and management software packaged with HPC professional services and support expertise. References [1] IDC Maintains HPC Server Revenue Forecast of 7% Growth Despite Flat Second Quarter [2] Intel Xeon Processor E5 Family [3] HPC Accelerating Science with Tesla GPUs [4] Mesos, A Platform for Fine-Grained Resource Sharing in the Data Center UC Berkeley Tech Report, May, 2010 [5] Adaptive Control of Extreme-scale Stream Processing Systems Proceedings of the 26th IEEE International Conference on Distributed Computing Systems. [6] "Rock-Hard Lustre: Trends in Scalability and Quality". Nathan Rutman, Xyratex Cray Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the copyright owners. Cray is a registered trademark, and the Cray logo, Cray CS300 and Advanced Cluster Engine are trademarks of Cray Inc. Other product and service names mentioned herein are the trademarks of their respective owners. WP-CCS-Software Page 12 of 12
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
Designed for Maximum Accelerator Performance
Designed for Maximum Accelerator Performance A dense, GPU-accelerated cluster supercomputer that delivers up to 329 double-precision GPU teraflops in one rack. This power- and spaceefficient system can
Smarter Cluster Supercomputing from the Supercomputer Experts
Smarter Cluster Supercomputing from the Supercomputer Experts Maximize Your Productivity with Flexible, High-Performance Cray CS400 Cluster Supercomputers In science and business, as soon as one question
Cluster Implementation and Management; Scheduling
Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /
Smarter Cluster Supercomputing from the Supercomputer Experts
Smarter Cluster Supercomputing from the Supercomputer Experts Lowers energy costs; datacenter PUE of 1.1 or lower Capable of up to 80 percent heat capture Maximize Your Productivity with Flexible, High-Performance
MPI / ClusterTools Update and Plans
HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Cray DVS: Data Virtualization Service
Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with
Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008
Sun in HPC Update for IDC HPC User Forum Tucson, AZ, Sept 2008 Bjorn Andersson Director, HPC Marketing Makia Minich Lead Architect, Sun HPC Software, Linux Edition Sun Microsystems Core Focus Areas for
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
Scaling from Workstation to Cluster for Compute-Intensive Applications
Cluster Transition Guide: Scaling from Workstation to Cluster for Compute-Intensive Applications IN THIS GUIDE: The Why: Proven Performance Gains On Cluster Vs. Workstation The What: Recommended Reference
ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK
ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK KEY FEATURES PROVISION FROM BARE- METAL TO PRODUCTION QUICKLY AND EFFICIENTLY Controlled discovery with active control of your hardware Automatically
Lustre Networking BY PETER J. BRAAM
Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information
IBM Global Technology Services September 2007. NAS systems scale out to meet growing storage demand.
IBM Global Technology Services September 2007 NAS systems scale out to meet Page 2 Contents 2 Introduction 2 Understanding the traditional NAS role 3 Gaining NAS benefits 4 NAS shortcomings in enterprise
PRIMERGY server-based High Performance Computing solutions
PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
Introduction to Gluster. Versions 3.0.x
Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster
InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment
December 2007 InfiniBand Software and Protocols Enable Seamless Off-the-shelf Deployment 1.0 Introduction InfiniBand architecture defines a high-bandwidth, low-latency clustering interconnect that is used
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Oracle Solaris: Aktueller Stand und Ausblick
Oracle Solaris: Aktueller Stand und Ausblick Detlef Drewanz Principal Sales Consultant, EMEA Server Presales The following is intended to outline our general product direction. It
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School
Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap
Technical Overview of Windows HPC Server 2008
Technical Overview of Windows HPC Server 2008 Published: June, 2008, Revised September 2008 Abstract Windows HPC Server 2008 brings the power, performance, and scale of high performance computing (HPC)
Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise
Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Introducing Unisys All in One software based weather platform designed to reduce server space, streamline operations, consolidate
SUN HPC SOFTWARE CLUSTERING MADE EASY
SUN HPC SOFTWARE CLUSTERING MADE EASY New Software Access Visualization Workstation, Thin Clients, Remote Access Developer Management OS Compilers, Workload, Linux, Solaris Debuggers, Systems and Optimization
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
Computer Organization & Architecture Lecture #19
Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of
Mellanox Academy Online Training (E-learning)
Mellanox Academy Online Training (E-learning) 2013-2014 30 P age Mellanox offers a variety of training methods and learning solutions for instructor-led training classes and remote online learning (e-learning),
New Storage System Solutions
New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems
Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET
CRAY XT3 DATASHEET Cray XT3 Supercomputer Scalable by Design The Cray XT3 system offers a new level of scalable computing where: a single powerful computing system handles the most complex problems every
SAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being
Brocade Solution for EMC VSPEX Server Virtualization
Reference Architecture Brocade Solution Blueprint Brocade Solution for EMC VSPEX Server Virtualization Microsoft Hyper-V for 50 & 100 Virtual Machines Enabled by Microsoft Hyper-V, Brocade ICX series switch,
Quantum StorNext. Product Brief: Distributed LAN Client
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000
Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000 Dot Hill Systems introduction 1 INTRODUCTION Dot Hill Systems offers high performance network storage products
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband
Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering [email protected] Company Overview
Benefits of multi-core, time-critical, high volume, real-time data analysis for trading and risk management
SOLUTION B L U EPRINT FINANCIAL SERVICES Benefits of multi-core, time-critical, high volume, real-time data analysis for trading and risk management Industry Financial Services Business Challenge Ultra
SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center
SRNWP Workshop HP Solutions and Activities in Climate & Weather Research Michael Riedmann European Performance Center Agenda A bit of marketing: HP Solutions for HPC A few words about recent Met deals
Skynax. Mobility Management System. System Manual
Skynax Mobility Management System System Manual Intermec by Honeywell 6001 36th Ave. W. Everett, WA 98203 U.S.A. www.intermec.com The information contained herein is provided solely for the purpose of
PARALLELS SERVER 4 BARE METAL README
PARALLELS SERVER 4 BARE METAL README This document provides the first-priority information on Parallels Server 4 Bare Metal and supplements the included documentation. TABLE OF CONTENTS 1 About Parallels
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer [email protected] Agenda Session Length:
Red Hat Global File System for scale-out web services
Red Hat Global File System for scale-out web services by Subbu Krishnamurthy (Based on the projects by ATIX, Munich, Germany) Red Hat leads the way in delivering open source storage management for Linux
A Tour of the Linux OpenFabrics Stack
A Tour of the OpenFabrics Stack Johann George, QLogic June 2006 1 Overview Beyond Sockets Provides a common interface that allows applications to take advantage of the RDMA (Remote Direct Memory Access),
Get More Scalability and Flexibility for Big Data
Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and
INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering
INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering Enquiry No: Enq/IITK/ME/JB/02 Enquiry Date: 14/12/15 Last Date of Submission: 21/12/15 Formal quotations are invited for HPC cluster.
CHAPTER 15: Operating Systems: An Overview
CHAPTER 15: Operating Systems: An Overview The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint
Hadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 [email protected] THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions
NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions 1 NEC Corporation Technology solutions leader for 100+ years Established 1899, headquartered in Tokyo First Japanese joint
Open-E Data Storage Software and Intel Modular Server a certified virtualization solution
Open-E Data Storage Software and Intel Modular Server a certified virtualization solution Contents 1. New challenges for SME IT environments 2. Open-E DSS V6 and Intel Modular Server: the ideal virtualization
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
How To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
Enabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
Bright Cluster Manager
Bright Cluster Manager For HPC, Hadoop and OpenStack Craig Hunneyman Director of Business Development Bright Computing [email protected] Agenda Who is Bright Computing? What is Bright
Amazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
PARALLELS SERVER BARE METAL 5.0 README
PARALLELS SERVER BARE METAL 5.0 README 1999-2011 Parallels Holdings, Ltd. and its affiliates. All rights reserved. This document provides the first-priority information on the Parallels Server Bare Metal
Product Brief. DC-Protect. Content based backup and recovery solution. By DATACENTERTECHNOLOGIES
Product Brief DC-Protect Content based backup and recovery solution By DATACENTERTECHNOLOGIES 2002 DATACENTERTECHNOLOGIES N.V. All rights reserved. This document contains information proprietary and confidential
What s new in Hyper-V 2012 R2
What s new in Hyper-V 2012 R2 Carsten Rachfahl MVP Virtual Machine Rachfahl IT-Solutions GmbH & Co KG www.hyper-v-server.de Thomas Maurer Cloud Architect & MVP itnetx gmbh www.thomasmaurer.ch Before Windows
Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware
Introduction By leveraging the inherent benefits of a virtualization based platform, a Microsoft Exchange Server 2007 deployment on VMware Infrastructure 3 offers a variety of availability and recovery
Overview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
Configuration Maximums
Topic Configuration s VMware vsphere 5.1 When you select and configure your virtual and physical equipment, you must stay at or below the maximums supported by vsphere 5.1. The limits presented in the
Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms
EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Red Hat* Cloud Foundations Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Red Hat* Cloud Foundations
Windows Server 2008 R2 Hyper V. Public FAQ
Windows Server 2008 R2 Hyper V Public FAQ Contents New Functionality in Windows Server 2008 R2 Hyper V...3 Windows Server 2008 R2 Hyper V Questions...4 Clustering and Live Migration...5 Supported Guests...6
WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression
WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
Operating System for the K computer
Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements
Red Hat Storage Server
Red Hat Storage Server Marcel Hergaarden Solution Architect, Red Hat [email protected] May 23, 2013 Unstoppable, OpenSource Software-based Storage Solution The Foundation for the Modern Hybrid
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
High-Performance Computing Clusters
High-Performance Computing Clusters 7401 Round Pond Road North Syracuse, NY 13212 Ph: 800.227.3432 Fx: 315.433.0945 www.nexlink.com What Is a Cluster? There are several types of clusters and the only constant
WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION
WHITE PAPER A PRIL 2002 C ONTENTS Introduction 1 Overview 2 Features 2 Architecture 3 Monitoring 4 ICE Box 4 Events 5 Plug-ins 6 Image Manager 7 Benchmarks 8 ClusterWorX Lite 8 Cluster Management Solution
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
Veritas Storage Foundation High Availability for Windows by Symantec
Veritas Storage Foundation High Availability for Windows by Symantec Simple-to-use solution for high availability and disaster recovery of businesscritical Windows applications Data Sheet: High Availability
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
SAS Business Analytics. Base SAS for SAS 9.2
Performance & Scalability of SAS Business Analytics on an NEC Express5800/A1080a (Intel Xeon 7500 series-based Platform) using Red Hat Enterprise Linux 5 SAS Business Analytics Base SAS for SAS 9.2 Red
IBM Deep Computing Visualization Offering
P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: [email protected] Summary Deep Computing Visualization in Oil & Gas
VMWARE WHITE PAPER 1
1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
Top Ten Reasons for Deploying Oracle Virtual Networking in Your Data Center
Top Ten Reasons for Deploying Oracle Virtual Networking in Your Data Center Expect enhancements in performance, simplicity, and agility when deploying Oracle Virtual Networking in the data center. ORACLE
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building
