Smarter Cluster Supercomputing from the Supercomputer Experts

Similar documents
Smarter Cluster Supercomputing from the Supercomputer Experts

Designed for Maximum Accelerator Performance

Designed for Maximum Accelerator Performance

HPC Software Requirements to Support an HPC Cluster Supercomputer

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Cray Cluster Supercomputers. John Lee VP of Advanced Technology Solutions CUG 2013

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Scaling from Workstation to Cluster for Compute-Intensive Applications

Scaling Across the Supercomputer Performance Spectrum

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

A-CLASS The rack-level supercomputer platform with hot-water cooling

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

Sun Constellation System: The Open Petascale Computing Architecture

HUAWEI Tecal E6000 Blade Server

HPC Update: Engagement Model

PRIMERGY server-based High Performance Computing solutions

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

SUN HPC SOFTWARE CLUSTERING MADE EASY

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

WHITE PAPER SGI ICE X. Ultimate Flexibility for the World s Fastest Supercomputer

Simplify Data Management and Reduce Storage Costs with File Virtualization

FLOW-3D Performance Benchmark and Profiling. September 2012

The Asterope compute cluster

SGI High Performance Computing

Cluster Implementation and Management; Scheduling

IBM BladeCenter H with Cisco VFrame Software A Comparison with HP Virtual Connect

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

DATA WAREHOuSIng eb Data Warehouse 6700

Data Sheet FUJITSU Server PRIMERGY CX400 M1 Multi-Node Server Enclosure

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

High-Performance Computing Clusters

Fujitsu HPC Cluster Suite

supercomputing. simplified.

HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6800 Data Center Server

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

An Introduction to the Gordon Architecture

CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers

IBM System x family brochure

Hadoop on the Gordon Data Intensive Cluster

UCS M-Series Modular Servers

Microsoft Private Cloud Fast Track Reference Architecture

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

præsentation oktober 2011

Fujitsu PRIMERGY Servers Portfolio

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Cooling and thermal efficiently in

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

EMC SYMMETRIX VMAX 10K

STORAGE CENTER WITH NAS STORAGE CENTER DATASHEET

FUJITSU Enterprise Product & Solution Facts

Hyperscale. The new frontier for HPC. Philippe Trautmann. HPC/POD Sales Manager EMEA March 13th, 2011

SUN ORACLE EXADATA STORAGE SERVER

OPTIMIZING SERVER VIRTUALIZATION

Cisco UCS B-Series M2 Blade Servers

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK

IBM System x family brochure

Data Sheet FUJITSU Server PRIMERGY CX420 S1 Out-of-the-box Dual Node Cluster Server

High Performance 2U Storage Server. Mass Storage Capacity in a 2U Chassis

We Take Supercomputing Personally

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

HP Moonshot System. Table of contents. A new style of IT accelerating innovation at scale. Technical white paper

EMC SYMMETRIX VMAX 20K STORAGE SYSTEM

CAS2K5. Jim Tuccillo

Mellanox Academy Online Training (E-learning)

What Is Microsoft Private Cloud Fast Track?

Evaluation of Dell PowerEdge VRTX Shared PERC8 in Failover Scenario

How To Build A Cisco Ukcsob420 M3 Blade Server

SUN ORACLE DATABASE MACHINE

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

Overview of HPC systems and software available within

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Current Status of FEFS for the K computer

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

EMC VMAX3 FAMILY VMAX 100K, 200K, 400K

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

HGST Virident Solutions 2.0

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

How To Write An Article On An Hp Appsystem For Spera Hana

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET

PolyServe Matrix Server for Linux

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

Technical Computing Suite Job Management Software

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Blade Servers & Virtualization

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Transcription:

Smarter Cluster Supercomputing from the Supercomputer Experts Lowers energy costs; datacenter PUE of 1.1 or lower Capable of up to 80 percent heat capture

Maximize Your Productivity with Flexible, High-Performance Cray CS400 Liquid- Cooled Cluster Supercomputers In science and business, as soon as one question is answered another is waiting. And with so much depending on fast, accurate answers to complex problems, you need reliable high performance computing (HPC) tools matched to your specific tasks. Understanding that time is critical and all HPC problems are not created equal, we developed the Cray CS400 cluster supercomputer series. These systems are industry standards-based, highly customizable, easy to manage, and purposefully designed to handle the broadest range of medium- to large-scale simulations and data-intensive workloads. All CS400 components have been carefully selected, optimized and integrated to create a powerful, reliable high-performance compute environment capable of scaling to over 27,000 compute nodes and 46 peak petaflops. Flexible node configurations featuring the latest processor and interconnect technologies mean you can get to the solution faster by tailoring a system to your specific HPC applications needs. Innovations in packaging, power, cooling and density translate to superior energy efficiency and compelling price/performance. Expertly engineered system management software instantly boosts your productivity by simplifying system administration and maintenance, even for very large systems. Cray has long been a leader in delivering tightly integrated supercomputer systems for large-scale deployments. With the CS400 system, you get that same Cray expertise and productivity in a flexible, standards-based and easy-to-manage cluster supercomputer. CS400-LC Cluster Supercomputer: Liquid-Cooled and Designed for Your Workload The CS400-LC system is our direct-to-chip warm water-cooled cluster supercomputer. Designed for significant energy savings, it features liquid-cooling technology that uses heat exchangers instead of chillers to cool system components. Compared to traditional air-cooled clusters, the CS400-LC system can deliver three times more energy efficiency with typical payback cycles ranging from immediate to one year. Along with lowering operational costs, the CS400-LC system offers the latest x86 processor technologies from Intel in a highly scalable package. Industry-standard server nodes and components have been optimized for HPC and paired with a comprehensive HPC software stack, creating a unified system that excels at capacityand data-intensive workloads.

Innovative Liquid Cooling Keeps Your System Cool and Energy Costs Low Designed to minimize power consumption without compromising performance, the CS400-LC cluster supercomputer uses an innovative heat exchange system to cool system processors and memory. Outdoor Dry Cooler The heat exchange cooling process starts with a coolant distribution unit (CDU), connected to each rack, and two separate cooling loops. One loop delivers warm or cool facility water to the CDU, where the heat is exchanged and the now-hot facility water exits the other end of the loop. A second loop repeats the process at the server level. A double-sealed low-pressure secondary loop, with dripless quick connects, cools the critical server components. It delivers cooled liquid to the servers where pump/cold plate units atop the processors capture the heat, and the now-hot liquid circulates back to the CDU for heat exchange. Facility water and server loop liquid never mix liquid-to-liquid heat exchangers within the CDU transfer heat between the loops. Facility Water Low Pressure Server Loop Coolant Distribution Unit (CDU) Server Cooler This isolated dual-loop design safeguards the nodes. First, the server loop is low pressure and low flow server loop components are not subject to the high pressure of the facility loop. Second, the server loop is prefilled with nonconductive, deionized water containing additives to prevent corrosion. Since it requires less powerful fans on the servers and fewer air conditioning units in the facility, the CS400-LC system reduces typical energy consumption by 50 percent with predicted power usage effectiveness (PUE) of 1.1 or lower. The system can also capture up to 80 percent of heat from the server components for possible reuse. Additionally, leak detection and prevention features are tightly integrated with the system remote monitoring and reporting capabilities. Choice of Flexible, Scalable Configurations Flexibility is at the heart of the Cray CS400-LC system design. At the system level, the CS400-LC cluster is built on the Cray GreenBlade platform. Comprising server blades and chassis, the platform is designed to provide mix-and-match building blocks for easy, flexible configurations, at both the node and whole system level. Among its advantages, the GreenBlade platform offers high density (up to 60 compute nodes per 42U rack), excellent memory capacity (up to 1,024 GB per node), many power and cooling efficiencies and a built-in management module for industry-leading reliability. CS400-LC Hardware Configuration Options Two-socket x86 Intel Xeon processors Large memory capacity per node Multiple interconnect options: 3D torus/fat tree, single/dual rail FDR InfiniBand Local hard drives in each server Choice of network-attached file systems and Lustre-based parallel file storage systems The CS400-LC system features the latest Intel Xeon processors. It offers multiple interconnect and network topology options, maximum bandwidth, local storage, many network-attached file system options and the ability to integrate with the Cray Sonexion scaleout Lustre system, providing fast, high-performance scratch and primary storage. Within this framework, the Cray CS400-LC system can be tailored to multiple purposes from an all-purpose cluster, to one suited for shared memory parallel tasks, to a system optimized for hybrid compute- and data-intensive workloads. Nodes are divided by function into compute and service nodes. Compute nodes run parallel MPI and/or Open MP tasks with maximum efficiency, while service nodes provide I/O and login functions. Compute nodes feature two Intel Xeon processors per node and up to 1,024 gigabytes of memory. Each node can host one local hard drive. With industry-standard components throughout, each system configuration can be replicated over and over to create a reliable and powerful large-scale system.

Easy, Comprehensive Manageability A flexible system is only as good as your ability to use it. The Cray CS400-LC cluster supercomputer offers two key productivity-boosting tools a customizable HPC cluster software stack and the Cray Advanced Cluster Engine (ACE ) system management software. Cray HPC Cluster Software Stack The HPC cluster software stack consists of a range of software tools compatible with most open source and commercial compilers, debuggers, schedulers and libraries. Also available as part of the software stack is the Cray Programming Environment, which includes the Cray Compiling Environment, Cray Scientific and Math Libraries, and Performance Measurement and Analysis Tools. HPC Programming Tools Schedulers, File Systems and Management Operating Systems and Drivers Development & Performance Tools Application Libraries Debuggers Resource Management / Job Scheduling Cray PE on CS Cray LibSci, LibSci_ACC Intel Parallel Studio XE Cluster Edition Intel MPI PGI Cluster Development GNU Toolchain NVIDIA CUDA Kit IBM Platform MPI MVAPICH2 OpenMPI Rogue Wave TotalView Allinea DDT, MAP Intel IDB PGI PGDBG GNU GDB SLURM Adaptive Computing Moab, Maui, Torque Altair PBS Professional IBM Platform LSF Grid Engine File Systems Lustre NFS GPFS Panasas PanFS Local (ext3, ext4, XFS) Cluster Management Drivers & Network Mgmt. Operating Systems Cray Advanced Cluster Engine (ACE ) Management Software Accelerator Software Stack & Drivers Linux (Red Hat, CentOS) OFED Cray Advanced Cluster Engine (ACE ) Hierarchical, Scalable Framework for Management, Monitoring and File Access MANAGEMENT MONITORING FILE ACCESS Hierarchical management infrastructure Divides the cluster into multiple logical partitions, each with unique personality Revision system with rollback Remote management and remote power control GUI and CLI to view/change/control, monitor health; plug-in capability Automatic server/network discovery Scalable, fast, diskless booting High availability, redundancy, failover Cluster event data available in real-time without affecting job performance Node, IB network status BIOS, HCA information Disk, memory, PCIe errors Temperatures, fan speeds Load averages Memory and swap usage Sub-rack and node power I/O status RootFS high-speed, cached access to root file system allowing for scalable booting High-speed network access to external storage ACE-managed, high-availability NFS storage The Advanced Cluster Engine (ACE) management software simplifies cluster management for large scale-out environments with extremely scalable network, server, cluster and storage management capabilities. Command line (CLI) and graphical user interface (GUI) options provide flexibility for the cluster administrator. An easy-to-use ACE GUI connects directly to the ACE daemon on the management server and can be executed on a remote system. With ACE, a large system is almost as easy to understand and manage as a workstation. ACE at a Glance Simplifies compute, network and storage management Supports multiple network topologies and diskless configurations with optional local storage Provides network failover with high scalability Integrates easily with standards-based HPC software stack components Manages heterogeneous nodes with different software stacks Monitors node and network health, power and component temperatures

Built-in Energy Efficiencies and Reliability Features Lower Your TCO Energy efficiency features, combined with our long-standing expertise in meeting the reliability demands of very large, high-usage deployments means you get more work done for less. In addition to liquid cooling, the CS400-LC options for additional power and cost savings include high-efficiency load balancing power supplies and a 480V power distribution unit with a choice of 208V or 277V three-phase power supplies. It means you can use industry-standard 208V and 230V power as well as 277V (single-phase of a 480V three-phase input) and reduce power loss caused by step-down transformers and resistive losses as the power is delivered from the wall directly to the rack. Reliability is built into the system design, starting with our careful selection of boards and components. Then multiple levels of redundancy and fault tolerance ensure the system meets your uptime needs. The CS400-LC cluster has redundant power, cooling and management servers and redundant networks all with failover capabilities. Intel Xeon Processor E5-2600 Product Family The Intel Xeon processor is at the heart of the agile, efficient datacenter. Built on Intel s industry-leading microarchitecture based on the 14nm second-generation Tri-Gate transistor technology, the Intel Xeon processor supports high-speed DDR4 memory technology with increased bandwidth, larger density and lower voltage over previous generations. The Intel support for PCI Express (PCIe) 3.0 ports improves I/O bandwidth, offering extra capacity and flexibility for storage and networking connections. The processor delivers energy efficiency and performance that adapts to the most complex and demanding workloads.

Cray CS400-LC Specifications Architecture Processor, Coprocessor and Accelerators Memory Interconnect and Networks System Administration Reliable, Available, Serviceable (RAS) Resource Management and Job Scheduling Liquid-cooled cluster architecture, up to 60 nodes per 42U rack Support for 12-core, 64-bit, Intel Xeon processor E5-2600 v4 product family Up to 1,024 GB registered ECC DDR4 RAM per compute node using 16 x 64GB DDR4 DIMMs External I/O interface 10 GbE Ethernet FDR InfiniBand with ConnectIB, QDR True Scale Host Channel Adapters or Intel Omni-Path Host Fabric Interface Options for single- or dual-rail fat tree or 3D torus Advanced Cluster Engine (ACE ) Complete remote management capability Graphical and command line system administration System software version rollback capability Redundant management servers with automatic failover Automatic discovery and status reporting of interconnect, server and storage hardware Ability to detect hardware and interconnect topology configuration errors Cluster partitioning into multiple logical clusters, each capable of hosting a unique software stack Remote server control (power on/off, cycle) and remote server initialization (reset, reboot, shut down) Scalable fast diskless booting for large node systems and root file systems for diskless nodes Redundant power, cooling and management servers with failover capabilities Redundant networks (InfiniBand, GbE and 10 GbE) with failover All critical components easily accessible and hot swappable Options for SLURM, Altair PBS Professional, IBM Platform LSF, Adaptive Computing Torque, Maui and Moab, and Grid Engine File System Disk Storage Operating System Performance Monitoring Tools Compilers, Libraries and Tools Cray Sonexion, NFS, Local FS (Ext3, Ext4 XFS) Lustre and Panasas PanFS available as global file systems Cray TAS, an open, capacity-optimized, tiered data system Full line of FC-attached disk arrays with support for FC, SATA disk drives and SSDs Red Hat, SUSE or CentOS Open source packages such as HPCC, Perfctr, IOR, PAPI/IPM, netperf Options for Open MPI, MVAPICH2 or Intel MPI Libraries Cray Compiler Environment (CCE), Cray LibSci, PGI, Intel Cluster Toolkit compilers, NVIDIA CUDA, CUDA C/ C++, Fortran OpenCL, DirectCompute Toolkits, GNU, DDT, TotalView, OFED programming tools and many others Power Liquid Cooling Features Cabinet Dimensions (HxWxD) Cabinet Weight Up to 38 kw per cabinet depending on configuration 208V/230V/277V power Optional 480V power distribution with 277V power supplies Low-pressure secondary loop completely isolated from primary datacenter liquid loop Field-serviceable cooling kits with integrated pressure and leak detection with remote monitoring 82.40 (2,093 mm) H x 23.62 (600 mm) W x 59.06 (1,500 mm) D standard 42U/19 rack cabinet 1,739 lbs. Cray Inc. 901 Fifth Avenue, Suite 1000 Seattle, WA 98164 Tel: 206.701.2000 Fax: 206.701.2500 www.cray.com 2014-2015 Cray Inc. All rights reserved. Specifications are subject to change without notice. Cray is a registered trademark of Cray Inc. All other trademarks mentioned herein are the properties of their respective owners. 20160322EMS