Designed for Maximum Accelerator Performance



Similar documents
Designed for Maximum Accelerator Performance

Smarter Cluster Supercomputing from the Supercomputer Experts

Smarter Cluster Supercomputing from the Supercomputer Experts

HPC Software Requirements to Support an HPC Cluster Supercomputer

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Cray Cluster Supercomputers. John Lee VP of Advanced Technology Solutions CUG 2013

A-CLASS The rack-level supercomputer platform with hot-water cooling

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Scaling Across the Supercomputer Performance Spectrum

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

FLOW-3D Performance Benchmark and Profiling. September 2012

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

HPC Update: Engagement Model

Keeneland Enabling Heterogeneous Computing for the Open Science Community Philip C. Roth Oak Ridge National Laboratory

Sun Constellation System: The Open Petascale Computing Architecture

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

HUAWEI Tecal E6000 Blade Server

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Data Sheet FUJITSU Server PRIMERGY CX272 S1 Dual socket server node for PRIMERGY CX420 cluster server

MESOS CB220. Cluster-in-a-Box. Network Storage Appliance. A Simple and Smart Way to Converged Storage with QCT MESOS CB220

We Take Supercomputing Personally

Mellanox Academy Online Training (E-learning)

Data Sheet FUJITSU Server PRIMERGY CX400 M1 Multi-Node Server Enclosure

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Cluster Implementation and Management; Scheduling

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

High Performance 2U Storage Server. Mass Storage Capacity in a 2U Chassis

A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing.

The Asterope compute cluster

Cisco Unified Computing System Hardware

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

PRIMERGY server-based High Performance Computing solutions

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

UCS M-Series Modular Servers

Unified Computing Systems

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

Hadoop on the Gordon Data Intensive Cluster

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

Scaling from Workstation to Cluster for Compute-Intensive Applications

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

SUN HPC SOFTWARE CLUSTERING MADE EASY

Data Sheet Fujitsu PRIMERGY CX122 S1 Cloud server unit for PRIMERGY CX1000

præsentation oktober 2011

Fujitsu PRIMERGY Servers Portfolio

HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6800 Data Center Server

MPI / ClusterTools Update and Plans

IBM System x family brochure

SUN ORACLE EXADATA STORAGE SERVER

Overview of HPC systems and software available within

Simplify Data Management and Reduce Storage Costs with File Virtualization

FUJITSU Enterprise Product & Solution Facts

Up to 4 PCI-E SSDs Four or Two Hot-Pluggable Nodes in 2U

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Accelerating CFD using OpenFOAM with GPUs

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Data Sheet FUJITSU Server PRIMERGY CX420 S1 Out-of-the-box Dual Node Cluster Server

Building Clusters for Gromacs and other HPC applications

SGI High Performance Computing

IBM BladeCenter H with Cisco VFrame Software A Comparison with HP Virtual Connect

Cooling and thermal efficiently in

How To Write An Article On An Hp Appsystem For Spera Hana

High-Performance Computing Clusters

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

IBM System x family brochure

WHITE PAPER SGI ICE X. Ultimate Flexibility for the World s Fastest Supercomputer

CAS2K5. Jim Tuccillo

Cisco UCS C24 M3 Server

High Performance 1U Server. High Performance 1U Storage Server. Front Parallel CPU Design features better cooling

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Fujitsu HPC Cluster Suite

New levels of efficiency and optimized design. The latest Intel CPUs. 2+1 expandability in UP 1U

SMB Direct for SQL Server and Private Cloud

An Introduction to the Gordon Architecture

Data Sheet Fujitsu PRIMERGY BX400 S1 Blade Server

Advancing Applications Performance With InfiniBand

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Hyperscale. The new frontier for HPC. Philippe Trautmann. HPC/POD Sales Manager EMEA March 13th, 2011

Cisco 7816-I5 Media Convergence Server

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET

Data Sheet FUJITSU Server PRIMERGY CX400 S2 Multi-Node Server Enclosure

LANL Computing Environment for PSAAP Partners

Private cloud computing advances

RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES

Cisco MCS 7825-H3 Unified Communications Manager Appliance

HP ProLiant SL270s Gen8 Server. Evaluation Report

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

High Performance Computing in CST STUDIO SUITE

Deliver More Applications for More Users

Transcription:

Designed for Maximum Accelerator Performance A dense, GPU-accelerated cluster supercomputer that delivers up to 329 double-precision GPU teraflops in one rack. This power- and spaceefficient system can be combined with Cray s optimized software stack to meet your demanding workload needs. Built by Cray the world leader in supercomputing systems.

Cray CS-Storm Powers Application Performance A perfect storm is brewing. Data volumes are increasing exponentially and mathematical models are growing ever more sophisticated and computationally intensive, but power and cooling limit the size of systems that can be deployed. These considerations are driving the use of GPUs as accelerators for computationally-intense workloads. Accelerators are commonly deployed in 1:1 or 2:1 configurations in blade systems with conventional CPUs. Applications typically run across the two types of processors, with 95% on the conventional CPU and the computationally-intense 5% on the accelerator 1. However, many applications in sectors such as financial services, oil and gas, defense and law enforcement are well suited to running almost entirely on GPUs. For this class of applications, the conventional processor s role is often restricted to housekeeping functions and being a conduit that carries data into the accelerator to be processed. The benefits are immense: a single, high-density rack dedicated to GPU computation can deliver double-precision performance up to 1/3 petaflop with very low power consumption. Cray s CS-Storm system is designed specifically for these demanding applications and delivers the compute power to quickly and efficiently convert large streams of raw data into actionable information. The CS-Storm system is able to drive the most powerful NVIDIA Tesla K80 GPUs at full speed with no throttling for power or cooling, and with demonstrated reliability month after month. Cray CS-Storm Accelerator-Optimized Cluster Solution System Overview The Cray CS series of cluster supercomputers offers a scalable architecture of high performance servers, network and software tools that can be fully integrated and managed as stand-alone systems. The CS-Storm cluster, an accelerator-optimized system built from multi-gpu server nodes, is designed for massively parallel computing workloads. Each Cray CS-Storm cluster rack can hold up to 22 2U rack-mounted CS-Storm server nodes. Each of these servers integrates eight accelerators and two Intel Xeon processors, delivering double-precision compute performance of up to up to 329 GPU teraflops in one 48U rack. The system can support both single- and double-precision floating-point applications. The Cray CS-Storm system provides a high performance software and hardware infrastructure running parallel MPI, OpenMP Unified Parallel C (UPC) or OpenSHMEM tasks with maximum efficiency. The system provides optimized I/O connectivity and flexible user login capabilities. It is available with a comprehensive HPC software stack compatible with open-source and commercial compilers, debuggers, schedulers and libraries. The system is also available with the optional Cray Programming Environment for Cluster Systems (Cray PE on CS), which includes the Cray Compiling Environment, Cray Scientific and Math Libraries, and Performance Measurement and Analysis Tools. The system can be easily managed with Cray s Advanced Cluster Engine (ACE ) management software, which provides network, server and cluster capabilities with easy system administration and maintenance for scale-out environments. HPC Workloads The CS-Storm system is well suited for HPC workloads in the defense, oil and gas, financial services, life sciences and business intelligence sectors. Typical applications include cybersecurity, geospatial intelligence, signal processing, portfolio and trading algorithm optimization, pattern recognition, machine learning and in-memory databases. Global Leader in Supercomputing Cray s ability to turn pioneering hardware and software technologies into world-renowned supercomputing solutions is the result of decades of dedication to high performance computing. From technical enterprise- to petaflops-sized solutions, our systems enable tremendous scientific achievement by increasing productivity, reducing risk and decreasing time to solution. For More Information To learn more about the Cray CS-Storm cluster visit www.cray.com/cs-storm or contact your Cray representative. 1 Source: NVIDIA analysis of GPU-accelerated codes

Features and Benefits Scalable System with Maximum Performance Up to eight NVIDIA Tesla K40 or K80 GPU accelerators per node (PCIe Gen3 cards, up to 300W each) With K40 accelerators, double-precision performance of up to 11.4 GPU teraflops per node With K80 accelerators, double-precision performance of up to 15 GPU teraflops per node Completely air-cooled platform without compromising system performance Available with liquid-cooled rear-door heat exchangers, to maximize server density with minimal thermal impact to the datacenter Multiple interconnect topology options including fat tree, single/dual rail, QDR/FDR/EDR InfiniBand Cray Programming Environment on Cluster Systems (Cray PE on CS) Compatible with Cray Tiered Adaptive Storage (TAS) and Cray Sonexion storage solutions Customizable HPC cluster software stack options including multiple Linux OS distributions, message passing (MPI) libraries, compilers, debuggers and performance tools Simplified Management Using Cray ACE Software Simplifies manageability for scale-out systems Manages network, server, cluster and storage Manages diverse node types with different OS environments Fine-grain system power and temperature monitoring Export and import system configurations and images Detects hardware, fabric topology configuration errors Version control and ability to roll back changes Supports most commonly-deployed open-source and commercial resource managers and job schedulers Intel Xeon Host Processors The Cray CS-Storm cluster supercomputer is available with Intel Xeon E5 2600 v3 (up to 145 watts) host processors. These flexible CPUs, designed for technical computing and datacenter deployment, can address a wide range of application, performance and energy efficiency requirements. NVIDIA Tesla K40 and K80 GPU Computing Accelerators GPU-accelerated computing offers unprecedented application performance by offloading the compute-intensive portions of the application to the GPU. The NVIDIA Tesla K40 GPU accelerator combines 2,880 cores and 12 GB of QDDR5 memory to deliver 1.43 TF of double precision performance. The NVIDIA Tesla K80 GPU accelerator combines a total of 4,992 cores and 24 GB of QDDR5 memory to deliver up to 1.87 TF of double precision performance. GPU Boost uses available power headroom to potentially deliver additional performance. The Cray CS-Storm system can deliver a full 300W to each GPU, enabling them to operate continuously at the highest level of performance. Reliable, Efficient and Serviceable Multiple levels of redundancy and fault tolerance can be configured to meet uptime needs Worldwide Cray support and service options

Cray HPC Cluster Software Stack HPC Programming Tools Development & Performance Tools Application Libraries Debuggers Cray PE on CS Cray LibSci, LibSci_ACC Intel Parallel Studio XE Cluster Edition Intel MPI PGI Cluster Development GNU Toolchain NVIDIA CUDA Kit IBM Platform MPI MVAPICH2 OpenMPI Rogue Wave TotalView Allinea DDT, MAP Intel IDB PGI PGDBG GNU GDB Schedulers, File Systems and Management Resource Management / Job Scheduling SLURM Adaptive Computing Moab, Maui, Torque Altair PBS Professional IBM Platform LSF Grid Engine File Systems Lustre NFS GPFS Panasas PanFS Local (ext3, ext4, XFS) Cluster Management Cray Advanced Cluster Engine (ACE ) Management Software Operating Systems and Drivers Drivers & Network Mgmt. Operating Systems Accelerator Software Stack & Drivers Linux (Red Hat, CentOS) OFED Cray s HPC Cluster Software Stack The CS-Storm accelerator-optimized cluster is available with a comprehensive HPC software stack including tools that are customizable to work with most open-source and commercial compilers, schedulers and libraries, such as the optional Cray Programming Environment on Cluster Systems (Cray PE on CS), which includes the Cray Compiling Environment, Cray Scientific and Math Libraries, and Performance Measurement and Analysis Tools. Cray LibSci_ACC provides accelerated BLAS and LAPACK routines that generate and execute auto-tuned kernels on GPUs. Cray s Advanced Cluster Engine (ACE) management software is included in the HPC cluster software stack. ACE provides network, server, cluster and storage management capabilities for easy system administration and maintenance. The software stack works seamlessly with any opensource or commercial resource manager or job scheduler in an ACE-managed cluster under Linux.

Cray Advanced Cluster Engine (ACE ) Management Software Designed for all Cray CS Cluster Supercomputers What is ACE? Cray s Advanced Cluster Engine (ACE) management software is part of Cray s HPC cluster software stack. ACE eliminates the complexity of managing an HPC cluster while simplifying manageability for scale-out environments. ACE software includes command line (CLI) and graphical user interfaces (GUI), providing flexibility for the cluster administrator. An intuitive and easy-to-use ACE GUI connects directly to the ACE Daemon on the management server and can be executed on a remote system running Linux, Windows or Mac operating systems. The management modules include network, server, cluster and storage management. ACE Software: Supports multiple network topologies and diskless configurations with optional local storage Provides network failover with high scalability when network redundancy option is chosen Integrates easily with a customizable HPC development environment for industry-standard platforms and software configurations Manages heterogeneous nodes with different software stacks Monitors nodes, networks, power and temperature Cray Advanced Cluster Engine (ACE ) Hierarchical, Scalable Framework for Management, Monitoring and File Access MANAGEMENT MONITORING FILE ACCESS Hierarchical management infrastructure Divides the cluster into multiple logical partitions, each with unique personality Revision system with rollback Remote management and remote power control GUI and CLI to view/change/control, monitor health; plug-in capability Automatic server/network discovery Scalable, fast, diskless booting High availability, redundancy, failover Cluster event data available in real-time without affecting job performance Node, IB network status BIOS, HCA information Disk, memory, PCIe errors Temperatures, fan speeds Load averages Memory and swap usage Sub-rack and node power I/O status RootFS high-speed, cached access to root file system allowing for scalable booting High-speed network access to external storage ACE-managed, high-availability NFS storage

Cray CS-Storm Rack and System-Level Specifications Architecture Processors and Accelerators (Per Node) Memory Interconnect System Administration Resource Management and Job Scheduling Reliable, Available, Serviceable (RAS) File System Operating System Performance Monitoring Tools Compilers, Libraries and Tools Rack Power Cooling Features Cabinet Dimensions Cabinet Weight (Empty) Support and Services Supported rack options and corresponding maximum number of server nodes: 24 rack, 42U and 48U options 18 and 22 nodes, respectively 19 rack, 42U and 48U options 10 and 15 nodes, respectively Support for up to two Intel Xeon host processors, 145W max per CPU Support for up to 8 NVIDIA Tesla K40 or K80 GPU computing accelerators Up to 1,024 GB SDRAM per compute node 12/24 GB onboard memory per K40/K80 GPU (96/192 GB onboard GPU memory) QDR or FDR InfiniBand with Mellanox ConnectX -3/Connect-IB, or Intel True Scale Host Channel Adapters Options for single or dual-rail fat tree Advanced Cluster Engine (ACE ): complete remote management capability Graphical and command line system administration System software version rollback capability Redundant management servers with automatic failover Automatic discovery and status reporting of interconnect, server and storage hardware Cluster partitioning into multiple logical clusters, each capable of hosting a unique software stack Remote server control (power on/off, cycle) and remote server initialization (reset, reboot, shut down) Scalable fast diskless booting for large-node-count systems and root file systems for diskless nodes Multiple global storage configurations Options for SLURM, Altair PBS Professional, IBM Platform LSF, Adaptive Computing Torque, Maui and Moab, and Univa Grid Engine Optional redundant networks for interconnect and management (InfiniBand, GbE and 10GbE) with failover Cray Sonexion, NFS, Local FS (Ext3, Ext4 XFS), Lustre, GPFS and Panasas PanFS available as global file systems Red Hat or CentOS available on compute nodes ACE Management Servers delivered with Red Hat Linux Open-source packages such as HPCC, Perfctr, IOR, PAPI/IPM, netperf Options for Open MPI, MVAPICH2, Intel MPI, MKL and IBM Platform MPI libraries Cray Programming Environment on Cluster Systems (Cray PE on CS), PGI, Intel Parallel Studio, NVIDIA CUDA OpenCL, DirectCompute Toolkits, GNU, DDT, TotalView, OFED programming tools and many others Up to 75 kw in a 48U rack, depending on configuration Optional 480 V power distribution with 200-277 VAC power supplies Air cooled Airflow: front intake, back exhaust Optional rear-door heat exchangers; can cool up to 7,000 CFM per rack depending on configuration 19 or 24 42U and 48U racks Without optional rear door heat exchanger, empty rack weight only: 24 42U rack: 520 lbs. (hosts up to 18 compute nodes) 24 48U rack: 550 lbs. (hosts up to 22 compute nodes) 19 42U rack: 295 lbs. (hosts up to 10 compute nodes) 19 48U rack: 329 lbs. (hosts up to 15 compute nodes) Turnkey installation services with worldwide support and service options

Cray CS-Storm High-Density, Hybrid Rackmount Server The Cray CS-Storm high-density rackmount server node is an industry standard 2U, high-density, multi-gpu platform. Each server features up to eight GPU accelerators (up to 300W each), two Intel Xeon processors (up to 145W each), up to 1,024 GB memory and up to six enterprise-grade SSD drives. Two principal variants of the server are offered, a compute node and a service node. Compute nodes maximize GPU density, with up to eight GPUs in a single node. Service nodes are designed with room for multiple expansion cards and are restricted to a maximum of two GPUs per node. Compute nodes and service nodes are available with Intel Xeon E5 2600 v3 host processors.

Cray CS-Storm 2826X8 Compute Node This hybrid compute server offers up to eight NVIDIA Tesla accelerators and two Intel Xeon E5 family processors, delivering exceptional high-density computing, memory bandwidth and compute power efficiency in a small footprint for GPU-compute-intensive applications. Model Number 2626X8 2826X8 Motherboard 1 per chassis, 2 Xeon host processors in a 2-way SMP Processor Intel Xeon E5-2600 v3 family processors (up to 145W) Memory Capacity Accelerators Drive Bays Up to 1,024 GB DDR4 Up to 8 NVIDIA Tesla GPU accelerators (K40 or K80); supports up to 300W parts Up to 6 enterprise-grade SSD drives Expansion Slots 1 riser card slot: x8 PCIe 3.0 Cooling Power Supply Power Input Weight Dimensions Temperature Server Management 6 80 mm x 80 mm x 38 mm fans Provides ample air flow for optimal GPU operation Dual 1,630W redundant power supplies; 80 PLUS Gold Efficiency (optional N+1) 200-277 VAC, 10A max Up to 93 lbs. 3.38 H x 23.98 W x 32.33 D Operating: 10 C 30 C Storage: -40 C 70 C Integrated BMC with IPMI 2.0 support

Cray CS-Storm 2826X2 Service Node This hybrid service node offers up to two NVIDIA Tesla accelerators and two Intel Xeon E5 family processors per motherboard along with flexible I/O options, making them well suited for management or login node functions in conjunction with 2826X8 hybrid compute servers. Model Number 2626X2 2826X2 Motherboard 1 per chassis, 2 Xeon host processors in a 2-way SMP Processor Intel Xeon E5-2600 v3 family processors (up to 145W) Memory Capacity Accelerators Drive Bays Network Interface Input/Output Expansion Slots Cooling Power Supply Power Input Weight Dimensions Temperature Server Management Up to 1,024 GB DDR4 Up to 2 NVIDIA Tesla GPU accelerators (K40); supports up to 300W parts Up to 6 enterprise-grade SSD drives Dual-port gigabit Ethernet (optional on-board single-port ConnectX -3 QDR/FDR InfiniBand with QSFP) DB-15 VGA, 2 RJ-45 LAN ports, 1 stacked two-port USB 2.0 compliant connector 1 riser card slot: x8 PCIe3 (external) 1 riser card slot: x16 PCIe3 (external) 1 riser card slot: x16 PCIe3 (internal) 6 80 mm x 80 mm x 38 mm fans Provides ample air flow for optimal GPU operation Dual 1,630W redundant power supplies; 80 PLUS Gold Efficiency (optional N+1) 200-277 VAC, 10A max Up to 93 lbs. 3.38 H x 23.98 W x 32.33 D Operating: 10 C 30 C Storage: -40 C 70 C Integrated BMC with IPMI 2.0 support

Cray Inc. 901 Fifth Avenue, Suite 1000 Seattle, WA 98164 Tel: 206.701.2000 Fax: 206.701.2500 www.cray.com 2014-2015 Cray Inc. All rights reserved. Specifications are subject to change without notice. Cray is a registered trademark of Cray Inc. All other trademarks mentioned herein are the properties of their respective owners. 20151029EMS