A-CLASS The rack-level supercomputer platform with hot-water cooling INTRODUCTORY PRESENTATION JUNE 2014 Rev 1 ENG
COMPUTE PRODUCT SEGMENTATION 3 rd party board T-MINI P (PRODUCTION): Minicluster/WS systems 4-8 DP compute trays 1 fixed head INTEL//NVIDIA Integrated storage Integrated networking Workgroup Shared T-Platforms V210 system board and common management intrinsics E-CLASS (H2 14): 2U general server/storage server 1 DP compute node V210 board (INTEL) HS disks Shares V-Class management V-CLASS (PRODUCTION): 5U air-cooled chassis 10 DP compute modules 5 accelerated DP modules V2x0/V205/402 boards INTEL/AMD/NVIDIA/ELBRUS Centralized management Workgroup, Departmental and Divisional A-CLASS (2014): ~52U cabinet Hot water cooling Integrated PSU blades Integrated blade switches 1P INTEL+ NVIDIA GPU @ launch Rack level integration Supercomputer FAMILY ABBREVIATION SEGMENT A-CLASS ADVANCED HIGH-END HPC V-CLASS VOLUME MIDDLE-END HPC/HIGH END CLOUD/SMB E-CLASS ENTERPRISE STORAGE OR GENERAL SERVER T-MINI P (P-CLASS) PERSONAL SCALABLE ALL-IN ONE HPC/WS SYSTEMS
INTRODRUCING A-CLASS Developed for multipetaflops supercomputers A peak performance of 420Tflops per system* Scales up to 128 systems (over 54Pflops) Direct hot water cooling technology Energy efficiency of: 3400Gflops/W** Modular architecture supports various future compute nodes The new level or reliability and availability Thermally isolated system cabinet Early availability: Q2 2014 * Based on 1-way Xeon E5 2680 v2 with single NVIDIA Tesla K40 ** Peak value 3
POTENTIAL APPLICATION SEGMENTS Aerospace Machine building Shipbuilding Transportation and automotive Oil, gas and utilities Data security Pharmacology and drug design Finance Semiconductor design Human science Climate research 4
THE А-СLASS OVERVIEW 5
RACK ENCLOSURE Custom rack level enclosure Integrated high speed signal and power backplane Dimensions 1500mm wide 800mm deep 2400 high, (~52U) Hybrid cooling system Direct hot water for management, compute and switch modules Inlet temp up to 45 C Outlet temp over 50 C when the inlet temp is 45 C Water pump speed up to 10,5 litre/s Air cooled power supplies Integrated heat exchanger Closer air circulation inside the enclosure Low operational noise 6
RACK ENCLOSURE (2) Provides up to 194kW of total power N+1 power redundancy for each compute section 380 VAC 3-phase input 48 VDC bus with metering function 8 groups x 12 PSUs (3 kw PSU model with up to 97,2% efficiency) Main frontal sections: 2 bays for independent management modules (with switches) 8 bays for compute modules (with switches) 8 for PSU groups (up to 12 each) All of the frontal sections support hot-swap functionality The enclosure comes with semitransparent French doors at the front and at the back for a thermal isolation 7
3 TYPES OF SECTIONS MANAGEMENT SECTION (х2) COMPUTE SECTION W SWITCHES (х8) POWER SECTION (х8) 8
MANAGEMENT SECTION 2 identical independent management sections Each hot swap module includes: 1P head node Intel Xeon E5-2600 v2 Different to compute node configuration of network ports - 2 x 10GbE ports and 1x QSFP InfiniBand port 2 x SSD 256 GB Top layer Ethernet switch for system management network Top layer FDR InfiniBand switch for data network 2 identical modules act as a failover cluster (see the diagram on the Simplified networks topology slide) Management modules are hot water-cooled 9
COMPUTE AND POWER SECTIONS 12 hot-swap modules per section MODULE WITH FOUR COMPUTE MODULES FDR IB MODULE B FDR IB+ETH MODULE A 3kW PSUs 10
HOT-SWAP MODULES 4 types of HS modules Management (server + 2 switches) Computational (4 compute nodes, CM ) Communication, type A - one IB data network switch and two ETH switches for management network) Communication, type B - IB system interconnect switch 4 CMs are mounted in pairs on both sides of the water plate Single compute node features* 1 x Intel E5-2600 v2, TDP 135 W+ Up to 32 GB of DDR3-1866 Reg. ECC, 4 modules Optional 256GB SSD 2 internal GbE ports 2 internal InfiniBand FDR 56 Gb/s ports 1 x NVIDIA Tesla K40 (SXM), TDP 235 W * Haswell-based node is planned for H2 2014 Extraction handles Spill proof inlet and outlet water connectors Compute node system board CAD model of compute module 11
SIMPLIFIED NETWORKS TOPOLOGY 2 independent InfiniBand networks: Two-layer data network MPI network with support of various topologies 2 independent management networks: 2 x two-layer Gigabit / 10Gigabit Ethernet networks 12
MANAGEMENT NETWORKS 2 independent two-layer Gigabit/10Gigabit Ethernet networks Two Ethernet bottom layer switches per each computational section (1 switch per network) One top layer Ethernet switch per each management section Every top-layer switch is connected to the bottom layer Every top-layer switch has two 10-Gigabit Ethernet uplink ports to connect to the external Ethernet network A total of 1 top-layer and 8 bottom-layer switches in each network A total of four 10Gigabit Ethernet ports 13
MANAGEMENT NETWORKS TOPOLOGY 14
DATA NETWORK (IB FABRIC 1) Two-layer topology Every compute section has one InfiniBand bottom-layer switch Each management section has one InfiniBand top-layer switch Every bottom layer switch is connected to both of top-layer switches Every top-layer switch has 18 FDR InfiniBand external ports to connect to external network A total of two top-layer switches and 8 bottom-layer switches A total of 36 external FDR InfiniBand data network ports 15
MPI TRAFFIC NETWORK (IB-FABRIC 2) A flexible topology network with n-tor support Every compute section has 4 MPI network InfiniBand switches Every FDR InfiniBand switch has 28 external ports to connect the system to the larger external InfiniBand network or to form self-contained MPI network within one A-Class system A total of 32 MPI traffic switches per system A total of 896 FDR InfiniBand MPI traffic ports 16
INTERCONNECT FABRICS TOPOLOGY 17
EXTERNAL IB FABRIC TOPOLOGIES External InfiniBand networks support various topologies, including 3D- and 4D-Torus, Dragonfly and Flattened Butterfly Ports are configured inside the rack chassis in accordance with desired topology 18
SYSTEM SOFTWARE ClustrX HPC Pack software: Cluster management module User management module Various resource managers and monitoring systems depending on a preference Hardware management tools ClustrX Safe automated equipment shutdown in case of abnormalities (AESS) Various functionality Support for distributed service nodes Virtual machine support Multiple running OS support within one cluster Local or diskless node boot via Ethernet, InfiniBand и iscsi Various file systems and database support Customized GUI dashboard to present system s operational status data 19
COOLING INFRASTRUCTURE
THE А-СLASS ADVANTAGE 21
A-CLASS ADVANTAGE ENERGY EFFICIENCY COMPUTE DENSITY SCALABILITY RELIABILITY 22
ENERGY EFFICIENCY Benefits of direct hot-water cooling: High peak energy efficiency of 3400Mflops per Watt* Year-round free cooling with street temperature up to +35С Savings due to no compressors and transition from conditioners to dry air-cooled heat exchangers Reuse hot-water heat buildings in winters Lower operational noise (TBA) Thermally isolated rack cabinet: The air inside A-Class does not mix with ambient datacentre air * Theoretical peak value. The real-life is to be measured from the plug 23
COMPUTATIONAL DENSITY The A-Class design packs 12,3kW of power per square meter, which is 2.8 times higher than industry s average 420Tflops per rack including switches* High computational density factors: Custom modular rack High level of integration to include management, compute, communication, cooling and power components Unique water radiator design to cool up to 2500W Custom form-factor boards * 350 Tflops/m 2 Thermal simulation of A-Class compute module 24
SCALABILITY Up to 128 systems with 32К nodes for 54Tflops of peak performance* Balanced ratio of CPU and GPU performance to available IB interfaces throughput of 3,3 GB/s/Tflops System is quickly expanded by adding new racks Networks Full bandwidth with low diameter for 32К of nodes Separation of data and MPI traffic Supported are Dragonfly and Flattened Butterfly as well as classis kd-torus topologies Integrated switches improve cabling infrastructure Management and monitoring system with automation scripts to accelerate larger deployments * Peak 128 performance of 128 systems with 256 Intel Xeon 2680 v2 CPUs v2 и 256 NVIDIA Tesla K40 GPUs 25
RELIABILITY Hardware features Two independent hot-swap management modules with dedicated management networks Redundant input power and independent groups of power suplies with N+1 redundancy inside each group Local scratch-disks Temperature, current, pressure and other sensors Monitoring at the chassis, section and node levels Leakage sensors automatically block water supply Network subsystem is routed away from power and cooling subsystems Software features Specialized ClustrX.HPC Pack module for the A-Class hardware features ClustrX.Safe software for Automated Equipment Shutdown system support 26
SUMMARY A-Class is the most advanced system ever designed by T- Platforms Developed from scratch Features high-level integration of custom components The system will be offered to the largest international HPC datacenters from June 23, 2014 For the access to a demo system please direct your inquiries to sales@t-platforms.ru Check out www.t-platforms.com/a-class for the additional system information 27
THANK YOU! www.t-platforms.com/a-class sales@t-platforms.ru 28
ADDITIONAL PHOTOS Front view Rear view (w/o cables) 29
ADDITIONAL PHOTOS 30