TSUBAME-KFC : a Modern Liquid Submersion Cooling Prototype Towards Exascale



Similar documents
The L-CSC cluster: Optimizing power efficiency to become the greenest supercomputer in the world in the Green500 list of November 2014

Jezelf Groen Rekenen met Supercomputers

Building a Top500-class Supercomputing Cluster at LNS-BUAP

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

A-CLASS The rack-level supercomputer platform with hot-water cooling

Typical Air Cooled Data Centre Compared to Iceotope s Liquid Cooled Solution

Bytes and BTUs: Holistic Approaches to Data Center Energy Efficiency. Steve Hammond NREL

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

High Performance LINPACK and GADGET 3 investigation

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Accelerating CFD using OpenFOAM with GPUs

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

High Performance Computing in CST STUDIO SUITE

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

A Paradigm Shift in Data Center Sustainability Going Beyond Aisle Containment and Economization

Pedraforca: ARM + GPU prototype

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

The New Data Center Cooling Paradigm The Tiered Approach

Oil Submersion Cooling for Todayʼs Data Centers. An analysis of the technology and its business implications

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science

Optimization of energy consumption in HPC centers: Energy Efficiency Project of Castile and Leon Supercomputing Center - FCSCL

Energy Efficiency in New and Existing Data Centers-Where the Opportunities May Lie

Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

How To Test A Theory Of Power Supply

HP ProLiant SL270s Gen8 Server. Evaluation Report

Review of SC13; Look Ahead to HPC in Addison Snell

Direct Contact Liquid Cooling (DCLC ) PRODUCT GUIDE. Industry leading data center solutions for HPC, Cloud and Enterprise markets.

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

Cooling and thermal efficiently in

How High Temperature Data Centers & Intel Technologies save Energy, Money, Water and Greenhouse Gas Emissions

HPC Update: Engagement Model

Journée Mésochallenges 2015 SysFera and ROMEO Make Large-Scale CFD Simulations Only 3 Clicks Away

The K computer: Project overview

Turbomachinery CFD on many-core platforms experiences and strategies

IT White Paper MANAGING EXTREME HEAT: COOLING STRATEGIES FOR HIGH-DENSITY SYSTEMS

CoolEmAll - Tools for realising an energy efficient data centre

Visit to the National University for Defense Technology Changsha, China. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Holistic, Energy Efficient Datacentre Cardiff Going Green Can Save Money

Storage Systems Can Now Get ENERGY STAR Labels and Why You Should Care. PRESENTATION TITLE GOES HERE Dennis Martin President, Demartek

HPC TCO: Cooling and Computer Room Efficiency

Trends in High-Performance Computing for Power Grid Applications

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Energy Efficient Data Centre at Imperial College. M. Okan Kibaroglu IT Production Services Manager Imperial College London.

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Minimization of Costs and Energy Consumption in a Data Center by a Workload-based Capacity Management

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

Green HPC - Dynamic Power Management in HPC

7 Best Practices for Increasing Efficiency, Availability and Capacity. XXXX XXXXXXXX Liebert North America

DataCenter 2020: first results for energy-optimization at existing data centers

InfiniBand Strengthens Leadership as the High-Speed Interconnect Of Choice

Introduction to IBM tools to manage energy consumption

Managing Data Centre Heat Issues

Data Sheet FUJITSU Server PRIMERGY CX400 S2 Multi-Node Server Enclosure

HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6800 Data Center Server

The CEETHERM Data Center Laboratory

CarbonDecisions. The green data centre. Why becoming a green data centre makes good business sense

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Trends in Data Center Design ASHRAE Leads the Way to Large Energy Savings

Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations

Recommendations for Measuring and Reporting Overall Data Center Efficiency

Resource Scheduling Best Practice in Hybrid Clusters

Case Study on Productivity and Performance of GPGPUs

AIRAH Presentation April 30 th 2014

Green Data Centre Design

Energy Recovery Systems for the Efficient Cooling of Data Centers using Absorption Chillers and Renewable Energy Resources

How High Temperature Data Centers & Intel Technologies save Energy, Money, Water and Greenhouse Gas Emissions

Energy Efficiency Opportunities in Federal High Performance Computing Data Centers

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

Smarter Data Centers. Integrated Data Center Service Management

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

HP PCIe IO Accelerator For Proliant Rackmount Servers And BladeSystems

Solve your IT energy crisis WITH An energy SMArT SoluTIon FroM Dell

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

Sun Constellation System: The Open Petascale Computing Architecture

NEC Micro Modular Server Introduction. NEC November 18,2014

Pacific Northwest National Laboratory (PNNL)

Data Centre Testing and Commissioning

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Recommendations for Measuring and Reporting Overall Data Center Efficiency

GREEN DATA CENTER DESIGN

Re Engineering to a "Green" Data Center, with Measurable ROI

Parallel Computing. Introduction

ECLIPSE Performance Benchmarks and Profiling. January 2009

Application for Data Centres. Deliver Energy Savings in excess of 20% Enable Operational Efficiencies

Data Sheet FUJITSU Server PRIMERGY CX400 M1 Multi-Node Server Enclosure

Expert guide to achieving data center efficiency How to build an optimal data center cooling system

European Code of Conduct for Data Centre. Presentation of the Awards 2014

Tulevaisuuden datakeskukset Ympäristö- ja energianäkökulmia

Optimizing the Efficiency of the NCAR-Wyoming Supercomputing Center Facility. Ademola Olarinde Mentor: Aaron Andersen

Linux Cluster Computing An Administrator s Perspective

How to Build a Data Centre Cooling Budget. Ian Cathcart

第 十 三 回 PCクラスタシンポジウム. Cray クラスタ 製 品 のご 紹 介 クレイ ジャパン インク

Performance of the JMA NWP models on the PC cluster TSUBAME.

The Effect of Data Centre Environment on IT Reliability & Energy Consumption

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Transcription:

TSUBAME-KFC : a Modern Liquid Submersion Cooling Prototype Towards Exascale Toshio Endo,Akira Nukada, Satoshi Matsuoka GSIC, Tokyo Institute of Technology ( 東 京 工 業 大 学 )

Performance/Watt is the Issue 2GFlops/W 1GFlops/W Development of supercomputers are capped by power budget Realistic supercomputers, data centers are limited by ~20MW In order to achieve Exascale systems, we will require technologies enabling 50GFlops/W Around 2020~2022 Development of supercomputers power efficiency. From Wu Feng s presentation@green500 SC13 BoF

Achievement 4 Years Ago TSUBAME 2.0 supercomputer achieved ~1GFlops/W (1.2PFlops Linpack, 1.2MW) World s 3rd in Nov2010 Green500 ranking Greenest Production Supercomputer award Towards TSUBAME3.0 (2016), We should be much more power efficient!!

How Do We Make IT Green? Reducing computers power Improvement of processors, process shrink Utilization of power-efficient many-core accelerators Software technologies that efficiently utilize accelerators Management technologies for power control Today s focus In TSUBAME2, cooling system consumes Reducing cooling power >25% power of the system Liquid has higher heap capacity than air Liquid cooling is preferable We should avoid making chilled water Warm/hot liquid cooling Designing water pipe / plate is expensive Control of coolant speed is more difficult Fluid submersion cooling

TSUBAME-KFC: Ultra-Green Supercomputer Testbed TSUBAME-KFC or Kepler Fluid Cooling = (Hot Fluid Submersion Cooling + Outdoor Air Cooling + Highly Dense Accelerated Nodes) in a 20-feet Container

TSUBAME-KFC: Ultra-Green Supercomputer Testbed Compute Nodes with Latest GPU Accelerators GRC Oil-Submersion Rack Processors 40~80 Oil 35~45 Heat Exchanger Oil 35~45 Water 25~35 40 x NEC/SMC 1U Servers 2 IvyBridge CPUs 4 NVIDIA K20X GPUs Peak performance 217TFlops (DP) 645TFlops (SP) Achievement Container 20 Feet Container (16m 2 ) Heat Dissipation to Outside Air Cooling Tower: Water 25~35 Outside Worlds top power efficiency, 4.5GFlops/W Avarage PUE <1.1 (Cooling power is ~10% of system power)

Installation Site of TSUBAME-KFC Neighbor space of GSIC, O-okayama campus of Tokyo Institute of Technology Originally a parking lot for bicycles KFC Container & Cooling tower Many cherry trees here GSIC Chillers for TSUBAME2

Coolant Oil Configuration ExxonMobil SpectraSyn Polyalphaolefins (PAO) 4 6 8 Kinematic Viscosity@40C 19 cst 31 cst 48 cst Specific Gravity@15.6C 0.820 0.827 0.833 Flash point (Open Cup) 220 C 246 C 260 C Pour point 66 C 57 C 48 C Fire Station at Den-en Chofu We are using ~1,200 liters oil Flash point of oil must be >250, Otherwise it is a hazardous material under the Fire Defense Law in Japan.

Installation Installation completed in Sep 2013

40 KFC Compute Nodes NEC LX 1U-4GPU Server, 104Re-1G (SUPERMICRO OEM) 2X Intel Xeon E5-2620 v2 Processor (Ivy Bridge EP, 2.1GHz, 6 core) 4X NVIDIA Tesla K20X GPU 1X Mellanox FDR InfiniBand HCA 1.1TB SATA SSD (120+480+480) CentOS 6.4 64bit Linux Intel Compiler, GCC CUDA 5.5 OpenMPI 1.7.2

Modification to Compute Nodes (2) Removed 12 cooling fans (1) Replace thermal grease with thermal sheets (3) Update firmware of power unit to operate with cooling fan stopped.

Power Measurement In TSUBAME KFC, we are recording power consumption of each compute node and each network switch, in one sample per second. Panasonic AKL1000 Data Logger Light Panasonic KW2G Eco-Power Meter RS485 Servers and switches AKW4801C sensors PDU

Node Temperature and Power GPU0 GPU1 CPU0 CPU1 Using IPMI to fetch Temp. data. GPU3 GPU2 Lower oil temp results in lower chip temp. But no further power reduction achieved. Air 26 deg. C CPU0 50 (43) CPU1 46 (39) GPU0 52 (33) GPU1 59 (35) GPU2 57 (48) GPU3 48 (30) Node Power Upper: Running DGEMM on GPU Lower: ( IDLE ) Oil Oil 28 deg. C 19 deg. C 40 (36) 26 Oil is cooler 42 than 28 Air (36)! 749W (228W) 47 (29) 46 (27) 40 (27) ~8% power reduction! 49 (30) 693W (160W) 31 (29) 33 (28) 42 (20) 43 (18) 33 (18) 42 (18) 691W (160W)

PUE (Power Usage Effectiveness) (= Total power / power for computer system) Power (kw) 40 35 30 25 20 15 10 5 0 Air cooling TSUBAME-KFC compute node network air conditioner oil pump water pump cooling tower fun PUE=1.3 in air cooling Oil Pump 0.46 kw Water Pump 0.75 kw Cooling Tower Fan 1.48 kw Cooling Total 2.69 kw Current PUE = 1.09!!

Green500 submission Green500 ranking is determined by Linpack performance(flops) / Power consumption(watt) Linpack: Dense matrix benchmark used in Top500 In current rule, cooling power is NOT included Power Efficiency (GFLOPS/Watt) 5 4.5 4 Performance and Power Efficiency Greenest Run Many LINPACK runs with different parameters, Including GHz, Voltage 3.5 100 120 140 160 180 Fastest Run Performance (TFLOPS)

Optimizations for Higher Flops/W Lower speed performance leads higher efficiency Tuning for HPL parameters Especially, block size (NB), and process grid (P&Q) Adjusting GPU clock and voltage The lowest voltage and the highest clock. Advantages of hardware configuration GPU:CPU ratio = 2:1 Low power Ivy Bridge CPU (this also lower the perf.) Cooling system. No cooling fans. Low temperature.

Power Profile during Linpack benchmark in Nov 13 Core phase, avg. 31.18 kw Middle 80%, avg. 32.10kW 1min. Avg. 27.78kW Power consumption (kw) Power efficiency defined in Green500 rule is 125.1TFlops / 27.78kW = 4.503GFlops/W When cooling power is counted, 125.1TFlops / (31.18+2.7kW) = 3.69GFlops/W

The Green500 List Nov 2013

Green Graph500 list on Nov. 2013 Ranking of power efficiency in Big Data benchmark Measures power-efficiency using TEPS/W ratio TEPS: Traversed Edges Per Second http://green.graph500.org KFC Got Double Crown in Nov 13!

KFC in Green500 Nov 13 Jun 14 Nov 14 No.1 4.504GF/W No.1 4.389GF/W Latest winners announced in SC14 No.1: L CSC (GSI Helmholtz center) No.2: Suiren PEZY SC (KEK) No. 3 4.447GF/W

How about Maintenance Cost? After operation starts in Sep 13, we added two SSDs into each node in Mar 14 To enhance big-data experiments We needed to pull up each node from oil!

Details of SSD Installation Time Procedures directly related to oil-submersion occupies 4m40s (~28% of time)

Summary TSUBAME-KFC: A Ultra Green Supercomputer testbed has been installed Fluid submersion cooling for improving power efficiency Further development is required towards 50GF/W TSUBAME1.0(2006): ~0.05GF/W TSUBAME2.0(2010): 1GF/W TSUBAME-KFC(2013): 4.5GF/W (3.7GF/W including cooling)