EnA-HPC Sept. 2013. Jean-Pierre Panziera Chief Technology Director

Similar documents
Defying the Laws of Physics in/with HPC. Rafa Grimán HPC Architect

Jean-Pierre Panziera Teratec 2011

Building an energy dashboard. Energy measurement and visualization in current HPC systems

HUAWEI Tecal E6000 Blade Server

Direct liquid cooling by bull. Laurent Cargemel Server Design & Development June 27th, 2012

PRIMERGY server-based High Performance Computing solutions

Data Sheet FUJITSU Server PRIMERGY CX400 M1 Multi-Node Server Enclosure

HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6800 Data Center Server

System Management Software Suite

Fujitsu PRIMERGY Servers Portfolio

A-CLASS The rack-level supercomputer platform with hot-water cooling

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Extreme Computing: The Bull Way

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

HPC Update: Engagement Model

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

FUJITSU Enterprise Product & Solution Facts

Direct Contact Liquid Cooling (DCLC ) PRODUCT GUIDE. Industry leading data center solutions for HPC, Cloud and Enterprise markets.

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

FLOW-3D Performance Benchmark and Profiling. September 2012

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

Sun Constellation System: The Open Petascale Computing Architecture

SGI High Performance Computing

Holistic, Energy Efficient Datacentre Cardiff Going Green Can Save Money

HP Cloudline Overview

MEGWARE HPC Cluster am LRZ eine mehr als 12-jährige Zusammenarbeit. Prof. Dieter Kranzlmüller (LRZ)

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

Minimization of Costs and Energy Consumption in a Data Center by a Workload-based Capacity Management

Supermicro Server Management Utilities

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing.

Data Sheet FUJITSU Server PRIMERGY CX420 S1 Out-of-the-box Dual Node Cluster Server

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

NEC Micro Modular Server Introduction. NEC November 18,2014

Data Center 2020: Delivering high density in the Data Center; efficiently and reliably

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

Data Sheet Fujitsu PRIMERGY CX122 S1 Cloud server unit for PRIMERGY CX1000

I don t want to move to the Arctic - a HPC Data Center Take on PUE and Energy Efficiency

Designed for Maximum Accelerator Performance

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Pedraforca: ARM + GPU prototype

Intel Cloud Builders Guide for Power Management in Cloud Design and Deployment using Supermicro Platforms and NMView management software

Data Sheet FUJITSU Server PRIMERGY CX272 S1 Dual socket server node for PRIMERGY CX420 cluster server

IBM System x SAP HANA

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Fujitsu PRIMERGY Servers Portfolio

Tulevaisuuden datakeskukset Ympäristö- ja energianäkökulmia

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

DataCenter Data Center Management and Efficiency at Its Best. OpenFlow/SDN in Data Centers for Energy Conservation.

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Hitachi Virtage Embedded Virtualization Hitachi BladeSymphony 10U

LS DYNA Performance Benchmarks and Profiling. January 2009

Data Sheet FUJITSU Server PRIMERGY CX400 S2 Multi-Node Server Enclosure

Designed for Maximum Accelerator Performance

Emerging Technology for the Next Decade

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Scaling from Workstation to Cluster for Compute-Intensive Applications

Parallel Programming Survey

Introduction to IBM tools to manage energy consumption

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Intel Xeon +FPGA Platform for the Data Center

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Desktop Consolidation. Stéphane Verdy, CTO Devon IT

SAS Business Analytics. Base SAS for SAS 9.2

International Journal of Computer & Organization Trends Volume20 Number1 May 2015

Headline in Arial Bold 30pt. The Need For Speed. Rick Reid Principal Engineer SGI

Intel Cloud Builders Guide to Power Management in Cloud Design and Deployment using Supermicro* Platforms and NMView* Management Software

Oil Submersion Cooling for Todayʼs Data Centers. An analysis of the technology and its business implications

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Green HPC - Dynamic Power Management in HPC

Smarter Cluster Supercomputing from the Supercomputer Experts

APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS

IBM System x family brochure

præsentation oktober 2011

CREATING & MANAGING A DYNAMIC INFRASTRUCTURE

Server Platform Optimized for Data Centers

FOR IMMEDIATE RELEASE

Bytes and BTUs: Holistic Approaches to Data Center Energy Efficiency. Steve Hammond NREL

WHITE PAPER SGI ICE X. Ultimate Flexibility for the World s Fastest Supercomputer

Open: Broader and Deeper

Data Sheet FUJITSU Server PRIMERGY TX100 S3p Tower Server

THE WORLD S MOST EFFICIENT DATA CENTER

When Does Colocation Become Competitive With The Public Cloud? WHITE PAPER SEPTEMBER 2014

Resource Efficient Computing for Warehouse-scale Datacenters

Data Center Equipment Power Trends

When Does Colocation Become Competitive With The Public Cloud?

SeaMicro SM Server

SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V

Power and Energy aware job scheduling techniques

Microsoft Private Cloud Fast Track Reference Architecture

How High Temperature Data Centers & Intel Technologies save Energy, Money, Water and Greenhouse Gas Emissions

Data center modeling, and energy efficient server management

High Performance Computing in CST STUDIO SUITE

Exar. Optimizing Hadoop Is Bigger Better?? March Exar Corporation Kato Road Fremont, CA

Transcription:

EnA-HPC Sept. 2013 Jean-Pierre Panziera Chief Technology Director 1

Bull: from Supercomputers to Cloud Computing Expertise & services Software HPC Systems Architecture Applications & Performance Energy Efficiency Data Management HPC Cloud Open, scalable, reliable SW Development Environment Linux, OpenMPI, Lustre, Slurm Administration & monitoring bullx supercomputer suite Servers Full range development from ASICs to boards, blades, racks Support for accelerators Infrastructure Data Center design Mobile Data Center Water-Cooling 2

Leading HPC technology with Bull TERA100 2010 1 st European PetaFlop-scale System Rank #6 CURIE 2011 1 st PRACE PetaFlop-scale System Rank #9 BEAUFIX 2013 1 st Intel Xeon E5-2600 v2 System Direct Liquid Cooling Technology 3

Energy (Electricity): a significant part of HPC budget 4

1998 2000 2002 2004 2006 2008 2010 /kwh Industrial Electricity Prices in Europe Electricity prices highly variable across Europe Avg 0.11 /kwh 0,12 0,10 0,08 Electricity Industrial Prices in UE http://epp.eurostat.ec.europa.eu/portal/page/portal/energy/data/main_tables# 0,06 0,04 0,02 0,00 Allemagne Espagne France Italie Pays-Bas Pologne Royaume-Uni Norvège Electricity prices Rising steadily CAGR 12% Source Eurostat year 2010 5

Power to the datacenter MEGAWATTS 6

Cooling & Power Usage Effectiveness (PUE) 2.0 1.5 1.0 A / C Prototype tested not developped for chassis test Air-cooled Water-cooled door Direct-Liquid-cooling We designed a flexible cold plate to cool the two CPUs and the IOH 20 kw/rack 30 kw/rack 70 kw/rack We choose an adapted technology with manufacturer : - all copper - same fixations as for heatsink (so the tube avoids the IOH fixation) - only half tube exchange heat - inner diameter : 4 mm Easy to cool CPU with hot water : - 45 C water blade inlet => Tcase <70 C Co-generation 7

Direct Liquid Cooling Infrastructure With hot water cooled servers, water chillers are not required anymore hot air Water chiller Cooling Tower 42U rack bullx DLC x 90 CN bullx DLC cold air 45 C x N up to 35 C 8

Fresh Air for (almost) free-cooling When system density does not matter Plenty of cool air available 8760 hours/year 8000h/y is > 90% of time Let s have another look at air-cooling 9

Where do all these Watts go? DDR3 (x8) Connector to backplane HDD/SSD 2x Xeon-EP Fans 10

Node consumption varies with workload Wrt Linpack (max -> 100%) Memory streaming 75% Irregular memory access 55% Iddle 25% Using turbo is never energy efficient 11

turbo CPU frequency vs System energy consumption Watts Watts 3 WattHours = = $ 2 3 1 2 1 1 2 3 frequency Elapsed Time Elapsed Time The CPU frequency is the HPC system throttle The faster the CPU, the more power The slower the CPU, the less power Minimal energy consumption is achieved for intermediate (<nominal) frequency 12

TCO/Hour Total Cost of Ownership (TCO): the CFO view Extra (turbo run) power Turbo run power Greener (slower run) power Which costs less? Fixed costs Fixed costs (CAPEX, building, staff ) (CAPEX, building, staff ) Elapsed Time Energy (electricity cost) is only a portion (25-30%) of the TCO When taking into account fixed expenses slower runs are more expensive Greener might not mean Cheaper TCO 13

Global optimization for Parallel HPC Applications Adding many more parameters to the equation: HPC applications == highly parallel Interference with other jobs on system creates variability Job placement Interconnect Storage Good load balancing is hard to achieve Everyone waiting for the slower thread, let s speed it up. non uniform parameters for different tasks/nodes. Complex optimization requires detailed understand / precise measurements 14

Power Management Accounting Users billed separately for CPU, IO, and Energy Keep compute center electricity bill within budget Control power Avoid running over capacity Allow for priority jobs Adjust power consumption with electricity cost Energy consumption / cost optimization Fine & precise power monitoring Power data analysis Control all system resources power enter software Application Management Development Environment bullx DE Execution Environment bullx BM (Batch Management) bullx MPI (Message Passing Interface) Supercomputer Management Management Center bullx MC Software Manager 1 SIR Monitoring & Control Manager Infrastructure Manager Data Management Parallel File System bullx PFS (Lustre) Network File System NFS Local File System Operating System bullx Linux - Red Hat Entreprise Linux (RHEL) - SUSE Linux Entreprise Server (SLES) 15

Power Control scenario bullx MC Power capping 16

Bullx MC Power Manager Monitoring o All HW with available power sensors o Consolidation every 10 mn o Store info in database o Graphical web interface o Out-of-band queries Power capping o Automatic action to decrease power level o Automatic information for system monitoring o Open framework, based on SEC ( Simple Event Correlator) o Allow new rules creation o But slow reaction time (minutes) 17

What do consume your applications? bullx BM Pluggins: RAPL, IPMI (OS) and RRD Per job (global value & time slice) Per node Per user New srun parameter to allow CPU frequency scaling for job execution 18

Bull TU Dresden high frequency monitoring HARDWARE Regular B700 blades + innovative power mesurement tools SOFTWARE API (Opensource) PROJECT Project Management IP Management Contract Management MIDDLEWARE New modules in VAMPIR Scalable High Definition Power Monitoring API APPLICATION Development of new optimization methodologies Demonstration of energy efficiency improvement OPENSOURCE Energy efficient operation CPU states Turn off devices Interface with batch scheduler Mesurement environment Vampir integration Energy accounting FPGA integration Measuring system Accuracy Energy Efficiency research at application level 19

Bull TU Dresden high frequency monitoring Admin (2) Cluster Instrumented blades IB Fabric / GigE Fabric FASS Phase 1 SATA / SAS Login (2) Export (2) SMP (2) GPU Cluster 32 Nodes 48x Kepler VR CPU VR DIMM1 VR DIMM1 VR CPU VR DIMM2 VR DIMM2 VR DIMM3 VR DIMM3 CPU1 VR DIMM4 VR DIMM4 CPU2 FPGA Cluster Manager i2c - 400Kb / VR Power Measurement i2c - 400Kb Power Monitor for Node Global Power API Ethernet 100Mb BMC (Baseboard Management Controller) XBUS 50Mb FPGA Operating System (OS) NODE Freq: 500Hz - Error: 2% 20

Energy efficient HPC systems Green systems Interest driven by energy cost and green attitude Green systems start with Green components (CPUs, Memory, PSUs, Interconnect ) Free-Cooling either with Liquid or fresh-air (save on CAPEX & OPEX) Optimize runtime parameters for best overall system performance (incl. Power) Power Monitoring Non-intrusive power monitoring at low frequency (seconds, minutes) Accounting Energy billing separately from CPU time Fine grain monitoring (seconds) possible but slightly intrusive (RAPL and OS IMPI) For high rate power sampling, HW instrumentation required Complete power management framework is still under development 21

22

23