Extreme Scaling on Energy Eﬃcient SuperMUC

Similar documents

Challenges on Extreme Scale Computers - Complexity, Energy, Reliability

Extreme Scale Compu0ng at LRZ

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Cosmological simulations on High Performance Computers

Partnership for Advanced Computing in Europe

PRACE hardware, software and services. David Henty, EPCC,

Supercomputing Resources in BSC, RES and PRACE

Access, Documentation and Service Desk. Anupam Karmakar / Application Support Group / Astro Lab

PRACE the European HPC Research Infrastructure. Carlos Mérida-Campos, Advisor of Spanish Member at PRACE Council

International High Performance Computing. Troels Haugbølle Centre for Star and Planet Formation Niels Bohr Institute PRACE User Forum

PRACE: access to Tier-0 systems and enabling the access to ExaScale systems Dr. Sergi Girona Managing Director and Chair of the PRACE Board of

PRACE: World Class HPC Services for Science

Relations with ISV and Open Source. Stephane Requena GENCI

Kriterien für ein PetaFlop System

Information about Pan-European HPC infrastructure PRACE. Vít Vondrák IT4Innovations

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Access to the Federal High-Performance Computing-Centers

Jean-Pierre Panziera Teratec 2011

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

IT security concept documentation in higher education data centers: A template-based approach

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

walberla: A software framework for CFD applications on Compute Cores

PRACE An Introduction Tim Stitt PhD. CSCS, Switzerland

EUNIS 2009: AVAILABILITY AND CONTINUITY MANAGEMENT AT TECHNISCHE UNIVERSITÄT MÜNCHEN AND THE LEIBNIZ SUPERCOMPUTING CENTRE

Jeff Wolf Deputy Director HPC Innovation Center

OpenMP Programming on ScaleMP

Supercomputing Status und Trends (Conference Report) Peter Wegner

InfiniBand Strengthens Leadership as the High-Speed Interconnect Of Choice

IT Service Management System at the Leibniz Supercomputing Centre

Information Security Risk Management in HEIs: From Processes to Operationalization Wolfgang Hommel, Stefan Metzger, Michael Steinke

BSC - Barcelona Supercomputer Center

Automating Big Data Benchmarking for Different Architectures with ALOJA

Linux Cluster Computing An Administrator s Perspective

walberla: A software framework for CFD applications

High Performance Computing at CEA

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Parallel Software usage on UK National HPC Facilities : How well have applications kept up with increasingly parallel hardware?

David Vicente Head of User Support BSC

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Jezelf Groen Rekenen met Supercomputers

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, June 2016

HPC-related R&D in 863 Program

Identity Management to support Hybrid Cloud environments at higher education institutions

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

MEGWARE HPC Cluster am LRZ eine mehr als 12-jährige Zusammenarbeit. Prof. Dieter Kranzlmüller (LRZ)

RSC presents SPbPU supercomputer center and new scientific research results achieved with RSC PetaStream massively parallel supercomputer

Building a Top500-class Supercomputing Cluster at LNS-BUAP

PRACE in building the HPC Ecosystem Kimmo Koski, CSC

SGI High Performance Computing

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

How Cineca supports IT

Journée Mésochallenges 2015 SysFera and ROMEO Make Large-Scale CFD Simulations Only 3 Clicks Away

Barry Bolding, Ph.D. VP, Cray Product Division

SURFsara HPC Cloud Workshop

Performance analysis of parallel applications on modern multithreaded processor architectures

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Data management challenges in todays Healthcare and Life Sciences ecosystems

and RISC Optimization Techniques for the Hitachi SR8000 Architecture

Production-Quality Grid Environments with UNICORE

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science

Distributed communication-aware load balancing with TreeMatch in Charm++

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Quantum StorNext. Product Brief: Distributed LAN Client

Algorithms of Scientific Computing II

Pedraforca: ARM + GPU prototype

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

Behind the scene III Cloud computing

Parallel file I/O bottlenecks and solutions

IT Service Management at the Leibniz Supercomputing Centre

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

YALES2 porting on the Xeon- Phi Early results

10- High Performance Compu5ng

SURFsara HPC Cloud Workshop

The PRACE Project Applications, Benchmarks and Prototypes. Dr. Peter Michielse (NCF, Netherlands)

Fast Parallel Algorithms for Computational Bio-Medicine

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

High Performance Computing in the Multi-core Area

Deploying and managing a Visualization Onera

Parallel IO performance and scalability study on the PRACE CURIE supercomputer.

Comparison of computational services at LRZ

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

e-infrastructures for Science and Industry

Defying the Laws of Physics in/with HPC. Rafa Grimán HPC Architect

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

High-Performance Computing and Big Data Challenge

1 Bull, 2011 Bull Extreme Computing

Hybrid Software Architectures for Big

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Cray s Storage History and Outlook Lustre+ Jason Goodman, Cray LUG Denver

Trends in High-Performance Computing for Power Grid Applications

HPCHadoop: MapReduce on Cray X-series

Transcription:

Extreme Scaling on Energy Eﬃcient SuperMUC Dieter Kranzlmüller Munich Network Management Team Ludwig- Maximilians- Universität München (LMU) & Leibniz SupercompuFng Centre (LRZ) of the Bavarian Academy of Sciences and HumaniFes Leibniz Supercompu:ng Centre of the Bavarian Academy of Sciences and Humani:es With 156 employees + 38 extra staﬀ for more than 90.000 students and for more than 30.000 employees including 8.500 scienfsts VisualisaFon Centre InsFtute Building Cuboid containing compufng systems 72 x 36 x 36 meters D. Kranzlmüller InsFtute Building Lecture Halls PPAM 2015 2 1

Leibniz Supercompu:ng Centre of the Bavarian Academy of Sciences and Humani:es n Computer Centre for all Munich UniversiFes IT Service Provider: Munich ScienFfic Network (MWN) Web servers e- Learning E- Mail Groupware Special equipment: Virtual Reality Laboratory Video Conference Scanners for slides and large documents Large scale ploders IT Competence Centre: Hotline and support ConsulFng (security, networking, scienffc compufng, ) Courses (text edifng, image processing, UNIX, Linux, HPC, ) D. Kranzlmüller PPAM 2015 3 The Munich Scien:fic Network (MWN) D. Kranzlmüller PPAM 2015 4 2

Leibniz Supercompu:ng Centre of the Bavarian Academy of Sciences and Humani:es n Regional Computer Centre for all Bavarian UniversiFes n Computer Centre for all Munich UniversiFes D. Kranzlmüller PPAM 2015 5 Virtual Reality & Visualiza:on Centre (LRZ) D. Kranzlmüller PPAM 2015 6 3

Examples from the V2C D. Kranzlmüller PPAM 2015 7 Leibniz Supercompu:ng Centre of the Bavarian Academy of Sciences and Humani:es n NaFonal SupercompuFng Centre n Regional Computer Centre for all Bavarian UniversiFes n Computer Centre for all Munich UniversiFes D. Kranzlmüller PPAM 2015 8 4

Gauss Centre for Supercompu:ng (GCS) n CombinaFon of the 3 German nafonal supercompufng centers: John von Neumann InsFtute for CompuFng (NIC), Jülich High Performance CompuFng Center Studgart (HLRS) Leibniz SupercompuFng Centre (LRZ), Garching n. Munich n Founded on 13. April 2007 n HosFng member of PRACE (Partnership for Advanced CompuFng in Europe) D. Kranzlmüller PPAM 2015 9 PRACE Research Infrastructure Created n Establishment of the legal framework PRACE AISBL created with seat in Brussels in April (AssociaFon InternaFonale Sans But LucraFf) 20 members represenfng 20 European countries InauguraFon in Barcelona on June 9 n Funding secured for 2010-2015 400 Million from France, Germany, Italy, Spain Provided as Tier- 0 services on TCO basis Funding decision for 100 Million in The Netherlands expected soon 70+ Million from EC FP7 for preparatory and implementafon Grants INFSO- RI- 211528 and 261557 Complemented by ~ 60 Million from PRACE members D. Kranzlmüller PPAM 2015 10 5

PRACE Tier- 0 Systems n n n n n n Curie @ GENCI: Bull Cluster, 1.7 PFlop/s FERMI @ CINECA: IBM BG/Q, 2.1 PFlop/s Hermit @ HLRS: Cray XE6, 1 Pflop/s JUQUEEN @ FZJ: IBM Blue Gene/Q, 5.9 PFlop/s MareNostrum @ BSC: IBM System X idataplex, 1 PFlop/s SuperMUC @ LRZ: IBM System X idataplex, 3.2 PFlop/s D. Kranzlmüller PPAM 2015 11 Leibniz Supercompu:ng Centre of the Bavarian Academy of Sciences and Humani:es n European SupercompuFng Centre n NaFonal SupercompuFng Centre n Regional Computer Centre for all Bavarian UniversiFes n Computer Centre for all Munich UniversiFes SuperMUC SGI UV SGI Altix Linux Clusters Linux Hosting and Housing D. Kranzlmüller PPAM 2015 12 6

SuperMUC @ LRZ Video: SuperMUC rendered on SuperMUC by LRZ hdp://youtu.be/olas6iiqwrq D. Kranzlmüller PPAM 2015 13 Top 500 Supercomputer List (June 2012) www.top500.org D. Kranzlmüller PPAM 2015 14 7

LRZ Supercomputers SuperMUC Phase II D. Kranzlmüller PPAM 2015 SuperMUC Phase 1 + 2 D. Kranzlmüller PPAM 2015 16 8

SuperMUC and its predecessors 10 m D. Kranzlmüller PPAM 2015 17 SuperMUC and its predecessors 10 m 11 m 22 m D. Kranzlmüller PPAM 2015 18 9

SuperMUC and its predecessors m 10 11 m D. Kranzlmüller m 22 PPAM 2015 19 LRZ Building Extension Picture: Horst-Dieter Steinhöfer Figure: Herzog+Partner für StBAM2 (staatl. Hochbauamt München 2) D. Kranzlmüller Picture: Ernst A. Graf PPAM 2015 20 10

SuperMUC Architecture Internet Achive and Backup ~ 30 PB Snapshots/Replika 1.5 PB (separate fire section) $HOME 1.5 PB / 10 GB/s NAS 80 Gbit/s Desaster Recovery Site pruned tree (4:1) non blocking SB-EP 16 cores/node 2 GB/core non blocking WM-EX 40cores/node 6.4 GB/core GPFS for $WORK $SCRATCH 10 PB 200 GB/s Compute nodes 18 Thin node islands (each >8000 cores) Compute nodes 1 Fat node island (8200 cores) è SuperMIG I/O nodes D. Kranzlmüller PPAM 2015 21 Power Consump:on at LRZ 35.000 30.000 Stromverbrauch in MWh 25.000 20.000 15.000 10.000 5.000 0 D. Kranzlmüller PPAM 2015 22 11

Cooling SuperMUC D. Kranzlmüller PPAM 2015 23 SuperMUC Phase 1 & 2 @ LRZ D. Kranzlmüller PPAM 2015 24 12

LRZ Application Mix q Computational Fluid Dynamics: Optimisation of turbines and wings, noise reduction, air conditioning in trains q Fusion: Plasma in a future fusion reactor (ITER) q Astrophysics: Origin and evolution of stars and galaxies q Solid State Physics: Superconductivity, surface properties q Geophysics: Earth quake scenarios q Material Science: Semiconductors q Chemistry: Catalytic reactions q Medicine and Medical Engineering: Blood flow, aneurysms, air conditioning of operating theatres q Biophysics: Properties of viruses, genome analysis q Climate research: Currents in oceans D. Kranzlmüller PPAM 2015 25 Increasing numbers Date System Flop/s Cores 2000 HLRB-I 2 Tflop/s 1512 2006 HLRB-II 62 Tflop/s 9728 2012 SuperMUC 3200 Tflop/s 155656 2015 SuperMUC Phase II 3.2 + 3.2 Pflop/s 229960 D. Kranzlmüller PPAM 2015 26 13

1 st LRZ Extreme Scale Workshop n July 2013: 1 st LRZ Extreme Scale Workshop n ParFcipants: 15 internafonal projects n Prerequisites: Successful run on 4 islands (32768 cores) n ParFcipaFng Groups (Sosware packages): LAMMPS, VERTEX, GADGET, WaLBerla, BQCD, Gromacs, APES, SeisSol, CIAO n Successful results (> 64000 Cores): Invited to parfcipate in PARCO Conference (Sept. 2013) including a publicafon of their approach D. Kranzlmüller PPAM 2015 27 1 st LRZ Extreme Scale Workshop n Regular SuperMUC operation 4 Islands maximum Batch scheduling system n Entire SuperMUC reserved 2,5 days for challenge: 0,5 Days for testing 2 Days for executing 16 (of 19) Islands available n Consumed computing time for all groups: 1 hour of runtime = 130.000 CPU hours 1 year in total D. Kranzlmüller PPAM 2015 28 14

Results (Sustained TFlop/s on 128000 cores) Name MPI # cores Description TFlop/s/island TFlop/s max Linpack IBM 128000 TOP500 161 2560 Vertex IBM 128000 Plasma Physics 15 245 GROMACS IBM, Intel 64000 Molecular Modelling 40 110 Seissol IBM 64000 Geophysics 31 95 walberla IBM 128000 Lattice Boltzmann 5.6 90 LAMMPS IBM 128000 Molecular Modelling 5.6 90 APES IBM 64000 CFD 6 47 BQCD Intel 128000 Quantum Physics 10 27 D. Kranzlmüller PPAM 2015 29 Results n 5 Software packages were running on max 16 islands: LAMMPS VERTEX GADGET WaLBerla BQCD n VERTEX reached 245 TFlop/s on 16 islands (A. Marek) D. Kranzlmüller PPAM 2015 30 15

Lessons learned Technical Perspective n Hybrid (MPI+OpenMP) on SuperMUC still slower than pure MPI (e.g. GROMACS), but applications scale to larger core counts (e.g. VERTEX) n Core pinning needs a lot of experience by the programmer n Parallel IO still remains a challenge for many applications, both with regard to stability and speed. n Several stability issues with GPFS were observed for very large jobs due to writing thousands of files in a single directory. This will be improved in the upcoming versions of the application codes. D. Kranzlmüller PPAM 2015 31 Extreme Scaling - Continuation n LRZ Extreme Scale Benchmark Suite (LESS) will be available in two versions: public and internal n All teams will have the opportunity to run performance benchmarks after upcoming SuperMUC maintenances n 2 nd LRZ Extreme Scaling Workshop è 2-5 June 2014 Full system production runs on 18 islands with sustained Pflop/s (4h SeisSol, 7h Gadget) 4 existing + 6 additional full system applications High I/O bandwidth in user space possible (66 GB/s of 200 GB/s max) Important goal: minimize energy*runtime (3-15 W/core) n Extreme Scale-Out SuperMUC Phase 2 D. Kranzlmüller PPAM 2015 32 16

Extreme Scale- Out SuperMUC Phase 2 n 12 May 12 June 2015 (30 days) n Selected Group of Early Users n Nightly OperaFon: general queue max 3 islands n DayFme OperaFon: special queue max 6 islands (full system) n Total available: 63,432,000 core hours n Total used: 43,758,430 core hours (UFlisaFon: 68.98%) Lessons learned (2015): n PreparaFon is everything n Finding Heisenbugs is difficult n MPI is at its limits n Hybrid (MPI+OpenMP) is the way to go n I/O libraries gewng even more important D. Kranzlmüller PPAM 2015 33 Partnership Ini:a:ve Computa:onal Sciences πcs n Individualized services for selected scienffic groups flagship role Dedicated point- of- contact Individual support and guidance and targeted training &educafon Planning dependability for use case specific opfmized IT infrastructures Early access to latest IT infrastructure (hard- and sosware) developments and specificafon of future requirements Access to IT competence network and experfse at CS and Math departments n Partner contribu:on Embedding IT experts in user groups Joint research projects (including funding) ScienFfic partnership equal foofng joint publicafons n LRZ benefits Understanding the (current and future) needs and requirements of the respecfve scienffic domain Developing future services for all user groups ThemaFc focusing: Environmental Compu:ng D. Kranzlmüller PPAM 2015 34 17

SeisSol - Numerical Simula:on of Seismic Wave Phenomena Dr. ChrisFan PelFes, Department of Earth and Environmental Sciences (LMU) Prof. Michael Bader, Department of InformaFcs (TUM) 1,42 Petaflop/s on 147.456 Cores of SuperMUC (44,5 % of Peak Performance) hdp://www.uni- muenchen.de/informafonen_fuer/presse/presseinformafonen/2014/pelfes_seisol.html Picture: Alex Breuer (TUM) / ChrisFan PelFes (LMU) D. Kranzlmüller PPAM 2015 35 Extreme Scaling - Conclusions n The number of compute cores is steadily increasing n Users need to possibility to execute (and opfmize) their codes on the full size machines n The Exteme Scaling Workshop Series @ LRZ offers a number of incenfves for users n The lessons learned from the Extreme Scaling Workshop are very valuable for the operafon of the centre n The LRZ Partnership IniFaFve ComputaFonal Science (pics) tries to improve user support hdp://www.sciencedirect.com/science/arfcle/pii/s1877050914003433 D. Kranzlmüller PPAM 2015 36 18

Extreme Scaling on Energy Efficient SuperMUC Dieter Kranzlmüller kranzlmueller@lrz.de D. Kranzlmüller PPAM 2015 37 19