Jean-Pierre Panziera Teratec 2011



Similar documents
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Accelerating CFD using OpenFOAM with GPUs

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Trends in High-Performance Computing for Power Grid Applications

Parallel Programming Survey

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Kriterien für ein PetaFlop System

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

HP ProLiant SL270s Gen8 Server. Evaluation Report

Overview of HPC systems and software available within

Extreme Computing: The Bull Way

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Enabling the Use of Data

Cluster Computing at HRI

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

SGI High Performance Computing

Sun Constellation System: The Open Petascale Computing Architecture

Cloud Data Center Acceleration 2015

High Performance Computing. Course Notes HPC Fundamentals

Data Centric Systems (DCS)

Summit and Sierra Supercomputers:

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science

Clusters: Mainstream Technology for CAE

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Findings in High-Speed OrthoMosaic

Data Centric Interactive Visualization of Very Large Data

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Enabling High performance Big Data platform with RDMA

Simulation Platform Overview

Next Generation Operating Systems

Unified Computing Systems

Barry Bolding, Ph.D. VP, Cray Product Division

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

1 Bull, 2011 Bull Extreme Computing

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

High Performance Computing in CST STUDIO SUITE

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Hadoop on the Gordon Data Intensive Cluster

Interconnecting Future DoE leadership systems

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

White Paper The Numascale Solution: Extreme BIG DATA Computing

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

Kalray MPPA Massively Parallel Processing Array

Pedraforca: ARM + GPU prototype

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Advancing Applications Performance With InfiniBand

Recent Advances in HPC for Structural Mechanics Simulations

Achieving Performance Isolation with Lightweight Co-Kernels

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Amazon EC2 Product Details Page 1 of 5

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

LS DYNA Performance Benchmarks and Profiling. January 2009

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

ECLIPSE Performance Benchmarks and Profiling. January 2009

Mellanox Accelerated Storage Solutions

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Jezelf Groen Rekenen met Supercomputers

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Performance Monitoring of Parallel Scientific Applications

HPC Programming Framework Research Team

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2015

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

HPC Trends and Directions in the Earth Sciences Per Nyberg Sr. Director, Business Development

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Transcription:

Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011

3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores SMP nodes 500 GB/s IO throughput 580 m² footprint >1.6 PetaFlops 360 TB ory 10 PB disk storage 90 000+ Xeon cores +SMP + GPU 250 GB/s IO throughput 200 m² footprint ~.5 PB/s ory total t BW >1.4 PetaFlops 70 000+ Xeon cores 280 TB ory 15 PB disk storage 120 GB/s IO throughput 200 m² footprint 2 Bull, 2011 More than PetaFlops!

Tera, Peta, Exa, Zetta, Flops / Scale higher resolution physics (elastic / anisotropic) accurate numerics (FWI) source: exascale.org 3 Bull, 2011 More than PetaFlops!

HPC Applications Electro-Magnetics Computational Chemistry Quantum Mechanics Computational Chemistry Molecular Dynamics Computational Biology Structural Mechanics Implicit Structural Mechanics Explicit A wide range of Domains and Applications Seismic Processing Computational Fluid Dynamics Reservoir Simulation Rendering / Ray Tracing Climate / Weather Ocean Simulation Data Analytics 4 Bull, 2011 More than PetaFlops!

Exascale Technology Challenges Processing Element : architecture and frequency Multi/Many-cores cores, Accelerators, Memory Capacity & BW MCM, 3D Packaging? Feeding enough Bytes to the FP engines, fast enough Network bandwidth, latency, topology and routing Optical connections/cables, fewer hops, compact packaging I/O scalability and flexibility XXXLarge datasets + faster computations data explosion System-level resiliency and reliability Month(s) long jobs getting through HW failures Power and Cooling Fewer & less consuming components, improved PUE Price? 5 Bull, 2011 More than PetaFlops!

HPC requirements, constraints and technology evolution expected required We won t make it with HPC as usual constraints expected We won t be able to afford it 6 Bull, 2011 More than PetaFlops!

2011 s CPU-GPU Hybrid Architecture processo or core core core core core core shared Application NIC IO bottleneck co- -process sor shared data Computation offload 7 Bull, 2011 OpenGPU- jpp

GPU Accélérators and Applications 2 x CPUs 2 x GPUs GPU / CPU ratio bullx blad de b505 GFlops (DP) 7 Memory BW 4.5 consumption 2 Memory Size 1 / 8 Applications suited for GPUs : Small source code (kernels) Limited datasets or very good locality (Single Precision) i Little communications Cuda or OpenCL or HMPP Graphics Rendering Seismic Imaging Molecular Dynamics, Astrophysics Financial Simulations Structure Analysis, Electromagnetism Genomics Weather / Climate / Oceanography and more 8 Bull, 2011 OpenGPU- jpp

Hybrid CPU-GPU architecture evolutions proc cessor rocessor co-p NIC self-boo otable core core core core core core shared IO sh hared data Faster with Many More Slower Processing Elements core core core core sh hared data IO shared /d data IO Much More NIC Parallelism!!! IO 9 Bull, 2011 OpenGPU- jpp core core cor re core NIC NIC More Memory Levels self-bootabl e co-pro ocessor

InfiniBand for Communications and IO IO nodes Exascale Systems 50-100k nodes [ 10x today ] Manag gement Cen nter 324p 324p x 18 324p Parallel File System MPI 36p 36p x 324 36p 18 nodes 5,832 nodes fully non-blocking 10 Bull, 2011 More than PetaFlops!

System reliability and Applications runtime Increase System Resiliency but not Cost!!! # components in system Jobs runtime MTBF MTTI Tera Peta Exa 11 Bull, 2011 More than PetaFlops!

Cooling & Power Usage Effectiveness (PUE) 20 2.0 15 1.5 10 1.0 20 kw/rack 30 kw/rack 70 kw/rack A / C Air-cooled Water-cooled door Direct-Liquid-cooling Co-generation 12 Bull, 2011 More than PetaFlops!

HPC systems and future technologies: HPC Applications stress all aspects of a system (seldom flops) Processing Units (PUs) cross-over between CPUs & GPUs CPUs ease of use + GPUs performance and power efficiency Balanced Design Required Memory BW & latency, Interconnect, local IO Optimal data centers cooling Improved nodes energy efficiency Accelerators, better integration (NIC in PU) Resilient but taffordable Software, Software, Software, Administration, Monitoring, Development, Runtime More than Flops! 13 Bull, 2011 More than PetaFlops!

14 Bull, 2011 More than PetaFlops!