Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011
3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores SMP nodes 500 GB/s IO throughput 580 m² footprint >1.6 PetaFlops 360 TB ory 10 PB disk storage 90 000+ Xeon cores +SMP + GPU 250 GB/s IO throughput 200 m² footprint ~.5 PB/s ory total t BW >1.4 PetaFlops 70 000+ Xeon cores 280 TB ory 15 PB disk storage 120 GB/s IO throughput 200 m² footprint 2 Bull, 2011 More than PetaFlops!
Tera, Peta, Exa, Zetta, Flops / Scale higher resolution physics (elastic / anisotropic) accurate numerics (FWI) source: exascale.org 3 Bull, 2011 More than PetaFlops!
HPC Applications Electro-Magnetics Computational Chemistry Quantum Mechanics Computational Chemistry Molecular Dynamics Computational Biology Structural Mechanics Implicit Structural Mechanics Explicit A wide range of Domains and Applications Seismic Processing Computational Fluid Dynamics Reservoir Simulation Rendering / Ray Tracing Climate / Weather Ocean Simulation Data Analytics 4 Bull, 2011 More than PetaFlops!
Exascale Technology Challenges Processing Element : architecture and frequency Multi/Many-cores cores, Accelerators, Memory Capacity & BW MCM, 3D Packaging? Feeding enough Bytes to the FP engines, fast enough Network bandwidth, latency, topology and routing Optical connections/cables, fewer hops, compact packaging I/O scalability and flexibility XXXLarge datasets + faster computations data explosion System-level resiliency and reliability Month(s) long jobs getting through HW failures Power and Cooling Fewer & less consuming components, improved PUE Price? 5 Bull, 2011 More than PetaFlops!
HPC requirements, constraints and technology evolution expected required We won t make it with HPC as usual constraints expected We won t be able to afford it 6 Bull, 2011 More than PetaFlops!
2011 s CPU-GPU Hybrid Architecture processo or core core core core core core shared Application NIC IO bottleneck co- -process sor shared data Computation offload 7 Bull, 2011 OpenGPU- jpp
GPU Accélérators and Applications 2 x CPUs 2 x GPUs GPU / CPU ratio bullx blad de b505 GFlops (DP) 7 Memory BW 4.5 consumption 2 Memory Size 1 / 8 Applications suited for GPUs : Small source code (kernels) Limited datasets or very good locality (Single Precision) i Little communications Cuda or OpenCL or HMPP Graphics Rendering Seismic Imaging Molecular Dynamics, Astrophysics Financial Simulations Structure Analysis, Electromagnetism Genomics Weather / Climate / Oceanography and more 8 Bull, 2011 OpenGPU- jpp
Hybrid CPU-GPU architecture evolutions proc cessor rocessor co-p NIC self-boo otable core core core core core core shared IO sh hared data Faster with Many More Slower Processing Elements core core core core sh hared data IO shared /d data IO Much More NIC Parallelism!!! IO 9 Bull, 2011 OpenGPU- jpp core core cor re core NIC NIC More Memory Levels self-bootabl e co-pro ocessor
InfiniBand for Communications and IO IO nodes Exascale Systems 50-100k nodes [ 10x today ] Manag gement Cen nter 324p 324p x 18 324p Parallel File System MPI 36p 36p x 324 36p 18 nodes 5,832 nodes fully non-blocking 10 Bull, 2011 More than PetaFlops!
System reliability and Applications runtime Increase System Resiliency but not Cost!!! # components in system Jobs runtime MTBF MTTI Tera Peta Exa 11 Bull, 2011 More than PetaFlops!
Cooling & Power Usage Effectiveness (PUE) 20 2.0 15 1.5 10 1.0 20 kw/rack 30 kw/rack 70 kw/rack A / C Air-cooled Water-cooled door Direct-Liquid-cooling Co-generation 12 Bull, 2011 More than PetaFlops!
HPC systems and future technologies: HPC Applications stress all aspects of a system (seldom flops) Processing Units (PUs) cross-over between CPUs & GPUs CPUs ease of use + GPUs performance and power efficiency Balanced Design Required Memory BW & latency, Interconnect, local IO Optimal data centers cooling Improved nodes energy efficiency Accelerators, better integration (NIC in PU) Resilient but taffordable Software, Software, Software, Administration, Monitoring, Development, Runtime More than Flops! 13 Bull, 2011 More than PetaFlops!
14 Bull, 2011 More than PetaFlops!