Trinity Advanced Technology System Overview

Similar documents
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

ASC Panel on Run-me System Topics

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015

Achieving Performance Isolation with Lightweight Co-Kernels

SAND P Issued by Sandia National Laboratories for NNSA s Office of Advanced Simulation & Computing, NA-114.

NA-ASC-128R-14-VOL.2-PN

HPCOR Data Management Policies Co-Chairs: Bill Allcock, Julia White. HPCOR 2014, June 18-19, Oakland, CA 1

Cray s Storage History and Outlook Lustre+ Jason Goodman, Cray LUG Denver

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Exascale Computing Project (ECP) Update

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Private Cloud Website Solu2on

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Trinity Facilities and Operations Planning and Preparation: Early Experiences, Successes, and Lessons Learned

BENCHMARKING V ISUALIZATION TOOL

Cloud Computing through Virtualization and HPC technologies

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

Intel Xeon Processor E5-2600

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

SciDAC Petascale Data Storage Institute

Sun Constellation System: The Open Petascale Computing Architecture

Summit and Sierra Supercomputers:

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Building a Top500-class Supercomputing Cluster at LNS-BUAP

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

INTEL Software Development Conference - LONDON High Performance Computing - BIG DATA ANALYTICS - FINANCE. Final version

The Real Score of Cloud

ECDF Infrastructure Refresh - Requirements Consultation Document

HPC Trends and Directions in the Earth Sciences Per Nyberg Sr. Director, Business Development


HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Performance Analysis of Flash Storage Devices and their Application in High Performance Computing

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

MapR Enterprise Edition & Enterprise Database Edition

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Big Data Performance Growth on the Rise

Data Center DC planning for the next 5 10 years. Copyright Experture and Robert Frances Group, all rights reserved

Hadoop on the Gordon Data Intensive Cluster

CFD Implementation with In-Socket FPGA Accelerators

Server: Performance Benchmark. Memory channels, frequency and performance

September 25, Maya Gokhale Georgia Institute of Technology

Physics Experiments ]

Update on the Cloud Demonstration Project

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

Big Data and Scientific Discovery

Dell Reference Configuration for Hortonworks Data Platform

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Virtual Pla*orms Hypervisor Methods to Improve Performance and Isola7on proper7es of Shared Mul7core Servers

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms

LS DYNA Performance Benchmarks and Profiling. January 2009

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Lustre Networking BY PETER J. BRAAM

Large-Data Software Defined Visualization on CPUs

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

Broadening Moab/TORQUE for Expanding User Needs

Boosting Data Transfer with TCP Offload Engine Technology

LONI Provides UNO High Speed Business Con9nuity A=er Katrina. Speakers: Lonnie Leger, LONI Chris Marshall, UNO

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney

Behind the scene III Cloud computing

Hortonworks Data Platform Reference Architecture

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Today s Mobile Internet. Geoff Huston, APNIC Labs

10- High Performance Compu5ng

NERSC File Systems and How to Use Them

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

Kriterien für ein PetaFlop System

INTEL Software Development Conference - LONDON High Performance Computing - BIG DATA ANALYTICS - FINANCE

Intel DPDK Boosts Server Appliance Performance White Paper

Hank Childs, University of Oregon

benefit of virtualiza/on? Virtualiza/on An interpreter may not work! Requirements for Virtualiza/on 1/06/15 Which of the following is not a poten/al

ECLIPSE Performance Benchmarks and Profiling. January 2009

Accelerating Data Compression with Intel Multi-Core Processors

High Performance Computing (HPC)

Multi-Threading Performance on Commodity Multi-Core Processors

SMB Direct for SQL Server and Private Cloud

Update on the Cloud Demonstration Project

Virtualization with the Intel Xeon Processor 5500 Series: A Proof of Concept

Mixing Hadoop and HPC Workloads on Parallel Filesystems

How SSDs Fit in Different Data Center Applications

Innovativste XEON Prozessortechnik für Cisco UCS

Enabling Technologies for Distributed Computing

Mission Need Statement for the Next Generation High Performance Production Computing System Project (NERSC-8)

Infrastructure Matters: POWER8 vs. Xeon x86

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Performance Analysis of Web based Applications on Single and Multi Core Servers

Transcription:

Trinity Advanced Technology System Overview Manuel Vigil Trinity Project Director Douglas Doerfler Trinity Chief Architect 1

Outline ASC Compu/ng Strategy Project Drivers and Procurement Process Pla;orm Architecture Overview Schedule and Status Ques/ons, and maybe some answers 2

ASC computing strategy Approach: Two classes of systems Advanced Technology: First of a kind systems that identify and foster technical capabilities and features that are beneficial to ASC applications Commodity Technology: Robust, cost-effective systems to meet the day-to-day simulation workload needs of the program Investment Principles Maintain continuity of production Ensure that the needs of the current and future stockpile are met Balance investments in system cost-performance types with computational requirements Partner with industry to introduce new high-end technology constrained by life-cycle costs Acquire right-sized platforms to meet the mission needs 3

ASC Pla;orm Timeline Advanced Technology Systems (ATS) Cielo (LANL/SNL) Sequoia (LLNL) ATS 1 Trinity (LANL/SNL) ATS 2 (LLNL) System Delivery ATS 3 (LANL/SNL) Tri- lab Linux Capacity Cluster II (TLCC II) Commodity Technology Systems (CTS) Dev. & Deploy Use Retire CTS 1 CTS 2 12 13 14 15 16 17 Fiscal Year 18 19 20 21

Advanced Technology Systems Leadership- class pla;orms Pursue promising new technology paths with industry partners These systems are to meet unique mission needs and to help prepare the program for future system designs Includes Non- Recurring Engineering (NRE) funding to enable delivery of leading- edge pla;orms Trinity (ATS- 1) will be deployed by ACES (New Mexico Alliance for Compu/ng at Extreme Scale, i.e. Los Alamos & Sandia) and sited at Los Alamos ATS- 2 will be led by LLNL, ATS- 3 by ACES, etc 5

Trinity Project Drivers Sa/sfy the mission need for more capable pla;orms Trinity is designed to support the largest, most demanding ASC applica/ons Increases in geometric and physics fideli/es while sa/sfying analysts /me to solu/on expecta/ons Foster a compe//ve environment and influence next genera/on architectures in the HPC industry Trinity is enabling new architecture features in a produc/on compu/ng environment (ATS Components) Tightly coupled solid state storage serves as a burst buffer for checkpoint/ restart file I/O & data analy/cs, enabling improved /me to solu/on efficiencies Advanced power management features enable measurement and control at the system, node and component levels, allowing explora/on of applica/on performance/wa^ and reducing total cost of ownership Trinity s architecture will introduce new challenges for code teams: transi/on from mul/- core to many- core, high- speed on- chip memory subsystem, wider SIMD/vector units 6

Trinity/NERSC8 Procurement Process Timeline ACES (LANL/SNL) Project started November 2011 Market Survey started January 2012 Partnered with LBL/NERSC on RFP (NERSC 8) March 2012 CD- 0, Drae Technical Requirements and RFI issued December 2012 Formal Design Review completed April 2013 Independent Project Review (Lehman) completed May 2013 CD- 1, Trinity/NERSC8 RFP issued August 2013 Technical Evalua/on of the proposals completed September 2013 Ini/al nego/a/ons for both systems completed November 2013 NNSA Independent Cost Review completed Jan 2014 CD- 2/3, NERSC8 awarded April 2014 CD- 2/3, Trinity awarded July 2014 aeer Best and Final Offer (BAFO) 7

Trinity Pla;orm Solu/on Cray has been awarded the contract, July 2014 Based on mature Cray XC30 architecture with Trinity introducing new architectural features Intel Knights Landing processor Burst Buffer storage nodes Advanced power management system soeware enhancements A single system that contains both Intel Haswell and Knights Landing (KNL) processors Haswell par//on sa/sfies FY15 mission needs (well suited to exis/ng codes) and fits the FY15 budget profile. KNL par//on delivered in FY16 results in a system significantly more capable than current pla;orms, provides the applica/on developers with an a^rac/ve next genera/on target, and fits the FY16 budget profile. Managed Risk Cray XC30 architecture minimizes system soeware risk and provides a mature high- speed interconnect Haswell par//on is low risk as technology is available Fall CY14 KNL is higher risk due to new technology, but provides a good path for codes teams to transi/on to many- core architecture 8

Trinity High- Level Architecture 9

Trinity Architecture Details Metric Trinity Node Architecture Haswell Par//on KNL Par//on KNL + Haswell Memory Capacity 2.11 PB > 1 PB >1 PB Memory BW >6 PB/sec > 1 PB/s >1PB/s + >4PB/s Peak FLOPS 42.2 PF 11.5 PF 30.7 PF Number of Nodes 19,000+ >9,500 >9,500 Number of Cores >760,000 >190,000 >570,000 Number of Cabs (incl I/O & BB) 112 PFS Capacity (usable) 82 PB usable > 8x Cielo PFS Bandwidth (sustained) 1.45 TB/s > 10x Cielo BB Capacity (usable) 3.7 PB BB Bandwidth (sustained) 3.3 TB/s 10

Compute Node Specifica/ons Haswell Knights Landing Memory Capacity (DDR) 2x64=128 GB Comparable to Intel Xeon processor Memory Bandwidth (DDR) 136.5 GB/s Comparable to Intel Xeon processor # of sockets per node 2 N/A # of cores 2x16=32 60+ cores Core frequency (GHz) 2.3 N/A # of memory channels 2x4=8 N/A Memory Technology 2133 MHz DDR4 MCDRAM & DDR4 Threads per core 2 4 Vector units & width (per core) 1x256 AVX2 AVX- 512 On- chip MCDRAM N/A Up to 16GB at launch, over 5x STREAM vs. DDR4 11

Trinity Capabili/es Each par//on will accommodate 1 to 2 large mission problems (2 to 4 total) Capability rela/ve to Cielo 8x to 12x improvement in fidelity, physics and performance > 30x increase in peak FLOPS > 2x increase in node- level parallelism > 6x increase in cores > 20x increase in threads 12

The Trinity Center of Excellence & Applica/on Transi/on Challenges Center of Excellence Work with select NW applica/on code teams to ensure KNL Par//on is used effec/vely upon ini/al deployment Nominally one applica/on per laboratory (SNL, LANL, LLNL) Chosen such that they impact the NW program in FY17 Facilitate the transi/on to next- genera/on ATS code migra/on issues This is NOT a benchmarking effort Intel Knights Landing processor From mul/- core to many- core > 10x increase in thread level parallelism A reduc/on in per core throughput (1/4 to 1/3 the performance of a x86-64 core) MCDRAM: Fast but limited capacity (~5x the BW, ~1/5 the capacity of DDR4 memory) Dual AVX- 512 SIMD units Burst Buffer Data analy/cs use cases need to be developed and/or deployed into produc/on codes Checkpoint/Restart should just work, although advanced features may require code changes 13

Trinity Pla;orm Schedule Highlights 2014-2016 14

Trinity Project Team Trinity Executive Committee ASC Execs, LANL ASC Execs, SNL ACES Co-Directors Project Manager System Architect ACES Co-Directors Gary Grider, LANL Bruce Hendrickson, SNL Trinity Project Director Manuel Vigil Chief Architect Doug Doerfler NNSA OCIO Advisors and Compliance Federal Project Director NNSA Acquisition Darren Knox System Architecture Doug Doerfler Josip Loncaric Center of Excellence Rob Hoekstra Tim Kelley Shawn Dawson Project Management, Security Manuel Vigil Jim Lujan Alex Malin Facilities and Trinity Installation Ron Velarde System Integration and Deployment David Morton Operations Planning Jeff Johnson Bob Ballance Burst Buffer Cornell Wright R&D Advanced Power Management Jim Laros Hardware Architecture Scott Hemmert Acceptance Jim Lujan External Networks and Archiving Parks Fields, Kyle Lamb System Software Stack Daryl Grunau File System Brett Kettering Application Readiness Cornell Wright Joel Stevenson Software Architecture Kevin Pedretti Viz Laura Monroe 15

Ques/ons Manuel Vigil mbv@lanl.gov Douglas Doerfler dwdoerf@sandia.gov 16