Similar documents
HPC Update: Engagement Model

The K computer: Project overview

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Building a Top500-class Supercomputing Cluster at LNS-BUAP

ECLIPSE Performance Benchmarks and Profiling. January 2009

Performance Analysis of Flash Storage Devices and their Application in High Performance Computing

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

BSC - Barcelona Supercomputer Center

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

LLNL Redefines High Performance Computing with Fusion Powered I/O

LS DYNA Performance Benchmarks and Profiling. January 2009

Summit and Sierra Supercomputers:

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Current Status of FEFS for the K computer

OC By Arsene Fansi T. POLIMI

New Storage System Solutions

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

OpenMP Programming on ScaleMP

Sun Constellation System: The Open Petascale Computing Architecture

1 Bull, 2011 Bull Extreme Computing

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Infrastructure Matters: POWER8 vs. Xeon x86

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Supercomputing Status und Trends (Conference Report) Peter Wegner

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

EMC ISILON X-SERIES. Specifications. EMC Isilon X200. EMC Isilon X210. EMC Isilon X410 ARCHITECTURE

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Setting a new standard

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Parallel Programming Survey

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

SPARC64 VIIIfx: CPU for the K computer

Azul Compute Appliances

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Lustre Networking BY PETER J. BRAAM

Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

SOSCIP Platforms. SOSCIP Platforms

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

PRIMERGY server-based High Performance Computing solutions

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

High Performance Computing. Course Notes HPC Fundamentals

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

System Software for High Performance Computing. Joe Izraelevitz

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

OpenSoC Fabric: On-Chip Network Generator

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET

Data Centric Systems (DCS)

How To Build A Cloud Computer

NetApp High-Performance Computing Solution for Lustre: Solution Guide

Netapp HPC Solution for Lustre. Rich Fenton UK Solutions Architect

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

bla bla OPEN-XCHANGE Open-Xchange Hardware Needs

A Quantum Leap in Enterprise Computing

Big Data Visualization on the MIC

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Barry Bolding, Ph.D. VP, Cray Product Division

Clusters: Mainstream Technology for CAE

Visit to the National University for Defense Technology Changsha, China. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Next Generation GPU Architecture Code-named Fermi

Big Data Management in the Clouds and HPC Systems

EMC ISILON NL-SERIES. Specifications. EMC Isilon NL400. EMC Isilon NL410 ARCHITECTURE

Operating System for the K computer

High-Density Network Flow Monitoring

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

Cosmological simulations on High Performance Computers

Scala Storage Scale-Out Clustered Storage White Paper

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

OpenSPARC T1 Processor

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Managing Data Center Power and Cooling

Energy Efficiency Analysis of The SoC Based Processors for The Scientific Applications

Hortonworks Data Platform Reference Architecture

Easier - Faster - Better

Transcription:

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary

Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security Administration as part of the Advanced Simulation and Computing Program (ASC) Primarily for nuclear weapons simulation at Lawrence Livermore National Laboratory Scientific purposes like astronomy, energy, studying of the human genome, and climate change

Blue Gene is an IBM project aimed at designing supercomputers IBM has created three generations of supercomputers, Blue Gene/L, Blue Gene/P, and Blue Gene/Q Blue Gene/L November 2004 6-rack system each rack 1,024 compute nodes first place in the TOP500 list performance of 70.72 TFLOPS Blue Gene/P November 2009 2-rack system each rack 2048 compute nodes 8 th place in the TOP500 list

Node Architecture : IBM Blue Gene/Q design 98,304 compute nodes Total of 1.6 million processor cores, 1.6PBmemory 96 racks covering an area of about 3,000 square feet Job Scheduler : Simple Linux Utility for Resource Management (SLURM )job scheduler Used by Dawn prototype and China's Tianhe-IA

Filesystem : Lustre parallel file system Ported ZFS management system Power Usage : Low power consumption Estimated to beat the current (2011) top 500 leaders by 3 time the power efficiency

360 mm² Cu-45 technology (SOI) ~ 1.47 B transistors 16 user + 1 service processors all processors are symmetric each 4-way multi-threaded 64 bits PowerISA 1.6 GHz L1 I/D cache = 16kB/16kB L1 prefetch engines each processor has Quad FPU (4-wide double precision, SIMD) peak performance 204.8 GFLOPS@55W

Central shared L2 cache: 32 MB edram multiversioned cache will support transactional memory, speculative execution. supports atomic ops Dual memory controller 16 GB external DDR3 memory 1.33 Gb/s 2 * 16 byte-wide interface (+ECC) Chip-to-chip networking Router logic integrated into BQC chip. External IO PCIe Gen2 interface

Assistant to the 16 user cores Offload interrupt handling Asynchronous I/O completion Messaging assist, e.g. MPI pacing Offload RAS Event handling

Simple Linux Utility for Resource Management (or simply SLURM) It is an open source job scheduler Used by most of the supercomputers and computer clusters It performs these three major jobs : Allocate and non-exclusive access to resources Provides a framework for starting, executing, and monitoring work especially MPI Arbitrates contention for resources by managing a queue of pending jobs SLURM is designed to handle thousands of nodes in a single cluster and can sustain throughput of 120,000 jobs per hour SLURM's design is very modular with dozens of optional plugins

Lustre is a parallel distributed file system Generally used for large scale cluster computing Name Lustre is derived from Linux and cluster Lustre file systems are scalable Can support tens of thousands of client systems, tens of petabytes (PB) of storage, and hundreds of gigabytes per second (GB/s) of aggregate I/O throughput

ZFS volume manager designed by Sun Microsystems. Features : - Verification against data corruption Support for high storage capacities Continuous integrity checking and automatic repair

Draw about 6 MW of power Projected to have an unprecedented efficiency in performance per watt 3000 Mflops/watt, 7 times as efficient as the Blue Gene/P design Estimated to beat the top-500 leaders (2011) with thrice the power efficiency

In November 2011,an initial 4-rack Blue Gene/Q system of 4096 nodes, 65536 user processor cores #17 in thetop500 list Achieved top position in the Graph500 list Blue Gene/Q systems also topped the Green500 list of most energy efficient supercomputers with about 2 GFLOPS/W

Blue Gene/Q is deployed at Lawrence Livermore National Laboratory, Sequoia is expected to achieve 20PFLOPS at peak performance, approximately two times higher compared to the currently K computer.