Installation of LLNL s Sequoia File System

Similar documents
LLNL Production Plans and Best Practices

HA Certification Document Armari BrontaStor 822R 07/03/2013. Open-E High Availability Certification report for Armari BrontaStor 822R

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Hadoop on the Gordon Data Intensive Cluster

High-Availability and Scalable Cluster-in-a-Box HPC Storage Solution

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Architecting a High Performance Storage System

New Storage System Solutions

HPC Update: Engagement Model

An Introduction to the Gordon Architecture

Configuration Maximums VMware vsphere 4.0

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

FLOW-3D Performance Benchmark and Profiling. September 2012

IRS: Implicit Radiation Solver Version 1.0 Benchmark Runs

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

NetApp High-Performance Computing Solution for Lustre: Solution Guide

Current Status of FEFS for the K computer

SMB Direct for SQL Server and Private Cloud

Lustre SMB Gateway. Integrating Lustre with Windows

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

F600Q 8Gb FC Storage Performance Report Date: 2012/10/30

Requesting Nodes, Processors, and Tasks in Moab

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

Lustre Networking BY PETER J. BRAAM

LLNL-TR SAVANT Status Report. Arden Dougan. November 6, 2009

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

High Performance Computing OpenStack Options. September 22, 2015

Kriterien für ein PetaFlop System

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Configuration Maximums VMware Infrastructure 3

Configuration Maximums

Configuration Maximums

Can High-Performance Interconnects Benefit Memcached and Hadoop?

IBM BladeCenter H with Cisco VFrame Software A Comparison with HP Virtual Connect

A High Performance Computing Scheduling and Resource Management Primer

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

New!! - Higher performance for Windows and UNIX environments

WinMagic Encryption Software Installation and Configuration

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Low-cost

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Data Sheet FUJITSU Server PRIMERGY CX272 S1 Dual socket server node for PRIMERGY CX420 cluster server

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

February, 2015 Bill Loewe

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

Configuration Maximums VMware vsphere 4.1

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

PCI Express IO Virtualization Overview

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Enabling Technologies for Distributed and Cloud Computing

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

High Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility

Lustre & Cluster. - monitoring the whole thing Erich Focht

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Microsoft Private Cloud Fast Track Reference Architecture

RDMA over Ethernet - A Preliminary Study

HP 85 TB reference architectures for Microsoft SQL Server 2012 Fast Track Data Warehouse: HP ProLiant DL980 G7 and P2000 G3 MSA Storage

Pivot3 Reference Architecture for VMware View Version 1.03

HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

EMC VMAX3 FAMILY VMAX 100K, 200K, 400K

Sun Constellation System: The Open Petascale Computing Architecture

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

File System-Aware Job Scheduling with Moab

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.2

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

Configuring LNet Routers for File Systems based on Intel Enterprise Edition for Lustre* Software

EVLA Post Processing Cluster Recommendation

A Systems Approach to HVAC Contractor Security

Lustre failover experience

Evaluation of Dell PowerEdge VRTX Shared PERC8 in Failover Scenario

Violin Memory Arrays With IBM System Storage SAN Volume Control

MDMCenter Hardware Specification

Patch Management Marvin Christensen /CIAC

Data Sheet FUJITSU Server PRIMERGY CX420 S1 Out-of-the-box Dual Node Cluster Server

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Netapp HPC Solution for Lustre. Rich Fenton UK Solutions Architect

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

VTrak SATA RAID Storage System

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms

Virtual Compute Appliance Frequently Asked Questions

Building a Top500-class Supercomputing Cluster at LNS-BUAP

PCI Express and Storage. Ron Emerick, Sun Microsystems

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Out-of-box comparison between Dell and HP blade servers

Business-centric Storage for small and medium-sized enterprises. How ETERNUS DX powered by Intel Xeon processors improves data management

HP recommended configurations for Microsoft Exchange Server 2013 and HP ProLiant Gen8 with direct attached storage (DAS)

How To Write An Article On An Hp Appsystem For Spera Hana

Certification Document bluechip STORAGEline R54300s NAS-Server 03/06/2014. bluechip STORAGEline R54300s NAS-Server system

Next Generation Storage for the HLTCOE

Transcription:

Installation of LLNL s Sequoia File System Marc Stearman Parallel File Systems Operations Lead marc@llnl.gov April 2012 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-PRES-551092

Disclaimer This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. 2

Sequoia Compute Platform 768 32 Lustre IB / Ethernet SAN 768 Sequoia Stats: 20PF Compute Platform 96 Racks 1.6 Million Cores 1.6 PB Memory 768 I/O Nodes - QDR IB Liquid Cooled 3

Sequoia I/O Infrastructure Requirements 50PB file system 500GB/s minimum, 1TB/s stretch goal QDR InfiniBand SAN connection to Sequoia Must integrate with existing Ethernet infrastructure Login/DM CN... CN..... ION..... CN... CN ION Login/DM Sequoia Platform Existing LC Infrastructure >$20M Budget Across five procurements dominated by RAID file system and IB SAN hardware procurements Lustre MDS Node Lustre OSS Nodes... Lustre OSS Nodes Sequoia SAN Lustre MDS Node OSS and MDS Hardware Procurements Phased Bandwidth Delivery Phase 1: 10% Oct 2011 Phase 2: 50% Dec 2011 Phase 3: 100% Feb 2012... SFS Disk Hardware Procurement 4

Sequoia I/O Challenges Abound ZFS-based Lustre Has never been done before Is dependent on ongoing local development and D&E contract investments Is the pioneering implementation of new backend fs for Lustre community Will be buggy, does not meet 1TB/s stretch goal without performance improvements InfiniBand SAN Lack of tools and experience as HPC SAN Need to bridge to existing Ethernet infrastructure Sequoia IONs ION/CN ratio is a factor of two less than Dawn and BG/L 5

D&E Efforts Lustre OSD work Abstracts all backend storage access into a single portable API Enables Lustre to use any backend file system with a minimum amount of glue logic ldiskfs ZFS btrfs (future work) SMP Checksum Performance Parallelize checksums across multiple threads ZFS Optimization Quotas 6

Sequoia SAN Architecture Login/DM 32 ION 24..... ION 24 Login/DM 32 Login leaf switch ION leaf switch ION leaf switch Login leaf switch 2 2-648 port QDR IB switches 6 IB Core 6 54 2 Lustre Ethernet Core Service Node 2 Expansion MDS Leaf switch x1 x1 TLCC2/CRELIC Lustre MDS Node Lustre OSS Nodes... Lustre OSS Nodes Sequoia SAN 7

Procurement Status RAID Hardware Contract Awarded to IAS/NetApp SAN Infiniband Hardware Contract Awarded to Advanced HPC/Mellanox OSS OSS Contract Awarded to Appro MDS Supermicro Westmere nodes, with RAID Inc. JBOD and OCZ Talos2 SSDs Sequoia Platform All racks at LLNL, integration in progress 8

RAID Hardware Details Contract Awarded to NetApp (LSI Engenio equipment) NetApp E5400 60-bay 4U Enclosure with 2 RAID controllers 3TB SAS drives 180TB RAW capacity 130TB RAID6 capacity 6Gb SAS lanes FC or IB Host interfaces (We chose QDR IB) 9

Lustre Server Architecture OSS uses TLCC2 design Appro GreenBlade Intel Xeon E5-2670 @ 2.60GHz Dual Socket, 8 core 64GB RAM QDR Mellanox ConnectX-3 IB down (LNET) Dual Port QDR ConnectX-2 HCA (SRP to Disk) MDS Supermicro X8DTH Intel Xeon X5690 @ 3.47GHz Dual Socket, 6 core (24 cpus with Hyperthreading) 192GB RAM JBODS with OCZ Talos2 SDDs (40 Drives, SAS connected using ZFS RAID10) Configure as a failover pair (active/passive) for reliability 10

OSS Connectivity 2 OSS nodes per E5400 as a failover pair Using RHEL6 multipath rdac drivers 11

Rack Layout 8 E5400s per RSSU (Rack Storage Scalable Unit) 16 OSS nodes at the top of the rack 48 Racks total: 384 E5400s, 768 OSS nodes 55PB Capacity, aiming for 1TB/s 8U OSS Chassis 8U OSS Chassis 8U OSS Chassis 8U OSS Chassis 8U OSS Chassis 12

Performance XDD Seagate Drives Hitachi Drives Aggregate 463 GB/s Average Node: 615 MB/s 13

Performance XDD, Post Drive Swap 14

Performance XDD Seagate Drives Hitachi Drives Aggregate 1.2 TB/s (752 nodes) Average Node: 1.5 GB/s 15

Performance XDD, Post Drive Swap 16

Performance ZPIOS Aggregate: Write 693 GB/s, Read 603 GB/s (616 nodes) Projected: Write 864 GB/s, Read 752 GB/s (768 nodes) Average Node: Write 1,125 MB/s, Read 979 MB/s 17

Performance ZPIOS, Post Drive Swap Aggregate: Write 747 GB/s, Read 689 GB/s (624 nodes) Projected: Write 919 GB/s, Read 848 GB/s (768 nodes) Average Node: Write 1197 MB/s, Read 1104 MB/s 18

Time Lapse Build 19

Time Lapse Build 19

Questions? 20