New Storage System Solutions



Similar documents
An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

Architecting a High Performance Storage System

Lessons learned from parallel file system operation

NetApp High-Performance Computing Solution for Lustre: Solution Guide

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

Current Status of FEFS for the K computer

Cluster Implementation and Management; Scheduling


The Lustre File System. Eric Barton Lead Engineer, Lustre Group Sun Microsystems

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Cray DVS: Data Virtualization Service

LUSTRE FILE SYSTEM High-Performance Storage Architecture and Scalable Cluster File System White Paper December Abstract

High Performance Computing OpenStack Options. September 22, 2015

Introduction to Gluster. Versions 3.0.x

SMB Direct for SQL Server and Private Cloud

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Scala Storage Scale-Out Clustered Storage White Paper

Parallele Dateisysteme für Linux und Solaris. Roland Rambau Principal Engineer GSE Sun Microsystems GmbH

Inside The Lustre File System

High-Availability and Scalable Cluster-in-a-Box HPC Storage Solution

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

HP reference configuration for entry-level SAS Grid Manager solutions

Application Performance for High Performance Computing Environments

Parallels Cloud Storage

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

High Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility

Performance Analysis: Scale-Out File Server Cluster with Windows Server 2012 R2 Date: December 2014 Author: Mike Leone, ESG Lab Analyst

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Lustre: A Scalable, High-Performance File System Cluster File Systems, Inc.

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

Globus and the Centralized Research Data Infrastructure at CU Boulder

Lustre * Filesystem for Cloud and Hadoop *

Enabling High performance Big Data platform with RDMA

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

February, 2015 Bill Loewe

Monitoring Tools for Large Scale Systems

Cray Lustre File System Monitoring

Accelerating and Simplifying Apache

Quantum StorNext. Product Brief: Distributed LAN Client

HPC Advisory Council

Performance in a Gluster System. Versions 3.1.x

StorPool Distributed Storage Software Technical Overview

Multiple Public IPs (virtual service IPs) are supported either to cover multiple network segments or to increase network performance.

High Availability with Windows Server 2012 Release Candidate

(Scale Out NAS System)

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

HPC Update: Engagement Model

Mellanox Accelerated Storage Solutions

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building Enterprise-Class Storage Using 40GbE

Easier - Faster - Better

Software-defined Storage at the Speed of Flash

Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

HGST Virident Solutions 2.0

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Kriterien für ein PetaFlop System

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

The Panasas Parallel Storage Cluster. Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Dell Exchange 2013 Reference Architecture for 500 to 20,000 Microsoft Users. 1 Overview. Reliable and affordable storage for your business

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Scalable filesystems boosting Linux storage solutions

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division

Lustre SMB Gateway. Integrating Lustre with Windows

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Maxta Storage Platform Enterprise Storage Re-defined

RED HAT STORAGE SERVER TECHNICAL OVERVIEW

Software-defined Storage Architecture for Analytics Computing

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Microsoft Windows Server Hyper-V in a Flash

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Design and Evolution of the Apache Hadoop File System(HDFS)

Hadoop: Embracing future hardware

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Sun Constellation System: The Open Petascale Computing Architecture

Sonexion GridRAID Characteristics

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Microsoft Windows Server Hyper-V in a Flash

Isilon IQ Scale-out NAS for High-Performance Applications

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Panasas: High Performance Storage for the Engineering Workflow

SFA10K-X & SFA10K-E. ddn.com. Storage Fusion Architecture TM. DDN Whitepaper

The last 18 months. AutoScale. IaaS. BizTalk Services Hyper-V Disaster Recovery Support. Multi-Factor Auth. Hyper-V Recovery.

Lustre failover experience

Transcription:

New Storage System Solutions Craig Prescott Research Computing May 2, 2013

Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions?

Existing Storage Systems File System /home /scratch/hpc /project /lts /rlts Purpose Source code, Makefiles, etc. Job I/O Longer-term projects, low/ moderate job I/O Long-term storage Replicated Long-term storage

Requirements and Solutions } Nodes must share data over the network 1000+ compute nodes and 20k+ processors Large scalability, throughput, IOP, and capacity requirements } Resilience in the face of component failure RAID High-Availability } POSIX Semantics All nodes must maintain consistent views of a file's data and metadata Locking subsystem

Requirements and Solutions } Support for High-Performance RDMA networking } Only two choices Traditional Network File System (e.g. NFS) or Parallel File System Parallel file systems offer scalability } Solutions Good proprietary parallel solutions are available Expensive, vendor lock-in, reduced flexibility Open Source numerous possibilities UF is a member of OpenSFS Lustre proven track record, in-house expertise

Lustre What is it? } Open-source parallel file system implementation GPL POSIX compliant(*) } Horizontally scalable in capacity and throughput Additional storage and server nodes can be added online } Scales to very large numbers of clients } Files and Directories can be striped over multiple servers and storage devices

Lustre What is it? } Supports High-Performance RDMA networking devices RDMA Remote Direct Memory Access NIC transfers data directly to/from application memory Implemented in InfiniBand interconnect, widely used in HPC environments } Built-in High-Availability Support If a server goes down, another server takes over

Lustre Who uses it? } Largest market share in HPC systems Primary file system of the petascale club 50-70% of top 10, top 100, and top 500 entries at top500.org in recent years Some very large deployments many 1000s or tens of 1000s of compute nodes l Titan (ORNL), Blue Waters (NCSA), Sequoia (LLNL), Stampede (TACC), are examples of big places } Top-tier HPC vendors use Lustre Dell, DDN, Cray, Xyratex, Fujistu, HP, Bull,... } Minimal requirements - commodity compute and I/O hardware with Linux (any block storage device)

Lustre Basic Architecture Lustre Metadata Servers (MDS) Lustre Object Storage Servers (OSS) (100 s) Pool of metadata servers MDS 1 (active) MDS 2 (standby) OSS 1 Commodity Storage Servers IB Ethernet Other... OSS 2 Lustre Clients (10 s - 10,000 s) Simultaneous support of multiple network types OSS 3 Shared storage enables failover OSS OSS 4 Router GigE IB etc... OSS 5 = failover Router OSS 6 Enterprise-Class Storage Arrays & SAN Fabrics

Lustre Components - Object Storage Servers } Object Storage Server (OSS) Server node that hosts storage device(s) and NIC(s) Manages object data and attributes Manages RDMA/zero-copy from clients/ routers Can be configured in active-active mode for failover

Lustre Components - Object Storage Targets } Object Storage Target (OST) Formatted storage device hosted by OSS Ldiskfs ( ext4+ ) file system, zfs option soon on-disk format is hidden from clients Each OSS can have multiple OSTs RAID6 arrays are common OSTs are independent, thus enabling linear scaling Optimized for large I/O (1MiB client RPCs) Client writeback cache allows I/O aggregation

Lustre Components - Metadata Server & Target } Metadata Server (MDS) Server node with storage and NIC Manages namespace only filenames, attributes, and ACLs, no data Determines file layout on OSTs at creation time Default policies balance I/O and space usage over OSSs/OSTs } Metadata Target (MDT) Storage device hosted by MDS Ldiskfs backing file system Generally a RAID1+0 device

/scratch/lfs } 24 OSS nodes Active-Active 128 GB RAM per OSS (read cache) Six RAID6 ( 8+2 ) OSTs per OSS 1440 2TB drives total 2.1 Petabytes usable } ~50 GB/s streaming reads/writes observed } ~100k random IOPS for small I/Os } Highly-Available } FDR InfiniBand Interconnect (56Gb/s)

Historical Perspective } /scratch file systems through the years... Name Era # Servers # Disks Interconnect HA? /scratch/ufhpc 2005-2011 8 144 SDR IB (8Gbps) N /scratch/crn 2007-2011 2 144 SDR IB (8Gbps) N /scratch/hpc 2011-8 384(192) QDR IB (32Gbps) Y /scratch/lfs 2013-24 1440 FDR IB (54Gbps) Y

Questions? } Thank you!