EOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. eeb@whamcloud.com. 2011 Whamcloud, Inc.

Similar documents
DAOS An Architecture for Extreme Scale Storage. Eric Barton High Performance Data Division Intel


ETERNUS CS High End Unified Data Protection

In Memory Accelerator for MongoDB

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

The Use of Flash in Large-Scale Storage Systems.

Lustre * Filesystem for Cloud and Hadoop *

Design and Evolution of the Apache Hadoop File System(HDFS)

Practical Applications of Lustre/ZFS Hybrid Systems LUG 2014 Miami FL

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

High Performance Computing OpenStack Options. September 22, 2015

ovirt and Gluster Hyperconvergence

Scala Storage Scale-Out Clustered Storage White Paper

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

PARALLELS CLOUD STORAGE

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

Long term retention and archiving the challenges and the solution

Monitoring Tools for Large Scale Systems

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Trends in Enterprise Backup Deduplication

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

New Storage System Solutions

Architecting a High Performance Storage System

Traditional v/s CONVRGD

Open Source, Scale-out clustered NAS using nfs-ganesha and GlusterFS

Big Fast Data Hadoop acceleration with Flash. June 2013

Amazon EC2 Product Details Page 1 of 5

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Storage Design for High Capacity and Long Term Storage. DLF Spring Forum, Raleigh, NC May 6, Balancing Cost, Complexity, and Fault Tolerance

IBM ELASTIC STORAGE SEAN LEE

The Design and Implementation of the Zetta Storage Service. October 27, 2009

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Cloud Based Application Architectures using Smart Computing

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Hypertable Goes Realtime at Baidu. Yang Dong Sherlock Yang(

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

NetApp High-Performance Computing Solution for Lustre: Solution Guide

Dell Converged Infrastructure

Storage Architectures for Big Data in the Cloud

Netapp HPC Solution for Lustre. Rich Fenton UK Solutions Architect

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

High Availability with Windows Server 2012 Release Candidate

SoftLayer Fundamentals. Storage and Backup. August, 2014

Hitachi Content Platform. Andrej Gursky, Solutions Consultant May 2015

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

Data Centric Systems (DCS)

Current Status of FEFS for the K computer

The BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

Hadoop: Embracing future hardware

HPC data becomes Big Data. Peter Braam

Configuring Apache Derby for Performance and Durability Olav Sandstå

Understanding Microsoft Storage Spaces

GeoGrid Project and Experiences with Hadoop

HRG Assessment: Stratus everrun Enterprise

Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation

ClearPath Storage Update Data Domain on ClearPath MCP

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

The Data Placement Challenge

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

EMC IRODS RESOURCE DRIVERS

Weaving Stored Procedures into Java at Zalando

Getting the Most Out of VMware Mirage with Hitachi Unified Storage and Hitachi NAS Platform WHITE PAPER

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Practices on Lustre File-level RAID

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

Certified Big Data and Apache Hadoop Developer VS-1221

Scale out NAS on the outside, Object storage on the inside

Application Performance for High Performance Computing Environments

Sep 23, OSBCONF 2014 Cloud backup with Bareos

Four Reasons To Start Working With NFSv4.1 Now

Windows Server 2008 Essentials. Installation, Deployment and Management

10th TF-Storage Meeting

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011


A Virtual Filer for VMware s Virtual SAN A Maginatics and VMware Joint Partner Brief

(Scale Out NAS System)

Accelerating and Simplifying Apache

Building Storage Service in a Private Cloud

Hadoop on the Gordon Data Intensive Cluster

Distributed File Systems An Overview. Nürnberg, Dr. Christian Boehme, GWDG

The Methodology Behind the Dell SQL Server Advisor Tool

Violin: A Framework for Extensible Block-level Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

HIGHLY AVAILABLE MULTI-DATA CENTER WINDOWS SERVER SOLUTIONS USING EMC VPLEX METRO AND SANBOLIC MELIO 2010

Inside Lustre HSM. An introduction to the newly HSM-enabled Lustre 2.5.x parallel file system. Torben Kling Petersen, PhD.

Overview: X5 Generation Database Machines

Google File System. Web and scalability

Transcription:

EOFS Workshop Paris Sept, 2011 Lustre at exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com

Agenda Forces at work in exascale I/O Technology drivers I/O requirements Software engineering issues Proposed exascale I/O model Filesystem Application I/O Components 3 EOFS Lustre Workshop - Paris, Sept 2011

Exascale I/O technology drivers Node Concurrency: O(1,000) Memory bandwidth: ~500GB/s Comms bandwidth: ~50GB/s System Compute nodes: O(100,000) Total concurrency: O(100,000,000) Total memory: ~10PB System MTTI: O(1 day) Storage nodes: O(1,000) Total storage: ~500PB Total I/O bandwidth: ~50TB/s 4 EOFS Lustre Workshop - Paris, Sept 2011

Exascale I/O technology drivers Local storage Solid state e.g. NVRAM on all compute nodes High cross-sectional bandwidth to compute nodes Low capacity Global storage Predominantly disk Relatively low network bandwidth available site-wide Data High capacity Streaming I/O Metadata Low capacity Random I/O 5 EOFS Lustre Workshop - Paris, Sept 2011

Exascale I/O technology drivers (Meta)data explosion Many billions of entities Mesh elements Graph nodes Timesteps Complex relationships Uncertainty quantification (UQ) 1,000s of simulation runs OODB Query / Search What did my simulation(s) discover? Schemas / Inheritance Data provenance / quality Storage Management Migration / Archive 6 EOFS Lustre Workshop - Paris, Sept 2011

Exascale I/O requirements Concurrency 100,000s of concurrent I/O streams Dictated by # of compute nodes Object I/O Laminar data flow by definition I/O forwarders Limit server fan-out to O(1,000) Multiple complex schemas Multiple application domains Different schemas Not suitable for kernel development Much harder to debug Search Non-resident index traversal Random reads IOPS limited Move search closer to the storage 7 EOFS Lustre Workshop - Paris, Sept 2011

Exascale I/O requirements Filesystem integrity Foundational component of system resilience End-to-end storage system data and metadata integrity Requirement not just a feature Balanced fault handling strategies Transactional models Filesystem always exists in a defined state Immediate cleanup on failure Scrubbing Repair minor damage that can persist for days 8 EOFS Lustre Workshop - Paris, Sept 2011

Exascale I/O requirements Global v. local storage Global looks like a filesystem Local looks like Cache / storage tier How transparent? Something more application specific? Automated / policy driven migration Pre-staging from global F/S Post-writeback to global F/S Fault isolation Transactionality Idempotence Performance isolation Qos requirements for global F/S 9 EOFS Lustre Workshop - Paris, Sept 2011

Software engineering Stabilization effort required non-trivial Expensive/scarce scale development and test resources Build on existing components when possible LNET (network abstraction), OSD API (backend storage abstraction) Implement new subsystems when required Distributed Application Object Storage (DAOS) Layered model Core features Simple but powerful Middleware Puts complexity where it s easiest to debug Supports different application domains Tools Generic data management and administration Application domain specific browsers and query engines 10 EOFS Lustre Workshop - Paris, Sept 2011

Exascale filesystem Conventional namespace Works at human scale Administration, security, accounting Supports legacy data and applications DAOS Containers Work at exascale Embedded in conventional namespace Provide storage for application domain specific object schemas (data + metadata) Storage pools Administration and accounting Data management /projects /Legacy /HPC /BigData Posix striped file a b c a b c a b c a Simulation data OODB OODB OODB OODB OODB metadata data data data data data data data data data data data data data data data MapReduce data Blocksequence data data data data data data data data data 11 EOFS Lustre Workshop - Paris, Sept 2011

Kernel Userspace I/O stack Application Query Tools DAOS Containers Application data and metadata Object resilience N-way mirrors / RAID6 Data management Migration over pools / between containers 10s of billions of objects distributed over thousands of OSSs Share-nothing create/destroy, read/write Millions of application threads ACID transactions on objects and containers Defined state on any/all combinations of failures No scanning on recovery DAOS API Simple userspace export of DAOS functionality Application domain API DAOS API Storage 12 EOFS Lustre Workshop - Paris, Sept 2011

Kernel Userspace I/O stack Application Query Tools Application domain API Userspace Easier development and debug Application domain libraries Domain-specific API style Collective / independent Transaction model OODB, Hadoop, HDF5, Posix Implementation policy Object aggregation/replication/migration Applications and tools Backup and restore Query, search and analysis Data browsers, visualisers and editors General purpose or application specific according to target APIs DAOS API Storage 13 EOFS Lustre Workshop - Paris, Sept 2011

Core components Unified storage pools Combine MDT/OST functionality, administer by usage Health network Timely failure notification & server collectives Userspace DLM API Locking support for schema implementations DAOS Sharded object index Accounting stats aggregation Collective open Container merge/split/duplicate/migrate Userspace API 14 EOFS Lustre Workshop - Paris, Sept 2011

Libraries and tools Schemas OODB Generic base model Multiple application domains CFD / genome / Oil and gas Hadoop HDF5 / Exa HDF5 DAOS browser / editor / debugger Generic + schema-specific Server-side query runtime VM sandbox to contain scheduler + query engines 15 EOFS Lustre Workshop - Paris, Sept 2011

Exascale I/O stack App Query/ Tools UQ Tools Query/ Tools App App Query/ Tools Pscale App Map/Reduce Applications And Tools Application domain API Application domain API I/O Libraries OODB Exa HDF5 HDF5 HDFS Distributed Application Object Storage Storage API Global Storage Local Storage Storage Subsystems 16 EOFS Lustre Workshop - Paris, Sept 2011

Thank You Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com