Next Generation Tier 1 Storage



Similar documents
CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT

OSG Hadoop is packaged into rpms for SL4, SL5 by Caltech BeStMan, gridftp backend

Analisi di un servizio SRM: StoRM

IBM ELASTIC STORAGE SEAN LEE

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

Big data management with IBM General Parallel File System

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Four Reasons To Start Working With NFSv4.1 Now

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

Investigation of storage options for scientific computing on Grid and Cloud facilities

The BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

EMC IRODS RESOURCE DRIVERS

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Data Storage in Clouds

pnfs State of the Union FAST-11 BoF Sorin Faibish- EMC, Peter Honeyman - CITI

short introduction to linux high availability description of problem and solution possibilities linux tools

Data storage at CERN

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Engineering a NAS box

The DESY Big-Data Cloud System

Patrick Fuhrmann. The DESY Storage Cloud

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

Cloud Based Application Architectures using Smart Computing

Preview of a Novel Architecture for Large Scale Storage

WHITE PAPER. Get Ready for Big Data:

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

The dcache Storage Element

HTCondor at the RAL Tier-1

Lessons learned from parallel file system operation

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

Summer Student Project Report

Data storage services at CC-IN2P3

In Memory Accelerator for MongoDB

Managed GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

New Storage System Solutions

Survey of Technologies for Wide Area Distributed Storage

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

IBM System x GPFS Storage Server

A Survey of Shared File Systems

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

XtreemFS a Distributed File System for Grids and Clouds Mikael Högqvist, Björn Kolbeck Zuse Institute Berlin XtreemFS Mikael Högqvist/Björn Kolbeck 1

Diagram 1: Islands of storage across a digital broadcast workflow

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Selling Compellent NAS: File & Block Level in the Same System Chad Thibodeau

Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing

High Availability with Windows Server 2012 Release Candidate

Isilon OneFS. Version 7.2. OneFS Migration Tools Guide

Research Data Storage Infrastructure (RDSI) Project. DaSh Straw-Man

BabuDB: Fast and Efficient File System Metadata Storage

Isilon OneFS. Version OneFS Migration Tools Guide

Service Catalogue. virtual services, real results

Detailed Outline of Hadoop. Brian Bockelman

RAID Storage, Network File Systems, and DropBox

Data Sheet FUJITSU Storage ETERNUS CD10000

IBM PureApplication System for IBM WebSphere Application Server workloads

SMB in the Cloud David Disseldorp

PES. Batch virtualization and Cloud computing. Part 1: Batch virtualization. Batch virtualization and Cloud computing

Easier - Faster - Better

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

HADOOP, a newly emerged Java-based software framework, Hadoop Distributed File System for the Grid

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

AIX NFS Client Performance Improvements for Databases on NAS

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

LHC schedule: what does it imply for SRM deployment? CERN, July 2007

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Ceph. A file system a little bit different. Udo Seidel

White paper: Unlocking the potential of load testing to maximise ROI and reduce risk.

GraySort on Apache Spark by Databricks

Archival Storage At LANL Past, Present and Future

Transcription:

Next Generation Tier 1 Storage Shaun de Witt (STFC) With Contributions from: James Adams, Rob Appleyard, Ian Collier, Brian Davies, Matthew Viljoen HEPiX Beijing 16th October 2012

Why are we doing this? What were the evaluation criteria? Candidates Selections And omissions Tests Status Timeline Aim for production service for 2014 data run Introduction

Why are we doing this CASTOR is working well for us but: CASTOR optimised for many disk servers per pool Getting harder as cost optimal size is getting larger & we have many storage pools Scheduling overhead/ hot spotting TransferManager (LSF replacement) has improved this A LOT Oracle Costs! Nameserver is Single Point of Failure Not mountable file system Limits take-up outside of WLCG/HEP( Matters as we also use Castor for storage for other STFC user groups) Requires (quite a lot of) specific expertise Disk-only not very widely deployed now EOS (CERN ATLAS+CMS), DPM (ASGC) Could cause delays in support resolution Reduced community support Greater risk in meeting future requirements for disk-only

Mandatory : Must make more effective use of existing h/w Must not restrict hardware purchasing Must be able to use ANY commodity disk Must support end-to-end checksumming Must support ADLER32 checksums Must support xrootd and gridftp Must support X509 authentication Must support ACLs Must be scalable to 100s Petabytes Must scale to > 10 12 files Must at least match I/O performance of CASTOR Criteria

Desirable Should provide NFS mountable file system Should be resilient to hardware failure (disk/memory/node) Should support checksum algorithms other than ADLER32 Should be independent of licensed software Any required database should be supported by STFC DB Team Should provide an HTTP or WebDAV interface Should support SRM interface Should provide a POSIX interface and file protocol Should support username/password authentication Should support kerberos authentication Should support hot file or hot block replication Criteria

What else are we looking for? Draining Speed up deployment/decommissioning of new hardware Removal Great if you can get data in fast, but if deletes are slow... Directory entries We already know experiments have large numbers of files in each directory Need support for lots Support Should have good support and wider usage IPv6 support Not on roadmap for Castor RAL & Tier 1 starting to look at IPv6

Candidates HEP (ish) solutions dcache, DPM, STORM+Lustre, AFS Parallel and Scale-Out Solutions HDFS, OpenStack, OrangeFS, MooseFS, Ceph, Tahoe-LAFS, XtreemFS, GfamrFS, GlustreFS,GPFS Integrated Solutions IBM SONAS, Isilon, Panasas, DDN, NetApp Paper (twiki) based review carried out plus using anecdotal /reported experience where we know sites that run things in production (i.e. reports at HEPiX etc.)

Selection

Selections dcache Still has SPoFs CEPH Still under development; no lifecycle checksumming HDFS Partial POSIX support using FUSE orangefs No file replication, fault tolerant version in preparation Lustre Requires server kernel patch (although latest doesn t), no hot file replication?

Some rejections... DPM Well known and widely deployed within HEP No hot-file replication, SPoFs, NFS interface not yet stable Some questions about scalability AFS Massively scalable (>25k clients) Not deployed in high throughput production, cumbersome to administer, security GPFS Excellent performance To get all required features, locked into IBM hardware, licensed EOS Good performance, auto file replication Limited support

IOZone tests A-la disk server acceptance tests Read/Write throughput tests (sequential and parallel) File/gridFTP/xroot Deletions Draining Fault tolerance Rebuild times Tests

dcache Under deployment. Testing not yet started Lustre Deployed IOZone tests complete, functional tests ongoing OrangeFS Deployed IOZone tests complete, functional tests ongoing CEPH RPMs built, being deployed. HDFS Installation complete, Problems running IOZone tests, other tests ongoing Status

Results (Preliminary) 2000 Write Time for 2GB Test File Transfer Time (secs) 1500 1000 500 Lustre OrangeFS Castor 0 1 5 10 15 20 25 30 35 40 45 50 Number of Parallel Write Operations Note Lustre hitting a wall doing parallel writes Not entirely understood yet (have some hints from other sites Assume this is a setup/tuning issue Exactly the kind of thing that could disqualify it if we can t fix

Results (Preliminary) Write Rates for 2GB Source File 120 Transfer Rate (MB/sec) 90 60 30 Lustre OrangeFS Castor 0 1 5 10 15 20 25 30 35 40 45 50 Number of parallel write operations

Results (Preliminary) Read Time for 2GB Source File 4000 Transfer Time (sec) 3000 2000 1000 Lustre OrangeFS Castor 0 1 5 10 15 20 25 30 35 40 45 50 Number of parallel read operations

Results (Preliminary) Read Rate for 2GB Source Files 120 Transfer Rate (MB/sec) 90 60 30 Lustre OrangeFS Castor 0 1 5 10 15 20 25 30 35 40 45 50 Number of Concurrent Read Operations

Very Provisional so far Castor rather well tuned at RAL Lustre & OrangeFS hardly tuned Non-binding summary OrangeFS & Ceph look promising in long term but immature dcache surely could do most of what we need Still file based Lustre Promising Like that it is block based Like no SPoFs Stable Could live with occasional downtimes for upgrades Results

Timeline Provisional Dec 2012 Final Selection (primary and reserve) Mar 2013 Pre-Production Instance available WAN testing and tuning Jun 2013 Production Instance Depends on... Hardware availability Quattor Profiling - configurartion should be less complex than CASTOR Results from test instances Open Questions: One large instance or multiple smaller ones as now? Large instance with dynamic quotas has some attractions Migration from CASTOR Could just use lifetime of hardware