Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT

Similar documents

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Next Generation Tier 1 Storage

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Alexandria Overview. Sept 4, 2015

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

CERNBox + EOS: Cloud Storage for Science

OSG Hadoop is packaged into rpms for SL4, SL5 by Caltech BeStMan, gridftp backend

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

Optimize the execution of local physics analysis workflows using Hadoop

Aspera Direct-to-Cloud Storage WHITE PAPER

High Availability Databases based on Oracle 10g RAC on Linux

Cost effective methods of test environment management. Prabhu Meruga Director - Solution Engineering 16 th July SCQAA Irvine, CA

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

PARALLELS CLOUD STORAGE

Parallels Cloud Storage

Hyper-converged IT drives: - TCO cost savings - data protection - amazing operational excellence

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

The Design and Implementation of the Zetta Storage Service. October 27, 2009

Investigation of storage options for scientific computing on Grid and Cloud facilities

Techniques for implementing & running robust and reliable DB-centric Grid Applications

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

Storage Architectures for Big Data in the Cloud

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Data Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information

Distributed Database Access in the LHC Computing Grid with CORAL

Long term retention and archiving the challenges and the solution

Tier0 plans and security and backup policy proposals

Analisi di un servizio SRM: StoRM

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

POSIX and Object Distributed Storage Systems

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

Hadoop: Embracing future hardware

SMB Direct for SQL Server and Private Cloud

Network Attached Storage. Jinfeng Yang Oct/19/2015

Data storage services at CC-IN2P3

Computing at the HL-LHC

Summer Student Project Report

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Report from SARA/NIKHEF T1 and associated T2s

White Paper. Recording Server Virtualization

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

The OpenStack TM Object Storage system

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

The dcache Storage Element

HTCondor at the RAL Tier-1

Archiving On-Premise and in the Cloud. March 2015

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Mass Storage System for Disk and Tape resources at the Tier1.

Status and Evolution of ATLAS Workload Management System PanDA

High Performance Computing OpenStack Options. September 22, 2015

EMC IRODS RESOURCE DRIVERS

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

StorPool Distributed Storage Software Technical Overview

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

AIX NFS Client Performance Improvements for Databases on NAS

Data storage at CERN

Preview of a Novel Architecture for Large Scale Storage

(Scale Out NAS System)

Lustre * Filesystem for Cloud and Hadoop *

Cloud Based Application Architectures using Smart Computing

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

How AWS Pricing Works

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Diagram 1: Islands of storage across a digital broadcast workflow

Designing a Cloud Storage System

How To Virtualize A Storage Area Network (San) With Virtualization

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Key Messages of Enterprise Cluster NAS Huawei OceanStor N8500

EMC AVAMAR. a reason for Cloud. Deduplication backup software Replication for Disaster Recovery

LHC schedule: what does it imply for SRM deployment? CERN, July 2007

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

<Insert Picture Here> Cloud Archive Trends and Challenges PASIG Winter 2012

Transcription:

SS Data & Storage Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT (with slides from Andreas Peters and Jan Iven) 5th International Conference "Distributed Computing and Grid-technologies in Science and Education" July 16-21, 2012 Dubna, Russia

SS Outline CERN storage strategy Disk vs Archive decoupling EOS deployment status at CERN Recent developments for Castor and EOS Cloud storage evaluation 2 Why cloud storage and protocols? Test plans for two S3 implementations OpenStack/Swift Openlab collaboration with Huawei Preliminary test results

SS Model change from HSM to more loosely coupled Data Tiers 3

SS EOS Design Targets Project start: April 2010 Initial focus: user analysis at CERN many individual users with chaotic work patterns many small output files, large shared read-only input files often only partial file access many file seeks over uninteresting input events or branches Using xroot as client server framework with an in-memory name space (no DB) availability via file-level replication (configurable) reduce operational effort at large volume scale 4

SS 5

SS Main Choices taken in EOS 6

SS EOS Deployment Status Today! 10000 5000 0 2011$07 2011$11 2012$03 72011$05 2011$09 2012$01 ATL AS $E OS ATL AS $ CAS TOR Installation for LHCb still under test! 8000 6000 4000 2000 0 2011$08 2012$02 2011$05 2011$11 CMS$ EOS CMS$ CASTOR

SS Deployment & Support Evolution Operation FTEs EOS 1.5 CASTOR 3.3 Similar number of hardware exceptions spike: doublecounting of low level errors in EOS later correlated in one ticket Total operator tickets stayed stable Significant decline of user tickets (red bars) 8 Incidents 2000 1500 1000 500 0 Aug$11 Jun$11 250 200 150 100 50 0 May Apr Jun IT C M$opera tor$tic kets Jul Oct$11 Dec$11 Feb$12 S ep Nov Jan Mar May Aug Oct D ec F eb Apr EOSALICE EOSCMS EOSATLAS C2PUBLIC C2LHCB C2CMS C2ATLAS C2ALICE automatic user+ggus

SS CASTOR developments Successful migration from LSF-based I/O scheduling to new TransferManager removing the previous access rate limitation for scheduled access in CASTOR reducing the deployment complexity of the CASTOR LSF setup at CERN reducing license costs for all CASTOR sites Tested in close collaboration with ATLAS as first user of the new system Deployed first at CERN and ASGC recently also at RAL Tier-1 Relatively smooth migration given the key role of this component in CASTOR 9

DSS CASTOR Tape highlights CH-1211 Geneva 23 http://cern.ch/it 62PB on tape, 52K tapes, 9 libraries, 80 production drives (+20 legacy) Beta-tested, validated and deployed IBM TS1140 (4TB) and Oracle T10000C (5TB) drives Boosted write tape speed writing by developing and deploying buffered tape marks (avoiding head repositioning) factor 10x achieved in 1 year Introduced traffic lights and bus lanes for prioritising bulk read requests, reducing tape mounts by ~50% Investigating suitability of commodity equipment (aka LTO) Active verification of archive contents by re-reading tapes and comparing checksums All newly filled tapes dusty (not recently mounted) tapes Cf posters: 415 (S. Murray) and 247 (G. Cancio) CHEP2012 - CERN IT physics storage - 16

SS EOS - Recent Developments Recently released EOS 0.2.2-1 GPL license Update to recent xroot 3.2 stress testing new xroot client inside the EOS setup Monitoring additional metrics and aggregates traffic stats by application or domain UDP fan out in JSON format eg popularity service Support for electronic operator s log book to annotate admin commands for later reference Import/export handler 11 low overhead import/export via xroot, {S3}

SS EOS Recent Developments Additional Redundancy Options Goal: less expensive disk archive storage with EOS distribute data chunks with checksums to many disks add redundancy blocks Redundant Array of Inexpensive Nodes (RAIN) File B1 B2 B3 B4 B5 B6 B7 B8 12 CR C 32 C Head B1 B5 CR C 32 C Head Head Head CR CR B2 B3 C B4 C 32 32 C C B6 B7 B8 CR C 32 C Head P1 P3 CR C 32 C Head P2 P4

SS Checksumming Performance comparing different algorithms -> CRC32C chosen measured on 8-Core Xeon 2.27 GHz 13 12x4 GB, DDR3 1 GHz RAM Linear scaling with # cores

SS Integration of Tuneable Redundancy EOS is based on XRootD as framework XRootD Client File Layout Plugin IO Scheme plug-in for implementation of file and directory IO interfaces already exist XRootD Server XRootD Server XR file layout plug-in on either client or server side provides access to distributed file fragments investigating impact of both options together with new encoding schemes Storage OFS Plugin File Layout Plugin 4k Block CheckSum CRC32C XFS Controller Storage OFS Plugin File Layout Plugin 4k Block CheckSum CRC32C XFS Controller HDD HDD 14

SS Measured Upload Performance File Upload Benchmark using eoscp RAIN IO driver and localhost XrootD server parallel threads (n,k) Reed-Solomon Result 15 encoding / decoding can run within available CPU budget without significant impact on throughput

SS Data & Storage Cloud Storage Evaluation

SS What is cloud storage anyway? Storage used by jobs running in a computational cloud network latency impacts what is usable depending on type of applications Storage build from (computational) cloud node resources storage life time = life time of node Storage service by commercial (or private) providers exploiting similar scaling concepts as computational clouds clustered storage with remote access with cloud protocol and modified storage semantic Term may be valid for all of the above but here I will refer to the last group of storage solutions 17

SS Why Cloud Storage & S3 Protocol? Cloud computing and storage gain rapidly in popularity Both as private infrastructure and as commercial service Several investigations are taking place also in the HEP and broader science community Price point of commercial offerings may not (yet?) be comparable with services we run at CERN or WLCG sites, but Changes in semantics, protocols, deployment model promise increased scalability with reduced deployment complexity (TCO) Market is growing rapidly and we need to understand if promises can be confirmed with HEP work loads Need to understand how cloud storage will integrate with (or change) current HEP computing models 18

SS S3 Semantics and Protocol Simple Storage Service (Amazon S3) just a storage service in contrast to eg Hadoop, which comes with a distributed computation model exploiting data locality uses a language independent REST API http(s) for transport Provide additional scalability by focussing on a defined subset of posix functionality partitioning of namespace into independent buckets S3 protocol alone can not provide scalability eg if added on top of a traditional storage system Scalability gains need to be proven for each S3 implementation 19

SS CERN Interest in Cloud Storage Main Interest: Scalability and TCO can we run cloud storage systems to complement or consolidate existing storage services? Focus: storage for physics and infrastructure analysis disk pools, home directories, virtual machine image storage Which areas of the storage phase space can be covered well? First steps: 20 setup and run a cloud storage service of PB scale confirm scalability and/or deployment gains

SS Potential Interest for WLCG S3 Protocol could be a standard interface for access, placement or federation of physics data Allowing to provide (or buy) storage services without change to user application large sites may provide private clouds storage on acquired hardware smaller sites may buy S3 or rent capacity on demand First Steps 21 successful deployment at one site (eg CERN) demonstrate data distribution across sites (S3 implementations) according to experiment computing models

Component Layering in current SEs User Protocol Layer local & WAN efficiency, federation support, identity & role mapping random client I/O sequential p-2-p put / get Grid to Local User & Role mapping DM Admin Access rfio/xroot gridftp SRM xroot gridftp Bestman Cluster Layer scaling for large numbers of concurrent clients Clustering & Scheduling {replicated} File Cluster Catalogue & Meta Data DB CASTOR / DPM EOS {reliable} Media Raw Media {reliable} Media Raw Media {reliable} Media Raw Media (RAIDed) disk servers JBOD Media Layer Stable manageable storage, scaling in volume per $ (including ops effort) 22

Poten5al Future Scenarios integrate cloud market buy or build and integrate via standard Protocol Layer xroot http S3 Cluster Layer EOS Cloud Storage Media Layer Cloud Storage need to prove S3 TCO gains S3 alone functionally sufficient? 23

SS Common Work Items Common items for OpenStack/Swift and Huawei systems Define the main I/O pattern of interest based on measured I/O patterns in current storage services (eg archive, analysis pool, home dir, grid home?) Define implement and test a S3 functionality test define S3 API areas of main importance develop a S3 stress / scalability test scale up to several hundred concurrent clients (planned for August) copy-local and remote access scenarios Define key operational use cases and classify human effort and resulting service unavailability add remove disk servers (incl. draining) h/w intervention by vendor (does not apply to Huawei) s/w upgrade power outage Compare performance and TCO metrics to existing 24services at CERN

SS Work Items - Huawei Appliance Support commissioning of Huawei storage system 0.8 PB system in place at CERN Share CERN test suite and results with Huawei Tests (including ROOT based ones) are regular being run by Huawei development team Perform continuous functional and performance tests to validate new Huawei s/w releases maintain a list of unresolved operational or performance issues Schedule and execute large scale stress test 26 S3 test suite Hammercloud with experiment applications

SS OpenStack/SWIFT Actively participate in fixing of CERN issues with released software build up internal knowledge about SWIFT implementation and defects probe our ability to contribute and influence the release content from the OpenStack foundation Run the same functionality and stress tests as for the Huawei system Visit larger scale SWIFT deployments to get in-depth knowledge about level of overlap between open software and in-house deployed additions / improvements compare I/O pattern in typical deployments with CERN services 27

SS Ongoing tests data up/download with 4kB to 100MB files extending to larger files 1GB, 10GB to investigate impact of data splitting on the storage side multiple buckets confirm the scale-out via independent meta data number of clients tested up to 200 concurrent clients (limited by client h/w availability) multi byte-range read (vector read) extended ROOT S3 plugin to support this fill 2 * 10 GB network connection 28 graceful behaviour in overload conditions?

SS Preliminary Results OpenStack/Swift and Huawei reach similar (10-20% less) performance as EOS for full file access for small to moderate number of clients (O(100)) Analysis type access using the ROOT S3 plugin naive use (no TTreeCache) of both S3 implementations shows significant overhead with enabled cache and vector read this overhead is removed S3fs (= fuse mounted S3 storage) almost reaches the same performance for jobs accessing 10-100% of a file assuming that local cache space (/tmp) is available Authentication and authorisation not yet mapped from certificates used in WLCG Plan to publish a more quantitative comparison at autumn HEPiX 29

SS Summary Strategy of decoupling Archive from Disk storage has been implemented at CERN Reducing the total deployment effort and the interference impact for experiment users CASTOR: tape improvements and simplified scheduling further consolidated archive area EOS: automation of deployment tasks and available redundancy options are being completed Cloud storage evaluation with two S3 30 implementation has started at CERN this year Performance of local S3 based storage looks so far comparable to current production system WAN replication and O(1000) client tests are coming up Realistic TCO estimation can not yet be done in a small (1PB) test system w/o real users access