Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier



Similar documents
DSS. Data & Storage Services. Cloud storage performance and first experience from prototype services at CERN

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

Improved metrics collection and correlation for the CERN cloud storage test framework

files without borders

HEP Compu*ng in a Context- Aware Cloud Environment

DSS. The Data Storage Services (DSS) Strategy at CERN. Jakub T. Moscicki. (Input from J. Iven, M. Lamanna A. Pace, A. Peters and A.

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Evolution of Database Replication Technologies for WLCG

Shoal: IaaS Cloud Cache Publisher

Software installation and condition data distribution via CernVM File System in ATLAS

IPv6 Traffic Analysis and Storage

Aspera Direct-to-Cloud Storage WHITE PAPER

Solution for private cloud computing

Building low cost disk storage with Ceph and OpenStack Swift

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

The Evolution of Cloud Computing in ATLAS

Data storage services at CC-IN2P3

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Potential of Virtualization Technology for Long-term Data Preservation

Computing Model for SuperBelle

Object storage in Cloud Computing and Embedded Processing

Distributed Database Access in the LHC Computing Grid with CORAL

Status and Evolution of ATLAS Workload Management System PanDA

Cloud services for the Fermilab scientific stakeholders

Big Data and Storage Management at the Large Hadron Collider

SUPPORT FOR CMS EXPERIMENT AT TIER1 CENTER IN GERMANY

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Technical Overview Simple, Scalable, Object Storage Software

The dcache Storage Element

Next Generation Operating Systems

Building a Volunteer Cloud

NT1: An example for future EISCAT_3D data centre and archiving?

Scientific Computing Data Management Visions

Aspera Direct-to-Cloud Storage WHITE PAPER

Cloud Accounting. Laurence Field IT/SDC 22/05/2014

HPC on AWS. Hiroshi Kobayashi, Dev./Lab. IT System HGST Japan, Ltd. Jun 3, 2015

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Testing & Assuring Mobile End User Experience Before Production. Neotys

Maximizing Data Value with HUAWEI OceanStor. Success Cases

Establishing Applicability of SSDs to LHC Tier-2 Hardware Configuration

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Virtualization. (and cloud computing at CERN)

Integration of Network Performance Monitoring Data at FTS3

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Prototyping a file sharing and synchronisation platform with owncloud

Moving Virtual Storage to the Cloud

Evolution of the Italian Tier1 (INFN-T1) Umea, May 2009

Amazon Cloud Storage Options

Investigation of storage options for scientific computing on Grid and Cloud facilities

Computing at the HL-LHC

The safer, easier way to help you pass any IT exams. Exam : E Backup Recovery - Avamar Expert Exam for Implementation Engineers.

Network Infrastructure Services CS848 Project

ATLAS Tier 3

EMC ISILON X-SERIES. Specifications. EMC Isilon X200. EMC Isilon X210. EMC Isilon X410 ARCHITECTURE

Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version

Data and Storage Services

StorReduce Technical White Paper Cloud-based Data Deduplication

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

dcache, list of topics

Maginatics Cloud Storage Platform for Elastic NAS Workloads

Data sharing and Big Data in the physical sciences. 2 October 2015

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Self service for software development tools

Archiving On-Premise and in the Cloud. March 2015

Building a Parallel Cloud Storage System using OpenStack s Swift Object Store and Transformative Parallel I/O

Chapter 12 Distributed Storage

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Analisi di un servizio SRM: StoRM

Current Status of FEFS for the K computer

Enterprise Applications

CERN s Scientific Programme and the need for computing resources

ITCertMaster. Safe, simple and fast. 100% Pass guarantee! IT Certification Guaranteed, The Easy Way!

The Resilient Smart Grid Workshop Network-based Data Service

Status of Grid Activities in Pakistan. FAWAD SAEED National Centre For Physics, Pakistan

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Experience in integrating enduser cloud storage for CMS Analysis

Scala Storage Scale-Out Clustered Storage White Paper

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

Key Messages of Enterprise Cluster NAS Huawei OceanStor N8500

Creating a Cloud Backup Service. Deon George

EMC IRODS RESOURCE DRIVERS

Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb

Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems

Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil

WOS OBJECT STORAGE PRODUCT BROCHURE DDN.COM Full Spectrum Object Storage

Hadoop Architecture. Part 1

Storage Options in the AWS Cloud

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

A Survey on Cloud Storage Systems

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Open access to data and analysis tools from the CMS experiment at the LHC

Building Cost-Effective Storage Clouds A Metrics-based Approach

Transcription:

Using S3 cloud storage with ROOT and CernVMFS Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

INDEX Huawei cloud storages at CERN Old vs. new Huawei UDS comparative Multi-client ROOT benchmarks Real CernVMFS application Summary 14/04/2015 Maria Arsuaga-Rios CERN openlab 2

NEW HUAWEI UDS GENERATION Old generation 384 disks 768 TB 7 front-end nodes S3 multipart uploads not supported New generation 300 disks 1.2 PB 4 front-end nodes Improve S3 support: multipart uploads Less rack space More compact storage nodes 14/04/2015 Maria Arsuaga-Rios CERN openlab 3

Throughput tests 100MB downloads 100MB uploads Metadata tests 4kB downloads 4kB uploads OLD vs. NEW HUAWEI UDS COMPARATIVE 14/04/2015 Maria Arsuaga-Rios CERN openlab 4

OLD vs. NEW HUAWEI UDS Throughput Download Tests Download speed [GB/s] 3 2.5 2 1.5 1 0.5 1 front-end 2 front-ends 3 front-ends 4 front-ends 5 front-ends 6 front-ends 7 front-ends Full network bandwidth = 20Gb/s Front-ends scale up Old UDS Overstressed 0 0 20 40 60 80 100 120 140 160 180 200 220 Concurrent clients [count] 14/04/2015 Maria Arsuaga-Rios CERN openlab 5

Download speed [GB/s] 3 2.5 2 1.5 1 0.5 1 front-end 2 front-ends 3 front-ends 4 front-ends OLD vs. NEW HUAWEI UDS Throughput Download Tests New UDS Full network bandwidth = 20Gb/s Front-ends scale up 0 0 20 40 60 80 100 120 140 160 180 200 220 Concurrent clients [count] 14/04/2015 Maria Arsuaga-Rios CERN openlab 6

OLD vs. NEW HUAWEI UDS Metadata Download Tests Download speed [1,000 files/s] 30 1 front-end 2 front-ends 25 20 15 10 5 3 front-ends 4 front-ends 5 front-ends 6 front-ends 7 front-ends Old UDS Front-ends scale up 0 0 200 400 600 800 1000 1200 1400 1600 Concurrent clients [count] 14/04/2015 Maria Arsuaga-Rios CERN openlab 7

Download speed [1,000 files/s] 30 25 20 15 10 5 1 front-end 2 front-ends 3 front-ends 4 front-ends OLD vs. NEW HUAWEI UDS Metadata Download Tests New UDS Front-ends scale up 0 0 200 400 600 800 1000 1200 1400 1600 Concurrent clients [count] 14/04/2015 Maria Arsuaga-Rios CERN openlab 8

MULTI-CLIENT ROOT BENCHMARKS Full parametric Heterogeneous clients Access protocol compatibility Multi-client ROOT benchmarks Offline analysis Concurrent client accesses to real ATLAS ROOT files Sparseness, entries accessed 100%, 0.1%, 0.001% 2 use cases: 1 bucket 64 buckets Tested only in UDSs Ceph and Amazon do not support multi-range requests 14/04/2015 Maria Arsuaga-Rios CERN openlab 9

MULTI-CLIENT ROOT BENCHMARKS % Entries Total Entries = 11918 Branches = 5860 Size = 760 MB Total Entries Read Calls MB read NTuple ATLAS W and Z bosons Analysis 100% 11918 23 758.42 0.1% 12 12 320.92 0.001% 1 5 42.08 14/04/2015 Maria Arsuaga-Rios CERN openlab 10

MULTI-CLIENT ROOT BENCHMARKS 1 Bucket Old UDS New UDS Full network bandwidth =20Gb/s Does not scale as expected Scales up until too many clients access the same small amount of data! 14/04/2015 Maria Arsuaga-Rios CERN openlab 11

MULTI-CLIENT ROOT BENCHMARKS 64 Buckets Old UDS New UDS Full network bandwidth =20Gb/s Full network bandwidth =20Gb/s 14/04/2015 Maria Arsuaga-Rios CERN openlab 12

MULTI-CLIENT ROOT BENCHMARKS Summary Full bandwidth is reached by executing sparse accesses with both UDS generations UDS supports multi-range get requests which are common in HEP analysis E.g. Ceph and Amazon S3 do not support multi-range requests 14/04/2015 Maria Arsuaga-Rios CERN openlab 13

REAL CernVMFS APPLICATION Introduction What is CernVMFS Read-only cached file system to deliver software Widely used in WLCG (Worldwide LHC Computing Grid) Mounted by users and files are downloaded on demand CernVMFS back-end challenges Publishing new software should be as fast as possible Files are accessed through the HTTP 14/04/2015 Maria Arsuaga-Rios CERN openlab 14

REAL CernVMFS APPLICATION S3 Backend New features to CernVMFS (release 2.1.20) S3 compatible storage Supports multiple buckets and accounts LHCb nightly builds test case LHCb software (binary and source files) Releases done daily for over 3 months ~1 million files in release (~150k new files) Completed successfully with UDSs and Ceph 14/04/2015 Maria Arsuaga-Rios CERN openlab 15

REAL CernVMFS APPLICATION Deletion Process Deletions should not affect the cloud storage functionality Old UDS does not handle deletions correctly New UDS is fast enough to keep up with average LHCb nightly builds repository size growth Old versions can be deleted to make space for new ones 14/04/2015 Maria Arsuaga-Rios CERN openlab 16

SUMMARY Scalable throughput and metadata performance verified Successful scalability with different ROOT access patterns LHCb nightly builds software stack distributed with CernVMFS using UDS backend 14/04/2015 Maria Arsuaga-Rios CERN openlab 17

THANK YOU 14/04/2015 Maria Arsuaga-Rios CERN openlab 18