The DESY Big-Data Cloud Service



Similar documents
The DESY Big-Data Cloud System

How To Share Data With The Cloud On Dcache

Patrick Fuhrmann. The DESY Storage Cloud

The dcache Storage Element

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

DESYcloud: an owncloud & dcache update

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

Preview of a Novel Architecture for Large Scale Storage

Managed GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Next Generation Tier 1 Storage

SC09 Tutorial M06 Cluster Construc5on Tutorial

Leveraging the Hybrid Cloud For Complete Data Protec:on. Private Public Managed

Data storage services at CC-IN2P3

IBM ELASTIC STORAGE SEAN LEE

VoIP Security How to prevent eavesdropping on VoIP conversa8ons. Dmitry Dessiatnikov

Mass Storage at GridKa

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Clusters in the Cloud

NERSC Archival Storage: Best Practices

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Zadara Storage Cloud A

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Cloudian The Storage Evolution to the Cloud.. Cloudian Inc. Pre Sales Engineering

SURFsara Data Services

Understanding Object Storage and How to Use It

Secure Hybrid Cloud Infrastructure for Scien5fic Applica5ons

Using Ac+ve Directory and LDAP for directory management kept in sync

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Best Prac*ces for Deploying Oracle So6ware on Virtual Compute Appliance

KIT Site Report. Andreas Petzold. STEINBUCH CENTRE FOR COMPUTING - SCC

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Experience in integrating enduser cloud storage for CMS Analysis

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

owncloud Architecture Overview

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

owncloud Architecture Overview

PROJECT PORTFOLIO SUITE

The Development of Cloud Interoperability

HAMBURG ZEUTHEN. DESY Tier 2 and NAF. Peter Wegner, Birgit Lewendel for DESY-IT/DV. Tier 2: Status and News NAF: Status, Plans and Questions

SOFTWARE-DEFINED STORAGE IN ACTION

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Cloud Compu)ng in Educa)on and Research

High Performance Computing OpenStack Options. September 22, 2015

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

Breaking the Storage Array Lifecycle with Cloud Storage

Analisi di un servizio SRM: StoRM

Neil Stobart Cloudian Inc. CLOUDIAN HYPERSTORE Smart Data Storage

Business Con*nuity with Docker

SURFsara HPC Cloud Workshop

StorPool Distributed Storage. Software-Defined. Business Overview

High Availability Databases based on Oracle 10g RAC on Linux

SwiftStack Filesystem Gateway Architecture

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015

File Services. File Services at a Glance

(Scale Out NAS System)

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Diagram 1: Islands of storage across a digital broadcast workflow

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

Introduction to NetApp Infinite Volume

The HP IT Transformation Story

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Computing at the HL-LHC

U"lizing the SDSC Cloud Storage Service

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

INDIGO-DataCloud Wupi 4 (Resource Virtualization)

SOFTWARE DEFINED STORAGE IN ACTION

Understanding Enterprise NAS

CERNBox + EOS: Cloud Storage for Science

Transcription:

The DESY Big-Data Cloud Service Peter van der Reest On behalf of the project team Slides by Patrick Fuhrmann The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 1

Content mo(va(on project goals suggested solu(on and components quick introduc(on of owncloud dcache the proposed hybrid System status and issues The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 2

about DESY science program Photon Science Petra III, Flash Center for FEL research: CFEL European X- Ray Free Electron Laser: EXFEL Center for structural system biology: CSSB Accelerator Research and Development High Energy Physics & Astropar(cle Physics Outphasing: HERA with Zeus, H1, Hermes, HERA- B WLCG (Atlas, CMS and LHCb), Belle I and II, ILC Cherenkov Telescope Array: CTA IceCube The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 3

Why suddenly Cloud? due to the well publicized poli(cal affairs, DESY is banning all non- local mail and storage providers: DESY data should be kept at DESY for mail we had a replacement right away no instant replacement for DropBox, GoogleDrive solu(on had to be available asap so we had to design and engineer a well featured cloud storage system for DESY within months The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 4

Project Goal currently maintained storage systems are focused on Scien(fic Big Data access with POSIX seman(cs sharing via ACLs customers, especially new/young communi(es (Photon Science), are reques(ng Cloud storage seman(cs project objec(ve: installa(on of a modern Cloud Storage System for scien(sts within 6 months integrated into the exis(ng AAI and storage infrastructure if possible: Reducing amount of exis(ng systems The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 5

We had to find out what Cloud means for our scien(fic customers. Big Data management support of scien(fic data lifecycle Web 2.0 feeling The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 6

Big Data management? unlimited storage space, pay per use quotas are a no go and/or pointless indestruc(ble data store, never loosing data Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years. offering Quali(es of Service (related to costs) Access Latency (how long do I have to wait) Reten(on Policy (how safe is my data, durability) extremely high availability of storage service No regular maintenance breaks beyond once a year, 4 days The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 7

Scien(fic Data Lifecycle High Speed Data Ingest Fast Analysis NFS 4.1/pNFS Visualiza(on & Sharing by WebDAV Wide Area Transfers (Globus Online, FTS) by GridFTP The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 8

The Web 2.0 experience? easy sharing with registered users and groups the public (publishing, anonymous sharing) synchronizing (bidirec(onally) with all relevant OS es access from mobile devices, preferably using upload/download integrated in OS web browser access and configura(on The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 9

The DESY Cloud What does that mean for DESY? Big Data Part Web 2.0? Here we need some help The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 10

Web 2.0 Cloud interface for the web 2.0 interface we need some experts reusing previous evalua(on (different context) going for the most popular solu(on reduce likelihood for product disappearing possibly building a user- community Germany: TU- Berlin, FZ- Jülich, TU- Dresden, interna(onally: CERN, HEPiX, United Na(ons The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 11

What exactly do we need from owncloud sync clients for all OS s upload/download clients for mobile devices sharing of data with individuals and groups (including public links) web browser based file access and configura(on The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 12

Now, what s a dcache? The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 13

dcache cheat - sheet dcache.org is an interna(onal Collabora(on, composed of developers and support people from DESY, Fermilab, NDGF and the HTW Berlin dcache is operated on about 70 sites around the world total space about 120 Petabytes we store 50 % of the en(re WLCG storage biggest dcache holds about 50 Petabytes larges dcache spans 4 countries The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 14

dcache spec for Dummies NFS/pNFS hpp/webdav gridftp xrootd, dcap unlimited hierarchical storage space Virtual File- system Layer dcache Automa(c and Manual Media transi(ons SSDs spinning disks tape, BlueRay The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 15

Star(ng with possibly the biggest 40 PBytes Tape 770 Write Pools 420 Read Pools 26 Stage Pools US- CMS Tier I 14 PBytes on Disk *** Total: 260 Doors 6 Head 280 Pool/Door Physical Hosts Informa(on provided by Catalin Dumitrescu and Dmitry Litvintsev The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 16

To certainly the most widespread 4 Countries HPC Center North One dcache Uni of Bergen PDC Uni of Oslo CSC dcache head node Nordu Net National Supercomputer Center Slide stolen from Mattias Wadenstein, NDGF The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 17

To very likely the smallest One Machine One Process NFS 4.1 Door WebDAV Door PoolManager Pool 1 TB gplazma 700 MHz ARM 512 MB Memory 2 * USB 2 100 MB Ethernet The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 18

dcache cheat - sheet (cont) protocol support NFS 4.1 / pnfs (scalable NFS) WebDAV GridFTP (Grid transfers) xrootd dcap authen(ca(on and authoriza(on support Kerberos user / password X509 (Cer(ficates and Proxies) LDAP/NIS or other informa(on providers The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 19

What do we need from dcache massive scale out managed space (up(me, availability) migra(on between media and decommissioning of hardware w/o down(me. mul( protocol access (scien(fic use case) NFS, CDMI(Cloud), WebDAV, gridftp(globusonline) Service Classes with automa(c and manual transi(ons (Access Latency, Reten(on Policy) hot spot detec(on storage (ering: tape, spinning disk, SSD s The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 20

What does the integra(on look like? The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 21

WEB 2.0 sync & share dcache owncloud Integra(on unlimited hierarchical storage space NFS 4.1 GridFTP, WebDAV dcache SSDs spinning disks tape, BlueRay The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 22

dcache owncloud Scien(fic Data Lifecycle GridFTP Unlimited hierarchical Storage Space Globus Online NFS 4.1 / pnfs HPC, HTC The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 23

dcache owncloud What does it look like for the user My dcache XXL Home My owncloud Home Sync Share Web 2.0 NFS 4.1/pNFS GridFTP WebDAV SRM (some private Grid Protocols) dcap xrootd The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 24

dcache owncloud Scalability (NFS4.1/pNFS does it) NFS Client NFS Client NFS Client pnfs Door pnfs Door pnfs Door The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 25

dcache OwnCloud integra(on simply running owncloud on dcache was the easy bit and works nicely dcache provides an NFSv4.1/pNFS interface which lets it look like a regular file system this is exactly what owncloud needs the fact the dcache doesn t allow files to be modified doesn t really bother owncloud. The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 26

But how about ownership? Ownership files owned by patrick in OwnCloud are owned by apache/owncloud in dcache this prevents us from using the same data with NFS4.1, gridftp or CDMI from dcache Tigran Mkrtchyan solved that issue: files now owned by patrick, but fully accessible by apache/owncloud dcache ACL s versus OwnCloud Sharing files shared in OwnCloud should ideally have similar ACLs in dcache data shared in owncloud is not automa(cally shared in dcache The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 27

sharing & ACL issue Web 2.0 Sync Share DESY LDAP Kerberos NFS WebDAV, GridFTP, CDMI The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 28

More issues besides the permission one The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 29

Name Space Issue We have We need Patrick Patrick Helge Helge Sandy Sandy The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 30

What we need WebDAV redirec(on to our nodes WebDAV/hpp redirect The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 31

What actually would be good instead of requiring a mounted filesystem (POSIX) for owncloud primary space, a network API/protocol would be beper best would be a standard (e.g. Cloud Data Management Interface, CDMI) CDMI is provided by big vendors allows to handle meta data, user and ownership as well The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 32

What s done we already installed two systems one connected to the DESY LDAP for DESY account holders one with the dcache.org private cloud for HTW students (different user contract J ) self registra(on with any valid cer(ficate most features are already available ordering more hardware about 200 Terabytes on top of the 100 Terabytes which are already deployed in above two systems The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 33

What s s(ll missing? access to owncloud defined in DESY User Registry resul(ng in group membership in DESY LDAP the plauorm adapter needs to be completed full account lifecycle: create, expire, archive, change group membership, terminate service etc external users integra(on, e.g. light- weight accs customizing the owncloud name space to support our scheme evalua(on of a owncloud sync client working against dcache directly by HTW student The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 34

ToDo: Tes(ng and verifica(on defining a set of reproducible test, which we can run on about 20-30 lab machines verify scalability check against future dcache or owncloud updates func(onal performance The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 35

Further (meline we expect to have a full pre- produc(on system ready in about 6-8 weeks DESY IT colleagues and HTW students will con(nue to be guinea pigs (or anyone managing to find the dcache.org cloud storage registra(on page) next report at HEPiX Fall 2014 The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 36

further reading www.dcache.org The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 37

dcache Big Data Cloud LOFAR antenna Huge amounts of data X- FEL (Free Electron Lasers) Fast Ingest The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 38