The DESY Big-Data Cloud Service Peter van der Reest On behalf of the project team Slides by Patrick Fuhrmann The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 1
Content mo(va(on project goals suggested solu(on and components quick introduc(on of owncloud dcache the proposed hybrid System status and issues The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 2
about DESY science program Photon Science Petra III, Flash Center for FEL research: CFEL European X- Ray Free Electron Laser: EXFEL Center for structural system biology: CSSB Accelerator Research and Development High Energy Physics & Astropar(cle Physics Outphasing: HERA with Zeus, H1, Hermes, HERA- B WLCG (Atlas, CMS and LHCb), Belle I and II, ILC Cherenkov Telescope Array: CTA IceCube The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 3
Why suddenly Cloud? due to the well publicized poli(cal affairs, DESY is banning all non- local mail and storage providers: DESY data should be kept at DESY for mail we had a replacement right away no instant replacement for DropBox, GoogleDrive solu(on had to be available asap so we had to design and engineer a well featured cloud storage system for DESY within months The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 4
Project Goal currently maintained storage systems are focused on Scien(fic Big Data access with POSIX seman(cs sharing via ACLs customers, especially new/young communi(es (Photon Science), are reques(ng Cloud storage seman(cs project objec(ve: installa(on of a modern Cloud Storage System for scien(sts within 6 months integrated into the exis(ng AAI and storage infrastructure if possible: Reducing amount of exis(ng systems The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 5
We had to find out what Cloud means for our scien(fic customers. Big Data management support of scien(fic data lifecycle Web 2.0 feeling The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 6
Big Data management? unlimited storage space, pay per use quotas are a no go and/or pointless indestruc(ble data store, never loosing data Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years. offering Quali(es of Service (related to costs) Access Latency (how long do I have to wait) Reten(on Policy (how safe is my data, durability) extremely high availability of storage service No regular maintenance breaks beyond once a year, 4 days The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 7
Scien(fic Data Lifecycle High Speed Data Ingest Fast Analysis NFS 4.1/pNFS Visualiza(on & Sharing by WebDAV Wide Area Transfers (Globus Online, FTS) by GridFTP The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 8
The Web 2.0 experience? easy sharing with registered users and groups the public (publishing, anonymous sharing) synchronizing (bidirec(onally) with all relevant OS es access from mobile devices, preferably using upload/download integrated in OS web browser access and configura(on The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 9
The DESY Cloud What does that mean for DESY? Big Data Part Web 2.0? Here we need some help The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 10
Web 2.0 Cloud interface for the web 2.0 interface we need some experts reusing previous evalua(on (different context) going for the most popular solu(on reduce likelihood for product disappearing possibly building a user- community Germany: TU- Berlin, FZ- Jülich, TU- Dresden, interna(onally: CERN, HEPiX, United Na(ons The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 11
What exactly do we need from owncloud sync clients for all OS s upload/download clients for mobile devices sharing of data with individuals and groups (including public links) web browser based file access and configura(on The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 12
Now, what s a dcache? The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 13
dcache cheat - sheet dcache.org is an interna(onal Collabora(on, composed of developers and support people from DESY, Fermilab, NDGF and the HTW Berlin dcache is operated on about 70 sites around the world total space about 120 Petabytes we store 50 % of the en(re WLCG storage biggest dcache holds about 50 Petabytes larges dcache spans 4 countries The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 14
dcache spec for Dummies NFS/pNFS hpp/webdav gridftp xrootd, dcap unlimited hierarchical storage space Virtual File- system Layer dcache Automa(c and Manual Media transi(ons SSDs spinning disks tape, BlueRay The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 15
Star(ng with possibly the biggest 40 PBytes Tape 770 Write Pools 420 Read Pools 26 Stage Pools US- CMS Tier I 14 PBytes on Disk *** Total: 260 Doors 6 Head 280 Pool/Door Physical Hosts Informa(on provided by Catalin Dumitrescu and Dmitry Litvintsev The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 16
To certainly the most widespread 4 Countries HPC Center North One dcache Uni of Bergen PDC Uni of Oslo CSC dcache head node Nordu Net National Supercomputer Center Slide stolen from Mattias Wadenstein, NDGF The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 17
To very likely the smallest One Machine One Process NFS 4.1 Door WebDAV Door PoolManager Pool 1 TB gplazma 700 MHz ARM 512 MB Memory 2 * USB 2 100 MB Ethernet The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 18
dcache cheat - sheet (cont) protocol support NFS 4.1 / pnfs (scalable NFS) WebDAV GridFTP (Grid transfers) xrootd dcap authen(ca(on and authoriza(on support Kerberos user / password X509 (Cer(ficates and Proxies) LDAP/NIS or other informa(on providers The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 19
What do we need from dcache massive scale out managed space (up(me, availability) migra(on between media and decommissioning of hardware w/o down(me. mul( protocol access (scien(fic use case) NFS, CDMI(Cloud), WebDAV, gridftp(globusonline) Service Classes with automa(c and manual transi(ons (Access Latency, Reten(on Policy) hot spot detec(on storage (ering: tape, spinning disk, SSD s The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 20
What does the integra(on look like? The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 21
WEB 2.0 sync & share dcache owncloud Integra(on unlimited hierarchical storage space NFS 4.1 GridFTP, WebDAV dcache SSDs spinning disks tape, BlueRay The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 22
dcache owncloud Scien(fic Data Lifecycle GridFTP Unlimited hierarchical Storage Space Globus Online NFS 4.1 / pnfs HPC, HTC The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 23
dcache owncloud What does it look like for the user My dcache XXL Home My owncloud Home Sync Share Web 2.0 NFS 4.1/pNFS GridFTP WebDAV SRM (some private Grid Protocols) dcap xrootd The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 24
dcache owncloud Scalability (NFS4.1/pNFS does it) NFS Client NFS Client NFS Client pnfs Door pnfs Door pnfs Door The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 25
dcache OwnCloud integra(on simply running owncloud on dcache was the easy bit and works nicely dcache provides an NFSv4.1/pNFS interface which lets it look like a regular file system this is exactly what owncloud needs the fact the dcache doesn t allow files to be modified doesn t really bother owncloud. The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 26
But how about ownership? Ownership files owned by patrick in OwnCloud are owned by apache/owncloud in dcache this prevents us from using the same data with NFS4.1, gridftp or CDMI from dcache Tigran Mkrtchyan solved that issue: files now owned by patrick, but fully accessible by apache/owncloud dcache ACL s versus OwnCloud Sharing files shared in OwnCloud should ideally have similar ACLs in dcache data shared in owncloud is not automa(cally shared in dcache The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 27
sharing & ACL issue Web 2.0 Sync Share DESY LDAP Kerberos NFS WebDAV, GridFTP, CDMI The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 28
More issues besides the permission one The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 29
Name Space Issue We have We need Patrick Patrick Helge Helge Sandy Sandy The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 30
What we need WebDAV redirec(on to our nodes WebDAV/hpp redirect The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 31
What actually would be good instead of requiring a mounted filesystem (POSIX) for owncloud primary space, a network API/protocol would be beper best would be a standard (e.g. Cloud Data Management Interface, CDMI) CDMI is provided by big vendors allows to handle meta data, user and ownership as well The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 32
What s done we already installed two systems one connected to the DESY LDAP for DESY account holders one with the dcache.org private cloud for HTW students (different user contract J ) self registra(on with any valid cer(ficate most features are already available ordering more hardware about 200 Terabytes on top of the 100 Terabytes which are already deployed in above two systems The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 33
What s s(ll missing? access to owncloud defined in DESY User Registry resul(ng in group membership in DESY LDAP the plauorm adapter needs to be completed full account lifecycle: create, expire, archive, change group membership, terminate service etc external users integra(on, e.g. light- weight accs customizing the owncloud name space to support our scheme evalua(on of a owncloud sync client working against dcache directly by HTW student The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 34
ToDo: Tes(ng and verifica(on defining a set of reproducible test, which we can run on about 20-30 lab machines verify scalability check against future dcache or owncloud updates func(onal performance The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 35
Further (meline we expect to have a full pre- produc(on system ready in about 6-8 weeks DESY IT colleagues and HTW students will con(nue to be guinea pigs (or anyone managing to find the dcache.org cloud storage registra(on page) next report at HEPiX Fall 2014 The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 36
further reading www.dcache.org The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 37
dcache Big Data Cloud LOFAR antenna Huge amounts of data X- FEL (Free Electron Lasers) Fast Ingest The DESY BIG DATA Cloud Service PvdR HEPiX Spring 2014 38