The DESY Big-Data Cloud System



Similar documents
The DESY Big-Data Cloud Service

Patrick Fuhrmann. The DESY Storage Cloud

How To Share Data With The Cloud On Dcache

The dcache Storage Element

DESYcloud: an owncloud & dcache update

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

Preview of a Novel Architecture for Large Scale Storage

Zadara Storage Cloud A

Next Generation Tier 1 Storage

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Data storage services at CC-IN2P3

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Managed GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

IBM ELASTIC STORAGE SEAN LEE

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

The HP IT Transformation Story

Nexenta Performance Scaling for Speed and Cost

Breaking the Storage Array Lifecycle with Cloud Storage

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Analisi di un servizio SRM: StoRM

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Experience in integrating enduser cloud storage for CMS Analysis

Mass Storage at GridKa

SURFsara HPC Cloud Workshop

SURFsara Data Services

SwiftStack Filesystem Gateway Architecture

irods at CC-IN2P3: managing petabytes of data

Solution for private cloud computing

High Availability Databases based on Oracle 10g RAC on Linux

HAMBURG ZEUTHEN. DESY Tier 2 and NAF. Peter Wegner, Birgit Lewendel for DESY-IT/DV. Tier 2: Status and News NAF: Status, Plans and Questions

INDIGO-DataCloud Wupi 4 (Resource Virtualization)

Understanding Object Storage and How to Use It

SURFsara HPC Cloud Workshop

Understanding Enterprise NAS

Tier0 plans and security and backup policy proposals

<Insert Picture Here> Cloud Archive Trends and Challenges PASIG Winter 2012

owncloud Architecture Overview

SOFTWARE-DEFINED STORAGE IN ACTION

owncloud Architecture Overview

WHITE PAPER. QUANTUM LATTUS: Next-Generation Object Storage for Big Data Archives

SOFTWARE DEFINED STORAGE IN ACTION

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

(Scale Out NAS System)

Advanced Computing. for large DESY. Volker Guelzow IT-Gruppe Hamburg, Oct 29th, 2014

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

SQL Server Virtualization

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

Cluster, Grid, Cloud Concepts

Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India talk2tamanna@gmail.com

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

Zero Downtime In Multi tenant Software as a Service Systems


High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

Quantum StorNext. Product Brief: Distributed LAN Client

Best Practices for Deploying Citrix XenDesktop on NexentaStor Open Storage

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Computing at the HL-LHC

Forschungszentrum Karlsruhe in der Helmholtz - Gemeinschaft. Holger Marten. Holger. Marten at iwr. fzk. de

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

Diagram 1: Islands of storage across a digital broadcast workflow

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Increasing Storage Performance, Reducing Cost and Simplifying Management for VDI Deployments

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

KIT Site Report. Andreas Petzold. STEINBUCH CENTRE FOR COMPUTING - SCC

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Amazon Cloud Storage Options

Transcription:

The DESY Big-Data Cloud System Patrick Fuhrmann On behave of the project team The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 1

Content (on a good day) About DESY Project Goals Suggested Solution and components Quick introduction of dcache owncloud The proposed hybrid System Status and issues The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 2

About DESY One laboratory, two locations: Hamburg Zeuthen The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 3

About DESY Science in Hamburg High Energy Physics Outphasing: HERA with Zeus, H1, Hermes, HERA-B WLCG (Atlas, CMS and LHCb): CERN Belle I and II (KEK) : Japan International Linear Collider (In planning) : Japan Photon Science Petra III, Flash CFEL X-Ray Free Electron Laser, XFEL (in preparation) Some biology The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 4

About DESY (Cont.) Science in Zeuthen is mostly Astro-Particle Physics Cherenkov Telescope Array (CTA) North and South Hemisphere Ice Cube South Pole (Neutrino) The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 5

Consequence for DESY IT About 20 Years of experience with huge amounts of Data, especially Tier 0 for HERA Petra III and XFEL (several Petabytes / month) Tier II for the World Wide LHC Computing Grid 15 Petabyes / year ( currently 200 Petabytes in total) DESY is currently operating/managing 11 Petabytes on Disk and 15 Petabytes on Tape mostly managed with dcache serving 10.000 cores in total from 6 storage instances. The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 6

Why suddenly Cloud? Due to the well know political affaires, DESY banned all non-local mail and storage providers. For mail we had a replacement right away No replacement for DropBox Replacement had to be available asap. So we had to find a Cloud system for DESY within months. The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 7

Project Goal Currently maintained storage systems are focused on Scientific Big Data. Access with POSIX semantics Sharing via ACLs. Customers, especially new/young communities (Photon Science), are requesting Cloud storage semantics. Project Objective: Installation of a modern Cloud Storage System for scientists within 6 months. Integrated into the existing AAI and storage infrastructure. If possible: Reducing amount of existing systems. The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 8

We had to find out what Cloud means for our scientific customers. Big Data management Support of Scientific data lifecycle Web 2.0 feeling The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 9

The Big Data management? Unlimited storage space, pay per use Quotas are a no go and pointless Indestructible data store, never loosing data Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years. Different Quality of Services (payments) Access Latency (How long do I have to wait) Retention Policy (How save is my data, durability) Extremely high availability of storage service No regular maintenance breaks below once a year, 4 days The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 10

Scientific Data Lifecycle High Speed Data Ingest Fast Analysis NFS 4.1/pNFS Visualization & Sharing by WebDAV Wide Area Transfers (Globus Online, FTS) by GridFTP The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 11

The Web 2.0 experience? Easy sharing with Registered Users and Groups The public (publishing) Synchronizing (bidirectional) with all relevant OS es Access from mobile devices, preferable upload/download OS integrated. Web Browser access and configuration The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 12

The DESY Cloud What does that mean for DESY? Big Data Part Web 2.0? Here we need some help The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 13

Web 2.0 Cloud interface For the web 2.0 interface we needed some experts. Not much time for evaluation. Going for the most popular solution Reduce likelihood for product disappearing Possibly building a user-community (like today) TU-Berlin, FZ-Jülich, TU-Dresden **** CERN, United Nations CERN is evaluating a similar approach and we are in contact anyway (WLCG) The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 14

What exactly do we need from owncloud The sync clients for all OS s Upload/download clients for mobile devices Sharing of data with individuals and groups (including public links) Web Browser based file access and configuration That s it for now. The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 15

Now, what s a dcache? The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 16

dcache Cheat - sheet dcache.org is an international Collaboration, composed of developers and support people from DESY, Fermilab, NDGF and the HTW Berlin. dcache is operated on about 70 sites around the world. Total space about 120 Petabytes. We store 50 % of the entire WLCG storage. Biggest dcache holds about 50 Petabytes. Larges dcache spans 4 countries. The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 17

dcache spec for Dummies NFS/pNFS httpwebdav gridftp xrootd/dcap Unlimited hierarchical Storage Space Virtual File-system Layer dcache Automatic and Manual Media transitions SSDs Spinning Disks Tape, Blue Ray The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 18

Starting with possibly the biggest 40 PBytes Tape 770 Write Pools 420 Read Pools 26 Stage Pools US-CMS Tier I 14 PBytes on Disk *** Total: 260 Doors 6 Head 280 Pool/Door Physical Hosts Information provided by Catalin Dumitrescu and Dmitry Litvintsev The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 19

To certainly the most widespread 4 Countries One dcache Slide stolen from Mattias Wadenstein, NDGF The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 20

To very likely the smallest One Machine One Process NFS 4.1 Door WebDAV Door PoolManager Pool 1 TB gplazma 700 MHz ARM 512 MB Memory 2 * USB 2 100 MB Ethernet The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 21

dcache cheat sheet (cont) Protocol support NFS 4.1 / pnfs (scalable NFS) WedDAV GridFTP (Grid transfers) xrootd dcap User/Authz support Kerberos User / password LDAP X509 (Certificates and Proxies) The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 22

What do we need from dcache Scales out massively Managed space (Uptime) Migration between media and decommissioning of hardware w/o downtime. Multi protocol access (Scientific use) NFS, CDMI(Cloud), WebDAV, gridftp(globusonline) Service Classes with automatic and manual transitions (Access Latency, Retention Policy) Hot spot detection Tape Spinning Disk SSD s The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 23

What does the integration look like? The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 24

dcache owncloud Integration WEB 2.0 Sync & share Unlimited hierarchical Storage Space NFS 4.1 GridFTP, WebDAV dcache SSDs Spinning Disks Tape, Blue Ray The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 25

dcache owncloud Scientific Data Lifecycle GridFTP Unlimited hierarchical Storage Space Globus Online NFS 4.1 / pnfs HPC, HTC The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 26

dcache owncloud What does it look like for the user My dcache XXL Home My owncloud Home Sync Share Web 2.0 NFS 4.1/pNFS GridFTP WebDAV SRM (some private Grid Protocols) dcap xrootd The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 27

dcache owncloud Scalability (NFS4.1/pNFS does it) NFS Client NFS Client NFS Client pnfs Door pnfs Door pnfs Door The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 28

dcache OwnCloud integration Simply running owncloud on dcache was the easy bit and works nicely. dcache provides an NFSv4.1/pNFS interface which lets it look like a regular file system. This is exactly what owncloud needs. The fact the dcache doesn t allow files to be modified doesn t really bother owncloud. The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 29

But how about ownership? Owner ship Files owned by patrick in OwnCloud are owned by apache/owncloud in dcache That prevents us from using the same data with NFS4.1, gridftp or CDMI from dcache Tigran solved that issue. dcache ACL s versus OwnCloud Sharing Files shared in OwnCloud should have similar ACLs in dcache. Data shared in owncloud is not automatically shared in dcache The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 30

Ownership/mapping issue Web 2.0 Sync Share DESY LDAP Kerberos NFS WebDAV, GridFTP, CDMI The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 31

More issues Besides the permission one The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 32

Name Space Issue We have We need Patrick Patrick Paul Paul Tanja *** The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 33

What we need WebDAV redirection to our nodes WebDAV/http redirect The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 34

What actually would be good Instead of requiring a mounted filesystem (POSIX) for owncloud primary space, an network API/protocol would be better. Best would be a standard (e.g. Cloud Data Management Interface, CDMI). CDMI is provided by big vendors Allows to handle meta data and user and ownership as well. The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 35

What s done We already installed two systems. One connected to the DESY LDAP for DESY employees One with the dcache.org private cloud For HTW students (different user contract ) Self registration with any valid Certificate Most features are already available Ordering more hardware About 200 Terabytes on top of the 100 Terabytes which are already deployed in two systems. The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 36

What s still missing? The platform adapter needs to be written Resource access to owncloud defined by group membership in DESY LDAP Customizing the owncloud name space to support our schema. HTW Student (Leonie) is evaluating a owncloud sync client working against dcache directly (under supervision of Tigran) The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 37

Testing and verification Defining a set of reproducible test, which we can run on about 20 machines Verify scalability Guaranty for future dcache or OwnCloud updates Functional Performance The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 38

Further timeline We expect to have a pre-production system ready in about 6-8 weeks. DESY IT colleagues and HTW students will be guinea pigs The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 39

dcache Big Data Cloud LOFAR antenna Huge amounts of data X-FEL (Free Electron Lasers) Fast Ingest The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 40

The End further reading www.dcache.org The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 41

The DESY BIG DATA Cloud Service Berlin Cloud Event Patrick Fuhrmann 5 May 2014 42