Prototyping a file sharing and synchronisation platform with owncloud



Similar documents
CERNBox + EOS: Cloud Storage for Science

Book of Abstracts CS3 Workshop

owncloud Architecture Overview

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Implementing Internet Storage Service Using OpenAFS. Sungjin Dongguen Arum

The most comprehensive review and comparison of cloud storage services

owncloud Enterprise Edition on IBM Infrastructure

owncloud Architecture Overview

DESYcloud: an owncloud & dcache update

A block based storage model for remote online backups in a trust no one environment

SDFS Overview. By Sam Silverberg

MICROSOFT OFFICE 365 MIGRATION 2013/05/13

In Memory Accelerator for MongoDB

Wikimedia architecture. Mark Bergsma Wikimedia Foundation Inc.

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Aspera Direct-to-Cloud Storage WHITE PAPER

Storage Made Easy Enterprise File Share and Sync (EFSS) Cloud Control Gateway Architecture

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Access All Your Files on All Your Devices

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

5 Mistakes to Avoid on Your Drupal Website

Google File System. Web and scalability

Smartphone Enterprise Application Integration

SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX

White Paper. Anywhere, Any Device File Access with IT in Control. Enterprise File Serving 2.0

Document OwnCloud Collaboration Server (DOCS) User Manual. How to Access Document Storage

IBM Cloud Manager with OpenStack

Using Microsoft Active Directory (AD) with HA3969U in Windows Server

User Guide. UserGuide_VersionCS8.1

For example some Bookkeepers are using Dropbox to share the accounting files between them and their client.

PHP on IBM i: What s New with Zend Server 5 for IBM i

MID-TIER DEPLOYMENT KB

Drupal Performance Tuning

insync Installation Guide

SOS SO S O n O lin n e lin e Bac Ba kup cku ck p u USER MANUAL

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

BioHPC Cloud Storage. portal.biohpc.swmed.edu Updated for

ios Deployment Simplified FileMaker How To Guide

Interwise Connect. Working with Reverse Proxy Version 7.x

App Orchestration Setup Checklist

Are You Ready for the Holiday Rush?

Testing and Restoring the Nasuni Filer in a Disaster Recovery Scenario

Shoal: IaaS Cloud Cache Publisher

Technical Overview Simple, Scalable, Object Storage Software

LifeSize UVC Video Center Deployment Guide

Polish National Data Storage. Norbert Meyer, Maciej Brzeźniak, Maciej Stroiński PSNC

Interoperable Cloud Storage with the CDMI Standard

Gladinet Cloud Backup V3.0 User Guide

Embracing Open Source: Practice and Experience from Alibaba. Wensong Zhang The 11 th Northeast Asia OSS Promotion Forum

Introduction to the EIS Guide

Vembu VMBackup v3.1.0 BETA

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

DATABASE VIRTUALIZATION AND INSTANT CLONING WHITE PAPER

Features of AnyShare

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

DeployStudio Server Quick Install

HTTP connections can use transport-layer security (SSL or its successor, TLS) to provide data integrity

Cloud Sync White Paper. Based on DSM 6.0

About This Document 3. Integration and Automation Capabilities 4. Command-Line Interface (CLI) 8. API RPC Protocol 9.

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

McAfee Web Gateway 7.4.1

The deployment of OHMS TM. in private cloud

Scalable Internet Services and Load Balancing

DiskPulse DISK CHANGE MONITOR

Hitachi Content Platform (HCP)

Deploying Adobe Experience Manager DAM: Architecture blueprints and best practices

In order to upload a VM you need to have a VM image in one of the following formats:

Introduction to Mobile Access Gateway Installation

Product Analysis of owncloud Enterprise Edition 8

The Design and Implementation of the Zetta Storage Service. October 27, 2009

Deploying Business Virtual Appliances on Open Source Cloud Computing

Analisi di un servizio SRM: StoRM

NSS Volume Data Recovery

Contents First Time Setup... 2 Setting up the Legal Vault Client (KiteDrive)... 3 Setting up the KiteDrive Outlook Plugin Using the Legal Vault

VMware vcenter Log Insight Getting Started Guide

Back2Cloud: A highly Scalable and Available Remote Backup Service

Self service for software development tools

POSIX and Object Distributed Storage Systems

OS6 N2520 N2560 N4520 N4560 FWRN Build OS

How to use Cloud Solutions by Swisscom for Disaster Recovery. Whitepaper. Fabian Haldimann Stefan Lengacher Thomas Gfeller

SMB in the Cloud David Disseldorp

The cloud storage service bwsync&share at KIT

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Transcription:

Data & Storage Services Prototyping a file sharing and synchronisation platform with owncloud CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Jakub T. Moscicki Massimo Lamanna CERN IT- DSS CHEP 2013 - Amsterdam

Content Background & context of the cernbox project Details on core funcmonality of owncloud TesMng owncloud Outlook 2

The origins of the cernbox project We need a compemmve alternamve to Dropbox for CERN users Reasons SLAs: availability, confidenmality integramon into IT infrastructure archival & backup policies The scale of the problem is unknown but we have some indicamons 4500 dismnct IPs in DNS from cern.ch to *.dropbox.com (daily...) We also want to adapt to user expectamons We manage large- scale online- storage systems and we can leverage on them EOS (RAW) CEPH (RAW) Other services ~62 PB ~3.5 PB ~30 PB Bulk of disk storage operated by IT/DSS 3

Gateway to the future A unified pla<orm integrated with physics data storage Federated dropbox service for HEP community and possibly in wider science Novel ways for suppormng specialized scien@fic workflows based on a common sharing and syncing plaborm Novel ways of delivering home directories in the virtualized IT environment local folder replica lives within the VM snapshot however, first we need to posi@vely address the classic Dropbox use- case 4

OwnCloud Evaluation OSS Dropbox Market not- yet- mature but products ramp up with quality and features Why owncloud? It has a vision matching our needs It has the required funcmonality It has an extensible architecture for future use- cases It is open- source Afer a market survey we decided to seriously test owncloud and provide feedback to the company and to the community 5

What OwnCloud provides? Interface cross plaborm web/mobile/desktop access desktop integramon (drag/drop, nomficamons, ) web 2.0 Core funcmonality Syncing of folders Folder/file sharing with ACLs, expiry date Public (hashed) links with opmonal password protecmon Anonymous upload folders (hashed links) Currently NO support for user- defined groups Admin creates groups Needed: integramon with IT infrastructure (external groups) 6

7

File transport protocol WebDAV (extension to HTTP with XML body) OwnCloud Server is RFC 2518 compliant Protocol is HTTP with XML body so it is bloated Basic metadata query for a file ~0.5KB Compresses well: metadata for 1000 files ~16KB Some good points IntegraMon with other web- services Desktop browsers: OSX/Finder, Simply curl/wget to GET, PROPFIND, PUT, DELETE, MOVE Fuse mount (davfs2) HTTP POST/GET 2GB file limit for upload currently but it is removed in newer PHP version Sync client Chunked upload (10MB chunks) Extension: OC- CHUNKED header anribute 8

Server storage layout Simple and transparent, including trashbin and versions <user> - cache - files - - - dira - - - dirb - - - - - - dirc: hello.txt - files_versions - - - - - - dirc: hello.txt.v1380894998 - files_trashbin - - - files - - - - - - dira: byebye.txt.d1381078676 - - - versions - - - - - - dira: byebye.txt.v1380891350.d1381078676 byebye.txt.v1380891050.d1381078676 9

How sync works client owncloud server PROPFIND: get remote ETAG PUT, GET, MOVE files inomfy local changes sqlite DB propagate ETAG on file change Notes: ETAG is a standard HTTP header for cache control ETAG is a unique idenmfier generated by the server No file diffs over the wire icons: hnp://www.visualpharm.com 10

Testing principles Community edi@on latest server (5.0.11) and client (1.4.1) AutomaMc tesmng of crimcal funcmonality tesmng of new releases tesmng in local/changed environment product- agnosmc collecmon of test cases and test ideas Share our work We plan to move our test toolkit to github.com/opensmashbox Test plan Core logics tes,ng Stress tesmng Scale tesmng OperaMonal scenarios 11

Core logics testing Fundamental checks: break it! check funcmonality, trigger file conflicts, check consistency of behaviour symlinks, hardlinks, device files, characters: \ : < > %? <space> * extreme condimons How many files can we put in a single directory How deep may be the directory tree In parallel: Client A: remove directory, Client B: add files to this directory Field tesmng: see if system is reasonable under normal condi8ons realismc acmons e.g. keep on updamng a file somewhere in a directory tree realismc file size distribumons from CERN AFS home directories but random content realismc directory tree /afs/cern.ch/user/m/moscicki (30K files, 8K directories) automamc verificamon of integrity of files and directories (md5) check propagamon of changes between different clients of the same user InteracMve analysis: tap into the server framework, protocol, db, apache, 12

Scale / stress testing (in progress) Horizontal scaling How many concurrent clients may a single server support? Idle users, AcMve users System scaling How many files/directories/users? 10% prototype : 1 K acmve users, 10K dormant users, 100TB data Performance tesmng and tuning How efficient we are in sending and receiving files Many tuning opportunimes MySQL/Innodb/indexes/cluster, memcached PHP version/accelerators, Apache tuning, OperaMonal scenarios Network access cut Server reboot Client killed Database lost, 13

Test infrastructure Baseline server setup RHEL 6, Apache, MySQL, local disk should work out- of- the- box ulmmate reference Intended final configuramon DB test AS disk users HTTP LB test AS AS AS Fuse or EOS client E O S Goal: unified space with physics data DB DB 14

Upsides Integrates smoothly with our LDAP(S) ETAG propagamon works as expected Parallel directory trees are really independent Large number of files in the ETAG propagamon chain does not affect efficiency ~O(log N) In field tesmng we did not manage to see corrupted files Conflict files are(mostly) created in expected ways However it may be problemamc for some applicamons Decent WebDAV streaming for large files Example: 400MB file on my desktop hnp/upload: ~40MB/s, hnp/download: ~100MB/s hnps/upload: ~25MB/s, hnps/download: ~60MB/s 15

Downsides Parallel sync with chunking could corrupt data now fixed in devel: temporary chunk files should encode session ids Client tends to traverse directory tree out of order now fixed in devel: asymptomcally correct but very subopmmal Problems with more than 50 levels of directories should give a clear feedback to the user if limits exceeded Shared folders: ETAG propagamon problems if not at top- level Syncing rates are low (~1-5 files/s) callgraph analysis of the framework shows opportunimes for large improvements addimonaly this may also be due to server tuning or older PHP version Random glitches of the desktop clients SomeMmes hangs (wireless networks) Updates of client versions not smooth 16

Scalability concerns for large folders Adding many files to a single directory does not scale ~O(N) However the directory tree hierarchy does scale ~O(1) Reason: a bug in PROPFIND implementamon triggers the mime- type determinamon for the enmre folder this makes syncing many files in a single directory very slow 17

Bug reporting We report all bugs on github and discuss them with owncloud CEO directly some issues are fixed already (not release yet, needs retesmng) Expecta8ons? 18

Feedback on architecture Generic framework server- side Open and extensible, API for adding new apps A challenge for core business performance Decouple view from data model Whether current coupling incidental or purposeful We see many opportunimes for performance improvement and bener scalability DocumentaMon of sync model is badly needed ETAG: calculate it? concerns about hashing content (mounmng secondary storage) what about hashing metadata (size+mmme)? requires careful thinking of client- side mmmes, ACLs, 19

Horizontal scalability (preliminary) Test configuramon (field tesmng) 100 users account, 50 VM clients sync sessions run in parallel data: 1 directory with 100 files Scenarios Dormant accounts ( idle system footprint) Download, upload files Modify and sync Measurement number of parallel sync sessions speed of the system errors: incomplete transfers Not cirimcal by design (will catch up on the next sync session) ~100 MB/s measured on the server NIC 20

Download efficiency Server load (preliminary) 100 150 200 250 Sync sessions vs Mme Downloaded files 100 150 200 250 Download Mme overload à refuse client Download elapsed Mme (s) 21

Testing EOS storage for cernbox 1000 test user accounts and ramping up à 1K acmve, 10K idle 5 million test files and ramping up 2 TB of test data and ramping up à 100TB 22

What others do? ETHZ: public beta service (using owncloud enterprise) limited use- case: memory smck replacement 5GB user quota 1900 users, 500GB data since 28 June Backend storage: SONAS EMC UniversiMes starmng beta service TUB MPI owncloud.com: scaleout tests with commercial vendors (~250K users) CEPH integramon under discussion 23

Evaluation OwnCloud Excellent vision and great funcmonality Young product in (very) acmve development Challenging task: support different semanmcs of local OSes and FSes Strong points Plaborm integramon, web interface, mobile clients rapidly improving Open architecture LAMP stack: well known and used in industry Developers are responsive and give anenmon to our feedback Weak points Maturity of the product rapid development and many changes at all levels Challenges to address in the core framework and sync client 24

What comes next Beta service at CERN we are confident that our concerns will be addressed in due Mme Your feedback is important on the beta service at CERN (as a user) on your local deployment experience (as a service manager) We report all issues on github and discuss our concerns with owncloud.com we have posimve feedback We conmnue to get user feedback on the product within HEP We will share our test toolkit on github 25

Summary Dropbox- like service on premise solving an important corporate issue for our OrganizaMon excimng new ideas and opportunimes for the future Dropbox- like service has a huge potenmal for HEP community we want to see it fly at CERN We have a tesmng framework for QA assessment CollaboraMon with owncloud ContribuMon to/from the community Extend the test coverage Learn from others Your feedback? Beta service at CERN Your deployment experience 26

Questions? Comments? 27