Experience in integrating enduser cloud storage for CMS Analysis



Similar documents
The glite File Transfer Service

User-friendly access to Grid and Cloud resources for 18scientific th 19 th computing January / 21

dcache, list of topics

Caching SMB Data for Offline Access and an Improved Online Experience

How To Connect Your Cloud

Cloud computing an insight

The Big-Data Cloud. Patrick Fuhrmann. On behave of the project team. The BIG DATA Cloud 8 th dcache Workshop, DESY Patrick Fuhrmann 15 May

The DESY Big-Data Cloud System

HyperQ Remote Office White Paper

Book of Abstracts CS3 Workshop

Cloud Computing. Adam Barker

imis & Document Management Systems

Monitoring commercial cloud service providers

Alfresco Enterprise on AWS: Reference Architecture

CERNBox + EOS: Cloud Storage for Science

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Patrick Fuhrmann. The DESY Storage Cloud

What s New in SharePoint 2016 (On- Premise) for IT Pros

Analisi di un servizio SRM: StoRM

Installation and Setup: Setup Wizard Account Information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group

Business Intelligence Competency Partners

BITDEFENDER SECURITY FOR AMAZON WEB SERVICES

White Paper. Anywhere, Any Device File Access with IT in Control. Enterprise File Serving 2.0

Copyright 2012 Trend Micro Incorporated. All rights reserved.

Flexible Identity Federation

Freedom for Servers, Drives & Desktops

Status and Evolution of ATLAS Workload Management System PanDA

Security Overview Enterprise-Class Secure Mobile File Sharing

Data Management. Network transfers

Gladinet Cloud Access Solution Simple, Secure Access to Online Storage

Cisco EXAM Implementing Cisco Threat Control Solutions (SITCS) Buy Full Product.

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

QoS and DLC in IaaS INDIGO-DataCloud

Cloud Computing Trends

Internet Storage Sync Problem Statement

Cloud Attached Storage 5.0

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

Desktop Virtualization for the Banking Industry. Resilient Desktop Virtualization for Bank Branches. A Briefing Paper

Document OwnCloud Collaboration Server (DOCS) User Manual. How to Access Document Storage

Integrating a heterogeneous and shared Linux cluster into grids

The most comprehensive review and comparison of cloud storage services

insync Installation Guide

Cloud Computing. Summary

This document details the procedure for installing Layer8 software agents and reporting dashboards.

Deploy Remote Desktop Gateway on the AWS Cloud

Tableau Online. Understanding Data Updates

Sync Security and Privacy Brief

From Internet Data Centers to Data Centers in the Cloud

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

Grid and Cloud Computing at LRZ Dr. Helmut Heller, Group Leader Distributed Resources Group

Top. Reasons Federal Government Agencies Select kiteworks by Accellion

Geschäftsanwendungen bereit machen für die Cloud. Make your Business Applications ready for the Cloud

Challenges in Deploying Public Clouds

Introduction to the EIS Guide

The Data Quality Monitoring Software for the CMS experiment at the LHC

Mod 2: User Management

AWS Plug-in Guide. Qlik Sense 1.1 Copyright QlikTech International AB. All rights reserved.

Introduction to Cloud Computing

SharePoint 2013 Business Connectivity Services Hybrid Overview

Computing at the HL-LHC

Office 365 Migration Performance & Server Requirements

Network & HEP Computing in China. Gongxing SUN CJK Workshop & CFI

Configuration Guide. BES12 Cloud

Service Level Standard

Cloud Sync White Paper. Based on DSM 6.0

SOFTWARE UPDATER A unique tool to protect your business against known threats

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

SURFsara Data Services

Cloud Computing Backgrounder

Web/Cloud based CATI using quexs

grow your storage business

User Management Tool 1.6

Scaling out compute resources LOCAL (INTER)NATIONAL

The Hybrid Cloud Advantage White Paper

Contents Jive StreamOnce

FileDrawer An Enterprise File Sharing and Synchronization (EFSS) solution.

OSG PUBLIC STORAGE. Tanya Levshina

Orbiter Series Service Oriented Architecture Applications

The most comprehensive review and comparison of cloud storage services

Network Infrastructure Services CS848 Project

Anwendungsintegration und Workflows mit UNICORE 6

Contents Jive StreamOnce

Simple, Secure User Guide for OpenDrive Drive Application v for OS-X Platform May 2015

PaperStream Connect. Setup Guide. Version Copyright Fujitsu

File Transfer Software and Service SC3

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report.

Chapter 19 Cloud Computing for Multimedia Services

MaaSter Microsoft Ecosystem Management with MaaS360. Chuck Brown Jimmy Tsang

OIS. Update on Windows 7 at CERN & Remote Desktop Gateway. Operating Systems & Information Services CERN IT-OIS

Cluster, Grid, Cloud Concepts

Informatica Cloud & Redshift Getting Started User Guide

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

WHITE PAPER: Egenera Cloud Suite

The most comprehensive review and comparison of cloud storage services

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

BioHPC Cloud Storage. portal.biohpc.swmed.edu Updated for

Transcription:

Experience in integrating enduser cloud storage for CMS Analysis Hassen Riahi CERN IT CS3 Workshop Zürich, 18 th January 2016 1/18/16 H. Riahi; CS3 workshop 2

Outline Overview Goals Integration architecture Challenges & Developments Commissioning experience Conclusions 1/18/16 H. Riahi; CS3 workshop 3

CERNBox CERNBox is a production CERN service providing a cloud synchronization service as commercial cloud storages for endusers (Dropbox, Amazon Cloud Drive, Google Drive, etc.) Data is located at CERN Offline access on all devices/all platforms Synchronize files 10% of CMS users download data to their desktop Easy sharing with other users Available for all CERN users (1TB/user) It has been integrated with: CERN batch E-Groups Root Viewer See related work: L. Mascetti et al. CERNBox: Cloud Storage for Science, CS3 workshop, Oral presentation, 1/19/16. 1/18/16 H. Riahi; CS3 workshop 4

Distributed Data Analysis in CMS +1000 individual user/month +50 sites +20k concurrent jobs Typically 1 output/job Files vary in size +200k completed jobs/day Minimal latencies Chaotic environment 1/18/16 H. Riahi; CS3 workshop 5

AsyncStageOut (ASO) ASO is the distributed user data management system for CMS Analysis It has started production in June 2014 It relies on File Transfer Service (FTS3) ~ 800 TB/8 M of files transferred during the last month ~ 200k transfers/day Peaks of ~ 500k transfers/day 1/18/16 H. Riahi; CS3 workshop 6

FTS3 FTS3 is a low level data movement service It is the WLCG middleware for data transfers Started officially production at August 1 st, 2014 Transfers more than 20PB/month (record of 2PB/day) Capable of auto-adapt the concurrency based on multiple parameters 1/18/16 H. Riahi; CS3 workshop 7

Goals Provide cloud storage to CMS users lacking of storage area in their home institute to perform physics analysis Enable CMS users to extend their storage in the grid by using their own storage in private/ public clouds Exploit Sync. and Share capabilities in CMS Analysis for: Prompt and offline access from the laptop to physics results produced by grid jobs Easy and fast sharing of results among the physics group 1/18/16 H. Riahi; CS3 workshop 8

Integration Architecture CERNBox 1/18/16 H. Riahi; CS3 workshop 9

Infrastructure: Challenges & Developments CMS supports only the transfer between SRM/ GridFTP storage endpoints for users data movement in the grid while cloud sites could expose other transfer protocols Ø The support of data movement from/to any transfer protocol exposed by the storage has been included in ASO CMS sites are required to follow CMS naming convention (/store/user/username) when configuring the storage paths for their users Ø The support of ad-hoc users paths in the storage has been included 1/18/16 H. Riahi; CS3 workshop 10

DM Challenges CMS storages are exposing SRM/GridFTP interface for users data transfer while it is not necessarily the case for cloud storages CERNBox relies so far on XROOT/HTTP protocols Dropbox exposes its own protocol AWS S3 3 rd party copy between 2 storages could be possible only if source and destination are exposing the same transfer protocol 1/18/16 H. Riahi; CS3 workshop 11

FTS3 Developments For Cloud FTS3 supports multiple protocols Normally used with 3 rd party enabled protocols Supports acting as a proxy between incompatible protocols S3, Dropbox and plain DAV/HTTP are supported Third party copies can be done if the other endpoint supports it Dropbox/S3 ó DPM/dCache WebDAV Pipelining is done otherwise Dropbox/S3 ó FTS3 ó Storage S3 credentials need to be configured on the FTS3 Dropbox access require OAuth negotiation via WebFTS or manual configuration 3.5 will include support for short-lived S3 credentials For further details: fts-support@cern.ch 1/18/16 H. Riahi; CS3 workshop 12

Commissioning Exercise The exercise ran for 4 days Jobs were configured to stage-out the outputs via ASO from the grid to CERNBox ~ 10 expert users were submitting continuous load FTS3 @pilot has been used to not interfere with the production activity ~ 8 TB in ~ 1M of files have been moved from the grid to CERNBox during this exercise 1/18/16 H. Riahi; CS3 workshop 13

Results 400k g 200k dd ~ 2 times the total number of transferred files per day for the overall analysis activity in CMS ~ 10 times the number of users files transferred per day to the CMS site @CERN CPU usage in FTS production instances can reach up to 85% in high load conditions This traffic could interfere with other transfers 20% 10% 1/18/16 H. Riahi; CS3 workshop 14

Conclusions Integrating the usage of the end-user cloud storages for CMS analysis activity allows to users to: make use of their own storage in private/public clouds benefit of Sync and Share capabilities CERNBox has been integrated with success as a grid resource for the distributed data analysis Gained experience for transparent integration of end-users cloud storage in CMS computing infrastructure The data movement from grid to cloud, CERNBox, using FTS3 has shown good performance during the commissioning exercise Next: Expose and ramp-up the service in production for real physics analysis Explore the actual level of interest of CMS physics groups to Sync. and Share capabilities 1/18/16 H. Riahi; CS3 workshop 15

Thank you! hassen.riahi@cern.ch 1/18/16 H. Riahi; CS3 workshop 16