Resource control in ATLAS distributed data management: Rucio Accounting and Quotas



Similar documents
DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

A simple object storage system for web applications Dan Pollack AOL

Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems

Distributed Database Access in the LHC Computing Grid with CORAL

Status and Evolution of ATLAS Workload Management System PanDA

How to Choose Between Hadoop, NoSQL and RDBMS

Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group

Diagram 1: Islands of storage across a digital broadcast workflow

ATLAS job monitoring in the Dashboard Framework

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

SharePoint 2010 Interview Questions-Architect

ArcGIS for Server Deployment Scenarios An ArcGIS Server s architecture tour

Practical Cassandra. Vitalii

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

The Evolution of Cloud Computing in ATLAS

The Data Quality Monitoring Software for the CMS experiment at the LHC

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Survey of Filesystems for Embedded Linux. Presented by Gene Sally CELF

Mirjam van Olst. Best Practices & Considerations for Designing Your SharePoint Logical Architecture

Software Life-Cycle Management

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

TESTING AND OPTIMIZING WEB APPLICATION S PERFORMANCE AQA CASE STUDY

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Ceph. A complete introduction.

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

Web Application Hosting Cloud Architecture

NoSQL and Hadoop Technologies On Oracle Cloud

EDG Project: Database Management Services

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Copyright 1

SharePoint 2013 Logical Architecture

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

SharePoint 2010 Performance and Capacity Planning Best Practices

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

SSIS Scaling and Performance

The Hadoop Distributed File System

Monitoring individual traffic flows within the ATLAS TDAQ network

Developing Scalable Java Applications with Cacheonix

PARALLELS CLOUD STORAGE

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

GSX Monitor & Analyzer. for IBM Collaboration Suite

Big Data With Hadoop

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Programming in C# with Microsoft Visual Studio 2010

Data Grids. Lidan Wang April 5, 2007

Scala Storage Scale-Out Clustered Storage White Paper

W I S E. SQL Server 2012 Database Engine Technical Update WISE LTD.

Cloud Computing at Google. Architecture

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Design and Evolution of the Apache Hadoop File System(HDFS)

Evolution of Database Replication Technologies for WLCG

GIS Databases With focused on ArcSDE

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Hadoop IST 734 SS CHUNG

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

In Memory Accelerator for MongoDB

Rakam: Distributed Analytics API

Performance And Scalability In Oracle9i And SQL Server 2000

WINDOWS 2000 Training Division, NIC

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day

McAfee Web Gateway 7.4.1

10231B: Designing a Microsoft SharePoint 2010 Infrastructure

In-Memory Data Management for Enterprise Applications

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

Outline. Definition. Name spaces Name resolution Example: The Domain Name System Example: X.500, LDAP. Names, Identifiers and Addresses

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Simple Solution for a Location Service. Naming vs. Locating Entities. Forwarding Pointers (2) Forwarding Pointers (1)

Ceph. A file system a little bit different. Udo Seidel

Distributed File Systems

Performance Testing of Java Enterprise Systems

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Analisi di un servizio SRM: StoRM

The Google File System

The deployment of OHMS TM. in private cloud

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Transcription:

Resource control in ATLAS distributed data management: Rucio Accounting and Quotas Martin Barisits On behalf of the ATLAS Collaboration CERN PH-ADP, Geneva, Switzerland 13. April 2015 Martin Barisits CHEP 2015 13. April 2015 1 / 16

Rucio Rucio is the Distributed Data Management system of the ATLAS collaboration, which replaced DQ2 in 2014. Manages 160 PB of data (633 Mio replicas), 730 storage endpoints on more than 150 sites globally, More than 5000 users/applications. 240 Hz user/application frontend interaction rate. Architecture: Scalable distributed architecture, RESTful http interface to server, Oauth based authentication model, Client / server / daemon / resource layer. Martin Barisits CHEP 2015 13. April 2015 2 / 16

Rucio concepts Accounts represent users, groups or activities. Namespace: Data hierarchy with meta data support: Data name space is partitioned by scopes. Files can be grouped into datasets. Datasets can be grouped into containers. (scope, name) tuple identifies all data (Files, Datasets, Containers). Rucio Storage Element (RSE) logical abstraction for storage space. Can be assigned meta attributes. Replica management based on replication rules and RSE expressions. Synchronous accounting: Counters per RSE and per (Account, RSE) tuple. Asynchronous accounting: Many different views (based on dataset metadata) calculated in Hadoop and provided as dumps. Martin Barisits CHEP 2015 13. April 2015 3 / 16

Rucio concept: RSE expressions RSE expressions 1 use the RSE expression language to define a set of RSEs. RSE names can be directly used: SITEA SITEB Meta-attributes on RSEs: type=tape&country=de Set operators: (Intersect) represented by & (Union) represented by \ (Complement) represented by \ Order of operations can be given by brackets (, ) Examples: (type=tape&country=de)\sitea (tier=2\country=de) CERN_EOSDISK 1 Barisits et al. ATLAS Replica Management in Rucio, CHEP2013 Martin Barisits CHEP 2015 13. April 2015 4 / 16

Rucio concept: Replication rules Replica Management is based on replication rules set on files, datasets or containers. A replication rule consists of an RSE expression, defining a set of possible destination RSEs and the number of replicas it should create. Example: A user wants to replicate a dataset to two tier-2 sites in the United Kingdom. Data identifier: usera:ds1 copies: 2 RSE expression: tier=2&country=uk Rucio picks two RSEs out of the set specified by the RSE expression. Selection is based on existing replicas, quota, rule weights or just done randomly. Martin Barisits CHEP 2015 13. April 2015 5 / 16

About quota Goal: Quotas should partition the storage for different users. Quotas help in protecting the storage from overload by a single user. Types: Usage quota (or block quota): Based on volume (bytes). File quota (or inode quota): Based on number of files. Hard and Soft quota, allowing users to temporarily violate their quota. Complexity for distributed systems: Performance of accounting the used space. Usually organizations want have quotas for a set of storage areas and not only a single one. (Quota policy) Performance of identifying the applying quotas. Martin Barisits CHEP 2015 13. April 2015 6 / 16

Quota history DQ2: Passive quota based on replica accounting reports. Checked by a script generating automatic email alarms (Soft quota). Rucio (Quota v1): Simple quota per (account, RSE, bytes) tuple; bytes possible. Quotas are hard and instantly deny access to the resource. If a quota doesn t exist access to the resource will be denied. Rucio (Quota v2, described in this presentation) Allows quota on a group of resources (RSEs). Allows overlapping quotas and handles exceptions. Allows quota on # of replicas (File/inode quota). Martin Barisits CHEP 2015 13. April 2015 7 / 16

Quota requirements DQ2-like passive quotas: No performance requirement, as calculated passively. Problem is that the system usually notifies too late, when the damage is already done and requires painful, manual, cleanup. Rucio v1 qutoas: Architecture & workflow need to provide performance to do this actively. Support storage partitioning. Single-RSE quotas can only hardly be used to enforce advanced quota policy. (e.g.: A user can store 5TB of data on all Scratchdisks) Rucio v2 quotas: Performance requirement even harder to meet due to complexity. Support both storage partitioning and advanced quota policies. Martin Barisits CHEP 2015 13. April 2015 8 / 16

Quota concept A quota can be defined as (account, RSE expression, bytes, files, level) tuple. The account describes the account the quota is for, e.g. jdoe. The RSE expression describes the RSEs this quota is enforced for, e.g. country=de&disk=1. bytes gives the number of bytes, files the number of files to enforce. level is used in case of overlapping quotas to allow exceptions. Accounting for used resources, per (account, RSE) is done synchronously in the system. level attribute is used to manage overlapping quotas: Higher level quota overrides setting of lower level quota. Quotas with same level: The higher bytes/files value is taken. Martin Barisits CHEP 2015 13. April 2015 9 / 16

Quota concept - examples Users have a global quota of 20TB on all Scratchdisks: (account = jdoe, RSE expression = "scratchdisk=1", bytes = 20TB, files =, level = 0) But, users should not have more than 2TB on a single Scratchdisk: (jdoe, "CERN_SCRATCHDISK", 2TB,, 1) But user jdoe is a power-user of BNL-OSG2_SCRATCHDISK: (jdoe, "BNL-OSG2_SCRATCHDISK",,, 2) Martin Barisits CHEP 2015 13. April 2015 10 / 16

Architecture I/III Quotas are checked actively by the servers or daemons when creating rules. Current usage counters are managed actively, and lookups by primary key for the usage of (bytes, files) for an (account, RSE). Table size: O(#accounts #RSEs) Lookup: Fast Quotas are stored in a table indexed on the account column, to quickly retrieve all applicable quotas for an account. Table size: O(#accounts n) n #RSEs Lookup: Fast For privileged accounts (root, ddmadmin, etc.) quotas are disabled and no lookups are made in the workflow. Martin Barisits CHEP 2015 13. April 2015 11 / 16

Architecture II/III Bottleneck is the identification of applicable quotas if a user wants to write to an RSE 1. This is due to the case that, while quotas for an account can be retrieved fast, they are in the form (account, RSE expression,...) tuples. This means, the system has to evaluate the RSE expression of each quota to decide if RSE 1 is part of it. This is potentially slow. Martin Barisits CHEP 2015 13. April 2015 12 / 16

Architecture III/III The solution to solve this bottleneck is caching as the results of RSE Expression evaluations only change very rarely. A daemon reads all unique RSE Expressions from the quota and evaluates them regularily. It stores the (RSE expression, [RSE]) tuple in a cache. Lookups of RSE Expressions in the cache are very fast. The selection of quotas and resolving of overlapping quotas is purely done on the server/daemon side without the database. Changes in quotas are enforced immediately. Changes in RSE Expressions (e.g. a new scratchdisk is added) take one cache lifetime to be enforced. Martin Barisits CHEP 2015 13. April 2015 13 / 16

Workflow Resolve the RSE expression of the rule privileged accounts Create replication rule Get current usage values for the RSEs Get quotas of the account from the DB Get the RSEs of the quotas from the cache Calculate quotas DB Daemon: Populate cache Cache Figure: Rule creation Martin Barisits CHEP 2015 13. April 2015 14 / 16

Evaluation Evaluation is difficult, as Rucio quota (v2) allows a very different (and more complex) policy. Compared are the quota timings of random rule creations in production with quota v1 vs. random rule creation in the test framework with quota (v2). Figure: Quota checks [ms] during a period of 24h Martin Barisits CHEP 2015 13. April 2015 15 / 16

Conclusion Presented a quota framework for Rucio which offers a wide toolset to implement the data policies of the collaboration, allowing complex overlapping quotas with exceptions. The architecture solving the performance critical parts of the system was described, and the workflow of the system was shown. Future work includes quotas on the number of concurrent transfers and volume of concurrent transfers. Martin Barisits CHEP 2015 13. April 2015 16 / 16

Resource control in ATLAS distributed data management: Rucio Accounting and Quotas Martin Barisits On behalf of the ATLAS Collaboration CERN PH-ADP, Geneva, Switzerland 13. April 2015 Martin Barisits CHEP 2015 13. April 2015 16 / 16