irods at CC-IN2P3: managing petabytes of data

Similar documents
Data storage services at CC-IN2P3

Data Management using irods

High Availability Databases based on Oracle 10g RAC on Linux

The LSST Data management and French computing activities. Dominique Fouchez on behalf of the IN2P3 Computing Team. LSST France April 8th,2015

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

KIT Site Report. Andreas Petzold. STEINBUCH CENTRE FOR COMPUTING - SCC

Technical. Overview. ~ a ~ irods version 4.x

Mass Storage System for Disk and Tape resources at the Tier1.

Tier0 plans and security and backup policy proposals

Solution Brief: Creating Avid Project Archives

(Scale Out NAS System)

The Archival Upheaval Petabyte Pandemonium Developing Your Game Plan Fred Moore President

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

Preview of a Novel Architecture for Large Scale Storage

CLOUD API DOCUMENTATION v2.0. Get list of cloud servers in account

Global Grid User Support - GGUS - in the LCG & EGEE environment

Patrick Foubert. IT & Datacenter Manager

High Performance Storage System. Overview

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Automated and Scalable Data Management System for Genome Sequencing Data

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

INTRODUCTION TO CLOUD MANAGEMENT

Solution for private cloud computing

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

<Insert Picture Here> Cloud Archive Trends and Challenges PASIG Winter 2012

Configuring Sun StorageTek SL500 tape library for Amanda Enterprise backup software

Archival Storage At LANL Past, Present and Future

Exam : IBM : IBM NEDC Technical Leader. Version : R6.1

Mass Storage at GridKa

High Performance Storage System Interfaces. Harry Hulen

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

<Insert Picture Here> Infrastructure as a Service (IaaS) Cloud Computing for Enterprises

Seriously: Tape Only Backup Systems are Dead, Dead, Dead!

The dcache Storage Element

RESPONSES TO QUESTIONS AND REQUESTS FOR CLARIFICATION Updated 7/1/15 (Question 53 and 54)

Introduction to LSST Data Management. Jeffrey Kantor Data Management Project Manager

GPFS und HPSS am HLRS

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

XenData Product Brief: SX-550 Series Servers for LTO Archives

Red Hat Storage Server

Advanced 100 GB storage space. Unlimited monthly bandwidth. Pro 150 GB storage space. Unlimited monthly bandwidth. Horde Squirrelmail Round Cube Mail

IRODS use case : Ciment, the Univ. Grenoble-Alpes HPC center. B.Bzeznik / X.Briand Irods users group meeting 11/06/2015

Globus and the Centralized Research Data Infrastructure at CU Boulder

HC INSTALLATION GUIDE. For Linux. Hosting Controller All Rights Reserved.

Linux Server Backup Advanced Getting Started Guide

This guide specifies the required and supported system elements for the application.

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Reference Architectures for Repositories and Preservation Archiving

Global Grid User Support - GGUS - start up schedule

Online Backup Client User Manual

XenData Product Brief: SX-250 Archive Server for LTO

The DESY Big-Data Cloud System

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

SAN TECHNICAL - DETAILS/ SPECIFICATIONS

Linux Server Backup Enterprise Getting Started Guide

Data Centers and Cloud Computing. Data Centers

Scala Storage Scale-Out Clustered Storage White Paper

Data management challenges in todays Healthcare and Life Sciences ecosystems

Data Centers and Cloud Computing

Big Data and Cloud Computing for GHRSST

TRANSFORMING DATA PROTECTION

How To Manage Your Digital Assets On A Computer Or Tablet Device

Zend Server 4.0 Beta 2 Release Announcement What s new in Zend Server 4.0 Beta 2 Updates and Improvements Resolved Issues Installation Issues

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento

PARALLELS SERVER BARE METAL 5.0 README

Media Cloud Building Practicalities

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Why long time storage does not equate to archive

Transcription:

Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h Rachid Lemrani

What is CC-IN2P3? IN2P3: one of the 10 institutes of CNRS. 19 labs dedicated to research in high energy, nuclear physics, astroparticles. CC-IN2P3: computing resources provider for experiments supported by IN2P3 (own projects and international collaborations). resources opened both to french and foreign scientists. irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 2

CC-IN2P3: some facts and figures } CC-IN2P3 provides: Storage and computing resources: Local, grid and cloud access to the resources. Database services. Hosting web sites, mail services. } 2100 local active users (even more with grid users): including 600 foreign users. } ~ 140 active groups (lab, experiment, project). } ~ 20000 cores batch system. } ~ 40 PBs of data stored on disk and tapes. irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 3

Storage at CC-IN2P3: disk Hardware Software Direct Attached Storage servers (DAS): Dell servers (R720xd + MD1200) ~ 240 servers Capacity: 12 PBs Disk attached via SAS: Dell servers ( R620 + MD3260) Capacity: 1.7 PBs Storage Area Network disk arrays (SAN): IBM V7000 and DCS3700, Pillar Axiom. Capacity: 240 TBs Parallel File System: GPFS (1.9 PBs) File servers: xrootd, dcache (10.6 PBs) Used for High Energy Physics (LHC etc ) Mass Storage System: HPSS (600 TBs) Used as a disk cache in front of the tapes. Middlewares: SRM (none), irods (840 TBs) Databases: mysql, PostGres, Oracle (57 TBs) irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 4

Storage at CC-IN2P3: tapes Hardware 4 Oracle/STK SL8500 librairies: 40,000 slots (T10K and LTO4) Max capacity: 320 PBs (with T10KD tapes) 106 tape drives 1 IBM TS3500 library: 3500 slots (LTO6) Software Mass Storage System: HPSS 24 PBs Max traffic (from HPSS): 100 TBs / day Interfaced with our disk services Backup service: TSM (1 PB) irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 5

SRB irods at CC-IN2P3: a little bit of history } } 2002: first SRB installation. 2003: put in production for CMS (CERN) and BaBar (SLAC). } 2004: CMS: data challenges. BaBar: adopted for data import from SLAC to CC-IN2P3. } } } } } 2005: new groups using SRB: biology, astrophysics 2006: first irods installation, beginning contribution to the software. 2008: first groups in production on irods. 2010: 2 PBytes in SRB. 2009 until now: SRB phased out (2013) and migration to irods. Evergrowing number of groups using our irods services. irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 6

Server side architecture Database cluster: Oracle 11g RAC 15 Data Servers (DAS): 840 TBs HPSS 100 Gbps icat Server ccirods (DNS alias) icat Server clients irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 7

Features used on the server side } irods interfaced with: HPSS. Fedora Commons (fuse). Web servers (fuse). } Rules: irods disk cache management (purging older files when quota reached). Automatic replications to HPSS or other sites. Automatic metadata extraction and ingestion into irods (biomedical field). Customized ACLs. External database feeding within workflows. irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 8

irods users profile @ CC-IN2P3 } Researchers of various disciplines: Data sharing, management and distribution. Data processing. Data archival. Physics: High Energy Physics Nuclear Physics Astroparticle Astrophysics Fluid mechanics Nanotechnology Biology: Genetics, phylogenetics Ecology Biomedical: Neuroscience Medical imagery Pharmacology (in silico) Arts and Humanities: Archeology Digital preservation Economic studies Computer science irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 9

irods @ CC-IN2P3: some of the users irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 10

irods in a few numbers } 23 zones. } 39 groups. } 469 users: Maximum of 800k connections per day. Maximum of 6.4m connections per month. } 80 millions of files. } 8560 TBs of data as of today: Up to +30 TBs growing rate per day. irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 11

irods in a few numbers irods storage evolution (PBs): 2012-2014 irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 12

On the client side } Clients OS: Linux (Ubuntu, Debian, Suse, Scientific Linux, CentOS ) Mac OSX Windows } Using: icommands C or Java APIs Fuse Parrot CC-IN2P3 provides the PHP web browser: Also testing irestserver from myirods irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 13

Performance tests: parrot, fuse, icommands (Credit: Quentin Le Boulc h) Download files of different size from irods using: fuse iget Parrot è Parrot performances closed to icommands Upload files of different size to irods using: fuse iput Parrot è Parrot performances closed to icommands è Fuse performances differences between uploads and downloads. irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 14

Biomedical example A quantitative model of thrombosis in intracranial aneurysms http://www.throbus-vph.eu Virtual simulation of the thrombosis. Partners to correlate any type of data in case simultaneous multidisciplinary analysis is required. Data flow Multiple Patient Data irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 15

Biomedical example: neuroscience Epilepsy treatment irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 16

Arts and Humanities example irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 17

Astrophysics example: LSST } CC-IN2P3 : half of Data Release Production will host all the processed data } irods : Data Management : Raw images & Processed data Data Transfers CC-IN2P3 NCSA Archival of 1 PB of data for the camera studies produced at SLAC? } Data Challenge 2013 : SDSS Data processed by CC-IN2P3 and NCSA Results shared using irods : ~100 TB Disks interfaced with Tapes (HPSS) } ~100s of PBs expected in 2030? irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 18

High Energy Physics example: BaBar archival in Lyon of the entire BaBar data set (total of 2 PBs). automatic transfer from tape to tape: 3 TBs/day (no limitation). automatic recovery of faulty transfers. ability for a SLAC admin to recover files directly from the CC-IN2P3 zone if data lost at SLAC. irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 19

Prospects } irods: key application for IN2P3 data management. new big projects joining: LSST, Euclid. user community still growing. } Our concerns: scalibility: database connections pooling needed. irods v4.x : OS portability on various systems. «build in place» installation. Oracle support. } Our needs: improvement in the Connection Control mechanism: interested to participate. rule naming and priorities on rules (can have tens of thousands of rules to be executed). SSL for uploads and downloads. REST APIs. irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 20

Acknowledgement At CC-IN2P3: } Pascal Calvat (user support: biology/biomedical apps, client developments) } Yonny Cardenas (user support: biology/biomedical apps, client developments, rules) } Rachid Lemrani (user support: astroparticle/astrophysics) } Quentin Le Boulc h (user support: astroparticle/astrophysics) } Thomas Kachelhoffer (MRTG monitoring) At Huma-num: } Pierre-Yves Jallud (user support: Arts and Humanities) At SLAC: } Wilko Kroeger (irods administrator) irods at CC-IN2P3 irods user meeting 2014, Boston 06-19-2014 21