Data storage at CERN

Similar documents
DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

Mass Storage System for Disk and Tape resources at the Tier1.

Next Generation Tier 1 Storage

Tier0 plans and security and backup policy proposals

Data storage services at CC-IN2P3

HEP computing and Grid computing & Big Data

HTCondor at the RAL Tier-1

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

The dcache Storage Element

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Managed GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team

U-LITE Network Infrastructure

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

Evolution of Database Replication Technologies for WLCG

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

The DESY Big-Data Cloud System

Computing at the HL-LHC

CERNBox + EOS: Cloud Storage for Science

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Patrick Fuhrmann. The DESY Storage Cloud

Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version

Hadoop: Embracing future hardware

(Scale Out NAS System)

LHC schedule: what does it imply for SRM deployment? CERN, July 2007

Report from SARA/NIKHEF T1 and associated T2s

Self service for software development tools

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

KIT Site Report. Andreas Petzold. STEINBUCH CENTRE FOR COMPUTING - SCC

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

Analisi di un servizio SRM: StoRM

High Availability Databases based on Oracle 10g RAC on Linux

GPFS Cloud ILM. IBM Research - Zurich. Storage Research Technology Outlook

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Scientific Computing Data Management Visions

Preview of a Novel Architecture for Large Scale Storage

Long term retention and archiving the challenges and the solution

SURFsara Data Services

Linux and the Higgs Particle

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

Investigation of storage options for scientific computing on Grid and Cloud facilities

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE. Matt Kixmoeller, Pure Storage

Client-aware Cloud Storage

巨 量 資 料 分 層 儲 存 解 決 方 案

High Performance Computing OpenStack Options. September 22, 2015

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Prototyping a file sharing and synchronisation platform with owncloud

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

Building Storage Service in a Private Cloud

IMPLEMENTING GREEN IT

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Online data handling with Lustre at the CMS experiment

Solution for private cloud computing

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

OSG Hadoop is packaged into rpms for SL4, SL5 by Caltech BeStMan, gridftp backend

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Low-cost

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

The Panasas Parallel Storage Cluster. Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Optimize the execution of local physics analysis workflows using Hadoop

POSIX and Object Distributed Storage Systems

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Getting performance & scalability on standard platforms, the Object vs Block storage debate. Copyright 2013 MPSTOR LTD. All rights reserved.

WHITE PAPER 1

Transcription:

Data storage at CERN Overview: Some CERN / HEP specifics Where does the data come from, what happens to it General-purpose data storage @ CERN Outlook EAKC2014 Data at CERN J.Iven - 1

CERN vs Experiments CERN = (Conseil* Européen* pour la Recherche Nucléaire*) European Organization for Nuclear Research Est.1954, international treaty, 21 states Provides lab facilities: water, electricity, cooling, offices, network, computing, various flavours of particle beams,.. ~2300 staff clerical, engineers, firemen,.. Experiments: international scientific collaborations, own funding HEP : high-energy physics Build & install detectors Use lab facilities ( MoU ) Computing: yearly review Generate & use & manage data 10...3000 physicists each WLCG: computing grid for LHC *: wrong, nowadays EAKC2014 Data at CERN J.Iven - 2

Data taking in HEP Think digital video camera.. (wikipedia) Unwieldy & complicated & expensive EAKC2014 Data at CERN J.Iven - 3

Data taking.. but gives 4d pictures Significant postprocessing required: calibration, track reconstruction.. and subsequent analysis Result: good statistics on very rare events scientific papers. EAKC2014 Data at CERN J.Iven - 4

Tier 0 at CERN: Acquisition, First pass reconstruction, & Distribution CDR: central data recording 200(-400) MB/sec 1(-2) GB/sec 2011: 4(-6) GB/sec EAKC2014 Data at CERN J.Iven - 5

Some history of scale Date Late 1950's Collaboration sizes 2-3 Data volume, archive technology Kilobits, notebooks 1960 s 10-15 1970 s ~35 MB, tape 1980 s ~100 GB, tape, disk 1990 s 700-800 TB, tape, disk 2010 s ~3000 PB, tape, disk kb, punchcards For comparison: 1990 s: Total LEP data set ~few TB Would fit on 1 tape today Today: 1 year of LHC data ~25 PB EAKC2014 Data at CERN J.Iven - 6

HEP data flow - schematic Filter Online Buffer Experiment reconstruction T0 Disk T0 Tape CERN Custodial copy T1 Disk T1 Tape else....where analysis T2 Disk Working copies EAKC2014 Data at CERN J.Iven - 7

HEP data flow - realistic Filter Online Buffer Experiment merge compress reconstruction T0 Disk T0 Disk CERN fast analysis analysis analysis T0 Tape T1 Disk T1 Tape Side note: CERN is small.. T2 Disk EAKC2014 Data at CERN J.Iven - 8

Physics data - CERN IT part Batch Computing EOS CASTOR CERN Physics systems in CERN-IT: CASTOR: HSM EOS: diskonly low-latency access, recent Both: Homegrown [Non HEP]-standard protocols (XrootD, RFIO, SRM, gridftp) EAKC2014 Data at CERN J.Iven - 9

CASTOR HSM 92PB 316M 350k SRM rfio xroot gridftp Client Born in 1999 Common Namespace Main Role: data recording, Tier-1 data export, production activities Mainly tape-backed data Focus on tape performance (latency can be high) Database centric Not optimized for concurrent access: Stager STd NSd TMd TGd Diskserver DMd DB... DB NSd vdqm Diskserver Tapeserver DMd Td - Currently Raid-1 configuration Central Aimed at DAQ activities: limited transfer slots = QoS No (real) quotas.. vmgr Tapeserver... Td

CASTOR: current setup 7 instances: - LHCs: ATLAS, ALICE, CMS and LHCb - PUBLIC: users, non-lhc experiments and specific pool for DAQ activities for AMS, COMPASS, NA61, NA62 - Repack (tape media migration and compacting) and PPS (pre-production) Totals: - 92PB, 316M files, ~650 diskservers Mature release cycle: - ~1 major release per year In production also at RAL and ASGC

CASTOR historical data LHC stop, EOS LHC first beam Tape cold data verification EOS migration Repacking tapes is major activity

EOS 25PB 158M 73k xrdcp gridftp SRM FUSE ROOT Client (xrootd) Born in 2010 in-memory namespace split per instance Main role: end-user analysis Disk-only storage Focus on low latency Optimized for concurrency Multi-replica on different diskservers No limit on transfer slots throttle via overload Quota system: users&groups (for volume and files) Strong authentication: krb5, X509 Diskserver: JBOD configuration Headnode MGM NS Auth MQ Sch MB Diskserver Diskserver FST FST FS SM TE... FS TE

EOS: current setup 6 instances: - LHCs: ATLAS, ALICE, CMS and LHCb - PUBLIC: recently deployed - AMS and COMPASS experiments Totals: - 20PB, 158M files, ~1100 diskservers Release lifecycle driven by functionality - ~2 major release per year, constant updates Used at Fermilab (Tier-3 functionality)

PUBLIC EOS deployment ATLAS CMS ALICE LHCb 06/2012 06/2011 06/2013 09/2013

other data storage services (Data storage everywhere at CERN...) Experiments: ex. DAQ 'disk buffers' up to several days of data taking Often: prototype solution, but trouble with running long-term Department & group-scale independent solutions CERN IT services ought to cover these.... but not always do. ( NIH -syndrome?) (Structured data / databases not considered here) Here: looking at CERN IT(-DSS) services. EAKC2014 Data at CERN J.Iven - 16

General-Purpose storages General-purpose shared filesystems Home directories = Untrusted clients = strong authentication AFS Linux/Mac - (see CERN site report), 950TB, 3G files DFS - Windows-only Future: (NFSv4), (FUSE-mounted EOS), (local FS+OwnCloud) cluster filesystems Typically: weak authentication Not used for computing at CERN general problem: open network + (too) many machines CERN computing is embarrassingly parallel CERN computing is worldwide (experiments have own networks) NetApp Filers: NFSv3 (190TB, 210M files) Re-export: CVMFS, higher-level services (TWiki, VCs,..), DB standard, will not void warranties EAKC2014 Data at CERN J.Iven - 17

PB Archive/tape TSM (9PB, 2.1G files) (See CERN AFS Backup talk) Most machines are not backed up (CASTOR) (EOS reed-solomon) Block storage: CEPH R&D: Hadoop/HDFS, Huawei S3, SWIFT,.. lots of auxiliary storage services protocol gateways (SRM, gridftp, http/webdav,..) File transfer engines (FTS), OwnCloud Bookkeeping, accounting, monitoring,... EAKC2014 Data at CERN J.Iven - 18

Looks like a zoo? Yes. But: different Dimensions: Usage Size/#Files I/O pattern Layering Lifecycle (eval/prototype/production/legacy) tool for the job -approach (and a bit of history. And Co-Evolution at work..) Historically, lots of use cases started on AFS..... and once big enough, moved somewhere else. Mostly. EAKC2014 Data at CERN J.Iven - 19

DSS Size matters 1k (Orders-of-magnitude only. No sweat.) concurrency AFS CEPH 100 EOS CASTOR 10 TSM local ms latency s Map new use cases to our toolbox CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/i t TSM 109 CEPH AFS files Size, performance Acceptable limitations h 106 local TB CASTOR EOS space PB

Our building blocks batchusserverus communalus 32..64GB RAM; 2..3 HD; Intel or AMD; ScientificLinuxCERN Used for (replicated) headnodes, metadata servers + SATA tray(s) + 10GbE discusserverus vulgaris 24x 3TB + 3x2TB, 1x 10GbE, used by EOS, CASTOR, AFS, CEPH, TSM, HDFS $$$/TBmonth varies by replication factor (+manpower) Build up higher-value services from more basic services: OwnCloud = EOS+FUSE+HTTP or NetApp+HTTP CVMFS = NetApp+HTTP AFS backups = (TSM or CASTOR) + scripts Redo when needed EAKC2014 Data at CERN J.Iven - 21

All is well? Problems / inefficiencies: Manpower issues: training, split But: No formal standby for any data service) Overprovisioning, per-service safety margins But: tend to split too-big services anyway into instances One size fits all rarely does c.f AFS servers are half-empty not enough cold data idle CPU capacity on diskservers Spare disks on other machines (CC has 107PB raw disk) Allocation & procurement cycle is sloow Formal tendering, anti-corruption safeguards, national interests.. EAKC2014 Data at CERN J.Iven - 22

Current: Huge interest in CEPH Nice 'blocks' -fit at several layers Have 3PB 'prototype' Used for OpenStack (VM images+volumes) Ogled by CASTOR, EOS, (AFS), (NFS) To replace current talk-to-disk-and-handle-errors layer (see D. Van der Steer Ceph storage for the cloud, http://indico.cern.ch/event/300076/ ) EAKC2014 Data at CERN J.Iven - 23

Outlook 2nd computer center in Wigner Institute/Budapest LAN access now means 23ms.. Currently LHC is not running (LS1) Restart for run 2 early 2015 expect higher data rates (~2x) and different data flows Other experiments will start earlier Run 3: 75GB/s from ALICE? RAID works less and less (disk size vs reconstruction speed) : forced to RAIN EAKC2014 Data at CERN J.Iven - 24

Summary Some CERN data storage use cases are special, some are not Toolbox / building block approach Common HW reduces cost But manpower / know-how is an issue Nothing cast in stone but some upheavals take a long time.. EAKC2014 Data at CERN J.Iven - 25