Managed Storage @ GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team

Similar documents
The dcache Storage Element

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

Mass Storage at GridKa

Data storage services at CC-IN2P3

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Summer Student Project Report

Analisi di un servizio SRM: StoRM

Chapter 12 Distributed Storage

Report from SARA/NIKHEF T1 and associated T2s

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

10th TF-Storage Meeting

Patrick Fuhrmann. The DESY Storage Cloud

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Next Generation Tier 1 Storage

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

KIT Site Report. Andreas Petzold. STEINBUCH CENTRE FOR COMPUTING - SCC

Chapter 11 Distributed File Systems. Distributed File Systems

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

HADOOP, a newly emerged Java-based software framework, Hadoop Distributed File System for the Grid

Sun Constellation System: The Open Petascale Computing Architecture

Configuring Apache Derby for Performance and Durability Olav Sandstå

DESYcloud: an owncloud & dcache update

The DESY Big-Data Cloud System

Four Reasons To Start Working With NFSv4.1 Now

Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing

Why is it a better NFS server for Enterprise NAS?

Tier Architectures. Kathleen Durant CS 3200

Storage Resource Managers: Middleware Components for Grid Storage

Big data management with IBM General Parallel File System

(Scale Out NAS System)

Research Data Storage Infrastructure (RDSI) Project. DaSh Straw-Man

Diagram 1: Islands of storage across a digital broadcast workflow

Data Movement and Storage. Drew Dolgert and previous contributors

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Creating a Cloud Backup Service. Deon George

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Hadoop: Embracing future hardware

Configuring Apache HTTP Server as a Reverse Proxy Server for SAS 9.3 Web Applications Deployed on Oracle WebLogic Server

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Web Service Based Data Management for Grid Applications

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Grid Computing in Aachen

ZCP 7.0 (build 41322) Zarafa Collaboration Platform. Zarafa Archiver Deployment Guide

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Transparent Optimization of Grid Server Selection with Real-Time Passive Network Measurements. Marcia Zangrilli and Bruce Lowekamp

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Understanding Enterprise NAS

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

Using Linux Clusters as VoD Servers

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

HAMBURG ZEUTHEN. DESY Tier 2 and NAF. Peter Wegner, Birgit Lewendel for DESY-IT/DV. Tier 2: Status and News NAF: Status, Plans and Questions

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Scala Storage Scale-Out Clustered Storage White Paper

INUVIKA OPEN VIRTUAL DESKTOP FOUNDATION SERVER

Application Design and Development

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

A Survey of Shared File Systems

Private Cloud Storage for Media Applications. Bang Chang Vice President, Broadcast Servers and Storage

EVault Technology Build Cloud-Connected Backup and Recovery Services for Datacenter

Introduction to NetApp Infinite Volume

Designing a Cloud Storage System

ETERNUS CS High End Unified Data Protection

Distributed File Systems

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

irods in complying with Public Research Policy

Transcription:

Managed Storage @ GRID or why NFSv4.1 is not enough Tigran Mkrtchyan for dcache Team

What the hell do physicists do? Physicist are hackers they just want to know how things works. In moder physics given cause does not produce same effect. Statistic is used to describe behavior. Physics data is IMMUTABLE : you keep it forever or you removed it, but you never FIX it!

Right tool for right job Large Hadron Collider: Expected start July 2008 800 million collisions per second (25 km long) Data rate ~ 1.5 GB per second ~15PB per year

Multiple tier model

GRID as core infrastructure GRID middleware applied to solve two major goals: Physical space, power, cooling, connectivity Political let regional investors to spend many for regional centers

What is a GRID? The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid.

What is a GRID? The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid.

What is a GRID? The term Grid computing originated in the early 1990s While as a or metaphor most of the for people making GRID computer is a distributed power as CPU easy recourses, to access as an it's electric all about power distributed grid. storage!

Storage Resource Manager To hide storage system implementation a top level management interface was defined - SRM. SRM together with 'Information Provider', which allows to query storage system called 'Storage Element (SE)'

Storage Resource Manager Storage Resource Managers (SRMs) are middleware components whose function is to provide dynamic space allocation and file management on shared storage components on the Grid. SRM interface defines following functions: Data Transfer File Pining/UnPining Space Management Request Status queries Directory operations Permission management

SRM Data Transfer SRM data transfer based on two concepts: SURL and TURL. SURL - is a site URL which consists of srm://host.at.site/<path>. TURL - is the transfer URL that an SRM returns to a client for the client to get or put a file in that location. It consists of protocol://tfn, where the protocol must be a specific transfer protocol selected by SRM from the list of protocols provided by the client. SRM behaves as a load balancer and redirector de facto, GSI enabled FTP protocol is used for transfers

SRM PUT (ftp) SRM get TURL SRM data flow ftp Client ftp write dcache Dest.

SRM GET (ftp) SRM get TURL SRM data flow ftp Client ftp READ dcache Dest.

SRM for pnfs people Client Server Data Server PUT/GET TURL LOCK + LAYOUTGET on bunch of files READ/WRITE READ/WRITE DONE SUCCESS/FAIL UNLOCK LAYOUTRETUN DESTROY_SESSION on bunch of files

SRM COPY-PUSH SRM Client PUT SRM

SRM COPY-PUSH SRM Client PUT Need It! SRM

SRM COPY-PULL SRM GET SRM Client

SRM COPY-PULL SRM GET Need It! SRM Client

SRM Space Management allows to reserve space prior the transfer Quota system, where you never get file system full has three space descriptions and allows transitions between them: CUSTODIAL, ONLINE (Tape1Disk1) CUSTODIAL, NEARLINE (Tape1Disk0) REPLICA, ONLINE (Tape0Disk1)

SRM Space Management (use case) Reserve space for data set in T0D1 space Put data set into space Analyze Dataset No Is Data OK? Yes Remove dataset Move dataset into T1Dx space Send dataset to remote site

Need It! SRM Space Management (use case) Reserve space for data set in T0D1 space Put data set into space Analyze Dataset No Is Data OK? Yes Remove dataset Move dataset into T1Dx space Send dataset to remote site

GRID Security Need It! X.509 based certificates extensions for Virtual Organizations (VO) support no trusted hosts subject : /O=GermanGrid/OU=DESY/CN=Tigran Mkrtchyan/CN=proxy issuer : /O=GermanGrid/OU=DESY/CN=Tigran Mkrtchyan identity : /O=GermanGrid/OU=DESY/CN=Tigran Mkrtchyan type : proxy strength : 512 bits timeleft : 11:59:40 === VO desy extension information === VO : desy subject : /O=GermanGrid/OU=DESY/CN=Tigran Mkrtchyan issuer : /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de attribute : /desy/role=null/capability=null timeleft : 11:59:40

SRM Uniform Data Access SRM Client SRM Interface dcache +MSS dcache CASTOR DPM

SRM Uniform Data Access Need It! SRM Client SRM Interface dcache +MSS dcache CASTOR DPM

Mission impossible We are doing well!

Mission impossible dcache installations We are doing well!

dcache - Background 10-9 10-4 Access Time & Size CPU Cache Disk Array Price 10 3 Tape Storage

dcache - Background 10-9 Access Time & Size CPU Cache Disk Array Price 10-4 dcache 10 3 Tape Storage

The goal of the project is: to share and optimize access to non-sharable storage devices, like tape drives, make use of slower and cheaper drive technology without overall performance reduction, to provide a system for storing and retrieving huge amounts of data, distributed among a large number of heterogeneous server nodes, under a single virtual filesystem tree with a variety of standard access methods.

Requirement is: to provide a system for storing and retrieving huge amounts of data, distributed among a large number of heterogeneous server nodes, under a single virtual filesystem tree with a variety of standard access methods.

dcache Design Name Space Provider size, owner, acls, checksum,... Pool Selection Unit Protocol Specific Doors Multiprotocol Pools can talk several protocols simultaneously

dcache Design Namespace operations Namespace IO Client IO requests IO IO Door IO Door IO Door Door Data server selection Disk Pools

dcache Design Pools are grouped into PoolGroups PoolGroup selected by flow direction, 'path'(file set), protocol and client IP Pool selected by cost, where cost is n*<cpu cost> + m*<space cost> n=1, m=0 : fill network bandwidth first n=0, m=1 : fill empty servers first

MSS connectivity Write Disk Pools Files arrives to a pool and declared as Precious Flush Disk Pools Precious files flushed according policy - time, size, number of files.

MSS connectivity Read Cached files can be delivered immediately Disk Pools Read Restore Missing files retrieved from the MSS first Disk Pools

MSS connectivity Write Write Pools Flush Pool 2 Pool on request Read Restore Read Pools

Current Status dcache let us build very large (capacity and bandwidth wise) storage system with small, independent building blocks building block need to provide: JVM >= 1.5 (all components are Java based) local filesystem network Interface no IO penalty while using Java

Current Status Project started June of 2000 as a join effort of DESY and FNAL First prototype April 2001 In Production since March 2002 Supported local access Protocols: dcap, xrootd Supported WAN access Protocols: ftp, http Deployed on AIX, Linux (x86, Power, x64), Solaris (Sparc, AMD) Run over country border Has an interface to OSM, Enstore, HPSS, TSM, DMF easy to add any other MSS Largest Installation 2PB (FNAL) ~1800 pools ~1.2 GB/s WAN (Peak rate 2.5 GB/s!) 60 TB/day read ( 100000 files! ) 2 TB/day write (8000 files )

Current Status (NorduGrid) dcache Pools dcache Core Service

Pnfs!= pnfs The dcache's Namespace provider called Pnfs: Perfectly Normal File System developed in 1997 and currently in replacement.

Uniform Data Access dcap SRM Client SRM Interface rfio-castor rfio-dpm dcache +MSS dcache CASTOR DPM

Why new protocols There is a three 'popular' protocols used in High Energy Physics: dcap dcache Access Protocol rfio Remote File IO xroot extended ROOT IO all protocols was designed, while NFSv2/3 was not distributed existing distributed solutions not fit well and expensive ( all of them ) and require special hardware or require special OS/kernel versions

NFSv4.1 fit well to dcache (and others) architecture Open Standard Protocol supported by industry NFSv4.1 Client comes 'for free' with Operating System NFSv4.1 client control DATA NFSv4.1 door NameSpace Manager

The Vision: SRM UP-Link Distributed Storage NFSv4.1 Local Analysis Farm Need It!

References: www.dcache.org SRM V2.2 spec. http://sdm.lbl.gov/srm-wg/doc/srm.v2.2.html NFSv4.1 spec. http://www.nfsv4-editor.org/

Special Tanks to: Andy Adamson (CITI) Benny Halevy (Panasas) Lisa Week (SUN) Sam Falkner (SUN) Robert Gordon (SUN)