Chapter 12 Distributed Storage

Similar documents
glibrary: Digital Asset Management System for the Grid

Analisi di un servizio SRM: StoRM

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

CHAPTER 17: File Management

Managed GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team

Data Grids. Lidan Wang April 5, 2007

File-System Implementation

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Multi-Terabyte Archives for Medical Imaging Applications

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

Veeam Best Practices with Exablox

Evolution of Database Replication Technologies for WLCG

The Hadoop Distributed File System

File Management. COMP3231 Operating Systems. Kevin Elphinstone. Tanenbaum, Chapter 4

Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Network File System (NFS) Pradipta De

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Using the NDMP File Service for DMA- Driven Replication for Disaster Recovery. Hugo Patterson

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

The dcache Storage Element

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Distributed File Systems

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Module 1: Introduction to Active Directory Infrastructure

Hadoop Architecture. Part 1

Big data management with IBM General Parallel File System

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Network Attached Storage. Jinfeng Yang Oct/19/2015

Open Source Systems Managed Storage. Stephen Cranage StorageTek

Technical. Overview. ~ a ~ irods version 4.x

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

Deploying De-Duplication on Ext4 File System

How To Install Storegrid Server On Linux On A Microsoft Ubuntu 7.5 (Amd64) Or Ubuntu (Amd86) (Amd77) (Orchestra) (For Ubuntu) (Permanent) (Powerpoint

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Distributed Database Access in the LHC Computing Grid with CORAL

File Management. Chapter 12

Web Service Based Data Management for Grid Applications

Cloud Storage Service

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Linux Kernel Architecture

Report from SARA/NIKHEF T1 and associated T2s

HDFS Users Guide. Table of contents

Centralized Disaster Recovery using RDS

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Data Management System - Developer Guide

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

ETERNUS CS High End Unified Data Protection

Chapter 12 File Management

Chapter 12 File Management. Roadmap

File Transfer Protocol. What is Anonymous FTP? What is FTP?

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

XenData Video Edition. Product Brief:

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Hitachi Content Platform (HCP)

MySQL for Beginners Ed 3

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

XenData Archive Series Software Technical Overview

Outline. File Management Tanenbaum, Chapter 4. Files. File Management. Objectives for a File Management System

The Win32 Network Management APIs

EMC DiskXtender File System Manager for UNIX/Linux Release 3.5

Active Directory. By: Kishor Datar 10/25/2007

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

HDFS Architecture Guide

16th International Conference on Control Systems and Computer Science (CSCS16 07)

Cloud Computing at Google. Architecture

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

EMC VNXe File Deduplication and Compression

TELE 301 Lecture 7: Linux/Unix file

Getting to Know the SQL Server Management Studio

CA ARCserve Family r15

VMware vsphere Data Protection Advanced 5.5

Transcription:

Chapter 12 Distributed Storage 1

2 Files File location and addressing What is a file? Normally we collapse. Concepts: name; contents; gui. What about the backup of this file? How do we distinguish? File is an area on disk which contains data. The name of a file is a pointer to that area. Directory is not a region of a disk which contains files. Now we are going to have to carefully distinguish A directory is a file which contains the names of files and pointers to the place on disk where the information about those files resides. A file name is actually a reference to a pointer. It contains a path /home/username/teaching and a name Lecture2.ppt

3 inodes root / Starting at a fixed position on the disk are the inodes The inodes are the place on disk where the metadata about a file is stored. Name, size, modification date, owner, inode 2 contains the information about the root directory / The file pointed to by inode 2 contains the names an inodes of the files (and directories) in the root directory. So accessing in order to do ls /user/home/kyberd Accessing inode2 Reading the file pointed to by inode2 and finding the inode of the user directory. Reading the file pointed to by this and finding the inode of the home directory. Reading the file pointed to by this and finding the inode of the kyberd directory Reading the file pointed to by this and listing the file names in the file

4 inodes The full pathname of a file is in fact an access to a data structure This data structure which provides a pointer to the physical location on disk. It is an index - one might even say a database.

5 Photos Photo album software Store the photo on disk Create an album Tracker Solenoid Place the photo in the album Create an album Publicity 2014 Place the photo in the album The organisation software places the photos on disk but adds an abstraction layer to allow you to organise the photos. How many copies do we have? When you delete a photo from an album does it disappear from disk? If you directly delete the photo does it disappear from the album? A photo album is a database, with a simple DBMS Grid storage

6 Distributed files How do we name them? They can be anywhere in the distributed system. There will often be duplicate copies of the files. Ideally we support a heterogeneous directory structure A solution was developed by the World-wide LHC Computing Grid WLCG Add some pictures Local File System Catalog LFC: provides a virtual interface to storage gives meaningful names hides the file system differences manipulated by user It understands duplicate files and will normally return the closest one. (Highest bandwidth least contention) when a job requests data SRM: protocol which governs access to the storage disk and tape hide complexities different underlying structures allows user to reserve physical space give a lifetime to a file reserve space in advance physical copy global namespace Rather than talk in generalities talk about a real system which has enormous amounts of data spread across the world Makes no reference to the file location

7 Manipulation Ways of accessing files. LCG: commands operate on the data. some of them have side effects in the LFC lcg-cr: uploads a file to the storage and registers it in the LFC when using the lcg applications you refer to the files on the storage using the SRM protocol LFC: manipulates the database lfc-mkdir: creates a directory in the lfc (the database) but does affect an SE you user interface machine connects to the correct LFC server using the LFC_HOST logical name Conventionally every VO has /grid/<vo> as a top level directory Can also define an LFC_HOME logical name

8 Manipulation lgc-cr copy and register lcg-cr --vo CMS d srm://srm.grid.xx.yy:8443/<directory_path><name> -l lfn:/<path>/<file> file://<path><localfile> srm instruction says where on the data storage machine to place it lfn: instruction says how to enter the name in the LFC data base file: is the name on your local machine (UI) The command creates a GUID which is a hexadecimal identifier for this file. unique identification number Replication shorten data transfer paths provide backup against data loss protect against system failure The name of the file on a particular storage elements is referred to as the SURL (SRM uniform resource locator)

9 Names File references GUID lfn SURL TURL file independent of physics location physical replica location transport 1GUID per file 1 lfn entry per file 1 SURL per replica 1 TURL per transfer GUID 36 byte unique string guid:38ed3f-60c402-11d7a6-b0f53e-e5a37e LFN user defined friendly name of a grid file lfn:/grid/<vo>/<name> SURL physically stored version of a file srm://<hostname><port>/<somestring> <port> only if it is not standard <string> depends on implementation of SRM TURL a URL that an SE issues for a SURL and can be used to store or retrieve data <protocol>://<se hostname><port>/<path> eg gsiftp://gridftp.xx. :2222/<path> Includes transfer protocol

10 Replica Automatic This allows data in files to be automatically replicated as required. Site which needs access to some data. The system can decide not to stream the data to that site, but make a copy available at a new storage location. Information on that location associated with metadata for that data set and future accesses will have the option of using that replica.

11 Alice The name of the song is called 'Haddocks' Eyes'. Oh, that's the name of the song, is it?'' Alice said, trying to feel interested. No, you don't understand,'' the Knight said, looking a little vexed. ` That is what the name is called. The name really is 'The Aged Aged Man'.'' Then I ought to have said 'That's what the song is called?' '' Alice corrected herself. The song is The song is called The name of the song The name of the song is called No, you oughtn't: that's quite another thing! The song is called `Ways And Means': but that's only what it's called, you know!'' Well, what is the song, then?'' said Alice, who was by this time completely bewildered. I was coming to that,'' the Knight said. ``The song really is `A-sitting On A Gate': and the tune's my own invention.'' Alice in through the looking glass. Lewis Carroll

12 Lessons Transparency Like Alice we tend to mix up a thing and the name of a thing. When we are planning distributed computing we need to realise. The object is different from the name We would like a way of referring to the contents of the file which should make no reference to its physical location replicas have the same name. Distributed Storage Lecture The management software might like a compact numeric way of referring to the same concept. We need a way to talk about a particular replica In the grid since a single file may be copied to number of temporary places simultaneously we create a separate identify for this transfer event