DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD



Similar documents
Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

Data Sharing with irods (integrated Rule-oriented Data System)

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

Assessment of RLG Trusted Digital Repository Requirements

Digital Preservation Lifecycle Management

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

integrated Rule-Oriented Data System Reference

Integrating Data Life Cycle into Mission Life Cycle. Arcot Rajasekar

Using Databases to Manage State Information for. Globally Distributed Data

Abstract. 1. Introduction. irods White Paper 1

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories

Data Grid Landscape And Searching

Preservation Environments

Technical. Overview. ~ a ~ irods version 4.x

HDF5-iRODS Project. August 20, 2008

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

Concepts in Distributed Data Management or History of the DICE Group

Data Management using irods

RELATED WORK DATANET FEDERATION CONSORTIUM, IRODS,

Luc Declerck AUL, Technology Services Declan Fleming Director, Information Technology Department

Michał Jankowski Maciej Brzeźniak PSNC

Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora

Policy-based Distributed Data Management Systems

WOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research DataDirect Networks. All Rights Reserved.

Data grid storage for digital libraries and archives using irods

Building Preservation Environments with Data Grid Technology

Data Grids, Digital Libraries, and Persistent Archives

irods at CC-IN2P3: managing petabytes of data

Geospatial Data and Storage Resource Broker Online GIS Integration in ESRI Environments with SRB MapServer and Centera.

THE CCLRC DATA PORTAL

Collaborative SRB Data Federations

Symantec Enterprise Vault.cloud Overview

Integrated Rule-based Data Management System for Genome Sequencing Data

OSG PUBLIC STORAGE. Tanya Levshina

PASIG May 12, Jacob Farmer, CTO Cambridge Computer

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center

irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Datagridflows: Managing Long-Run Processes on Datagrids

Automated and Scalable Data Management System for Genome Sequencing Data

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

ECS 165A: Introduction to Database Systems

XenData Archive Series Software Technical Overview

Tools and Services for the Long Term Preservation and Access of Digital Archives

Chronopolis: A Partnership. The Chronopolis: Digital Preservation Archive Development and Demonstration Program

Information Sciences Institute University of Southern California Los Angeles, CA {annc,

Fedora Distributed data management (SI1)

A High-Performance Virtual Storage System for Taiwan UniGrid

GridFTP: A Data Transfer Protocol for the Grid

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Evolution of Database Replication Technologies for WLCG

The glite File Transfer Service

PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK wataru.takase@kek.jp

Data Management System for grid and portal services

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Digital Preservation. OAIS Reference Model

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

irods in complying with Public Research Policy

NEES Cyberinfrastures

Symantec Enterprise Vault.cloud Overview

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

The Way to SOA Concept, Architectural Components and Organization

DA-NRW: a distributed architecture for long-term preservation

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

The National Consortium for Data Science (NCDS)

XenData Video Edition. Product Brief:

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie.

irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods

Top. Reasons Legal Firms Select kiteworks by Accellion

Diagram 1: Islands of storage across a digital broadcast workflow

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

Popular backup/archival service and its application for the archival of the network traffic in the academic network PIONIER

Enabling Cloud Architecture for Globally Distributed Applications

Data Grids. Lidan Wang April 5, 2007

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

Ex Libris Rosetta: A Digital Preservation System Product Description

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Data Management Plans - How to Treat Digital Sources

Plateforme de Calcul pour les Sciences du Vivant. SRB & glite. V. Breton.

PetaShare: Enabling Data Intensive Science

IT Forum UW-Madison Records Management Program. UW Archives and Records Management

Metadata Hierarchy in Integrated Geoscientific Database for Regional Mineral Prospecting

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

Infosys GRADIENT. Enabling Enterprise Data Virtualization. Keywords. Grid, Enterprise Data Integration, EII Introduction

Data Lab System Architecture

Grid Sun Carlo Nardone. Technical Systems Ambassador GSO Client Solutions

The LSST Data management and French computing activities. Dominique Fouchez on behalf of the IN2P3 Computing Team. LSST France April 8th,2015

Big Data Operations: Basis for Benchmarking Big Data Systems

Chapter 7. Using Hadoop Cluster and MapReduce

A Data Management System for UNICORE 6. Tobias Schlauch, German Aerospace Center UNICORE Summit 2009, August 25th, 2009, Delft, The Netherlands

EDG Project: Database Management Services

The Southern California Earthquake Center Information Technology Research Initiative

Best Practices: Extending Enterprise Applications to Mobile Devices

Protecting Official Records as Evidence in the Cloud Environment. Anne Thurston

Active Directory Compatibility with ExtremeZ-IP. A Technical Best Practices Whitepaper

The archiving activities occur in the background and are transparent to knowledge workers. Archive Services for SharePoint

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

Transcription:

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure Arcot (RAJA) Rajasekar DICE/SDSC/UCSD

What is SRB? First Generation Data Grid middleware developed at the San Diego Supercomputer Center (SDSC) A distributed file system, based on a client-server architecture. Allows users to access files seamlessly across a distributed environment, based upon their attributes rather than just their names or physical locations. It replicates, syncs and archives data, connecting heterogeneous resources in a logical and abstracted manner.

User Base & Diversity of Applications Collections at SDSC: One PetaBytes, 200+ Million files Multi-disciplinary Scientific Data Astronomy, Cosmology Neuro Science, Cell-Signalling & other Bio-medical Informatics Environmental & Ecological Data Educational (web) & Research Data (Chem, Phys, ) Archival & Library Collections Earthquake Data, Seismic Simulations Real-time Sensor Data Growing at 1TB a day Supporting large projects: TeraGrid, NVO, SCEC, SEEK/Kepler, GEON, ROADNet, JCSG, AfCS, SIO Explorer, SALK, PAT, UCSDLibrary,

BIRN: Biomedical Information Research Network

NOAO Zone Architecture

ROADNet Architecture ISGC 2004, Taipei, Taiwan

What is irods? Second Generation Data Grid It is a data grid system data virtualization A distributed file system, based on a client-server architecture. Allows users to access files seamlessly across a distributed environment, based upon their attributes rather than just their names or physical locations. It replicates, syncs and archives data, connecting heterogeneous resources in a logical and abstracted manner. It is a distributed workflow system policy virtualization Policy is a first class object to be managed Long-term policy captures provenance Policy can be declarative/descriptive as well as procedural/normative Policy can be a process enforced on new objects Policy can be a constraint whose integrity can be checked any time

Some sample policies Every dataset should have two copies in two distributed locations Every data stream should have an associated metadata in the catalog showing lat,long,elev, A data from an observatory is allowed access only to users from that observatory for the first 3 months When a new thermal vent instrumentation comes online send email to aaa@bbb When a data stream X has not received any data packet in 2 days send email to ccc@ddd

Data Virtualization with irods Logical name space Location independent identifier Persistent identifier User Application Collection owned data Access controls Audit trails Checksums Descriptive metadata Common naming convention and set of attributes for describing digital entities Inter-realm authentication Single sign-on system Archive at SDSC Database at MBARI File System at WoodsHole

Policy Virtualization with irods Micro-Services Functions with well-defined semantics Transactional - recovery Context of application Message Queues Rules Triggered by events Conditional execution of alternative rule declarations System constructs: loops, recursion, branching Workflows Distributed Execution Immediate, Deferred, Periodic Execution at SIO User Application Execution at MBARI Execution at WoodsHole

Data Management Virtualization Access Interface Standard Access Actions Data Grid Services Standard Micro-services Storage Protocol Storage System Map from the actions requested by the access method to a standard set of micro-services used to interact with the storage system Policies & Practices

Management Virtualization Examples of management policies Integrity Validation of checksums Synchronization of replicas Data distribution Data retention Access controls Authenticity Chain of custody - audit trails Track required preservation metadata - templates Generation of Archival Information Packages

irods Architecture Client Interface Admin Interface Resources Resource-based Services Rule Invoker Service Manager Rule Modifier Module Config Modifier Module Metadata Modifier Module Rule Micro Service Modules Engine Consistency Check Module Consistency Check Module Consistency Check Module Metadata-based Services Micro Service Modules Current State Rule Base Confs Meta Data Base

Distributed Management System Rule Engine Policy Management Data Transport Virtualization Metadata Catalog Persistent State information Execution Server Engine Side Workflow Messaging System Execution Control Scheduling

irods Rule Each rule defines Event Condition Action sets (micro-services and rules) Recovery sets (transaction oriented) Rule types Atomic, applied immediately Deferred, support deferred consistent constraints Periodic, typically used to validate assertions

Sample Rules ingestobject(*f) $userdept == sdsc OR $userdept == sio createfile(*f), registerfile(*f), computechksum(*f),!, findbackuprsrc(*f, *R), replicatefile(*f, *R), computechecksum(*f, *R), comparechecksum(*f). ingestobject(*f) $userdept == nvo createfile(*f), registerfile(*f), extractfitsmetadata(*f). ingestobject(*f) createfile(*f), registerfile(*f).

Rule-based Data Management Administrator-controlled rules to implement management policies Administrative - adding / deleting users, resources Data ingestion - pre-processing, post-processing Data transport / deletion - parallel I/O streams, disposition Data retention policies expiration, over-writes, versions Data Reliability Policies copies, formats, migration, checking,

NVO Micro-Services msiobjbyname Accesses the NASA/IPAC Extragalactic Database (NED) Uses Web-Service provided by: http://voservices.net/ned Given an object name returns RA, Dec and Type msisdssimgcutout_getjpeg Accesses SDSS Uses Web-Service provided by: http://skyserver.sdss.org/ Given RA and Dec, and a cut-out size returns Image Cutout file in a buffer

NVO irods Rules getobjpositionbyname.ir Executes the micro-service: msiobjbyname Shows as output RA, Dec and Type getcutoutbyposition.ir Executes the micro-service: msisdssimgcutout_getjpeg Stores the Image Cutout as a file in irods. Uses other irods system micro-services getcutoutbybyobjname.ir Chains the two micro-services Chaining of two web-services from two service providers Takes an Object Name, Cutout Parameters and stores an image cutout in an irods File With other NVO and image manipulating micro-services, with similar or different functionalities, one can write complex (and alternate) irods rules.

Conclusion More Information: www.diceresearch.org Contact: sekar@sdsc.edu