irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories



Similar documents
Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

RELATED WORK DATANET FEDERATION CONSORTIUM, IRODS,

integrated Rule-Oriented Data System Reference

Integrating Data Life Cycle into Mission Life Cycle. Arcot Rajasekar

Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora

OSG PUBLIC STORAGE. Tanya Levshina

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

Assessment of RLG Trusted Digital Repository Requirements

irods Technologies at UNC

Using Databases to Manage State Information for. Globally Distributed Data

WOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research DataDirect Networks. All Rights Reserved.

Data Management using irods

Digital Preservation Lifecycle Management

Concepts in Distributed Data Management or History of the DICE Group

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!

Data Sharing with irods (integrated Rule-oriented Data System)

Automated and Scalable Data Management System for Genome Sequencing Data

The National Consortium for Data Science (NCDS)

Abstract. 1. Introduction. irods White Paper 1

Policy-based Distributed Data Management Systems

Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network

irods for Big Data Management in Research Driven Organizations Charles Schmitt CTO & Director of Informatics RENCI

irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods

Technical. Overview. ~ a ~ irods version 4.x

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Building Preservation Environments with Data Grid Technology

Chronopolis: A Partnership. The Chronopolis: Digital Preservation Archive Development and Demonstration Program

How To Build A Cloud Storage System

Data grid storage for digital libraries and archives using irods

Data Intensive Cyber Environments Center University of North Carolina at Chapel Hill

irods at CC-IN2P3: managing petabytes of data

Protecting Official Records as Evidence in the Cloud Environment. Anne Thurston

Tools and Services for the Long Term Preservation and Access of Digital Archives

Preservation in the Cloud: Three Ways. Michele Kimpton CEO, DuraSpace Richard Rodgers, Mark Leggo>, & Simon Waddington

Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie.

CROSS PLATFORM AUTOMATIC FILE REPLICATION AND SERVER TO SERVER FILE SYNCHRONIZATION

Preservation Environments

ODUM INSTITUTE ARCHIVE SERVICES OVERVIEW IASSIST 2015

Michał Jankowski Maciej Brzeźniak PSNC

PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK wataru.takase@kek.jp

DuraCloud Direct To Researcher Easy Data Management in the Cloud for Researchers and Scientists

Building community clouds to support access to scholarship. Michele Kimpton CEO, DuraSpace Jonathan Markow CSO, DuraSpace

Oracle Database 10g: Backup and Recovery 1-2

PASIG May 12, Jacob Farmer, CTO Cambridge Computer

Luc Declerck AUL, Technology Services Declan Fleming Director, Information Technology Department

Managing Next Generation Sequencing Data with irods

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Using the NDMP File Service for DMA- Driven Replication for Disaster Recovery. Hugo Patterson

Investigating Storage Architectures for Long-term Preservation: Channeling the Archival Data Deluge

DA-NRW: a distributed architecture for long-term preservation

Discovering Sun in the Digital Libraries and Repositories Market. October 7, Art Pasquinelli Education Market Strategist Sun Microsystems, Inc.

System Administration of Windchill 10.2

How To Understand The Nature Of Big Data

Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria

SEP sesam Backup & Disaster Recovery Overview

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

<Insert Picture Here> Solution Direction for Long-Term Archive

Why long time storage does not equate to archive

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

Technology Insight Series

Best Practices for Managing and Monitoring SAS Data Management Solutions. Gregory S. Nelson

Object storage in Cloud Computing and Embedded Processing

September 2009 Cloud Storage for Cloud Computing

Storage Options. February 24, 2011

Smart Storage and Preservation: How Digital Repositories Can Participate

Reagan Moore, PI Mary Whitton, Project Manager. National Science Foundation Cooperative Agreement: OCI

Private Cloud Storage for Media Applications. Bang Chang Vice President, Broadcast Servers and Storage

A DICOM-based Software Infrastructure for Data Archiving

ENTERPRISE DOCUMENT MANAGEMENT SYSTEM

CatDV Pro Workgroup Serve r

Integrated Rule-based Data Management System for Genome Sequencing Data

Ex Libris Rosetta: A Digital Preservation System Product Description

European Data Infrastructure - EUDAT Data Services & Tools

Dionseq Uatummy Odolorem Vel

DATA AND LOG FILES FOR CENTRAL MANAGEMENT STORE

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

IT Forum UW-Madison Records Management Program. UW Archives and Records Management

Data Lab System Architecture

Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS.

Union County. Electronic Records and Document Imaging Policy

HDF5-iRODS Project. August 20, 2008

Amazon Cloud Storage Options

Mayur Dewaikar Sr. Product Manager Information Management Group Symantec Corporation

Combining Technologies to Create New Solutions

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Digital Repository Initiative

ZFS Backup Platform. ZFS Backup Platform. Senior Systems Analyst TalkTalk Group. Robert Milkowski.

ANSYS EKM Overview. What is EKM?

Large Scale Coastal Modeling on the Grid

2011 FileTek, Inc. All rights reserved. 1 QUESTION

irods in complying with Public Research Policy

NCSU SSO. Case Study

The Rutgers Workflow Management System. Workflow Management System Defined. The New Jersey Digital Highway

Exhibit to Data Center Services Service Component Provider Master Services Agreement

Reference Architectures for Repositories and Preservation Archiving

A Websense White Paper Websense CloudMerge Ingestion Service

Transcription:

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories Reagan W. Moore Arcot Rajasekar Mike Wan {moore,sekar,mwan}@diceresearch.org h;p://irods.diceresearch.org

Abstract The integrated Rule Oriented Data System (irods) organizes distributed data into a sharable colleckon. The data may reside in cloud storage, in insktukonal repositories, in tape archives, in laptop file systems. We will demonstrate the enforcement of management policies across the mulkple storage locakons, access mechanisms ranging from web browsers to Fedora to Web DAV to EnginFrame interfaces, and types of asserkons that can be made on data in cloud storage.

Projects TPAP TransconKnental Persistent Archive Prototype PreservaKon research in collaborakon with NARA DCAPE Distributed Custodial Archival PreservaKon Environment CollaboraKon with State Archives to build preservakon policies PoDRI Policy Driven Repository Interoperability IMLS collaborakon with Duraspace to integrate Fedora / irods OOI Ocean Observatories IniKaKve NSF collaborakon to archive sensor data CDR Carolina Digital Repository UNC CH collaborakon to build an insktukonal repository TUCASI Triangle UniversiKes Center for Advanced Studies, Inc. CollaboraKon to build cyberinfrastructure between UNC, Duke, NCSU, RENCI

Overview of irods Architecture Overview of irods Data System User Can Search, Access, Add and Manage Data & Metadata irods Data Server Disk, Tape, etc. irods Server irods Rule Engine Track policies irods Metadata Catalog Track informa;on *Access data with Web based Browser or irods GUI or Command Line clients.

Integrated Rule Oriented Data System SoYware to organize distributed data into a sharable colleckon, while enforcing management policies Manage properkes of the shared colleckon Developed by DICE Center University of North Carolina at Chapel Hill 6 staff, 4 students University of California, San Diego 5 staff Funded by NSF OCI-0848296 NARA Transcontinental Persistent Archives Prototype NSF SDCI-0721400 Data Grids for Community Driven Applications

PreservaKon Concept Maintain properkes of records while uklizing new storage systems Records are permanent Federate storage across cloud, archives, file systems ImplicaKons infrastructure independence AcKve management of records MulKple indireckon mechanisms to protect records from dependencies upon technology ValidaKon of state informakon and parsing of audit trails to verify asserkons about records

Preservation Data VirtualizaKon Access Interface Standard Micro-services Data Grid Standard Operations Storage Protocol Map from tasks requested by the access method to a standard set of micro-services The standard microservices are mapped to standard operations. Operations are mapped to the protocol of the storage system Storage System View clients and storage as ephemeral

Policy based Record Management Express policies as computer ackonable rules Define explicit locakons in data management framework where policies will be enforced Express procedures as remotely executable microservices Create new preservakon services by linking micro services into a workflow Manage state informakon to track the applicakon of preservakon procedures Maintain audit trails to track evolukon of policies and procedures

iput With ReplicaKon Client iput data Institutional Repository metadata icat /<filesystem> Rule Base Data Metadata Cloud Storage metadata Rule Base Cache Data data Replication rule added to rule base

Policies Implemented Outside of the (Cloud) Storage AutomaKon of preservakon procedures DescripKve metadata extrackon CreaKon of archival form (AIP) TransformaKve migrakon AutomaKon of administrakve funckons DistribuKon, replicakon, retenkon, disposikon Report generakon usage, performance, error tracking Periodic validakon of assessment criteria Trustworthiness, integrity, authenkcity, chain of custody Parsing of audit trails for policy compliance over Kme

Management Framework Instrument data management infrastructure at all locakons where policy should be enforced (65 hooks) File create, open, read, write, delete CollecKon create, delete User create, modify, delete, group Resource create, modify, delete, group Metadata file modify, colleckon modify, descripkve ACL modify Support pre processing policy AuthorizaKon, seleckon, redireckon Support post processing policy Audit trails, redackon, derived product generakon

irods Distributed Data Management

Types of Rules Synchronous rules applied by framework at management hook locakons Stored in rule base; core.irb file Asynchronous rule that are queued for deferred or periodic execukon Batch system to manage queue InteracKvely executed rules defined by a user Executed through irule command

irods Rules Server side workflows AcKon condikon workflow chain recovery chain CondiKon test on any a;ribute: CollecKon, file name, storage system, file type, user group, elapsed Kme, IRB approval flag, descripkve metadata Workflow chain: Micro services / rules that are executed at the storage system Recovery chain: Micro services / rules that are used to recover from errors

Checksum ValidaKon Rule mychecksumrule{ msimakequery("data_name, COLL_NAME, DATA_CHECKSUM",*Condition,*Query); msiexecstrcondquery(*query,*b); assign(*a,0); foreachexec (*B) { msigetvalbykey(*b,coll_name,*c); msigetvalbykey(*b,data_name,*d); msigetvalbykey(*b,data_checksum,*e); msidataobjchksum(*b,*operation,*f); ifexec (*E!= *F) { writeline(stdout,file *C/*D has registered checksum *E and computed checksum *F); } else { assign(*a,*a + 1); } } ifexec(*a > 0) { writeline(stdout, have *A good files); } } *Condition can be COLL_NAME like /ils161/home/moore/genealogy/%

Highly Extensible To integrate cloud storage Wrote an S3 driver to execute the cloud storage protocol Implemented cloud storage as a compound resource Added disk cache in front of the cloud storage to enable enforcement of policies across all cloud accesses All irods clients can then be used to access data stored in the cloud Web browser, WebDav, FUSE, Kepler & Taverna workflows, Unix shell commands, C libraries, EnginFrame portal Can write policies to distribute data between cloud, file systems, archives, laptops. Can federate other data grids with the data grid that uses the cloud storage

NARA TransconKnental Persistent Archive Prototype Use data grid technology to build a preservakon environment Conduct research on preservakon concepts Infrastructure independence Enforcement of preservakon properkes AutomaKon of administrakve preservakon processes ValidaKon of preservakon assessment criteria Demonstrate preservakon on selected NARA digital holdings IntegraKon of generic infrastructure with preservakon technologies (Cheshire, MVD, JHOVE, Pronom, Fedora, Dspace)

NaKonal Archives and Records AdministraKon TransconKnental Persistent Archive Prototype Federation of Seven Independent Data Grids NARA I NARA II Georgia Tech Rocket Center U NC U Md UCSD MCAT MCAT MCAT MCAT MCAT MCAT MCAT Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products.

irods is a "coordinated NSF/OCI Nat'l Archives research ackvity" under the auspices of the President's NITRD Program Reagan W. Moore rwmoore@renci.org h;p://irods.diceresearch.org NSF OCI-0848296 NARA Transcontinental Persistent Archives Prototype NSF SDCI-0721400 Data Grids for Community Driven Applications