Data grid storage for digital libraries and archives using irods



Similar documents
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

Shibbolized irods (and why it matters)

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD

irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!

Integrated Rule-based Data Management System for Genome Sequencing Data

Oxford Digital Asset Management System (DAMS) Update

Preservation in the Cloud: Three Ways. Michele Kimpton CEO, DuraSpace Richard Rodgers, Mark Leggo>, & Simon Waddington

Technical. Overview. ~ a ~ irods version 4.x

Digital Preservation Lifecycle Management

Automated and Scalable Data Management System for Genome Sequencing Data

integrated Rule-Oriented Data System Reference

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories

Luc Declerck AUL, Technology Services Declan Fleming Director, Information Technology Department

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Assessment of RLG Trusted Digital Repository Requirements

DA-NRW: a distributed architecture for long-term preservation

Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)

Columbia University Libraries / Information Services

Building Semantic Content Management Framework

PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK wataru.takase@kek.jp

irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Research Data Management Policy. Glasgow School of Art

Considerations for Research Data Management

Abstract. 1. Introduction. irods White Paper 1

Data Grid Landscape And Searching

Federated Identity & Access Mgmt for Higher Education

<Insert Picture Here> Solution Direction for Long-Term Archive

The RECOVER Cloud-Based Data Server

PASIG May 12, Jacob Farmer, CTO Cambridge Computer

The Australian War Memorial s Digital Asset Management System

OSG PUBLIC STORAGE. Tanya Levshina

Digital libraries of the future and the role of libraries

MATRIX and H-Net Backup and Archival Storage: Practices and Suggested Improvements. Preservation of the H-Net Lists Supplemental Report

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy Page 1 of 8

Nick Gold Director of Business Development Chesapeake Systems. itunes: The Workflow Show

The cross-disciplinary Roots of the British collaboration between scholars in humanities and

irods at CC-IN2P3: managing petabytes of data

Long-term archiving and preservation planning

Columbia University Digital Library Architecture. Robert Cartolano, Director Library Information Technology Office October, 2009

Cambridge University Library. Working together: a strategic framework

Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning

How To Use Open Source Software For Library Work

Appendix A. Functional Requirements: Document Management

Workforce Demand and Career Opportunities in University and Research Libraries

How to avoid building a data swamp

Data Management using irods

Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context

The ADS and SWORD-ARM: Deposit charges, costing tools and e-repository sustainability

Direction des Technologies de l Information

WHY DIGITAL ASSET MANAGEMENT? WHY ISLANDORA?

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Filestor Digital Asset Management. The way it works

Let s Talk Digital An Approach to Managing, Storing, and Preserving Time-Based Media Art Works. DAMS Branch Manager Smithsonian Institution, OCIO

Response to Invitation to Tender: requirements and feasibility study on preservation of e-prints

Providing an Effective Intranet Knowledge Base Using Linux and Open Source Software

Tools and Services for the Long Term Preservation and Access of Digital Archives

North Carolina Digital Preservation Policy. April 2014

DA-NRW: A distributed architecture for longterm preservation

Laserfiche. and SharePoint Integration. Your potential, realized. Learn More Inside. Include imaged documents in collaborative processes.

Students must take and pass120 credits of taught modules 4. Exit awards. ECTS equivalent

Entitlements Access Management for Software Developers

Fedora Distributed data management (SI1)

Mercy Baggot Street Canopy Intranet

DataShare & Data Audit. Lessons Learned. Robin Rice. Digital Curation Practice, Promise and Prospects

northplains Whitepaper Differentiating DAM from ECM What Do You Really Need? Connecting your world. Visually.

Digital Repository Initiative

Presentation Pathway. About Pacific Lutheran University. A Sense of Urgency. General Criteria & Goals. Environmental Scanning

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie.

BRITISH LIBRARY BOARD. Tuesday, 9 February F.O.I. Publication Status: OPEN - to be released without redaction IT STRATEGY

A Service for Data-Intensive Computations on Virtual Clusters

Technical concepts of kopal. Tobias Steinke, Deutsche Nationalbibliothek June 11, 2007, Berlin

Solution Brief: Archiving Avid Interplay Projects using NLT and XenData

Managing the Unmanageable: A Better Way to Manage Storage

James Hardiman Library. Digital Scholarship Enablement Strategy

Developing JISC Open. Alan Robiette, JISC Development Group Supporting education and research

The aim of this lecture is to: Sum up some main issues that you have heard today Explain the research into the digital preservation issues Present

irods Technologies at UNC

WOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research DataDirect Networks. All Rights Reserved.

Questionnaire for the Request for Assistance for an Application. Software

web archives & research collections

Chronopolis: A Partnership. The Chronopolis: Digital Preservation Archive Development and Demonstration Program

How To Build A Map Library On A Computer Or Computer (For A Museum)

The Rutgers Workflow Management System. Workflow Management System Defined. The New Jersey Digital Highway

Facing the Hydra alone

Archiving A Dell Point of View

A Selection of Questions from the. Stewardship of Digital Assets Workshop Questionnaire

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

Managing Physical and Digital Assets for Unified Distribution Workflows

Australian Research Collaboration Service (ARCS) & Grid Activities in Australia

ARCHIVING SERVICES SERVICE DEFINITION

e-infrastucture services for R&E and beyond

Planning and Infrastructure for Analog to Digital Preservation Projects

Running Hydra in a small shop

Optimising Data Management: full listing of deliverables. College storage approach

Digital Preservation. OAIS Reference Model

Second EUDAT Conference, October 2013 Workshop: Digital Preservation of Cultural Data Scalability in preservation of cultural heritage data

Transcription:

Data grid storage for digital libraries and archives using irods Mark Hedges, Centre for e-research, King s College London eresearch Australasia, Melbourne, 30 th Sept. 2008 Background: Project History Data grid project at AHDS and STFC ended at demise of AHDS Used SRB (Storage Resource Broker) AHDS Executive -> Centre for e- Research at King s College London Centre incorporates staff and expertise of AHDS and other groups Continuity, but some change of focus New data grid project (using irods) 1

Background: Data Challenge History Ongoing growth of corpora due to major digitisation projects Highly diverse in type and Visual size: Arts images, text, music, video, database, Performing multi-media Arts Archaeology Require specialised knowledge Literature/Linguistics Highly complex, contextual, fuzzy, uncertain, inconsistent, incomplete Rapid expansion: AHDS data size increased 20-fold between 2005 & 2008 Increasing number of large objects (e.g. video, archaeology scans) Data Grids Storage Resource Broker (SRB), a widely-used data grid technology developed by the San Diego Super Computer Center Addresses storage issues for digital repository and preservation environments Provides uniform, searchable access to virtualised, distributed resources, so DL is insulated from: physical location of data types of storage migrating to new hardware Scalable as library grows, new resources can be added dynamically Auditing facilities 2

SRB Storage client application (e.g. digital repository) SRB storage datastream1 object1 client request client response datastream2 datastream3 disseminator disseminator impl (web service) get Entire object retrieved object2 object3 object1 object2 distributed / virtualised Issues Not open source Very effective for storage management, but not integrated with wider infrastructure. Not easy to integrate application-specific requirements (either change the core code, or implement in client, or use proxy commands) some examples in later slides. No built-in implementation of workflow (have to script this outside SRB, whether server or client side), or of asynchronous processing. Requires choreography between SRB admin and person running workflow. Relatively restricted support for metadata extension 3

irods The open source successor to SRB Provides similar data virtualisation Rule-Oriented Data management System Rule Engine allows data management policies to defined and realised as rules Rules are sets of operations that you want to impose on an object (e.g. file, user, resource, ). Rules allow virtualisation of policies the digital library is insulated from how these policies are implemented. What are rules? Rules built up cumulatively from atomic operations called micro-services Micro-services and rules can be added and modified to meet local needs Triggered by certain events: Eventcondition-action model Great potential to hide processing from application layer Create server-side workflows 4

Definition of rules The components of a rule definition are as follows: actiondef condition workflowchain recoverychain Where: actiondef identifies the action to be carried out condition is necessary condition for execution workflowchain is sequence of actions to be executed recoverychain is corresponding sequence of recovery actions (to ensure consistent state). Rule can be built up cumulatively from other rules. Data passed into/within rules (via parameters/context). Examples of rule use Some examples of using rules: Digital preservation Processing digital material on ingest Fedora disseminators -> rules/microservices Shibboleth integration Integration with provenance systems 5

Example: digital preservation Execution triggered when an object has been ingested acpostprocforput accheckobjectintegrity## acanalyseobject## acnormaliseobject## msisysrepldataobj(presrescgrp,all) nop##nop##nop##msicleanupreplicas Example Rule processing text objects on ingest Processing depends on type of object. acpostprocforput $format == "application/msword" && $objectcategory= textcategoryx" acvalidateobjfortextprocessingx## acexecutetextprocessingx## acvalidatetextprocessingx nop##msicleanuptextprocx## nop 6

Example: data-side processing Fedora retrieves entire objects for processing Inefficient, and not always necessary Implement processing close to the data Fedora disseminators -> irods rules Client-side workflows -> irods rules irods Storage & Rules Client application (e.g. digital repository) irods storage layer + rule execution datastream1 object1 client request client response datastream2 datastream3 disseminator iget / irule rule triggers executes object2 object3 Rule Engine distributed / virtualised rule definition processing impl 7

irods Access Management: Shibboleth Apache access request PIP irods+re Capture & store attributes mod_ shib admin attributes Rule response PDP -service -service -service PEP client Client stores data in irods Provenance & irods Rule causes microservice to access external system External Provenance System IRODS + icat + RE Update icat file metadata IRODS + RE Update icat file metadata IRODS System IRODS +RE Rule engine runs, manipulations recorded Internal Provenance System 8

More prototyping Next steps Developing more comprehensive set of rules for curation and preservation Finish Shibboleth & provenance integration Dynamic deployment of rules Prototypes -> production Acknowledgements Thanks for contributions from: Tobias Blanke, King s College London Adil Hasan, University of Liverpool Jens Jensen, Science & Technology Facilities Council Andrea Weise, Science & Technology Facilities Council Also, thanks to the JISC which funded part of the work. 9

Contacts mark.hedges at kcl.ac.uk 10