Data grid storage for digital libraries and archives using irods
|
|
|
- Frederica Day
- 10 years ago
- Views:
Transcription
1 Data grid storage for digital libraries and archives using irods Mark Hedges, Centre for e-research, King s College London eresearch Australasia, Melbourne, 30 th Sept Background: Project History Data grid project at AHDS and STFC ended at demise of AHDS Used SRB (Storage Resource Broker) AHDS Executive -> Centre for e- Research at King s College London Centre incorporates staff and expertise of AHDS and other groups Continuity, but some change of focus New data grid project (using irods) 1
2 Background: Data Challenge History Ongoing growth of corpora due to major digitisation projects Highly diverse in type and Visual size: Arts images, text, music, video, database, Performing multi-media Arts Archaeology Require specialised knowledge Literature/Linguistics Highly complex, contextual, fuzzy, uncertain, inconsistent, incomplete Rapid expansion: AHDS data size increased 20-fold between 2005 & 2008 Increasing number of large objects (e.g. video, archaeology scans) Data Grids Storage Resource Broker (SRB), a widely-used data grid technology developed by the San Diego Super Computer Center Addresses storage issues for digital repository and preservation environments Provides uniform, searchable access to virtualised, distributed resources, so DL is insulated from: physical location of data types of storage migrating to new hardware Scalable as library grows, new resources can be added dynamically Auditing facilities 2
3 SRB Storage client application (e.g. digital repository) SRB storage datastream1 object1 client request client response datastream2 datastream3 disseminator disseminator impl (web service) get Entire object retrieved object2 object3 object1 object2 distributed / virtualised Issues Not open source Very effective for storage management, but not integrated with wider infrastructure. Not easy to integrate application-specific requirements (either change the core code, or implement in client, or use proxy commands) some examples in later slides. No built-in implementation of workflow (have to script this outside SRB, whether server or client side), or of asynchronous processing. Requires choreography between SRB admin and person running workflow. Relatively restricted support for metadata extension 3
4 irods The open source successor to SRB Provides similar data virtualisation Rule-Oriented Data management System Rule Engine allows data management policies to defined and realised as rules Rules are sets of operations that you want to impose on an object (e.g. file, user, resource, ). Rules allow virtualisation of policies the digital library is insulated from how these policies are implemented. What are rules? Rules built up cumulatively from atomic operations called micro-services Micro-services and rules can be added and modified to meet local needs Triggered by certain events: Eventcondition-action model Great potential to hide processing from application layer Create server-side workflows 4
5 Definition of rules The components of a rule definition are as follows: actiondef condition workflowchain recoverychain Where: actiondef identifies the action to be carried out condition is necessary condition for execution workflowchain is sequence of actions to be executed recoverychain is corresponding sequence of recovery actions (to ensure consistent state). Rule can be built up cumulatively from other rules. Data passed into/within rules (via parameters/context). Examples of rule use Some examples of using rules: Digital preservation Processing digital material on ingest Fedora disseminators -> rules/microservices Shibboleth integration Integration with provenance systems 5
6 Example: digital preservation Execution triggered when an object has been ingested acpostprocforput accheckobjectintegrity## acanalyseobject## acnormaliseobject## msisysrepldataobj(presrescgrp,all) nop##nop##nop##msicleanupreplicas Example Rule processing text objects on ingest Processing depends on type of object. acpostprocforput $format == "application/msword" && $objectcategory= textcategoryx" acvalidateobjfortextprocessingx## acexecutetextprocessingx## acvalidatetextprocessingx nop##msicleanuptextprocx## nop 6
7 Example: data-side processing Fedora retrieves entire objects for processing Inefficient, and not always necessary Implement processing close to the data Fedora disseminators -> irods rules Client-side workflows -> irods rules irods Storage & Rules Client application (e.g. digital repository) irods storage layer + rule execution datastream1 object1 client request client response datastream2 datastream3 disseminator iget / irule rule triggers executes object2 object3 Rule Engine distributed / virtualised rule definition processing impl 7
8 irods Access Management: Shibboleth Apache access request PIP irods+re Capture & store attributes mod_ shib admin attributes Rule response PDP -service -service -service PEP client Client stores data in irods Provenance & irods Rule causes microservice to access external system External Provenance System IRODS + icat + RE Update icat file metadata IRODS + RE Update icat file metadata IRODS System IRODS +RE Rule engine runs, manipulations recorded Internal Provenance System 8
9 More prototyping Next steps Developing more comprehensive set of rules for curation and preservation Finish Shibboleth & provenance integration Dynamic deployment of rules Prototypes -> production Acknowledgements Thanks for contributions from: Tobias Blanke, King s College London Adil Hasan, University of Liverpool Jens Jensen, Science & Technology Facilities Council Andrea Weise, Science & Technology Facilities Council Also, thanks to the JISC which funded part of the work. 9
10 Contacts mark.hedges at kcl.ac.uk 10
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)
INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) Todd BenDor Associate Professor Dept. of City and Regional Planning UNC-Chapel Hill [email protected] http://irods.org/ SESYNC Model Integration Workshop Important
irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire [email protected] 25th
irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire [email protected] Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...
Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.
Policy-driven Distributed Data Management (irods) Richard Marciano [email protected] Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable
Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora
Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora David Pcolar Carolina Digital Repository (CDR) [email protected] Alexandra Chassanoff School of Information &
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI!
irods Overview Intro to Data Grids and Policy-Driven Data Management!!Leesa Brieger, RENCI! Reagan Moore, DICE & RENCI! Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Current
Integrated Rule-based Data Management System for Genome Sequencing Data
Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer
Oxford Digital Asset Management System (DAMS) Update
Oxford Digital Asset Management System (DAMS) Update Neil Jefferies R&D Project Manager Systems & eresearch Services (SERS) Oxford University Library Services (OULS) Agenda Overview Fedora-Commons Honeycomb/ST5800
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
Automated and Scalable Data Management System for Genome Sequencing Data
Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs
irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories
irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories Reagan W. Moore Arcot Rajasekar Mike Wan {moore,sekar,mwan}@diceresearch.org h;p://irods.diceresearch.org
Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007
Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the
Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)
Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM) Oracle's Sun Storage Archive Manager (SAM) self-protecting file system software reduces operating costs by providing data
Columbia University Libraries / Information Services
Stephen Davis, October 28, 2010 Columbia University Libraries / Information Services Digital Asset Management Digital Preservation Digital Publishing Introductions Stephen Paul Davis Director, Libraries
Building Semantic Content Management Framework
Building Semantic Content Management Framework Eric Yen Computing Centre, Academia Sinica Outline What is CMS Related Work CMS Evaluation, Selection, and Metrics CMS Applications in Academia Sinica Concluding
PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK E-mail: [email protected]
SCALA: A Framework for Graphical Operations for irods KEK E-mail: [email protected] Adil Hasan University of Liverpool E-mail: [email protected] Yoshimi Iida KEK E-mail: [email protected] Francesca
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods
irods Overview Introduction to Data Grids, Policy-Driven Data Management, and Enterprise irods Renaissance Computing Institute (RENCI) A research unit of UNC Chapel Hill Directed by Stan Ahalt, formerly
Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina
Digital Assets Repository 3.0 PASIG User Group Conference Noha Adly Bibliotheca Alexandrina DAR 3.0 DAR manages the full lifecycle of a digital asset: its creation, ingestion, metadata management, storage,
Abstract. 1. Introduction. irods White Paper 1
irods: integrated Rule Oriented Data System White Paper Data Intensive Cyber Environments Group University of North Carolina at Chapel Hill University of California at San Diego September 2008 Abstract
<Insert Picture Here> Solution Direction for Long-Term Archive
1 Solution Direction for Long-Term Archive Donna Harland Oracle Optimized Solutions: Solutions Architect Program Agenda Archive Layers SAM QFS connectivity for
PASIG May 12, 2012. Jacob Farmer, CTO Cambridge Computer
Adding Intelligence to Conventional NAS and File Systems: Metadata, Backups, and Data Life Cycle Management PASIG May 12, 2012 Presented by: Jacob Farmer, CTO Cambridge Computer Copyright 2009-2011, Cambridge
The Australian War Memorial s Digital Asset Management System
The Australian War Memorial s Digital Asset Management System Abstract The Memorial is currently developing an Enterprise Content Management System (ECM) of which a Digital Asset Management System (DAMS)
Digital libraries of the future and the role of libraries
Digital libraries of the future and the role of libraries Donatella Castelli ISTI-CNR, Pisa, Italy Abstract Purpose: To introduce the digital libraries of the future, their enabling technologies and their
THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy 2015-2018. Page 1 of 8
THE BRITISH LIBRARY Unlocking The Value The British Library s Collection Metadata Strategy 2015-2018 Page 1 of 8 Summary Our vision is that by 2020 the Library s collection metadata assets will be comprehensive,
irods at CC-IN2P3: managing petabytes of data
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h
Long-term archiving and preservation planning
Long-term archiving and preservation planning Workflow in digital preservation Hilde van Wijngaarden Head, Digital Preservation Department National Library of the Netherlands The Challenge: Long-term Preservation
Columbia University Digital Library Architecture. Robert Cartolano, Director Library Information Technology Office October, 2009
Columbia University Digital Library Architecture Robert Cartolano, Director Library Information Technology Office October, 2009 Agenda Technology Architecture Off-site NYSERNet Facility Ingest, curation
Cambridge University Library. Working together: a strategic framework 2010 2013
1 Cambridge University Library Working together: a strategic framework 2010 2013 2 W o r k i n g to g e t h e r : a s t r at e g i c f r a m e w o r k 2010 2013 Vision Cambridge University Library will
Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning
Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning Simon Lambert Scientific Computing Department STFC Rutherford
How To Use Open Source Software For Library Work
USE OF OPEN SOURCE SOFTWARE AT THE NATIONAL LIBRARY OF AUSTRALIA Reports on Special Subjects ABSTRACT The National Library of Australia has been a long-term user of open source software to support generic
Appendix A. Functional Requirements: Document Management
Appendix A. Functional Requirements: Document Management Document Management technology helps organizations better manage the creation, revision, and approval of electronic documents. It provides key features
How to avoid building a data swamp
How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make
Data Management using irods
Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC [email protected] 2 Course outline Why talk about irods? What is irods?
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager [email protected] Structure! Background and overview! OAIS Model! Why
WHY DIGITAL ASSET MANAGEMENT? WHY ISLANDORA?
WHY DIGITAL ASSET MANAGEMENT? WHY ISLANDORA? Digital asset management gives you full access to and control of to the true value hidden within your data: Stories. Digital asset management allows you to
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data
Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 [email protected] San Diego Supercomputer Center
Filestor Digital Asset Management. The way it works
Filestor Digital Asset Management The way it works Filestor is an Advanced Digital Asset Management System Filestor is far more than a Digital Asset Management System as it has been designed to be flexible
Let s Talk Digital An Approach to Managing, Storing, and Preserving Time-Based Media Art Works. DAMS Branch Manager Smithsonian Institution, OCIO
Let s Talk Digital An Approach to Managing, Storing, and Preserving Time-Based Media Art Works DAMS Branch Manager Smithsonian Institution, OCIO 1 Love and Marriage Digital Asset Management?? Digital Diamonds
Response to Invitation to Tender: requirements and feasibility study on preservation of e-prints
Response to Invitation to Tender: requirements and feasibility study on preservation of e-prints A proposal to the JISC from the Arts and Humanities Data Service and the University of Nottingham, Project
Providing an Effective Intranet Knowledge Base Using Linux and Open Source Software
Providing an Effective Intranet Knowledge Base Using Linux and Open Source Software Presenters https://www.kbcasestudy.info Clay Wells :: Systems Programmer Sr., University of Pennsylvania Lance Barbour
North Carolina Digital Preservation Policy. April 2014
North Carolina Digital Preservation Policy April 2014 North Carolina Digital Preservation Policy Page 1 Executive Summary The North Carolina Digital Preservation Policy (Policy) governs the operation,
DA-NRW: A distributed architecture for longterm preservation
DA-NRW: A distributed architecture for longterm preservation Manfred Thaller, Sebastian Cuy, Jens Peters, Daniel de Oliveira, Martin Fischer Universität zu Köln International Workshop on Semantic Digital
Laserfiche. and SharePoint Integration. Your potential, realized. Learn More Inside. Include imaged documents in collaborative processes.
Laserfiche and SharePoint Integration Your potential, realized. With the Laserfiche and SharePoint Integration components included with Laserfiche Web Access, Laserfiche s industry-leading document imaging
Entitlements Access Management for Software Developers
Entitlements Access Management for Software Developers Market Environment The use of fine grained entitlements and obligations control for access to sensitive information and services in software applications
Mercy Baggot Street Canopy Intranet
Mercy Baggot Street Canopy Intranet www.appnovation.com Mercy Baggot Street Canopy Intranet Contents 1.0 Background P. 3 2.0 Project Overview P. 4 3.0 Modules P. 7 4.0 Other Technologies P. 9 * This project
northplains Whitepaper Differentiating DAM from ECM What Do You Really Need? Connecting your world. Visually.
Whitepaper Differentiating DAM from ECM What Do You Really Need? Both analysts and organizations can agree on one thing - the recent and unprecedented rise in demand for multi-media assets has driven the
Digital Repository Initiative
Digital Repository Initiative 01 April 2014 Ray Frohlich Director, Enterprise Systems and Infrastructure with Euan Cochrane Digital Preservation Manager, Preservation Michael Friscia Manager of Digital
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent
Technical concepts of kopal. Tobias Steinke, Deutsche Nationalbibliothek June 11, 2007, Berlin
Technical concepts of kopal Tobias Steinke, Deutsche Nationalbibliothek June 11, 2007, Berlin 1 Overview Project kopal Ideas Organisation Results Technical concepts DIAS kolibri Models of reusability 2
Solution Brief: Archiving Avid Interplay Projects using NLT and XenData
Solution Brief: Archiving Avid Interplay Projects using NLT and XenData Contents 1. Introduction to the Open Interplay Archive 2. Solution Benefits 3. System Architecture 4. How to Archive and Restore
Managing the Unmanageable: A Better Way to Manage Storage
Managing the Unmanageable: A Better Way to Manage Storage Storage growth is unending, but there is a way to meet the challenge, without worries about scalability or availability. October 2010 ISILON SYSTEMS
James Hardiman Library. Digital Scholarship Enablement Strategy
James Hardiman Library Digital Scholarship Enablement Strategy This document outlines the James Hardiman Library s strategy to enable digital scholarship at NUI Galway. The strategy envisages the development
How To Build A Map Library On A Computer Or Computer (For A Museum)
Russ Hunt OCLC Tools Managing & Preserving Digitised Map Libraries Keywords: Digital collection management; digital collection access; digital preservation; long term preservation. Summary This paper explains
The Rutgers Workflow Management System. Workflow Management System Defined. The New Jersey Digital Highway
The Rutgers Workflow Management System Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries Presented at the 2007 LITA National Forum Denver, Colorado Workflow Management System
Facing the Hydra alone
Facing the Hydra alone Three case studies Richard Green, University of Hull Chris Awre, University of Hull Simon Lamb, University of Hull Steven Ng, Temple University Adam Wead, Penn State University What
Archiving A Dell Point of View
Archiving A Dell Point of View Dell Product Group 1 THIS POINT OF VIEW PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED
A Selection of Questions from the. Stewardship of Digital Assets Workshop Questionnaire
A Selection of Questions from the Stewardship of Digital Assets Workshop Questionnaire SECTION A: Institution Information What year did your institution begin creating digital resources? What year did
- a Humanities Asset Management System. Georg Vogeler & Martina Semlak
- a Humanities Asset Management System Georg Vogeler & Martina Semlak Infrastructure to store and publish digital data from the humanities (e.g. digital scholarly editions): Technically: FEDORA repository
Managing Physical and Digital Assets for Unified Distribution Workflows
Managing Physical and Digital Assets for Unified Distribution Workflows Ron Peeters, Executive Vice President Xytech Systems Corporation [email protected] Tuesday, February 11, 2008 Abstract Production
Planning and Infrastructure for Analog to Digital Preservation Projects
Planning and Infrastructure for Analog to Digital Preservation Projects Linda Tadic Director of Operations ARTstor [email protected] Why digitize? To preserve and provide access to content : Content that
Running Hydra in a small shop
Running Hydra in a small shop Three case studies Richard Green, University of Hull Adam Wead, Penn State University with help from Chris Awre, University of Hull Simon Lamb, University of Hull Steven Ng,
Digital Preservation. OAIS Reference Model
Digital Preservation OAIS Reference Model Stephan Strodl, Andreas Rauber Institut für Softwaretechnik und Interaktive Systeme TU Wien http://www.ifs.tuwien.ac.at/dp Aim OAIS model Understanding the functionality
