DA-NRW: A distributed architecture for longterm preservation



Similar documents
Digital preservation in the Die Deutsche Bibliothek and cooperative initiatives in Germany

Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context

Long-term archiving and preservation planning

Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria

Ex Libris Rosetta: A Digital Preservation System Product Description

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Database Preservation Toolkit: a flexible tool to normalize and give access to databases

Long-term preservation activities of the Bavarian State Library

Digital Preservation. OAIS Reference Model

The challenges of becoming a Trusted Digital Repository

Columbia University Libraries / Information Services

Preservation Action: What, how and when? Hilde van Wijngaarden Head, Digital Preservation Department National Library of the Netherlands

Preserving digital data - risk assessment and digital preservation strategies

Data grid storage for digital libraries and archives using irods

Chapter 5: The DAITSS Archiving Process

DIGITAL ARCHIVES & PRESERVATION SYSTEMS

Database preservation toolkit:

2009 ikeep Ltd, Morgenstrasse 129, CH-3018 Bern, Switzerland (

Preserving French Scientific data

Technical concepts of kopal. Tobias Steinke, Deutsche Nationalbibliothek June 11, 2007, Berlin

Interagency Science Working Group. National Archives and Records Administration

nestor-studies 10 Into the Archive A guide for the information transfer to a digital repository Draft for public comment

#MMTM15 #INFOARCHIVE #EMCWORLD 1

Long Term Knowledge Retention and Preservation

Digital Preservation: the need for an open source digital archival and preservation system for small to medium sized collections,

Best Archiving Practice Guidance

Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora

Digital Preservation Workshop

Copying Archives. Ngoni Munyaradzi (MNYNGO001)

Data Warehouses in the Path from Databases to Archives

Digital Repository Initiative

Summary Report of the PREMIS Implementation Fair

INTEGRATING RECORDS SYSTEMS WITH DIGITAL ARCHIVES CURRENT STATUS AND WAY FORWARD

Florida Digital Archive (FDA) Policy and Procedures Guide

Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)

DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories

Cost and Value analysis of digital data archiving ANNA PALAIOLOGK

Digital Preservation Services: State of the Art Analysis. Raivo Ruusalepp and Milena Dobreva

Florida Digital Archive (FDA) Policy and Procedures Guide

Harnessing Media Micro-Services for Stewardship of Digital Assets at CUNY TV. NDSR Project:

The Rutgers Workflow Management System. Workflow Management System Defined. The New Jersey Digital Highway

Assessing a Scientific Data Center as a Trustworthy Digital Repository

Technology Watch Report

Implementing Open Source Systems for Digital Asset Management and Preservation

PAPER Data retrieval in the PURE CRIS project at 9 universities

Preservation and Dissemination Policy of the LISS Data Archive

METADATA STANDARDS AND GUIDELINES RELEVANT TO DIGITAL AUDIO

dati.culturaitalia.it a Pilot Project of CulturaItalia dedicated to Linked Open Data

Big Data in the Digital Cultural Heritage

Building An Institutional Repository With DSpace

Response to Invitation to Tender: requirements and feasibility study on preservation of e-prints

Islandora: An Open Source Institutional Repository Solution. Consortium of MnPALS Libraries Annual Meeting April 2014

Building Semantic Content Management Framework

FOR A PAPERLESS FUTURE. Petr DOLEJŠÍ Senior Solution Consultant SEFIRA Czech Republic

Fixity Checks: Checksums, Message Digests and Digital Signatures Audrey Novak, ILTS Digital Preservation Committee November 2006

North Carolina Digital Preservation Policy. April 2014

<Insert Picture Here> Solution Direction for Long-Term Archive

Oxford Digital Asset Management System (DAMS) Update

ARCHIVEMATICA: USING MICRO-SERVICES AND OPEN-SOURCE SOFTWARE TO DELIVER A COMPREHENSIVE DIGITAL CURATION SOLUTION

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Organization of VizieR's Catalogs Archival

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

METS and the CIDOC CRM a Comparison

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

Collaborative Open Market to Place Objects at your Service

Service Oriented Architecture

Transcription:

DA-NRW: A distributed architecture for longterm preservation Manfred Thaller, Sebastian Cuy, Jens Peters, Daniel de Oliveira, Martin Fischer Universität zu Köln International Workshop on Semantic Digital Archives Berlin, September 29 th 2011

AIP AIP AIP AIP Digitales Archiv NRW DA NRW: Systemarchitektur IT Verbund Prozess: Access RZZ mit (zertifizierten) Digitalen Archiven Prozesse: Daten-Management, Administration HBZ UzK LVR Ifk NN IT-Dienstleister Präsentation (HBZ) Prozess: Access LZA- Reposi tory Content Broker LZA- Reposi tory Content Broker LZA- Reposi tory Content Broker LZA- Reposi tory Content Broker u.a. DIP Presen tation Repo sitory DIP Presenter DIP Portale Prozess: SIP SIP SIP SIP Ingest Kultur- und Wissenschaftseinrichtungen in NRW Produzenten (Behörden, Einrichtungen, Verlage, Private ) SIP = Submission Information Packages AIP = Archival Information Packages DIP =Dissemination Information Packages Wolf-Rüdiger Schleidgen: Digitales Archiv NRW, 06.07.2011 2

Basic (political) concepts 1 / 2 (1) Each cultural heritage institution in North Rhine Westphalia (NRW) can deposit digital material into the Digital Archive of the state. (2) Such institutions are: Libraries, museums, archives, universities (publications as well as examination records), archaeological survey institutes, audiovisual archives (3) This includes a technology watch triggering migrations as necessary.

Basic (political) concepts 2 / 2 (4) The Digital Archive is a cluster of synchronized digital repositories at existing institutions. As a whole it implements OAIS. (5) It also contains a presentation component which holds copies for unrestricted access via the large cultural heritage portals. (6) Each depositor can decide, down to the item level, which policies are to be applied to the stored objects.

Technical purpose (1) Prove of concept. (2) Precise financial planning data for long term viability. (3) For a restricted number of data formats and metadata standards. (4) With a capacity of ca. 200 TB. (5) No restrictions on scalability of number of formats and metadata standards. (6) Capacity scalable by at least one order of magnitude. (7) Supposed to be operative in April 2012.

Small print If the Eifel volcanoes explode, and only individual media survive, they should still be fully self descriptive.

AIP AIP AIP AIP Digitales Archiv NRW DA NRW: Systemarchitektur IT Verbund Prozess: Access RZZ mit (zertifizierten) Digitalen Archiven Prozesse: Daten-Management, Administration HBZ UzK LVR Ifk NN IT-Dienstleister Präsentation (HBZ) Prozess: Access LZA- Reposi tory Content Broker LZA- Reposi tory Content Broker LZA- Reposi tory Content Broker LZA- Reposi tory Content Broker u.a. DIP Presen tation Repo sitory DIP Presenter DIP Portale Prozess: SIP SIP SIP SIP Ingest Kultur- und Wissenschaftseinrichtungen in NRW Produzenten (Behörden, Einrichtungen, Verlage, Private ) SIP = Submission Information Packages AIP = Archival Information Packages DIP =Dissemination Information Packages Wolf-Rüdiger Schleidgen: Digitales Archiv NRW, 06.07.2011 7

Object DB LZA Repository AIP Pres Repository DIP Disposer Packager Contract DB Decider Format DB Listener Grapper Handle SIPs Harvest SIPs

Synchronizer Hardware Archive 1 Hardware Archive 2 Hardware Archive 3 Portal Hardware Presenter Software

Contracts 1 / 2 Generally speaking, an object will be kept on the repositories the system chooses and made available without restriction to the whole world. A contract may: Restrict an object to specific repositories. Keep it out of the presentation system. Restrict the amount of metadata delivered to harvesters. Restrict the quality of data objects disclosed to harvester.

Contracts 2 / 2 A contract may define these restrictions on: an institutional level. a media level. a object group level. the level of an individual object. out of which the concretely applicable contract for each archived object is computed.

Small print printed large If the Eifel volcanoes explode, and only individual media survive, they should still be fully self descriptive. Translated into: No separation of data and metadata, ever. Therefore use BagIt objects.

BagIt Container format for AIPs: paket4223/ data/ bitstreams/ img2346.tif img2347.tif metadata/ dc.xml premis.xml mets.xml manifest-md5.txt bagit.txt

<rightsextension> <ar xmlns="http://danrw.de/access" xmlns:xsi="http://www.w3.org/2001/xmlschemainstance" schemalocation="http://danrw.de/access file:/da- NRW/RechtemanagementAKI/PremisAbbildung/ar.xsd"> <conditions> <PUBLICATION > <ip_range required="false"> <val>44.56.81.211</val> <val>64.56.81.211</val> </ip> <displayresolutionimage> <width_px>200</width_px> </displayresolutionimage> <streamingquality> <kbps>64</kbps> </streamingquality> <pagelimit> <pages>2</pages> </pagelimit> </PUBLICATION> </conditions> </ar> </rightsextension>

BagIt / Premis extended Premis within BagIt extended by contractual information stored within objects in OWL and OWL representation of contracts handled as contract DB within a triplestore. Probably.

Technology decisions (1) Own uppermost orchestration layer (2) supported by irods micro service layer https://www.irods.org (3) No Fedora, when performance counts. Use inpresentation level, however. http://fedora-commons.org

Standards to be used (1) LTP objects: BagIT http://tools.ietf.org/html/draft-kunzebagit-06 (2) object identifiers: URN (3) Structural metadata: METS / MODS, as far as applicable; otherwise domain specific equivalent (4) [ format identifiers: PRONOM / UDFR version, RDF based ] (5) PREMIS: PREMIS. Currently XML binding, probably RDF or UML binding

Why probably? If the Eifel volcanoes explode, and only individual media survive, they should still be fully self descriptive. Do Semantic Technologies make long term preservation easier or harder? Gladney: Digital Archiving v. Long Term Preservation

Thank you for listening! sebastian.cuy@uni-koeln.de d.de-oliveira@uni-koeln.de martin.fischer@uni-koeln.de jens.peters@uni-koeln.de manfred.thaller@uni-koeln.de