A Digital Library Feasibility Study



Similar documents
How To Use Open Source Software For Library Work

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context

The Australian War Memorial s Digital Asset Management System

A Proof of Concept Cloud Based Solution. Mark Evans Tessella Inc. PASIG Austin, TX - January 13 th 2012

Ex Libris Rosetta: A Digital Preservation System Product Description

The challenges of digital preservation to support research in the digital age

Content Management Playout Encryption Broadcast Internet. Content Management Services

Brown County Information Technology Aberdeen, SD. Request for Proposals For Document Management Solution. Proposals Deadline: Submit proposals to:

Digital Preservation Strategy,

Functional Requirements for Digital Asset Management Project version /30/2006

IFI Irish Film Archive Digital Preservation & Access Strategy

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Columbia University Libraries / Information Services

CASE STUDY: THE PARLIAMENTARY ARCHIVES

DIGITAL ASSET WORKFLOW

Shared service components infrastructure for enriching electronic publications with online reading and full-text search

Newspaper Digitization Brief Background

Response to Invitation to Tender: requirements and feasibility study on preservation of e-prints

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support

How To Build A Connector On A Website (For A Nonprogrammer)

Digital Asset Management

DOCUMATION S ACCOUNTS RECEIVABLE SOLUTION

Preservation Action: What, how and when? Hilde van Wijngaarden Head, Digital Preservation Department National Library of the Netherlands

Progress Report Template -

Intelligent Document Platform (eforms) and File Upload

Questionnaire on Digital Preservation in Local Authority Archive Services

DIGITAL ASSET WORKFLOW

On the Radar: Tessella

PROJECT INITIATION DOCUMENT

Sharepoint vs. inforouter

SEVENTH FRAMEWORK PROGRAMME THEME ICT Digital libraries and technology-enhanced learning

Why Your Library Should Move to Ex Libris Alma. An Ex Libris Alma Solution Brief

Æ Æ. PROVYS TVoffice. management reporting. evaluation. planning. integrated workflow and information sharing environment. asset procurement costs

A Selection of Questions from the. Stewardship of Digital Assets Workshop Questionnaire

How To Build A Map Library On A Computer Or Computer (For A Museum)

Document Management System. Developed by

Interoperability between Sun Grid Engine and the Windows Compute Cluster

Hosted SharePoint 2013 for Business

Adlib Internet Server

DOCUMATION S DOCUMENT MANAGEMENT

Integrated Rule-based Data Management System for Genome Sequencing Data

THE BRITISH LIBRARY BOARD BLB 12/29

Information Technology Strategic Plan

The Key Elements of Digital Asset Management

Institutional Repositories: Staff and Skills requirements

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy Page 1 of 8

Statement of Work (SOW) for Web Harvesting U.S. Government Printing Office Office of Information Dissemination

BUSINESS PROCESS AUTOMATION. Document digitisation and records management solutions. Delivering value. Enabling success. Integrated Services

Library & Technology Services Web Technology Services

Audiovisual Archive Management System (AVAMS) Project

Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)

Newspaper Preservation. by H.R. Mohan Associate VP (Systems) The Hindu Chennai

Flattening Enterprise Knowledge

How To Manage Your Digital Assets On A Computer Or Tablet Device

Overview of NDNP Technical Specifications

Building Semantic Content Management Framework

Digital Collecting Strategy

sdsys THAGORAS SCISYS UK LTD The National Archives Customer Relationship Management System and Integrated Marketing Solution RESPONSE TO TENDER

Keystone Image Management System

THE CCLRC DATA PORTAL

SwiftStack Filesystem Gateway Architecture

Digital libraries of the future and the role of libraries

Institutional Repositories: Staff and Skills Set

Building next generation consortium services. Part 3: The National Metadata Repository, Discovery Service Finna, and the New Library System

K-Series Guide: Guide to digitising your document and business processing. February 2014 LATEST EDITION

<Insert Picture Here> Solution Direction for Long-Term Archive

Cambridge University Library. Working together: a strategic framework

Planning and Infrastructure for Analog to Digital Preservation Projects

Validating Enterprise Systems: A Practical Guide

Managing Physical and Digital Assets for Unified Distribution Workflows

3 C i t y C e n t e r D r i v e S u i t e S t. L o u i s, MO w w w. k n o w l e d g e l a k e. c o m P a g e 3

Long-term archiving and preservation planning

TRANSKRIBUS. Research Infrastructure for the Transcription and Recognition of Historical Documents

Base One's Rich Client Architecture

One System to rule them all - an epic journey to manage our collections

Affordable Digital Preservation for Libraries and Museums

NERC Biodiversity and Ecosystem Service Sustainability (BESS) Data Management Strategy

BarTender s.net SDKs

Transcription:

A Digital Library Feasibility Study C. Henshaw, D. Thompson, M. Savage-Jones Wellcome Library London, UK LIBER Annual Conference Aarhus, Denmark June 2010

Introduction 1. Who we are 2. Vision and strategy 3. Aims of the Feasibility Study 4. Methodology 5. Outcomes 6. Concluding remarks

Who we are One of the worlds major resources for the study of medical history Modern and historical collections Special collections archives, manuscripts, artworks, audio/visual Digitised collections, picture library

Vision and strategy Transformation of the Wellcome Library Ambitious plan to digitise up to 30m images over 5+ years. Mainly historic collections, but also modern (subject to copyright clearance). Almost entirely internally funded. Wellcome Digital Library Phase 1: 2010 2012, infrastructure and pilot digitisation 2012: seek remainder of funding for main programme Main programme: 2012 onward, digitise all suitable content

Strategy Phase 1: 2010-2012 Build a sustainable and expandable mechanism for creating, storing and delivering data the foundation stone of the future WDL; Digitise key library holdings relating to one of the Trust s major challenges as set out in the strategic plan ( Modern Genetics and its Foundations); Fund the digitisation of important third party content which complements our holdings; Use innovative web tools to encourage discovery and use of these collections. Explore commercial partnerships for cost-effective digitisation of other parts of our collection

Digitisation Phase 1: 2010-2012 Archival records: Crick, Sanger, etc., 500,000 images Printed books: 1,400 genetics-related books from 1850-1990 Commercial partnership: to be determined! External content: Will identify relevant external collections and fund digitisation; content will be ingested into the Wellcome Digital Library

Infrastructure Do not have the infrastructure required to create an integrated digital library providing a seamless interface across catalogues and digital collections. Digital Library requires several components: 1.Search and Discovery (Encore, catalogues) 2.Digital Delivery system 3.Digital Asset Management system (SDB) 4.Full-text index 5.Workflow system for managing digitisation and ingest

Aims of the Feasibility Study To answer some questions around the infrastructure, primarily: SDB (Safety Deposit Box - preservation system for born digital). Q: Can it be used for large-scale digitisation? Delivery system METS Q: How does it fit into the system architecture, what do we need to look for, what are the options? Q: Is METS really the way to go, do we need a Profile, how will we create METS files? Full-text index Q: How is an index constructed, what do we need, how does it fit into the system architecture? Workflow system Q: What are the key requirements, what are the options,

Methodology SDB: Commissioned the suppliers Tessella to investigate how SDB could be used as a DAM for the digital library. 1. Report Recommend modifications to SDB to meet requirements 2. Proof-of-concept demonstration To demonstrate the capabilities of SDB to ingest and manage digitised content and to make that content available to a 3rd party system.

Methodology CCS: Commissioned CCS (Content Conversion Systems) to provide information and recommendations: 1. Report Recommendations for implementation of: METS Full-text index Front-end delivery, including conversion of JPEG 2000 on-the-fly 2. Proof-of-concept demonstration Using CCS s Veridian system, to demonstrate ability to request and retrieve content from SDB. Did NOT look at front-end design, Web 2.0, authentication

Methodology Workflow system: Research carried out in-house. Came to realise that ad-hoc tracking and project monitoring systems were not suitable for large-scale digitisation and ingest of this content into the DAM. Workflow system requirements: Track and monitor digitisation and ingest activities Aggregate metadata Output XML as METS and other required formats How will the system fit into the overall infrastructure

Outcomes - SDB 1. PoC successfully ingested JPEG 2000 files Used a mocked up SIP containing content and metadata for a Logical Object (book, file of letters, etc.) Characterised the JPEG 2000 on ingest using JHOVE, adding administrative metadata to SDB Does not currently characterise audio/visual formats, but this can be added in at any time with a tool such as MediaInfo 2. PoC successfully delivered content to Veridian Accepted a remote request for content and was able to pass that content to the remote system (Veridian)

Outcomes SDB/Veridian interop Veridian Requests image Image in cache? No Submits webservices request to SDB Yes SDB Informs Veridian call was received SDB processes request and makes image available to ftp server Displays image for end user Sends request callback Veridian Downloads source image from ftp server JPEG copied to cache JPEG2000 converted to JPEG

Further work SDB 1. Investigate further Tessella s recommended modifications to SDB, including development of API s for remote request, and new ingest workflows. 2. Compare the value of customising SDB to other potential DAM systems on the market. 3. Carry out a full tender for the system as appropriate.

Outcomes Delivery system As neither the Library, not SDB, have an appropriate delivery system on hand, CCS s Veridian system was used as part of the proof-of-concept. 1. Veridian successfully requested and retrieved content from SDB 2. Veridian successfully converted JPEG 2000 files onthe-fly, using a limited cache. The Library will use JPEG 2000 archive and access files for all of its image content. It was important to look at how onthe-fly conversion could be implemented and whether it was feasible. 3. On-the-fly conversion was slow Remote locations meant using the Internet, rather than an internal network as in a real life situation.

Further work Delivery 1. Actual speed of content delivery to users to be tested further using the Wellcome s server and storage network. 2. Draw up complete specifications for delivery system. 3. Carry out a full tender for the system.

Outcomes Full-text index We will OCR all printed text. CCS made some recommendations: 1. Use of ICR (Intelligent Content Structure) 2. Dictionaries, word lists, alternate spellings 3. Set up indexing profiles for different types of content 4. Solr, based on Lucene was a suitable architecture for a large-scale word index 5. Consider transcribing hand-written content

Further work Full-text index 1. Investigate further the recommended indexing options to improve search results. 2. Investigate further how the index could be accessed by Encore. 3. Draw up specifications for indexing solution. 4. Carry out tender for the final system as appropriate.

Outcomes METS We expect to use METS as a wrapper for descriptive and administrative metadata. This would be used by the delivery system. CCS made some recommendations: 1. METS is a useful metadata format for this purpose. 2. The Wellcome should develop its own profile as a reference. 3. Use METS/ALTO for full-text content to provide a structure for the textual content 4. Use MODS and MIX metadata standards in the METS

Further work METS 1. Ensure METS will indeed be implemented by chosen delivery system. 2. Determine what metadata standard(s) to use in the METS for descriptive and technical metadata. 3. Consider further the use of ALTO extension. 4. Finalise model for Wellcome METS profile.

Outcomes Workflow system It became clear through looking at potential existing workflow systems on the market, that metadata processing should be carried out by a separate system. 1. A workflow tracking system (WTS) should be implemented 2. This should focus on tracking and managing processing of content and ingest. 3. A separate system a metadata normaliser (MNS) - should be implemented

Outcomes WTS key requirements 1. Allow projects to be managed on a Project, Batch and Unit level 2. Associate descriptive metadata with each unit using barcodes 3. Perform command line actions (such as converting images to JPEG 2000) where possible 4. Allow for flexibility in workflow steps for different workstreams 5. Store metadata in an industry-standard database

Outcomes MNS key requirements 1. Separate system, but would utilise the same database as the WTS 2. Map descriptive metadata from the Library cataloguing systems to a set of database fields 3. Map administrative metadata from the DAM (file names and unique identifiers, etc.) 4. Aggregate all ingested metadata into a unified databse 5. Output XML from the database using a number of templates (depending on type of content), e.g. METS

Further work WTS and MNS 1. Complete specifications for WTS and MNS 2. Investigate options for off-the-shelf and bespoke systems. 3. Carry out a full tender for the systems.

Concluding remarks Feasibility study gave us: 1. Far more understanding about how all the elements of a Digital Library fit together. 2. The tools to develop precise specifications for what we need to develop and/or procure. 3. Ability to start addressing the issues around gaps and dependencies in existing and new systems. 4. Insights into how to work with suppliers, particularly where multiple suppliers need to communicate with each other. 5. Plan of action to start actually developing our Digital Library