Digital Assets Repository 3.0 PASIG User Group Conference Noha Adly Bibliotheca Alexandrina
DAR 3.0 DAR manages the full lifecycle of a digital asset: its creation, ingestion, metadata management, storage, dissemination, publishing and archival An eco-system of components for an integrated institutional repository.
DAR 3.0 Modular design with integrated components Consolidation of assets Flexible content model for different types of digital objects based on current standards Integration with different sources of metadata, e.g ILS, repositories, databases, Repository-bound applications Preservation
Conceptual Overview Digital Assets Factory (DAF) Flexible management for the digitization workflow Unified means of ingestion into the system Support both physical and born digital materials Digital Assets Metadata (DAM) manages the metadata even in an incomplete state. Digital Assets Publishing (DAP) components allow applications to synchronize objects and their metadata stored in their databases/indexes with the repository Digital Assets Keeper (DAK) manages access to the object files, versions and caching.
Conceptual Overview Collections/Sets: DAR manages one instance of the object Objects are consolidated into sets/collections An object can belong to different sets Objects are shared among applications Applications synch with repository getting latest updates of their objects Applications maintain different derivatives of same object Relies on RDF to define sets and relations between objects
Conceptual Overview Discovery layer Core files are kept online on spinning drives Simple derivatives for display Users can browse and search using simple viewers Provides full text search across the whole collection, based on the access rights granted to the user. Ingestion plugins Flexible Integration with different sources of metadata Allow ingestion and synchronization with external sources
Digital Assets Factory (DAF) Full control over the digitization process workflow Configurable and flexible management tool for any digitization workflow Flexible workflow definition including Definition of sequence of phases Pre-phase and post-phase checks Redirects Special workflows are defined for different object types
Digital Assets Factory (DAF) Automated integrity checks at each step of the workflow. Automated ingestion into the repository and archiving. Integrates with external sources of metadata thru plugins Integrates with enterprise tools and automated software used for digitization Compliant with OAIS Available for download at http://wiki.bibalex.org/dafwiki
Metadata Management METS and MODS standards for recording metadata Fedora as a metadata registry Content Models (Hybrid) Photo (atomistic) / Album (aggregate) Book (compound ) / Bibliographic (aggregate)
Triple Store and Handles Triple Store RDF relations between objects are stored in Triple Store Currently using Mulgara Scalability Issues Alternatives: 4Store? Integration with Fedora Handles Each object has a unique identifier UUID UUID is used to generate Handle list of external identifiers is maintained
METS Store A METS skeleton is created for each object even if metadata is incomplete When metadata complete, send to Fedora and disseminate Accommodate digitizing objects before metadata is ready METS store can be used to reconstruct Fedora
Metadata Synchronization External sources Synchronization is based on XML templates Templates map the output of ILS or DB into MODS Templates can be easily created for different sources Metadata Tool No source of info to extract metadata Relies on human data entry (normal users) Generates human friendly forms thru configurable XML templates Offers type validation, controlled vocabulary, authority lists Metadata is synchronized with METS store Allows full text search (Solr) across items in sets/collections Represent s objects in a hierarchy depicting sets /collections Supports simple workflow with designated roles e.g. editors, reviewers, etc.
Copyright and Access Module Access control policy for specific sets or objects Can define rights to certain operations (e.g. view, print, download etc) based on the application requesting access Can define exceptions to override rules (e.g. prevent a certain object from being displayed) Coordinate access to objects based on the number of licenses
Authentication and Authorization Single Sign On module Set management and ACLs LDAP integration and local users
Digital Assets Keeper Keep a working copy of the object online Maintain a unique copy of the object with persistent identifier Handle entries and external identifiers A storage abstraction layer isolate repository from storage implementation Manages different versions of items Manages caching and derivates Load balancing among nodes
Online Archive (OnA) Complete hardware and software solution for archival Provides reliable and scalable storage based on commodity hardware with spinning hard drives uses in house developed software for data management Any AIP ingested is mirrored at least once Heavily relies on Checksums to ensure the integrity of the data
Digital Assets Publishing (DAP) Different Viewers and applications are built using the Restful API Applications are highly integrated with repository; not separate silos: Repository-bound DAR manages one instance of each object Applications have access to slice of the data (Sets of Objects) based on their access rights Applications synch with DAR: queries API for new or updated metadata and files Applications maintain different derivatives independently
Discovery Layer Stores simple derivatives for all objects Users can browse and search all assets stored within using simple viewers. Provides full text search across the metadata and textual content, based on the access rights granted to the user. Full text search is built on Solr with support for 5 languages: Arabic, English, French, Spanish and Italian
Current Status More than 430,000 objects including Books Photos Manuscripts Maps Documents Specialized viewers been built to display items stored within the repository, such as books and photos. More viewers are still under e.g. tiled image viewer and manuscript viewer. Print on demand (POD) integration layer makes part of DAR available through the POD system. Several interfaces can also be built on top of this API to integrate DAR with other systems.
DAR Books Application built on top of DAR using Restful API displays books stored in the repository (185,000) Faceted Search, including content Morphological full text search (5 languages) Search results highlighting Embeddable book viewer, can be added to any webpage. Whenever a book is added to or updated in DAR, it is automatically retrieved by DAR books.
DAR Books Annotations Tools Sticky Notes Highlight and underline, colors More to come Open Annotations, Annotea, etc Web 2.0 Social Features: Rating and comments Create your own bookshelves Sharing and embedding Adding to other social sites: Facebook, Twitter,
Text Highlighting
Text Underlining
Adding Sticky Notes
Future Work Enhance the Storage Layer: exploring irods, pair trees etc Extending the Copyright and Access module Explore the potential of triple stores Beyond defining sets and collections Scalability Migrating existing applications into repository-bound
Thank You