MultiMimsy database extractions and OAI repositories at the Museum of London



Similar documents
Functional Requirements for Digital Asset Management Project version /30/2006

Open Access Repositories Technical Considerations. Introduction. Approaches to Setting up Repositories

How To Manage Your Digital Assets On A Computer Or Tablet Device

Building integration environment based on OAI-PMH protocol. Novytskyi Oleksandr Institute of Software Systems NAS Ukraine

Taking full advantage of the medium does also mean that publications can be updated and the changes being visible to all online readers immediately.

2015, André Melancia (Andy.PT) 1

OCLC CONTENTdm. Geri Ingram Community Manager. Overview. Spring 2015 CONTENTdm User Conference Goucher College Baltimore MD May 27, 2015

Building An Institutional Repository With DSpace

OPENGREY: HOW IT WORKS AND HOW IT IS USED

Oct 15, Internet : the vast collection of interconnected networks that all use the TCP/IP protocols

OCLC CONTENTdm and the WorldCat Digital Collection Gateway Overview

Flattening Enterprise Knowledge

The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

Improving software quality with an automated build process

Five Steps to Integrate SalesForce.com with 3 rd -Party Systems and Avoid Most Common Mistakes

Avatier Identity Management Suite

James Hardiman Library. Digital Scholarship Enablement Strategy

Notes about possible technical criteria for evaluating institutional repository (IR) software

DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories

DESKTOP COMPUTER SKILLS

dati.culturaitalia.it a Pilot Project of CulturaItalia dedicated to Linked Open Data

A collaborative platform for knowledge management

Migrating helpdesk to a new server

CHAPTER 5: BUSINESS ANALYTICS

Braindumps.C questions

Evaluation Checklist Data Warehouse Automation

Alfresco. Wiley Publishing, Inc. PROFESSIONAL. PRACTICAL SOLUTIONS FOR ENTERPRISE. John Newton CONTENT MANAGEMENT. Michael Farman Michael G.

DIGITAL OBJECT an item or resource in digital format. May be the result of digitization or may be born digital.

IBM Rational Asset Manager

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Vilas Wuwongse, Thiti Vacharasintopchai, Neelawat Intaraksa Asian Institute of Technology

The Czech Digital Library and Tools for the Management of Complex Digitization Processes

Digital Asset Management Developing your Institutional Repository

Institutional Repositories: Staff and Skills Set

Adam Rauch Partner, LabKey Software Extending LabKey Server Part 1: Retrieving and Presenting Data

UNIVERSE DESIGN BEST PRACTICES. Roxanne Pittman, InfoSol May 8, 2014

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

SPHOL207: Database Snapshots with SharePoint 2013

Getting Started with the Ed-Fi ODS and Ed-Fi ODS API

September 2009 Cloud Storage for Cloud Computing

Getting Started with STATISTICA Enterprise Programming

The Online Research Database Service (ORDS)

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support

Data Consumer's Guide. Aggregating and consuming data from profiles March, 2015

Oracle BI 11g R1: Build Repositories

AWS Schema Conversion Tool. User Guide Version 1.0

Kofax Export Connector for Microsoft SharePoint

PAPER Data retrieval in the PURE CRIS project at 9 universities

CHAPTER 4: BUSINESS ANALYTICS

EFFECTIVE STORAGE OF XBRL DOCUMENTS

Institutional Repositories: Staff and Skills requirements

Building Library Website using Drupal

Setting up SQL Translation Framework OBE for Database 12cR1

Lightweight Data Integration using the WebComposition Data Grid Service

HYPERION SYSTEM 9 N-TIER INSTALLATION GUIDE MASTER DATA MANAGEMENT RELEASE 9.2

14 Configuring and Setting Up Document Management

Metadata Quality Control for Content Migration: The Metadata Migration Project at the University of Houston Libraries

1 2DB Introduction

Technical Specifications (Excerpt) TrendInfoWorld Web Site

Digital Asset Management. Content Control for Valuable Media Assets

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy Page 1 of 8

Cloud Storage Standards Overview and Research Ideas Brainstorm

Installing and Administering VMware vsphere Update Manager

SAP BODS - BUSINESS OBJECTS DATA SERVICES 4.0 amron

news from Tom Bacon about Monday's lecture

TERMS OF REFERENCE. Revamping of GSS Website. GSS Information Technology Directorate Application and Database Section

Using Metadata Manager for System Impact Analysis in Healthcare

Mamut Business Software. Additional Products and Enterprise Extensions. Mamut Client Manager

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

XML Processing and Web Services. Chapter 17

Audit compliance and long-term archiving for SharePoint

Design and Functional Specification

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Microsoft in an Integrated. Update. decisions faster. Presented by Steve Studer for the AIIM

Databases in Organizations

A Plan for the Continued Development of the DNS Statistics Collector

9.1 SAS/ACCESS. Interface to SAP BW. User s Guide

Victorian Mapping and Address Service. System Overview. And. Trial Access Definition & Request Form

Invenio: A Modern Digital Library for Grey Literature

CERN Document Server

Performance Management Platform

Records management in SharePoint 2010

Museums and Online Archives Collaboration Digital Asset Management Database

Transcription:

MultiMimsy database extractions and OAI repositories at the Museum of London Mia Ridge Museum Systems Team Museum of London mridge@museumoflondon.org.uk

Scope Extractions from the MultiMimsy 2000/MultiMimsy XG database The possibilities of an OAI Repository

Before I go on... What is OAI?

OAI is... In this context, 'OAI' is short-hand for the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) It supports 'verbs' things the repository can do, like identify itself, return a list of records, tell you what metadata format its using, get individual records It's a big box of metadata and files that you can browse or search

If you're small or poor... You can use a Static OAI-PMH repository instead of a full or dynamic repository It's basically an XML file in a special format You can create this XML via a Word or Excel export with a bit of scripting (or macros)

About the Museum of London's environment We migrated from MultiMimsy 2000 to MultiMimsy XG last year We have another big database of archaeological data from MoLAS, databases for Archive Management Systems and the LAARC We have a Content Management System for the websites with thematic or interpretive content

Museum Systems Team Small permanent in-house IT team I design and develop bespoke database applications for recording, analysing and publishing archaeological or museum information We design and develop database-driven websites (with Web Developer and Content Manager) with content from various sources

Technical Infrastructure at MoL Desktop MultiMimsy client forms MultiMimsy server (database) day-to-day Collections Management System, holds data in own format Staging server (database/web server) runs extraction scripts as queries against MM database, holds development version of data structures and scripts and static pages for testing Live server (database and web servers) Host the sites you see on the internet

Thinking about extractions? Work out what you need for the final product Reports can help you test your data is fit for purpose

Typical Site Goals Enable access to the collection Provide for specialist and general audiences Increase knowledge and understanding of the collection But how do we do that with a system designed for collections management?

Typical design challenges Different audiences, different goals General public Researchers and specialists Dynamic content Complex relationships between: Categories Object records Other records

Website design and development Consultation with target audiences Consultation with cataloguing team and curators Design site architecture and navigation Iterative process, taking into account audience needs, structure of data Design presentation of objects, categories, publications, images, and their relationships Test and re-test with your users

Typical Site Infrastructure Static content: extended texts (stored externally) Regional study Site report, 'about' section Dynamic content from MultiMimsy: object records subject authority or group records images publications

Extraction proceses the geeky bit MultiMimsy Server (Oracle) Extraction scripts (SQL queries) Staging Server (SQL Server/web server) Staging server runs extraction scripts against MultiMimsy (Oracle) database server: Scripts are a set of SQL queries on Microsoft SQL Server, stored as Data Transformation Services (DTS) so they can be scheduled to automatically re-run and update the web database Results stored in database tables on staging server

Extraction processes the tricky bit Work out what schemas or data structures are needed in the final site create wireframes with every possible item that might be displayed the page don't forget the 'invisible' fields needed on the backend to present that data appropriately e.g. what determines which items are displayed on each page. These might include back-end fields for navigation or search.

Extraction processes the tricky bit Map from publication schema to MultiMimsy fields Figure out how content from other sources will be linked in, e.g.: string matching on authority names unique IDs or accession numbers

Extraction processes Consistency helps if you have used different fields in each project, you need to re-map web schemas to MultiMimsy each time

Catalogue records The interface you see doesn't match the backend so mapping can be tricky Allow time for finding these fields then working out how they're related to other tables Use reports and exports to generate SQL and test relationships Depending on your version of MM you can try and get the field name from the form.

Images In our implementation: Image metadata is stored in the media table Image files are stored on our filesystem I run a SQL query to generate a list of files and their locations on the filesystem (path, image name) I then run an ASP script to generate a DOS batch file which then creates the necessary directories and copies the images into them, retaining the path structure

Information records [authorities] More scripts to pull related authorities using links between object and information records: Publications People/Organisations Places Subjects Groups

The final result A website! Hopefully you will have seen Exploring 20th Century London or another Museum of London microsite Here's how Exploring worked

Where does OAI come in? It was the model we used for the Hub partnership project, Exploring 20 th Century London

Extraction models considered for X20CL Single static database, data loaded manually Automatic harvesting to central database Distributed System, data stored locally and queried live Result: the harvesting model was chosen, with OAI-PMH implementation

Advantages of OAI More reliable and faster than querying distributed partners; saves bandwidth and processing time, as only new, updated or deleted data is moved Customisable project schema and data repository allow the production of re-usable and interoperable content Can support metadata standards such as Dublin Core and Spectrum XML Open standard: reduces risk of 'lock-in' Variety of open-source tools are available on all platforms Established museums, libraries, archives user base

MoL OAI repository The use of OAI was inherited from Exploring 20 th Century London Since we have it, we hope that our repository will have a use beyond providing an OAI-PMH-compliant data source for partnership projects and our own internal requirements

MoL repository We're going to use a DSpace repository for collections data - object metadata, media files and metadata and information record (people, places, events, publications) metadata - for selected records from our Mimsy XG collections management system

Possibilities for MoL OAI repository Permanent, stable URI (unique address) for each object an 'object home' Other people may query the repository offers a fully-featured collections search while providing greater visibility for data Semantic web and other cool stuff?

Possibilities for repository Authoritative index into our collections database, with links to every online instance of an object, regardless of project, showing different thematic or interpretive uses of the object in other websites Link from object to all related information or authority records and media such as images, audio files, transcripts, object captions and descriptions; related objects

Semantic web and the repository? The 'object home' means our data is ready for the semantic web Lightweight 'semantic web' technologies can be already used with that data The search interface of the repository could act as an API a 'box of tricks'

OpenSearch, RDF, feeds We can be fully buzzwordcompliant Queries can be converted to RSS streams (OpenSearch, GeoRSS) or RDF These streams can be used by others in 'mash ups'

New uses of our data We aren't resourced to provide interfaces to meet every requirement We can provide data for others to create interfaces to browse or search data in new ways Mashups allow people to merge our content with other sources, e.g. online maps, other collections User-centric, not museum-centric

It's a bit experimental Working with supportive suppliers (BioMed Central) Currently resolving issues of how records relate to each other, as Dublin Core doesn't handle it well - possibly ORE to create 'bundles' of records Could we incorporate user-generated content such as links to the object from user sites, comments, tags?

Questions?