How to port a collection to the Semantic Web. presenter: Borys Omelayenko contributors: A. Tordai, G. Schreiber



Similar documents
Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

ONKI-SKOS Publishing and Utilizing Thesauri in the Semantic Web

Semantische webtechnologieën voor digitaal erfgoed en de geschiedwetenschap. Victor de Boer Web & Media the Network Institute

Achille Felicetti" VAST-LAB, PIN S.c.R.L., Università degli Studi di Firenze!

STAR Semantic Technologies for Archaeological Resources.

A Java Tool for Creating ISO/FGDC Geographic Metadata

Appendix A: Inventory of enrichment efforts and tools initiated in the context of the Europeana Network

Semantic Method of Conflation. International Semantic Web Conference Terra Cognita Workshop Oct 26, Jim Ressler, Veleria Boaten, Eric Freese

Adam Rauch Partner, LabKey Software Extending LabKey Server Part 1: Retrieving and Presenting Data

ThManager: An Open Source Tool for creating and visualizing SKOS

Web NDL Authorities: Authority Data of the National Diet Library, Japan, as Linked Data

Copyright 2012 Taxonomy Strategies. All rights reserved. MDM and Taxonomy. Mitre Technical Exchange Meeting

Introduction. Aim of this document. STELLAR mapping and extraction guidelines

Introduction to SKOS. Bob DuCharme October 6, 2011

Publishing Europe s Television Heritage on the Web.

E-resource management and the Semantic Web: applications of RDF for e-resource discovery

Integration of domain and social ontologies in a CMS based collaborative platform

Implementing an RDF/OWL Ontology on Henry the III Fine Rolls

nmqwertyuiopasdfghjklzxcvbnmrtyuio sdfghjklzxcvbnmqwertyuiopasdfghjklz vbnmqwertyuiopasdfghjklzxcvbnmqw

Information and documentation The Dublin Core metadata element set

Publishing Linked Data Requires More than Just Using a Tool

Bringing the Thesaurus for Economics on to the Web of Linked Data

How To Write An Inspire Directive

Semantic Interoperability

DEVELOPING A VISUAL DIGITAL IMAGE COLLECTION. Calgary Collections 2005: The Changing Collections Environment CLA Pre-Conference June 15, 2005

Metadata and Metadata Standards

Legislative XHTML. Integrating ECMA Script & RDF

Using the Getty Vocabularies as Linked Open Data in a Cataloging Tool for an Academic Teaching Collection: Case Study at the University of Denver

Core Competencies for Visual Resources Management

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

CatMDEdit Metadata editor

Linked Open Ontology Cloud Managing a System of Interlinked Cross-domain Light-weight Ontologies

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

Innoveren met Data. Created with open data : Dr.ir.

11 ways to migrate Lotus Notes applications to SharePoint and Office 365

The presentation explains how to create and access the web services using the user interface. WebServices.ppt. Page 1 of 14

Pattern based mapping and extraction via CIDOC CRM

Links, languages and semantics: linked data approaches in The European Library and Europeana.

1. Visual Paradigm for UML

STAR Semantic Technologies for Archaeological Resources.

A Web services solution for Work Management Operations. Venu Kanaparthy Dr. Charles O Hara, Ph. D. Abstract

Sum of all paintings opening slide Introduce myself. Nlwp, Commons, Wikidata, GLAMwiki, bots, Wiki Loves Monuments, uploads, Based on Wikimania 2015

How semantic technology can help you do more with production data. Doing more with production data

Bringing Semantics to the i2b2 Framework

GUIDELINES FOR THE CREATION OF DIGITAL COLLECTIONS

CDL s OAI Harvest Architecture Development

AN AUTOMATIC AND METHODOLOGICAL APPROACH FOR ACCESSIBLE WEB APPLICATIONS

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski

Metadata Quality Control for Content Migration: The Metadata Migration Project at the University of Houston Libraries

UNIMARC, RDA and the Semantic Web

Ontology-Based Query Expansion Widget for Information Retrieval

SNS-Navigator: A Graphical Interface to Environmental Meta-Information

An Oracle White Paper January Oracle Database 12c: Full Transportable Export/Import

Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM)

AAC Road Map. Introduction

Archive I. Metadata. 26. May 2015

A Workbench for Prototyping XML Data Exchange (extended abstract)

dati.culturaitalia.it a Pilot Project of CulturaItalia dedicated to Linked Open Data

RDF Resource Description Framework

Joint Steering Committee for Development of RDA

VRTRESEARCH&INNOVATION

Does it fit? KOS evaluation using the ICE-Map Visualization.

Connecting the Smithsonian American Art Museum to the Linked Data Cloud

COMBINING AND EASING THE ACCESS OF THE ESWC SEMANTIC WEB DATA

FROM COL LECTION TO MUSEUM MANAGEMET SYSTEMS. A CRITICAL REVIEW OF DEMANDS AND FEATURES

Representing the Hierarchy of Industrial Taxonomies in OWL: The gen/tax Approach

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

TECHNICAL Reports. Discovering Links for Metadata Enrichment on Computer Science Papers. Johann Schaible, Philipp Mayr

XML Processing and Web Services. Chapter 17

TMS THE MUSEUM SYSTEM

openlca 1.4 overview and first steps

Business Process Design As-Is and To-Be Checklists Introduction

Acronym: Data without Boundaries. Deliverable D12.1 (Database supporting the full metadata model)

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

How To Manage Your Digital Assets On A Computer Or Tablet Device

Setting up SQL Translation Framework OBE for Database 12cR1

Building a National Ontology Infrastructure

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

European Forest Information and Communication Platform

Visualization of Semantic Windows with SciDB Integration

Integration of Polish National Bibliography within the repository platform for science and humanities

Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction via the CIDOC CRM

PoliticalMashup. Make implicit structure and information explicit. Content

Definition of the Europeana Data Model v5.2.6

Information Standards on the Net

Search and Information Retrieval

Monitoring and Reporting Drafting Team Monitoring Indicators Justification Document

Cataloguing is riding the waves of change Renate Beilharz Teacher Library and Information Studies Box Hill Institute

Visualisation of regional and city data for a better understanding

Software Architecture Document

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms

A GENERALIZED APPROACH TO CONTENT CREATION USING KNOWLEDGE BASE SYSTEMS

Mapping VRA Core 4.0 to the CIDOC/CRM ontology

Java Metadata Interface and Data Warehousing

Survey Results: Requirements and Use Cases for Linguistic Linked Data

From Databases to Natural Language: The Unusual Direction

MD Link Integration MDI Solutions Limited

Transcription:

How to port a collection to the Semantic Web presenter: Borys Omelayenko contributors: A. Tordai, G. Schreiber

Datasets Content Collection meta-data describing works Local thesauri terms, such as materials, styles, techniques geographical places, such as villages directory of people involved. Representation independent solutions variety of database (via SQL), XML dump, text files different schemas and values

Process model Thesaurus Meta Data Convert schema to a standard Align data values to terms 1 4 2 3

1. Convert thesaurus schema to SKOS SKOS is a W3C model for expressing structure of thesauri and vocabularies Preferred labels, alternative labels, broader, narrower relations <skos:concept rdf:about= Nederland > <skos:preflabel xml:lang= nl >Nederland</skos:prefLabel> <skos:preflabel xml:lang= en >Netherlands</skos:prefLabel> <skos:broader rdf:resource= Europe /> </skos:concept> Pretty straightforward: just a few concepts involved

2. Convert metadata schema to VRA VRA is a de-facto standard in describing visual resources. It is a specialization of Dublin Core for visual resources <vra:work rdf:about= SK-C-5 > <vra:creator rdf:resource= Rembrandt_van_Rijn /> <vra:title xml:lang= en >The company of of Captain <vra:title xml:lang= nl >Het korporaalschap <vra:date>1642</vra:date> </vra:work> Conversion may be complicated as museums tend to use their own data models (database schemas) to describe works

Technology: AnnoCultor Java-based platform infrastructure basic conversion rules open to custom rules open to other systems, such as GATE GPL Converter is a Java program built according to a template invoking rules

Complexity of schema conversion Rules Types of rules collection + thesaurus + directory Rijksmuseum Tropenmuseum Volkenkunde Bibliopolis 56 + 13 + 17 19 + 6 + 4 22 + 19 + NA 34 + 15 + NA 11 + 6 + 7 8 + 6 + 4 9 + 8 + NA 11 + 5 + NA

3. Metadata: value alignment Look at each meta-data value word and find a corresponding vocabulary term 1. Concept from local thesaurus 2. Concept from other known thesaurus 3. Concept from implicit thesaurus 4. Value that should become part of some thesaurus in the future 5. Typed literals 6. Just literals

3. Value alignment: Rijksmuseum 29.000 (online) records 43.000 terms in local thesaurus 1. Concept from local thesaurus: 150.000 2. Concept from other thesaurus: 1. concepts 2.000 2. places in descriptions 7.800 3. Concept from implicit thesaurus: NA 4. New terms: 8.600 2.000 5. Interpret typed literals: NA 6. Leave as literal: NA

Usefulness: Etretat Title: Zeegezicht aan de kust Description: Gezicht op het strand en de krijtrotsen bij Etretat in Normandië

4. Thesaurus alignment Local thesauri are overlapping with standard ones Sometimes thesauri explicitly borrow some parts from others Focus on correct match each mapping mistake is propagated with each term use manual verification is an option

Alignment technology Shared information extraction technique to find terms in thesauri/collection entries based on advanced tailored string matching plus term context, e.g. Rembrandt in 1920 open for other methods, e.g. NLP Efficient human interaction show top not mapped terms, e.g. foto. Successful party due to structured text and context fails when different words are used

Success of thesauri alignment AAT TGN ULAN Rijksmuseum 6 / 43 13 / 56 Ethnographic 4/ 5 3 / 5.5 RKD Bibliopolis.4 / 1.1 41 / 300.25 / 1 mapped / total; 000

Conclusions Methodology for porting collections convert thesauri to SKOS convert collections to DC/VRA align thesauri search for terms in collections Open and flexible technology Java-based and open for extensions Cost/success stories a collection requires 50-70 rules or 1-2 weeks alignment can be costly can align up to 80% of records automatically http:// annocultor.sourceforge.net