STAR Semantic Technologies for Archaeological Resources. http://hypermedia.research.glam.ac.uk/kos/star/



Similar documents
STAR Semantic Technologies for Archaeological Resources.

Semantic Technologies for Archaeology Resources: Results from the STAR Project

Following a guiding STAR? Latest EH work with, and plans for, Semantic Technologies

Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction via the CIDOC CRM

Pattern based mapping and extraction via CIDOC CRM

Introduction. Aim of this document. STELLAR mapping and extraction guidelines

Achille Felicetti" VAST-LAB, PIN S.c.R.L., Università degli Studi di Firenze!

Semantic Interoperability

Information for the Semantic Web. Procedures for Data Integration through h CIDOC CRM Mapping

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Semantic Indexing via Knowledge Organization Systems: Applying the CIDOC-CRM to Archaeological Grey Literature

A collaborative platform for knowledge management

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

dati.culturaitalia.it a Pilot Project of CulturaItalia dedicated to Linked Open Data

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

12 The Semantic Web and RDF

Semantic Technologies and Linked Data

How to port a collection to the Semantic Web. presenter: Borys Omelayenko contributors: A. Tordai, G. Schreiber

How semantic technology can help you do more with production data. Doing more with production data

Introduction to SKOS. Bob DuCharme October 6, 2011

Integration of domain and social ontologies in a CMS based collaborative platform

Why was it built? AGROVOC (big agriculture vocabulary developed by FAO) In 2004: > concepts in up to 22 languages

ThManager: An Open Source Tool for creating and visualizing SKOS

A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS

Design and Implementation of an Automatic Semantic Annotation Service

D3.1: SYSTEM TEST SUITE

COMBINING AND EASING THE ACCESS OF THE ESWC SEMANTIC WEB DATA

We have big data, but we need big knowledge

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Linked Data Management and INSPIRE Compliant Data Export

Linked Data Publishing with Drupal

Addressing Self-Management in Cloud Platforms: a Semantic Sensor Web Approach

TECHNICAL Reports. Discovering Links for Metadata Enrichment on Computer Science Papers. Johann Schaible, Philipp Mayr

Standards, Tools and Web 2.0

The Ontological Approach for SIEM Data Repository

Lift your data hands on session

Concept for an Ontology Based Web GIS Information System for HiMAT

Guideline for Implementing the Universal Data Element Framework (UDEF)

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

An Ontology Model for Organizing Information Resources Sharing on Personal Web

Introduction to Ontologies

LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together

nmqwertyuiopasdfghjklzxcvbnmrtyuio sdfghjklzxcvbnmqwertyuiopasdfghjklz vbnmqwertyuiopasdfghjklzxcvbnmqw

Model Driven Interoperability through Semantic Annotations using SoaML and ODM

Linked Open Data A Way to Extract Knowledge from Global Datastores

DISCOVERING RESUME INFORMATION USING LINKED DATA

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

Gerald Hiebel 1, Øyvind Eide 2, Mark Fichtner 3, Klaus Hanke 1, Georg Hohmann 4, Dominik Lukas 5, Siegfried Krause 4

AGRIS: an RDF-aware system in the agricultural domain

ISTEC.MIP Measurement Data Integration Platform

GetLOD - Linked Open Data and Spatial Data Infrastructures

Oracle Application Express MS Access on Steroids

Acronym: Data without Boundaries. Deliverable D12.1 (Database supporting the full metadata model)

Applying Semantic Web Technologies in Service-Oriented Architectures

Semantic Web Applications

Publishing Linked Data Requires More than Just Using a Tool

Leveraging existing Web frameworks for a SIOC explorer to browse online social communities

Towards the Russian Linked Culture Cloud: Data Enrichment and Publishing

Visualization of Semantic Windows with SciDB Integration

Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM)

How To Write A Drupal Rdf Plugin For A Site Administrator To Write An Html Oracle Website In A Blog Post In A Flashdrupal.Org Blog Post

ANDS Prototype Controlled Vocabulary Service

EUR-Lex 2012 Data Extraction using Web Services

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

MULTILINGUAL ACCESS TO CONTENT THROUGH CIDOC CRM ONTOLOGY

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

ONKI-SKOS Publishing and Utilizing Thesauri in the Semantic Web

Improving EHR Semantic Interoperability Future Vision and Challenges

Benjamin Heitmann Digital Enterprise Research Institute, National University of Ireland, Galway

Enabling End User Access to Big Data in the O&G Industry

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

Rich Metadata and Context Capturing through CIDOC/CRM and MPEG-7 Interoperability

Reason-able View of Linked Data for Cultural Heritage

Information Standards on the Net

BauDenkMalNetz Creating a Semantically Annotated Web Resource of Historical Buildings

Databases in Organizations

Semantic Advertising for Web 3.0

Theodoros. N. Arvanitis, RT, DPhil, CEng, MIET, MIEEE, AMIA, FRSM

Data-Gov Wiki: Towards Linked Government Data

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

Web Services - Consultant s View. From IT Stategy to IT Architecture. Agenda. Introduction

Seman&c Web: Benefits For Clinical Decision Support At The Bedside. Emory Fry, MD SemTechBiz 2013

A generic approach for data integration using RDF, OWL and XML

Towards the Integration of a Research Group Website into the Web of Data

Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD

Semantic Method of Conflation. International Semantic Web Conference Terra Cognita Workshop Oct 26, Jim Ressler, Veleria Boaten, Eric Freese

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

Integrating data from The Perseus Project and Arachne using the CIDOC CRM An Examination from a Software Developer s Perspective

D4.1 - Functional Specifications & Portal Architecture

Implementing an RDF/OWL Ontology on Henry the III Fine Rolls

data.bris: collecting and organising repository metadata, an institutional case study

HOW TO DO A SMART DATA PROJECT

A Semantic web approach for e-learning platforms

Information Services for Smart Grids

Long Term Knowledge Retention and Preservation

Ontology based Recruitment Process

TRANSFoRm: Vision of a learning healthcare system

From MARC21 and Dublin Core, through CIDOC CRM: First Tenuous Steps towards Representing Library Data in FRBRoo

Annotation: An Approach for Building Semantic Web Library

Transcription:

STAR Semantic Technologies for Archaeological Resources http://hypermedia.research.glam.ac.uk/kos/star/

Project Outline 3 year AHRC funded project Started January 2007, finish December 2009 Collaborators English Heritage RSLIS, Denmark Aims To investigate the potential of semantic terminology tools for widening access to digital archaeology resources, including disparate datasets and associated grey literature To demonstrate cross search and browsing at detailed, meaningful level Introduction Modelling Mapping Extraction Applications Indexing

Introduction Data Modelling (RDF) CIDOC Conceptual Reference Model (CRM) English Heritage Ontological Model (CRMEH) English Heritage thesauri & glossaries (SKOS) Data Mapping Mapping issues domain specific vs. general model Granularity, coverage Data normalisation issues Data Extraction Custom extraction utility Consolidation to RDF triple store database Pilot Applications SKOS Web Service / Client applications CRM Web Service / Client applications Introduction Modelling Mapping Extraction Applications Indexing

General Architecture Applications Server Side, Rich Rich Client, Browser Web Web Services, SQL, SPARQL RDF RDF Based Common Ontology Data Layer (CRM // CRMEH // SKOS) Indexing Conversion Data Mapping // Normalisation Grey Grey literature EH EH thesauri, glossaries STAN STAN RRAD RRAD IADB IADB LEAP LEAP RPRE RPRE Introduction Modelling Mapping Extraction Applications Indexing

Data Modelling - RDF CRM [ http://cidoc.ics.forth.gr/ ] CIDOC Conceptual Reference Model International standard ISO 21127:2006 CRMEH [ http://hypermedia.research.glam.ac.uk/kos/crm/ ] English Heritage Ontological Model Extends CIDOC CRM for archaeological domain SKOS [ http://www.w3.org/2004/02/skos/ ] Simple Knowledge Organisation System RDF representation of thesauri, glossaries, taxonomies, classification schemes etc. Introduction Modelling Mapping Extraction Applications Indexing

Potential Data Mapping Problems General schema (CRM) General schema (CRM)? Domain specific datasets the abstractness of the [CRM] concepts makes them ambiguous to any human user. If several experts specify mappings independently they will produce incompatible mappings and fail the goal of enabling interoperability. [from BRICKS FP6 IP Poster, ECDL2007] Domain specific datasets STAR approach - domain specific schema (CRMEH) Introduction Modelling Mapping Extraction Applications Indexing

Data Mapping Exercise Mapped DB Field CRMEH Entity manual process, requires expert domain knowledge Needed to Consider Granularity & completeness of mapping Model coverage vs. dataset coverage Event based vs. relational model Events only implicit in mappings and datasets Introduction Modelling Mapping Extraction Applications Indexing

Data Extraction - Scope Extraction of data to RDF triples 5 archaeological datasets Custom data extraction application Conversion of controlled terminology 6 thesauri converted to SKOS 27 glossaries created in SKOS Created based on recording manuals MultiTes XSL transformation to SKOS Introduction Modelling Mapping Extraction Applications Indexing

Custom Data Extraction Application <?xml version="1.0"?> <rdf:rdf xml:base= http://tempuri/star/base# xmlns:crm= http://cidoc.ics.forth.gr/rdfs/cidoc_v4.2.rdfs# xmlns:crmeh= http://tempuri/star/crmeh# xmlns:rdf= http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:rdfs= http://www.w3.org/2000/01/rdf-schema# > <crmeh:ehe0007.context rdf:about="http://tempuri/star/base#ehe0007.rrad.context.contextno.1"> <crm:p3f.has_note> <crmeh:ehe0046.contextnote rdf:about="http://tempuri/star/base#ehe0046.rrad.context.description.1"> <rdf:value>upper ploughsoil over whole site no Sub-division for the convenience of finds processing '1' contains finds contexts '3759', '3760' and '3763'.</rdf:value> </crmeh:ehe0046.contextnote> </crm:p3f.has_note> </crmeh:ehe0007.context> <crmeh:ehe0007.context rdf:about="http://tempuri/star/base#ehe0007.rrad.context.contextno.10"> <crm:p3f.has_note> <crmeh:ehe0046.contextnote rdf:about="http://tempuri/star/base#ehe0046.rrad.context.description.10"> <rdf:value>original recorded coordinates: 0980/0960</rdf:value> </crmeh:ehe0046.contextnote> </crm:p3f.has_note> </crmeh:ehe0007.context> Etc. Generated RDF Data

Data Extraction and Consolidation Data Extraction resultant files 6 EH thesauri 6 RDF files 27 EH glossaries 27 RDF files 5 archaeological datasets 305 RDF files Data Consolidation - RDF triple store database 1,148,882 entities, 2,998,005 triples Diverse data held within single DB schema CRM, CRMEH, SKOS, OWL, DC etc. Platform independent import/export RDF-XML, NTriples, Turtle Inbuilt support for SPARQL queries Custom extension for full text search

Linking CRM and SKOS CRM data instance SKOS thesaurus concept EHE0009.ContextFind [http://...#..12345] skos:concept [http://...#97992] crm:p105f.consists_of skos:broader EHE0030.ContextFindMaterial [http://...] EHP10F.is_represented_by skos:concept [http://...#97805] rdf:value skos:preflabel skos:scopenote cast iron cast iron Dating from the 15th century, it it is is a hard alloy of of iron and carbon, melted and shaped into various moulded forms Property: EHP10F.is_represented_by (represents) Domain: crm:e55.type Range: skos:concept

RCHME Archaeological Periods in CRM NE rdf:value E42.Identifier [http://...] P1F.is_identified_by E62.String [http://...] E52.Time-Span [http://..#..7] P79F.beginning_is_qualified_by rdf:value NEOLITHIC rdf:value E49.Time_Appellation [http://...] P78F.is_identified_by P80F.end_is_qualified_by E62.String [http://...] rdf:value -4000-2200 <crm:e52.time-span rdf:about="&base;e52.rchme.periods.termuid.7"> <crm:p1f.is_identified_by> <crm:e42.identifier rdf:about="&base;e49.rchme.periods.code.7"> <rdf:value>ne</rdf:value> </crm:e42.identifier> </crm:p1f.is_identified_by> <crm:p78f.is_identified_by> <crm:e49.time_appellation rdf:about="&base;e49.rchme.periods.term.7"> <rdf:value>neolithic</rdf:value> </crm:e49.time_appellation> </crm:p78f.is_identified_by> <crm:p79f.beginning_is_qualified_by> <crm:e62.string rdf:about="&base;e62.rchme.periods.mindate.7"> <rdf:value>-4000</rdf:value> </crm:e62.string> </crm:p79f.beginning_is_qualified_by> <crm:p80f.end_is_qualified_by> <crm:e62.string rdf:about="&base;e62.rchme.periods.maxdate.7"> <rdf:value>-2200</rdf:value> </crm:e62.string> </crm:p80f.end_is_qualified_by> </crm:e52.time-span>

Relationships Between EH Concepts Concept [http://...#69659] skos:inscheme skos:preflabel ConceptScheme [http://..#eh1] dc:title MONUMENT TYPE SUNDIAL skos:scopenote A structure used to show the time of day by means of the sun shining on a 'gnomon', the shadow of which falls on the surface of the dial which is marked with a diagram showing the hours. Can be freestanding, usually on a pillar, or fixed to a building. Concept [http://...#95316] skos:inscheme skos:preflabel ConceptScheme [http://..#eh128] dc:title MDA OBJECT TYPE SUNDIAL skos:scopenote A device for telling the time by the sun, using the shadow cast by a gnomon, or sometimes a spot of light through a small hole. They were often used for checking the accuracy of clocks. They range from panels on the sides of churches to pocket watch size.

SKOS Web Service and Client Applications Windows based client application SKOS Web Service Web browser based components ( widgets ) SKOS Client Applications

SKOS Client Windows Application

SKOS Client - Widgets Concept Search Concept Schemes Concept Details Concept Expansion

CRMEH Web Service and Client Application CRMEH Web Service Search and browse seamlessly across multiple archaeological datasets CRMEH Client Application

CRMEH Client Application

Next Steps for Database Cross Search Query builder expose support for complex query patterns (SPARQL) Closer integration of thesauri into overall search process Browser-based application components (AJAX) Include further archaeological datasets

Indexing Grey Literature Documents Information Extraction for enabling rich, semantic aware indexing of Archaeology fieldwork reports. Rule Based method for NER name entity recognition and semantic annotation of terms with respect to the ontological model CRMEH Thesaurus and Flat Gazetteer lists to support the task of NER; connecting annotated terms to thesaurus and model classes. XML to represent simple and combined annotation structures.

Information Extraction Framework EH Thesaurus - Object Types -Archaeological Periods Ontology -CIDOC CRM-EH Java Pattern Engine General Architecture for Text Engineering Gazetteer Lists Corpus Collection XML structures to represent simple and combined annotation forms

The Process of Information Extraction 3000 terms from EH Thesauri (Periods Object Types) supported the NER task. Flat gazetteer entries to expand on single term annotation Linguistic pattern rules for revealing connections between annotations Prototype development Initially targeted to E49.Time Appellation entities, expanded to E19.Physcical Object entities. 9 Annotation Types Produced in a cascading process using Jape transducers. Attribute assignment to semantic descriptions of annotated terms Nested annotation structures to express connections between terms.

Linking Text to Semantic Descriptions eleven of these seventeen pits contained Bronze Age pottery. ({Context} {Context_Peri od}) ({Token.category == VB} {Token.category == VBZ} {Token.category == VBP} {Token.category == VBN} {Token.category == VBG} {Token.category == VBD}) ({Archeo_Object} {Ob ject_type}) EHE0007.Context EH Thesaurus Periods EHE0039.TimespanAppellation JAPE TRANSDUCER Linguistic Pattern Rules EH Thesaurus Object Types EHE0009.ContextFind Gazetteer List

XML Exports <Context_Find gate:gateid="222760" rule="context Find"> <Context gate:gateid="221765" rule="context" >pits</context> contained <Archeo_Object gate:gateid="221710" rule="archeoobject_simple" > <E49_Time_Appellation gate:gateid="220303" SKOS- EH="134732 rule="thesaurus term">early Bronze Age</E49_Time_Appellation> <Object_Type gate:gateid="220577" SKOS- EH= 051544" rule="thesaurus term">pottery</object_type> </Archeo_Object> </Context_Find> PHP DOM_XML employed over XML structures to render the annotations and to display thesaurus descriptions

Model Adaptation Issues eleven of these seventeen pits contained Bronze Age pottery. <Context_Find> <Context>pits</Context> contained <Archeo_Object> <E49_Time_Appellation>Bronze Age </E49_Time_Appellation> <Object_Type>pottery </Object_Type> </Archeo_Object> </Context_Find> Context E53.Place EH_E0007 P108. Produced by ContextFind ProductionEvent E12.Production Event EH_E1002 P26.Moved By ContextFind E19.Physical Object EH_E0009 P108. has timespan ContextFind DepositionEvent E9.Move EH_E1004 P25.Moved ContextFind ProductionEvent Timespan E52.Timespan EH_E0038 RDF ContextFind ProductionEvent TimespanAppellation E49.TimeAppelation EH_E0039 P1.Indentified By

Work in progress Closer integration between ontological model and Jape rules Mapping of annotation structures to RDF triples Enriching the process with additional Knowledge resources ie glossaries Expand the method to larger corpus collection andronikos.kyklos.co.uk Negation detection and document pre-processing

STAR Semantic Technologies for Archaeological Resources http://hypermedia.research.glam.ac.uk/kos/star/ http://andronikos.kyklos.co.uk avlachid@glam.ac.uk dstudhope@glam.ac.uk cbinding@glam.ac.uk