Describing Humanities Objects by Ontologies What have the DH to offer, based on the CIDOC-CRM and TEI?



Similar documents
TEI and Cultural Heritage Ontologies

Following a guiding STAR? Latest EH work with, and plans for, Semantic Technologies

Information for the Semantic Web. Procedures for Data Integration through h CIDOC CRM Mapping

From MARC21 and Dublin Core, through CIDOC CRM: First Tenuous Steps towards Representing Library Data in FRBRoo

Joint Steering Committee for Development of RDA

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

Methodology for CIDOC CRM based data integration with spatial data

Mapping VRA Core 4.0 to the CIDOC/CRM ontology

Concept for an Ontology Based Web GIS Information System for HiMAT

STAR Semantic Technologies for Archaeological Resources.

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

LOD2014 Linked Open Data: where are we? 20 th - 21 st Feb Archivio Centrale dello Stato. SBN in Linked Open Data

M3039 MPEG 97/ January 1998

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Core Enterprise Services, SOA, and Semantic Technologies: Supporting Semantic Interoperability

Structured Data Capture (SDC) Trial Implementation

Experiences from a Large Scale Ontology-Based Application Development

Structured Data Capture (SDC) Draft for Public Comment

FoLiA: Format for Linguistic Annotation

EPrints Preservation Update

FRBR. object-oriented definition and mapping to FRBR ER (version 1.0)

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Definition of the CIDOC Conceptual Reference Model

Building Semantic Content Management Framework

Integration of Cultural Information

INFORMATION INTEGRATION: MAPPING CULTURAL HERITAGE METADATA INTO CIDOC CRM CARRASCO, L. B., THALLER, M., CARVALHO, J. R. ***

Developing common European archaeological concepts through extending the CIDOC CRM within ARIADNE

STAR Semantic Technologies for Archaeological Resources.

From Databases to Natural Language: The Unusual Direction

UNIMARC, RDA and the Semantic Web

How to Convert a TEI Document to PDF within the <oxygen/> XML Editor

PICASSO Big Data Expert Group

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Pragmatic Web 4.0. Towards an active and interactive Semantic Media Web. Fachtagung Semantische Technologien September 2013 HU Berlin

Cataloguing is riding the waves of change Renate Beilharz Teacher Library and Information Studies Box Hill Institute

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

DATA MODEL FOR STORAGE AND RETRIEVAL OF LEGISLATIVE DOCUMENTS IN DIGITAL LIBRARIES USING LINKED DATA

CRM dig : A generic digital provenance model for scientific observation

Queensland recordkeeping metadata standard and guideline

Fraunhofer FOKUS. Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee Berlin, Germany.

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

Cloud Monitoring and Auditing with CADF (Cloud Auditing and Data Federation)

LEXUS: a web based lexicon tool

Definition of the CRMsci An Extension of CIDOC-CRM to support scientific observation

EUR-Lex 2012 Data Extraction using Web Services

Vast-Lab, PIN, The University of Florence, ITALY Go Sugimoto

Erasmus Without Papers

CIDOC-CRM Extensions for Conservation Processes: A Methodological Approach

How To Change Marc To A Bibbone Model

Semantics and Ontology of Logistic Cloud Services*

How To Understand The Difference Between Terminology And Ontology

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Introduction to Text Mining and Semantics. Seth Grimes -- President, Alta Plana

Open Data collection using mobile phones based on CKAN platform

Deploying a Geospatial Cloud

ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT

PDF Primer PDF. White Paper

Frequently Asked Questions (FAQs) ISO :2005 PDF/A-1 Date: July 10, 2006

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Information Management Metamodel

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Index. Registry Report

A Generic Database Schema for CIDOC-CRM Data Management

CRMsci: the Scientific Observation Model

Computer Forensic Capabilities

Core Fittings C-Core and CD-Core Fittings

Short messaging solutions, including XMPP based instant messaging and text based conferences, between health care providers and general practitioners

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies

Secure Semantic Web Service Using SAML

>

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Electronic Health Record (EHR) Standards Survey

Service Oriented Architecture

Flattening Enterprise Knowledge

Transcription:

Describing Humanities Objects by Ontologies What have the DH to offer, based on the CIDOC-CRM and TEI? Øyvind Eide, University of Passau, Germany Christian-Emil Ore, University of Oslo, Norway 12 June 2015 1

Text and ontology TEI XML Physical and logical structure Semantic content Henry III Fine Rolls Project (Ciula, Viera: Complementing and extending TEI documents with an ontology. TEI Members Meeting 2008) RDF/OWL ontology Network of associations Additional statements and interpretative layers <persname key="ashford_de_william">william de <placename key="ashford1">ashford</placename> </persname> <rs key="abjuration" type="subject">on the day he abjured the kingdom<persname key="rumberue_de_thomas">thomas de <placenamekey="rumberue">rumberue</placename></persname></rs>

Encoding for extraction A fragment of a imaginary archaeological excavation report: The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces. 12 June 2015 3

Information extraction Actor: Dr. Diggey Relation: performed Event: E1 Type excavation Place: Wastland Time- span 2005 Actor: Dr. Diggey Relation: performed Event: E2 Type: Modification Descr: Breaking the sword into 30 pieces Relation: part of E1 Relation: in presence of Object: Sword Relation: identified by Identifier: C50435 <TEI> <teiheader> </teiheader> <text> <p xml:id="p1"> <rs xml:id="e1">the excavation in <name type="place" xml:id="n1">wasteland </name> in <date xml:id="d1">2005</date></rs> was performed by <name type="person" xml:id="n2">dr. Diggey </name>. He had the misfortune of <rs xml:id="e2"> breaking <rs xml:id="o1">the beautiful sword <rs xml:id= o_id1 >(C50435)</rs></rs> into 30 pieces</rs>. </p> </text></tei> 12 June 2015 4

CIDOC-CRM E55 Types refer to / refine E39 Actors (persons, inst.) participate in E28 Conceptual Objects E18 Physical Things affect or refer to E2 Temporal Entities (Events) at have location E52 Time-Spans E53 Places 12 June 2015 5

1 TEI integration routes Header <place>...<place> <event>...<event> 2 Header <rdf:>...<rdf:> <rdf:>...<rdf:> TEI document TEI document Body <name>...</name> <rs>...<rs> <name>...</name> Body <name>...</name> <rs>...<rs> <name>...</name> 3 TEI document Header <...> </...> Body <name>...</name> <rs>...<rs> <name>...</name> 12 June 2015 6

Charter by king Hákon Hákonsson 1225

Collection 1 Diplomatarium Norvegicum Summary Source info Text number Date Place Edited text

Collection 2 more recent transcripts

Collection 3 Regesta Norvegica persons, places, subject,.etc are in the registries text witnesses where the charter text is published, e.g. in Diplomatarium Norvegicum

Norwegian Charters Diplomatarium Norvegicum 22 volumes, cover 1100 to 1582 Published 1846 1995 Retro-digitized, TEI P5 encoding Newer transcripts Old Norwegian 1170 1405, 4000 transcripts TEI P5, no metadata, only identifier Regesta Norvegica 9 volumes cover 1100 to 1408 Very rich metadata TEI P5 encoded

The principle of Entropy Fallacy Massive data aggregation: Increased amount of data = Increase of amount of information Increased interlinking = Increase in information Popular view: Everything is connected to everything

Ontology An ontology is a conceptual model, that is, a formally defined model resulting from an analysis of a specific domain not necessarily a data model in the computer science sense. Core ontologies with universals General ontologies with particulars (thesauri/authority systems) a formal ontology can be expressed

Tools & Methods Encoding the original texts as XML-documents Text Encoding Initiative, tei-c.org Medieval Nordic Text Archive, menota.org Metadata expressed compliant with ontologies Cultural heritage view: CIDOC-CRM (ISO-21127), Library/bibliographic view: FRBR (FRBRoo) Encoding of metadata TEI-XML for archival purposes RDF for linked data

Linked data TEI-XML documents Part 1, the proper text <TEI...> <teiheader> <filedesc> <!--All kind of metadata--> <!-- Persons, places, bibl. ref, text witnesses etc --> </filedesc> </teiheader> <text> <! xml encode proper text goes here -->... </text> </TEI> Part 2, data for Linked Data (semantic web) Addtional structure with extracted assertions/metadata from the document expressed in RDF -XML

Possible points for external links Regesta Norvegica/Diplomatarium Norvegicum Persons, places, subject, onomastic information Creation date, place Text witnesses, archival signature Cross references for copies (vidimus) etc. Published, mentioned, bibliographic references Transcripts Text witnesses, archival signature Linguistic information

For consideration Well defined ontologies may assist in clearifying a scholarly analysis Without the use of common standard models like the CIDOC-CRM data integration can only be done on a trivial level Bottom up methods for data integration may be useful but must be complemented by top down methods 12 June 2015 17