Describing Humanities Objects by Ontologies What have the DH to offer, based on the CIDOC-CRM and TEI? Øyvind Eide, University of Passau, Germany Christian-Emil Ore, University of Oslo, Norway 12 June 2015 1
Text and ontology TEI XML Physical and logical structure Semantic content Henry III Fine Rolls Project (Ciula, Viera: Complementing and extending TEI documents with an ontology. TEI Members Meeting 2008) RDF/OWL ontology Network of associations Additional statements and interpretative layers <persname key="ashford_de_william">william de <placename key="ashford1">ashford</placename> </persname> <rs key="abjuration" type="subject">on the day he abjured the kingdom<persname key="rumberue_de_thomas">thomas de <placenamekey="rumberue">rumberue</placename></persname></rs>
Encoding for extraction A fragment of a imaginary archaeological excavation report: The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces. 12 June 2015 3
Information extraction Actor: Dr. Diggey Relation: performed Event: E1 Type excavation Place: Wastland Time- span 2005 Actor: Dr. Diggey Relation: performed Event: E2 Type: Modification Descr: Breaking the sword into 30 pieces Relation: part of E1 Relation: in presence of Object: Sword Relation: identified by Identifier: C50435 <TEI> <teiheader> </teiheader> <text> <p xml:id="p1"> <rs xml:id="e1">the excavation in <name type="place" xml:id="n1">wasteland </name> in <date xml:id="d1">2005</date></rs> was performed by <name type="person" xml:id="n2">dr. Diggey </name>. He had the misfortune of <rs xml:id="e2"> breaking <rs xml:id="o1">the beautiful sword <rs xml:id= o_id1 >(C50435)</rs></rs> into 30 pieces</rs>. </p> </text></tei> 12 June 2015 4
CIDOC-CRM E55 Types refer to / refine E39 Actors (persons, inst.) participate in E28 Conceptual Objects E18 Physical Things affect or refer to E2 Temporal Entities (Events) at have location E52 Time-Spans E53 Places 12 June 2015 5
1 TEI integration routes Header <place>...<place> <event>...<event> 2 Header <rdf:>...<rdf:> <rdf:>...<rdf:> TEI document TEI document Body <name>...</name> <rs>...<rs> <name>...</name> Body <name>...</name> <rs>...<rs> <name>...</name> 3 TEI document Header <...> </...> Body <name>...</name> <rs>...<rs> <name>...</name> 12 June 2015 6
Charter by king Hákon Hákonsson 1225
Collection 1 Diplomatarium Norvegicum Summary Source info Text number Date Place Edited text
Collection 2 more recent transcripts
Collection 3 Regesta Norvegica persons, places, subject,.etc are in the registries text witnesses where the charter text is published, e.g. in Diplomatarium Norvegicum
Norwegian Charters Diplomatarium Norvegicum 22 volumes, cover 1100 to 1582 Published 1846 1995 Retro-digitized, TEI P5 encoding Newer transcripts Old Norwegian 1170 1405, 4000 transcripts TEI P5, no metadata, only identifier Regesta Norvegica 9 volumes cover 1100 to 1408 Very rich metadata TEI P5 encoded
The principle of Entropy Fallacy Massive data aggregation: Increased amount of data = Increase of amount of information Increased interlinking = Increase in information Popular view: Everything is connected to everything
Ontology An ontology is a conceptual model, that is, a formally defined model resulting from an analysis of a specific domain not necessarily a data model in the computer science sense. Core ontologies with universals General ontologies with particulars (thesauri/authority systems) a formal ontology can be expressed
Tools & Methods Encoding the original texts as XML-documents Text Encoding Initiative, tei-c.org Medieval Nordic Text Archive, menota.org Metadata expressed compliant with ontologies Cultural heritage view: CIDOC-CRM (ISO-21127), Library/bibliographic view: FRBR (FRBRoo) Encoding of metadata TEI-XML for archival purposes RDF for linked data
Linked data TEI-XML documents Part 1, the proper text <TEI...> <teiheader> <filedesc> <!--All kind of metadata--> <!-- Persons, places, bibl. ref, text witnesses etc --> </filedesc> </teiheader> <text> <! xml encode proper text goes here -->... </text> </TEI> Part 2, data for Linked Data (semantic web) Addtional structure with extracted assertions/metadata from the document expressed in RDF -XML
Possible points for external links Regesta Norvegica/Diplomatarium Norvegicum Persons, places, subject, onomastic information Creation date, place Text witnesses, archival signature Cross references for copies (vidimus) etc. Published, mentioned, bibliographic references Transcripts Text witnesses, archival signature Linguistic information
For consideration Well defined ontologies may assist in clearifying a scholarly analysis Without the use of common standard models like the CIDOC-CRM data integration can only be done on a trivial level Bottom up methods for data integration may be useful but must be complemented by top down methods 12 June 2015 17