- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

Similar documents
A collaborative platform for knowledge management

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Taking full advantage of the medium does also mean that publications can be updated and the changes being visible to all online readers immediately.

GetLOD - Linked Open Data and Spatial Data Infrastructures

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

WHY DIGITAL ASSET MANAGEMENT? WHY ISLANDORA?

Building Semantic Content Management Framework

EUR-Lex 2012 Data Extraction using Web Services

Software Architecture Document

data.bris: collecting and organising repository metadata, an institutional case study

technische universiteit eindhoven WIS & Engineering Geert-Jan Houben

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

Rotorcraft Health Management System (RHMS)

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

PAPER Data retrieval in the PURE CRIS project at 9 universities

CERN Document Server

Content Management Systems: Drupal Vs Jahia

Lightweight Data Integration using the WebComposition Data Grid Service

Windchill Service Information Manager Curriculum Guide

The Rutgers Workflow Management System. Workflow Management System Defined. The New Jersey Digital Highway

Invenio: A Modern Digital Library for Grey Literature

ELIS Multimedia Lab. Linked Open Data. Sam Coppens MMLab IBBT - UGent

Introduction to XML Applications

Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

Presentation / Interface 1.3

We have big data, but we need big knowledge

Open Data Integration Using SPARQL and SPIN

The Open Source CMS. Open Source Java & XML

Experiences with an XML topic architecture (DITA)

Corporate Bill Analyzer

ONTOLOGY-BASED MULTIMEDIA AUTHORING AND INTERFACING TOOLS 3 rd Hellenic Conference on Artificial Intelligence, Samos, Greece, 5-8 May 2004

How To Build A Cloud Based Intelligence System

SOA REFERENCE ARCHITECTURE: WEB TIER

Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology

Scope. Cognescent SBI Semantic Business Intelligence

Agents and Web Services

Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context

Notes about possible technical criteria for evaluating institutional repository (IR) software

Communiqué 4. Standardized Global Content Management. Designed for World s Leading Enterprises. Industry Leading Products & Platform

Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using irods and Fedora

Oracle BI 11g R1: Build Repositories

Annotea and Semantic Web Supported Collaboration

Structured Content: the Key to Agile. Web Experience Management. Introduction

Server side PDF generation based on L A TEX templates

The Ontological Approach for SIEM Data Repository

Secure Semantic Web Service Using SAML

IAAA Grupo de Sistemas de Información Avanzados

MatchPoint Technical Features Tutorial Colygon AG Version 1.0

Digital Asset Management A DAM System for TYPO3

Comparison of Digital Asset Management Systems (DAMs) and Content Management Systems (CMSs)

Arts Image Database - Specifications

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

CloudCERT (Testbed framework to exercise critical infrastructure protection)

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support

ISLANDORA STAFF USER GUIDE. Version 1.3

DFG form /15 page 1 of 8. for the Purchase of Licences funded by the DFG

Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions OVERVIEW

A Semantic web approach for e-learning platforms

DATA MODEL FOR STORAGE AND RETRIEVAL OF LEGISLATIVE DOCUMENTS IN DIGITAL LIBRARIES USING LINKED DATA

Towards a Semantic Wiki Wiki Web

An Information Provider s Wish List for a Next Generation Big Data End-to-End Information System

MERMIG The advanced collaboration software

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, Version 4.0

Functional Requirements for Digital Asset Management Project version /30/2006

Explorer's Guide to the Semantic Web

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

ARCHITECTURAL DESIGN OF MODERN WEB APPLICATIONS

Linked Data Publishing with Drupal

Using Open Source software and Open data to support Clinical Trial Protocol design

Common Questions and Concerns About Documentum at NEF

ASYST Intelligence South Africa A Decision Inc. Company

James Hardiman Library. Digital Scholarship Enablement Strategy

Content Management Systems: Drupal Vs Jahia

OpenText Web Experience Management

Developing Microsoft SharePoint Server 2013 Advanced Solutions

WebSphere Portal Server and Web Services Whitepaper

Transcription:

- a Humanities Asset Management System Georg Vogeler & Martina Semlak

Infrastructure to store and publish digital data from the humanities (e.g. digital scholarly editions): Technically: FEDORA repository Apache Cocoon Administration client in Java Extended OpenRDF Sesame IIPImage In action: http://gams.uni-graz.at In preparation: package solution http://gams.uni-graz.at/download/cirilo-installer-2.4.tar.gz

GAMS Ingest Dissemination Fedora Commons Repository, integrating Lucene full text index Mulgara triple store for object handling content models Cirilo Admin- Client Cocoon XSLT processing Extended OpenRDF Sesame triple store Further content models IIPImage http://github.com/acdh/cirilo.git

What is FEDORA (commons)? Flexible Extensible Digital Object Repository Architecture http://www.fedora-commons.org "Repository": Scalable, persistent reusable storage and retrieval infrastructure for content and metadata

Flexible Extensible Digital Object Repository Architecture Extensible via webservices (SOAP) Operating system indipendent Scalable via a distributed architecture in a JSP container environment

FEDORA functionalities Including semantic technologies and full-text search engines (e.g. Lucene) Supports standardized protocols for data exchange, e.g. OAI-PMH etc. Definition of access rights with extensible Access Control Markup Language (XACML) LDAP and Shibboleth based authentication and authorization Includes version management strategies for datastreams XML based standardized object formats: METS etc.

Apace Cocoon: Handling XML framework integrating data management with XSLT processing workflows and taskconcentrated coding for (multilingual) web applications separation of content, logic, presentation and management layers in website design (MVC pattern) Multiple delivery channels in multilingual usage scenarios

Fedora Content Models A structural definition for a type of object (e.g. scholarly article, digital edition, learning object, podcast, ontology etc.) A pattern of datastreams (number and type) A pattern of datastreams and their disseminators A set of rules for creating a digital object A set of constraints on a digital object

Content Model Dublin Core Metadata objects press release XACML Metadata define access rules REL-EXT Metadata describe object to object relationships Datastream (e.g. TEI file) Datastream (e.g. image file) Datastream (e.g. RDF/XML file) Pointers to service definitions to provide service-mediated views, e.g.: gethtml getpdf ImageViewer gettei e.g. "Digital Edition": Content: XML file with parallel segmentation apparatus images Disseminators: Virtual machine, DFG-Viewer TEI Critical Edition Toolbox

Content Model

Fedora Content Models HTML

Fedora Content Models PDF

GAMS default content models cirilo:context Aggregates objects cirilo:tei YEAH, it's a TEI file cirilo:dfgmets Aggregates files/datastreams and metadata of a single object cirilo:ontology It's an formal ontology in RDF cirilo:skos It's a Simple Knowledge Organisation System (SKOS) ontology cirilo:query It's a search environment cirilo:html It's an HTML file cirilo:pdf It's an PDF file cirilo:bibtex It's an bibliography in TeX cirilo:resource It's a well resource

GAMS default content models cirilo:context Display ordered lists of objects cirilo:tei Display the TEI as a readable text cirilo:dfgmets Displays the data in the DFGviewer cirilo:ontology Let you navigate by through hierarchies of concepts cirilo:skos Extract names in different languages in different languages do for example cirilo:query Do a multicategory search cirilo:html Display the HTML cirilo:pdf yes, display the PDF cirilo:bibtex Create a bibliography in a specific style cirilo:resource

http://www.fedora-commons.org http://cocoon.apache.org http://gams.uni-graz.at http://github.com/acdh/cirilo.git

Workflow

First steps Assessment of material Explanation of research interests and desired outcome Possibilities and benefits of a digital edition Developing a data model Formalization of the data model in TEI Data acquisition 17

Data acquisition & TEI data model Prerequisite: a valid TEI document Write your own XML and import the result Ingest from Excel Ingest from a text processing program Use OxGarage and import the result Ingest from exist 18

Data acquisition: Excel Data acquisition in Excel Excel template to TEI 19

Data acquisition: Excel The resulting TEI document 20

Data acquisition: text processing 21

Client: Environment einrichten Creating a project specific environment: Extras > Create environment Define stylesheets for web and print versions Customization of mappings TEI > Dublin Core; TEI > RDF 22

Client: Ingest and edit objects Mass ingest File > Ingest objects Select a content model > cirilo:tei.dixit Select the user Ingest from "filesystem", "exist" or "Excel spreadsheet" 23

View objects in a browser Open an object in a browser http://glossa.uni-graz.at/[pid] Open individual datastreams, e.g. the TEI_SOURCE: http://glossa.uni-graz.at/[pid]/tei_source Every single datastream is quotable 24

Dissemination XSL processor webservices 25

Visualization of contexts 26

Visualization of contexts One object in different project contexts 27

Disseminator: TEI to HTML 28

Disseminator: TEI to PDF 29

Presentation: index of persons 30

Semantic enrichment Charge markup and links with machineprocessable meaning Explicit, public and reusable data models Use of existing resources in the web (in the sense of LOD) Use of controlled and standardized vocabularies GND, VIAF 31

Semantic enrichment Linked (Open) Data Data that is available in the web, addressable through an URI, linked with other data, (ideally) described in RDF and queryable with SPARQL. 32

Semantic Enrichment: Dublin Core (DC) DC_MAPPING System metadata will be extracted from the content data following project specific rules TEI Content > DC_MAPPING > Dublin Core The result is stored in the Dublin Core datastream Preferences > Extract Dublin Core metadata 33

Semantic enrichment: DC <mm:metadata-mapping xmlns:mm="http://mml.uni-graz.at/v1.0"> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/oai/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:t="http://www.tei-c.org/ns/1.0"> <dc:title> <mm:map select="//tei:titlestmt/tei:title" /> </dc:title> <dc:publisher> <mm:map select="//tei:publicationstmt/tei:publisher" /> </dc:publisher> <dc:identifier>this:pid</dc:identifier> </oai_dc:dc> </mm:metadata-mapping> 34

Semantic enrichment: DC <TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiheader> <filedesc> <titlestmt> <title>physicians in the Shenandoah Valley: Letters, 1850-1854</title> <author>caspar Coiner Henkel</author> </titlestmt> <publicationstmt> <idno type="pid">o:dixit.01</idno> </publicationstmt> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/oai/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title>physicians in the Shenandoah Valley: Letters, 1850-1854</dc:title> <dc:creator>caspar Coiner Henkel</dc:creator> <dc:identifier>o:dixit.01</dc:identifier> </oai_dc:dc> 35

Semantic enrichment: geonames Automated resolution of place names using the webservice geonames.org Encoding of place names in TEI <placename key="geonameid:2778067">graz</placename> Preferences > Resolvement of place names 36

Semantic enrichment: geonames Full data record <keywords scheme="cirilo:normalizedplacenames"> <list><item> <placename xml:id="gn.1"> <country>austria</country> <settlement>graz</settlement> <name ref="geonameid:2778067" type="fcode:pplc">graz</name> <location> <geo>47.066667 15.433333</geo> </location> </placename> </item></list> </keywords> 37

Semantic enrichment: SKOS The content model cirilo:skos allows to store thesauri in SKOS format Storage in a triple store (Sesame) Resolvement of SKOS concepts during the TEI ingest process Preferences > Resolve SKOS concepts" Encoding of the reference in TEI ana="ocm:130 ocm:180" 38

Semantic enrichment: SKOS Full data record <keywords scheme="http://glossa.uni-graz.at/archive/objects/o:ocm"> <term ref="http://gams.uni-graz.at/skos/scheme/o:ocm/#130" type="skos:concept"> <term type="skos:preflabel" xml:lang="en">geography</term> </term> <term ref="http://gams.uni-graz.at/skos/scheme/o:ocm/#180" type="skos:concept"> <term type="skos:preflabel" xml:lang="en">total Culture</term> </term> </keywords> 39

Semantic enrichment: index of persons Controlled vocabularies and data records (GND, VIAF, ) 40

Semantic enrichment: index of persons Generate index of persons via an ontology Reference from the TEI to an ontology <persname ref="#p10119">f. Kafka</persName> Ontology (in RDF format) <rdf:description xml:id="p10119" rdf:about="http://d-nb.info/118559230"> <dc:identifier>p10119</dc:identifier> <g2o:type>person</g2o:type> <g2o:prefname>kafka, Franz</g2o:prefName> <g2o:dateofbirth>1883-07-03</g2o:dateofbirth> <g2o:dateofdeath>1924-06-03</g2o:dateofdeath> <g2o:profession>writer</g2o:profession> </rdf:description> 41

Semantic enrichment: index of persons RDF-Mapping on the TEI source > object-specific statements will we stored in the FEDORA internal triple store (Mulgara) Storage of the ontology in the Sesame triple store Common query via SPARQL 42

Projects in GAMS as selection Visual Archive Southeastern Europe Collection of historical and contemporary visual materials on Southeastern Europe (postcards, photographs) http://gams.uni-graz.at/vase Arms and Portrait Books of Regensburg The collection of Arms and Portraits books from the city archive of Regensburg http://gams.uni-graz.at/rpb Alexander Rollett: Letters Digital Edition of the correspondence of Alexander Rollett, the first holder of the chari of physiology and histology in Graz http://gams.uni-graz.at/rollett 43

Cirilo Client

Cirilo Client 45

Cirilo Client Tasks Java application for data curation in Fedora based repositories Front-end for FEDORA object management Mass operations Ingest processes from directories, databases, spreadsheets Predefined Content Models 46

Fedora Object Model Content Model 47

Cirilo Client Default content models cirilo:context cirilo:tei cirilo:dfgmets cirilo:ontology cirilo:skos cirilo:query cirilo:html cirilo:pdf cirilo:bibtex cirilo:resource 48

cirilo:tei Content datastreams TEI_SOURCE BIBTEX DC IMAGES THUMBNAIL 49

cirilo:tei System data STYLESHEET and FO_STYLESHEET DC_MAPPING RDF_MAPPING RELS-INT REPLACEMENT_RULESET QUERY RELS-EXT 50

cirilo:tei Disseminators Voyant Tools 51

cirilo:tei Disseminators Versioning Machine 52

cirilo:tei Disseminators Google Maps / GeoBrowser 53

cirilo:tei Disseminators Project specific STYLESHEET 54

cirilo:tei Semantic enrichment DC_MAPPING > DC RDF_MAPPING > RELS-INT > Triplestore referenced place names > geonames.org > RDF_MAPPING > RELS-INT > Triplestore semantic concepts > Sesame repository QUERY object searches in Mulgara and Sesame triplestore (e.g. dynamic registers) 55

Cirilo Client Functionalities Create objects and datastreams file > edit objects > new file > ingest objects file > edit objects > edit > [select your datastream] > new Edit objects and datastreams file > edit objects > edit > [select your datastream] > add (upload a file) or edit (cirilo editor) or delete Create and manage metadata File > edit objects > [select your object ] > edit > Content datastreams > DC > Edit Assign disseminators to objects File > edit objects > system datastreams > STYLESHEET Extract semantic information 56

cirilo:tei Ingest options 57