Automation of metadata processing



Similar documents
The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

A sustainable archiving software solution for The Language Archive

How To Manage Your Digital Assets On A Computer Or Tablet Device

WHY DIGITAL ASSET MANAGEMENT? WHY ISLANDORA?

Islandora: An Open Source Institutional Repository Solution. Consortium of MnPALS Libraries Annual Meeting April 2014

WK1F: Getting Started with Islandora

Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)

Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services

Offerte del 13 giugno 2014

ISLANDORA STAFF USER GUIDE. Version 1.3

DiscoveryGarden Inc. Software Developer

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

Foundation of a Rich Repository Ecosystem. Mark Leggott University Librarian/Project Lead - UPEI President & CEO - DiscoveryGarden Inc.

Digital Communication and Interoperability - A Case Study

Combining Technologies to Create New Solutions

Research Data Registry (ARTIS)

Web Application Developer 38,000-42,000 (depending on qualification and experience) Berlin (Germany) Full time, Permanent

STANDARDS IN SPOKEN CORPORA

The Rise of Documentary Linguistics and a New Kind of Corpus

How to Create a Successful Website Based With Drupal

Scott D. Bacon. Web Services and Emerging Technologies Librarian Assistant Professor Kimbel Library, Coastal Carolina University

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Functional Requirements for Digital Asset Management Project version /30/2006

Framework as a master tool in modern web development

Business Proposition. Digital Asset Management. Media Intelligent

Towards Web Services for Speech Recording and Annotation

Open-Source Based Solutions for Processing, Preserving, and Presenting Oral Histories

ARCHITECTURAL DESIGN OF MODERN WEB APPLICATIONS

Fusesix. Design Programming Development Marketing. Fusesix Web Services South Carolina, USA. Phone:

PRIVACY AWARE ACCESS CONTROL FOR CLOUD-BASED DATA PLATFORMS

Digital Asset Management. Content Control for Valuable Media Assets

The Database for Spoken German DGD2

The Knowledge Sharing Infrastructure KSI. Steven Krauwer

Offerte dell 11 luglio 2014

Taking Control of Library Metadata and Websites using the extensible Catalog

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Metadata Quality Control for Content Migration: The Metadata Migration Project at the University of Houston Libraries

CS3051: Digital Content Management

Social Media Guidance. Who s here? 7/29/2015

Students who successfully complete the Health Science Informatics major will be able to:

Annotation in Language Documentation

Language Documentation and Description

Yale-NUS College Student Associate Program

B SVF - Bavaria Long Term Preservation

Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF

This presentation will provide a brief introduction to Rational Application Developer V7.5.

Choosing a Content Management System (CMS)

Drupal CMS for marketing sites

Building Library Website using Drupal

A Selection of Questions from the. Stewardship of Digital Assets Workshop Questionnaire

A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing

Education. Relevant Courses

Digital Asset Management. San Jose State University. Susan Wolfe MARA 211. July 22, 2012

Mapping Objects to External DBMSs

MarkLogic Server. Reference Application Architecture Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

How To Create A Charter Corpus On The Web (For Historians)

An Introduction to TextGrid

Service Interoperability

The Rutgers Workflow Management System. Workflow Management System Defined. The New Jersey Digital Highway

In ediscovery and Litigation Support Repositories MPeterson, June 2009

ETD Repository: Drupal, Solr, Islandora, and Fedora Commons. Aaron Collie, Devin Higgins, Lucas Mak, Shawn Nicholson

Requirements Design Implementation. Software Architectures. Components Software Component Architecture. DSSA: Domain-Specific Software Architectures

IGW+ Certificate. I d e a l G r o u p i n W e b. International professional web design,

Exploiting Sign Language Corpora in Deaf Studies

and ensure validation; documents are saved in standard METS format.

Developing ASP.NET MVC 4 Web Applications MOC 20486

Easy configuration of NETCONF devices

Information Retrieval Elasticsearch

This course provides students with the knowledge and skills to develop ASP.NET MVC 4 web applications.

LEXUS: a web based lexicon tool

priceprofor Documentation

The New ADS Search Interface and API

WebLicht: Web-based LRT services for German

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Swarthmore College Libraries Digital Collection Development Policy

State of Drupal Barcelona Dries Buytaert

VeraPDF: Building the definitive PDF/A validator The European Union s PREFORMA project

Software Development & Education Center PHP 5

Programming Fundamentals of Web Applications Course 10958A; 5 Days

Developing software to prepare social science research data and code for sharing and preservation

Gantry Basics. Presented By: Jesse Hammil (Peanut Gallery: David Beuving)

Final Report - HydrometDB Belize s Climatic Database Management System. Executive Summary

DRUPAL CONTINUOUS INTEGRATION. Part I - Introduction

Pragmatic Web 4.0. Towards an active and interactive Semantic Media Web. Fachtagung Semantische Technologien September 2013 HU Berlin

HOW TO CREATE THEME IN MAGENTO 2


Development of a Topographical Transcription Method. Introduction

TERMS OF REFERENCE. Revamping of GSS Website. GSS Information Technology Directorate Application and Database Section

MultiMimsy database extractions and OAI repositories at the Museum of London

CLARIN-NL Second Open Call. Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010

- a Humanities Asset Management System. Georg Vogeler & Martina Semlak

DFG form /15 page 1 of 8. for the Purchase of Licences funded by the DFG

FEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE

Live Demo Imeji: Bastien Saquet, Max Planck Digital Library, Software Development Imeji Outlook

How We Did It. Unique data model abstraction layer to integrate, but de-couple EHR data from patient website design.

What s really under the hood? How I learned to stop worrying and love Magento

Paul Boisvert. Director Product Management, Magento

Software Requirements Specification vyasa

Language Translation Services RFP Issued: January 1, 2015

D5.5 Initial EDSA Data Management Plan

Transcription:

Automation of metadata processing CLARIN-Conference in Wroclaw, Poland, 15-17, Octobre Except where otherwise noted, content on this poster is licensed under a Creative Commons Attribution 4.0 International license. 1

Introduction Repositories Introduction HZSK- and (Daniel Jettka) LAUDATIO-Repository (Dennis Zielke) Open-Source technologies Generalized model of the data ingest process Role of standardized metadata in the import process Validation of data Modelling import formats and data structures Indexing of metadata 2

Introduction HZSK-Repository is based on the software triad Fedora, Islandora, and Drupal currently contains 19 corpora of transcribed spoken language stored research data includes texts, transcripts, audio and video data, images, metadata, and other data types is connected to the CLARIN-D infrastructure on several levels, e.g. the central services Virtual Language Observatory (for metadata search) and the CLARIN Federated Content Search (for search directly in the content) 3

Introduction LAUDATIO-Repository is an open access environment for persistent storage of historical texts and their annotations it currently contains historical corpora from various disciplines with a total of 2000 texts that contain about two million word forms the main focus lies on German historical texts and linguistic annotations including all dialects of time periods ranging from the 9th to the 19th century 4

Introduction LAUDATIO-Repository technical the technical repository infrastructure is based on generalizable software modules such as the graphical user interface, the data exchange module between research data and the Fedora REST API the metadata search for indexing and faceting is based on the Lucene-based technology ElasticSearch the imported corpora are stored in their original structure in a permanent and unchangeable version 5

LAUDATIO: Used Open-Source-Technologies (1) CakePHP 2.4 to use MVC PHP5 Web-Framework Authorization and Authentication in the user management via Access Control List Fedora 3.6 for Data storage REST-API for Data exchange ElasticSearch as Search engine REST-API for Data exchange Implemented customized and versioned IndexMapping 6

LAUDATIO: Used Open-Source-Technologies (2) External PID-Webservice (EPIC API Version 2) to assign the Persistent Identifier Third party Open Source libraries auf Github http://tinyurl.com/lf26u97 Flat-Design (HTML5, CSS3) (Coming soon) 7

LAUDATIO: appropriated Data structure TEI XML P5 Description of the corpus data structure using the TEI metadata standards 8

LAUDATIO: View/Index Mapping ElasticSearch 9

LAUDATIO: Examples ElasticSearch for Indexing IndexMapping ViewMapping 10

LAUDATIO: Object model Fedora via RIDGES-Korpus 11

LAUDATIO: Schema config stored in Fedora 12

If you have questions please contact us: Dennis Zielke, Humboldt-Universität zu Berlin, E-Mail: dennis.zielke@hu-berlin.de Daniel Jettka, Hamburg Centre of spoken language corpora, E-Mail: daniel.jettka@uni-hamburg.de 13