TRANSKRIBUS. Research Infrastructure for the Transcription and Recognition of Historical Documents



Similar documents
CCS Content Conversion Specialists. METS / ALTO introduction

GCP - Records Managers Association

Digital libraries of the future and the role of libraries

Crowdsourcing manuscript transcription: the Transcribe Bentham project

Web based document management Simple, Yet Sophisticated Workflow HTML Forms. Business Process Automation A guide to capture tools for VisualVault

CONTINIA DOCUMENT CAPTURE FACTSHEET ENGLISH LANGUAGE

A Digital Library Feasibility Study

TIRED OF TYPING? ACCELERATE YOUR BUSINESS WITH: AUTOMATED DATA ENTRY PURCHASE APPROVAL WORKFLOW DIGITAL DOCUMENT ARCHIVE DOCUMENT CAPTURE

User Guide of edox Archiver, the Electronic Document Handling Gateway of

DATA MANAGEMENT FOR QUALITATIVE DATA USING NVIVO9

PROCESSING & MANAGEMENT OF INBOUND TRANSACTIONAL CONTENT

Switch to Electronic Document Management save time, space, money and frustration.

Appendix A. Functional Requirements: Document Management

PRINCIPLES OF RESEARCH DATA MANAGEMENT AT THE RECTOR S DECISION, 9 SEPTEMBER 2014

Guidelines for the submission of invoices

Key Considerations for Documentation Management Technology. Learning from Local Experience

EndNote Beyond the Basics

DOCUMENT MANAGEMENT. Evo2: YOUR FLEXIBLE FRIEND Evo3: SEEK AND YE SHALL FIND

BUSINESS PROCESS AUTOMATION. Document digitisation and records management solutions. Delivering value. Enabling success. Integrated Services

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

About the Digitization Programme & History of ITU Portal

The challenges of digital preservation to support research in the digital age

Less paper less costly way to manage documents! Document and Process Management System

Access IARD at or through Firm Gateway at

Course 1: Digital Libraries Introduction Unit 4: Digital libraries functional components software

Xerox Mobile Link App Benefits Examples

Technical specifications for Digitization of Consular documents

Introduction to WIPOScan Software

DAISY PRODUCER: AN INTEGRATED PRODUCTION MANAGEMENT SYSTEM FOR ACCESSIBLE MEDIA

Xerox Workflow Automation Services Solutions Brochure. Xerox DocuShare 7.0. Enterprise content management for every organization.

Collaborative Computational Projects: Networking and Core Support

Newspaper Digitization Brief Background

A P P L Y I N G F O R S T U D E N T E X C H A N G E I N H E L S I N K I M E T R O P O L I A :

E-Content Service Group Virtual Meeting. Digital Preservation: How to Get Started

Dematerialisation and document collaboration

Introducing. Product Overview White Paper. Enterprise Content Management Platform. November

Invenio: A Modern Digital Library for Grey Literature

ENHANCED PUBLICATIONS IN THE CZECH REPUBLIC

The Big Data methodology in computer vision systems

Are you ready for more efficient and effective ways to manage discovery?

An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis

Scientific Knowledge and Reference Management with Zotero Concentrate on research and not re-searching

Filing Information Rich Digital Asset Management Coca-Cola s Archive Research Assistant: Using DAM for Competitive Advantage IDC Opinion

White Paper. The Five Keys to a Successful Document Management System ABSTRACT. Command Your Content

Kofax Solution Brief. Kofax Enterprise Capture Solutions Enable Document-driven Business Processes in SharePoint

Document Capture and Distribution

Crowdsourcing and digitization

EPSON PERFECTION SCANNING BASICS

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

SURVEY ON THE LONG-TERM PRESERVATION OF DIGITAL DOCUMENTS IN EUROPEAN LIBRARIES Monika Krimbacher Michael Neuhauser Martina Vogl

Quality Assurance Guidance

Crowdsourcing for Big Data Analytics

Document Management Software. Find what you need fast Break through organizational barriers Work from wherever you want, whenever you want

Glossary of terms used in the survey

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy Page 1 of 8

The UM-Google Project (aka MDP)

Personalizing Your Signature Appearances

Maryland Management Company CaseStudy

Robert B. McGeachin. Introduction. Most academic scholars have usually amassed personal collections of works relevant to

Overview of NDNP Technical Specifications

The A-Z of Building a Digital Newspaper Archive: A Case Study of the Upper Hutt City Leader

Visendo Fax Server Integration With SharePoint Server

BUSINESS CORE PRINTING SOLUTIONS

PDF Accessibility Overview

Nuance Power PDF is PDF uncompromised.

Stu Van Dusen Marketing Manager, Lexbe LC. September 18, 2014

Version5 +AutoCapture & Workflow

Declaration of Conformity 21 CFR Part 11 SIMATIC WinCC flexible 2007

Business Software Defined DMS DOCUMENT MANAGEMENT SYSTEM ZETA SOFTWARE

Change Control Process

AccuRead OCR. Administrator's Guide

Connections to External File Sources

Xerox Easy Translator Service User Guide

Veco User Guides. Document Management

Going Paperless The Utah Experience. Mike Pecorelli Project Manager Utah DEQ

FP7 Project reporting. FP7 INCO National Contact Points Meeting 9 June Athens Anne Mandenoff - European Commission - RTD A.

Our mission is to provide the best back offices solutions available in the market today. You will not find another service provider that is more

Transcription:

TRANSKRIBUS. Research Infrastructure for the Transcription and Recognition of Historical Documents Günter Mühlberger, Sebastian Colutto, Philip Kahle Digitisation and Digital Preservation group (University of Innsbruck)

TRANSKRIBUS enables collaboration among humanities scholars, computer scientists, archives and volunteers with the ultimate goal to revolutionise recognition, transcription and access to historical handwritten documents. 16/06/2015 2

ARCHIVES & LIBRARIES HUMANITIES SCHOLARS COMPUTER SCIENTISTS PUBLIC USERS

ARCHIVES & LIBRARIES HUMANITIES SCHOLARS TRANSCRIPITON & RECOGNITON PLATFORM COMPUTER SCIENTISTS PUBLIC USERS

ARCHIVES & LIBRARIES Provide images Receive standard formats (METS/ALTO/TEI/PDF-A) DOCUMENTS HUMANITIES SCHOLARS TRP COMPUTER SCIENTISTS PUBLIC USERS

ARCHIVES & LIBRARIES Provide images Receive standard formats (METS/ALTO/TEI/PDF-A) Transcribe text DOCUMENTS HUMANITIES SCHOLARS Receive TEI, PDF EXPERT & CROWD CLIENTS TRP COMPUTER SCIENTISTS PUBLIC USERS

ARCHIVES & LIBRARIES Provide images Receive standard formats (METS/ALTO/TEI/PDF-A) Transcribe text DOCUMENTS Work with reference data HUMANITIES SCHOLARS Receive TEI, PDF EXPERT & CROWD CLIENTS TRP HTR DIA KWS NLP, HPC Invent algorithms COMPUTER SCIENTISTS PUBLIC USERS

ARCHIVES & LIBRARIES Provide images Receive standard formats (METS/ALTO/TEI/PDF-A) Transcribe text DOCUMENTS Work with reference data HUMANITIES SCHOLARS Receive TEI, PDF EXPERT & CROWD CLIENTS TRP HTR DIA KWS NLP, HPC Improve algorithms COMPUTER SCIENTISTS WEBSITE Annotate, evaluate, contribute Search and retrieve handwritten documents PUBLIC USERS

ARCHIVES & LIBRARIES Provide images Receive standard formats (METS/ALTO/TEI/PDF-A) HUMANITIES SCHOLARS Transcribe text Receive TEI, PDF TRANSKRIBUS TRANSCRIPITON RECOGNITION PLATFORM Work with reference data Improve algorithms COMPUTER SCIENTISTS Annotate, evaluate, contribute Search and retrieve handwritten documents PUBLIC USERS

Cycle of growth HTR TECHNOLOGY ACCELERATED DIGITISATION NEW SERVICES MORE TRANSCRIPTS MORE USERS 16/06/2015 10

Platform Software as a Service Integrated Tools External Services Images Document Management User Management TRP Expert GUI Crowd- Sourcing GUI METS ALTO PDF TEI RTF PAGE Website Search Interface 16/06/2015 11

Website http://transkribus.eu/ 16/06/2015 12

TRANSKRIBUS Expert GUI 16/06/2015 13

TSX Crowd-sourcing platform 16/06/2015 14

Archives/collection holders are enabled to Manage the transcription process of large and heterogeneous document collections in a standardized and effective way Expose their digitised documents to their own employees, humanities scholars, volunteers and the crowd Involve users according to their background Support humanities scholars in their research Outsource transcription (or the training of an HTR model) to service providers (e.g. off-shore) Make large amounts of handwritten documents searchable (if a trained HTR model is available) Import transcribed documents in machine readable formats (METS/ALTO) Provide standardized requirements to researchers and technology providers dealing with issues of interest (e.g. manage tendering) 16/06/2015 15

Humanities scholars are enabled to (1) Use TRANSKRIBUS for free (registration) Upload as many single documents as they want Transcribe their documents semi-automated in a way that image and text are linked to each other (improved transcription quality!) Enrich their documents with Named Entities (person and geo names, dates) and personal tags Normalize their transcription in a transparent way (abbreviations, character sets, editorial declaration) Export their document in various formats TEI: transcribed text with machine and human readable text ( scientific style ) PDF-Text: transcribed text with some formatting ( book style ) PDF-Image: transcribed text in the background, image in the foreground ( Scanned PDF style ) RTF: transcribed text for human readability ( working style ) METS/ALTO: full machine readable package ( digital library style ) METS/PAGE: full machine readable package ( computer science style ) 16/06/2015 16

Humanities scholars are enabled to (2) Manage their documents in a private collection (no one else has access!) Invite other users to collaborate (e.g. students, colleagues) and to work on the same document See different versions of their documents Expose their documents also in a simplified web-based transcription GUI (TSX) Transcribe documents with the support of HTR if a model is available (needs offline training in beforehand) Exploit language resource database in the background (e.g. words, variants, frequencies, etc.) Search within a handwritten text collection Benefit from other TRANSKRIBUS users in an indirect way. The following resources will be available to every user of the platform: Trained HTR models Normalized editorial declarations Normalized Named Entities Language resources 16/06/2015 17

Sustainability model Free usage Single documents (who ever wants to work with the platform) E.g. humanities scholars, volunteers, genealogists, archivists, Service Level Agreement Document collections E.g. transcription projects (grants), archives, libraries (OCR collections) Public-Public-Partnership Subsidiary model, not only for TRANSKRIBUS itself, but also for participants (e.g. archives, collection holders) E.g. DARIAH, direct support via e-infrastructure funds, research agencies (may have interest on standardized formats and workflows, etc.) Research agencies Vision Dozens of archives expose their collections via TRANSKRIBUS, hundreds of humanities scholars are actually working with it and thousands of volunteers are involved 16/06/2015 18

transcriptorium READ READ Recognition and Enrichment of Archival Documents E-Infrastructure programme: Virtual Research Environments 13 partners, coordinated by UIBK 3,5 years (very likely 1.1.2016 until 30.6.2019) 8,2 mill. EUR funding 10 associated partners Highlights Release of TRANSKRIBUS e2015 (=all main features included) Implementation of HTR technology in four Large Scale Demonstrators (Zurich, Finland, Passau, Venice Time Machine) 10 associated partners (archives, collection holders) joining the project in Y1 ScanApp (mobile phone as document scanner) E-Learning application (history students!) European Hands (marketing campaign) Large Data Sets for Research Competitions (put HTR on the research agenda) 16/06/2015 19

Thank you for your attention! 16/06/2015 20