Electronic Critical Edition of Ancient Digital Manuscript Sources



Similar documents
Attachment(s) - Curriculum Vitae - Photo. Davide Merlitti. Nationality: Italy. 1 / 7

Evolving the system towards Horizon2020 and VCMS 1 challenges

EAD and EAC in Italy and the Italian archival descriptive systems on-line

TextGrid as Virtual Research Environment

Satellite Meeting "Conservation and preservation of library material in a cultural-heritage oriented context" 31 August - 1 September 2009 Rome, Italy

METHODS AND EXPERIENCES IN CULTURAL HERITAGE ENHANCEMENT

Development of a Topographical Transcription Method. Introduction

e INTESA: L'uso di sistemi italiani di telemedicina e loro Integrazione nel Sistema Sanitario Nazionale" L. Guerriero e R. Bedini

Senso Comune. a Community Knowledge Base for the Italian Language. Creative Commons Attribution-Share Alike 2.5 Italy License

ELECTRONIC EDITIONS WHICH WE HAVE MADE AND WHICH WE WANT TO MAKE

AN OPEN KNOWLEDGE BASE FOR ITALIAN LANGUAGE IN A COLLABORATIVE PERSPECTIVE

Hardware and software for a PC-based workstation devoted to philological (principally Greek and Latin) studies Lana, Maurizio

Studioddm snc via Malpighi, Milano - Italy. t f studioddm@studioddm.com (.

MULTIFUNCTIONAL DICTIONARIES

Special Topics in Computer Science

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Service Road Map for ANDS Core Infrastructure and Applications Programs

TextGrid Research Infrastructure for the e-humanities

Andrea Pedeferri. Curriculum Vitae

LOD2014 Linked Open Data: where are we? 20 th - 21 st Feb Archivio Centrale dello Stato. SBN in Linked Open Data

Design and Implementation of an Automatic Semantic Annotation Service

How To Create A Charter Corpus On The Web (For Historians)

Selecting a Taxonomy Management Tool. Wendi Pohs InfoClear Consulting #SLATaxo

Research Network and Database System (FuD)

Chapter 8. Final Results on Dutch Senseval-2 Test Data

A Platform for Managing Term Dictionaries for Utilizing Distributed Interview Archives

Image quality issues in digitization projects of historical documents

Research Guide to Italian Literature

SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks

EAC-CPF Ontology and Linked Archival Data

CACAO PROJECT AT THE LOGCLEF TRACK

Integration of Protein-protein Interaction Data in a Genomic and proteomic Data Warehouse

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Il Data Model di Europeana!

Bachelor s Degree in English Studies

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

The Manuscript as Cultural Heritage: Digitisation ++

Technical Writing - A Glossary of Useful Spanish Language Resources

Organizing a large scale lexical database

LME - Law Making Environment. Model driven semantic drafting and annotation of legislative sources

Using NLP and Ontologies for Notary Document Management Systems

Time: A Coordinate for Web Site Modelling

School Library Standards. for California Public Schools, Grades Nine through Twelve

SOCIS: Scene of Crime Information System - IGR Review Report

Paving the Way for the Next Generation of Cultural Digital Library Services: The Case Study of 'Fortuna visiva of Pompeii' within the BRICKS Project

Interactive information visualization in a conference location

Integrated Library Systems (ILS) Glossary

OAISistema verso un portale OAI per gli studi sul Mediterraneo Antico

viii Javier E. Díaz-Vera, Rosario Caballero

TRANSKRIBUS. Research Infrastructure for the Transcription and Recognition of Historical Documents

MULTILINGUAL ACCESS TO CONTENT THROUGH CIDOC CRM ONTOLOGY

Requirements for European Masters

An Introduction to TextGrid

Guide for deposit applications Questions and answers

Visualizing Poetry: Creating Tools for Critical Analysis. Introduction Current debates over distant reading (Moretti) seem to imply that digital tools

An experience with Semantic Web technologies in the news domain

Guide for deposit applications Questions and answers

1 st CIRCULAR. ICE XI International Congress of Egyptologists XI. Florence (Italy) August 23 rd 30 th 2015

Information for the Semantic Web. Procedures for Data Integration through h CIDOC CRM Mapping

Achille Felicetti" VAST-LAB, PIN S.c.R.L., Università degli Studi di Firenze!

PONTE EU OPEN DAY 31 January-Cambridge

Appendix A: Inventory of enrichment efforts and tools initiated in the context of the Europeana Network

CASTELLO DI POSTIGNANO REAL ESTATE PROPOSAL

Mechanics of Materials and Structures Laboratory

Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie.

PoS-tagging Italian texts with CORISTagger

Giovanni LA TORRE. PDM PLM PLM Business Consultant. 34 years old 10 years of experience

SPC BOARD (COMMISSIONE DI COORDINAMENTO SPC) AN OVERVIEW OF THE ITALIAN GUIDELINES FOR SEMANTIC INTEROPERABILITY THROUGH LINKED OPEN DATA

THAMUS PRESENTATION LIRICS Industrial Advisory Group meeting Barcelona 20/21 June Un Consorzio

Overview of admission requirements for the master s degree programs of the Faculty of Arts

How To Use Gross

Flattening Enterprise Knowledge

Courses in Arabic, Mandarin Chinese, Catalan, Irish Gaelic, and American Sign Language are offered.

ENTERPRISE DOCUMENTS & RECORD MANAGEMENT

IBM Content Analytics with Enterprise Search, Version 3.0

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Marie Dupuch, Frédérique Segond, André Bittar, Luca Dini, Lina Soualmia, Stefan Darmoni, Quentin Gicquel, Marie-Hélène Metzger

@ Biblioteca Salaborsa, Auditorium Enzo Biagi, Piazza del Nettuno, 3, Bologna. tbc Ministero dei Beni e delle Attività Culturali e del Turismo

Processing: current projects and research at the IXA Group

Towards a Knowledge-Based Learning System for The Quranic Text

Community Edition. Master Data Management 3.X. Administrator Guide

Weblogs Content Classification Tools: performance evaluation

Our Master Programmes in Brief

Study Plan. Bachelor s in. Faculty of Foreign Languages University of Jordan

Business Intelligence for The Internet of Things

Multilingual and Localization Support for Ontologies

The Bibliography of the Italian Parliament: Building a Digital Parliamentary Research Library. What is the BPR?

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013

The Dictionary of the Common Modern Greek Language is being compiled 1 under

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Getting Off to a Good Start: Best Practices for Terminology

WebLicht: Web-based LRT services for German

Zukunft. Seit Study Programs. at the University of Heidelberg. International Relations Office

FTA Technology 2009 IT Modernization and Business Rules Extraction

2QWRORJ\LQWHJUDWLRQLQDPXOWLOLQJXDOHUHWDLOV\VWHP

Information and documentation The Dublin Core metadata element set

twitter.com/liberologico Pisa Pula (CA) Campochiaro (CB)

Transcription:

Institute for Computational Linguistics Pisa Andrea Bozzi Electronic Critical Edition of Ancient Digital Manuscript Sources Archivi e biblioteche: dalla memoria del passato al web Cagliari November 25-26, 2009

Terminological note Electronic edition Computational (or digital) Philology

Basic tools for scholarly editing of digital documents Text indexing and concordances Image enhancement Texts and images: integrated open source environment for images and texts Annotations: collaborative scholarly editing 3D stemmatology (graphical representation of relations among witnesses) NLP tools: lemmatization, morphological analysis, creation of data banks of syntactic structures, sense extraction, identification of named entities,.

The Pinakes Text (PKT) Editing Criteria Linear transcription of a single source (bon manuscrit?) Positive apparatus where to record the variants of the collated sources Specific area of the apparatus where to store the readings selected or proposed by the critical editor Automatic Generation of the textus constitutus Automatic generation of the text of all the other reviewed and collated sources Computer-assisted assessment of the variants and manmachine user interface to hypothesize stemmata resulting from the apparatus data

Aims of PKT To browse digital libraries and view documents To edit documents (add, modify, delete) Edit texts Edit images To search documents Base search: by title, by author, by volume, etc... Advanced search: by text (wordforms, lemmas) and/or: by concepts (thanks to ontologies defined and tagged by each single user) contained in the document

Aims of PKT To enrich documents with meta-data (annotations) relevant for philological analisys given by a single user (or by a community of users who are studying the same textual or image archive) To enrich documents with linguistic analysis: given by the user (or by a community of users who are studying the same textual or image archive) given by a computational tool

Pinakes Text Document View

Pinakes Text Edit Document (actual view)

Pinakes Text Search documents by content Users can specify one or more words to be retrieved in the documents archive For each word it is possible to specify Type: whole word, fragmented word, lemma,... What to consider: case sensitive, stress sensitive,... Where to search: body, notes, titles, prose,... Terms can be combined to obtain a more complex search expression Boolean operators: and, or, not

Pinakes Text Search documents by content

Pinakes Text Search documents by content

Pinakes Text Documents Annotations (view)

Please, test some PKT searching functions on the following web site address http://pinakes.imss.fi.it:8080/pinakestext/home.jsf 16.000 images of the National Edition printed pages 16.000 text file transcriptions Access by wordforms and by lemmas (so far, only for the Il Saggiatore ). The complete lemmatisation will be available next spring.

Flexibility of PKT: some case studies and areas of application Greek papirology and classical philology Egyptology: demotic documents on ostraka Romance Philology Philology of ancient printed books Linguistic tools: morphological analyzer and lemmatization engine

Annotations and critical apparatus

Indexes

Textual criticism for medieval manuscripts Link to collated sources

Analysis of the variant reading in the collated source Selection of the variant eixens

Recording the variant in the apparatus Memoriz. of Eixens var. in critical apparatus

List of collated editions Variant search in different ancient printed editions of the same work

Image of the corresponding page in the selected edition

Future activities Scholarly editing manuscript of modern and contemporary authors (critique génétique) Exporting the edited text, variant apparatuses, annotations and indexes (e.g.index locorum, index verborum) for printed editions Linkage with NLP tools (e.g.: automatic lemmatizer for Latin, Italian, Greek, ) Classifing variants for user dependent hypothesis of stemmata

Pinakes Text and INTEREDITION PTK is a web-based platform with integrated modules for computer-assisted scholarly editing within the roadmap of electronic publishing INTEREDITION (Interoperable Supranational Infrastructure for Digital Editions) COST ACTION IS-0704 European Science Foundation

General description of PKT http://pinakes.imss.fi.it/index.php/pinakestext Partners: - CNR, Istituto di Linguistica Computazionale, Pisa Fondazione Rinascimento Digitale, Firenze; Istituto e Museo della Storia della Scienza, Firenze; Ministero per i Beni Culturali, Roma