Electronic Critical Edition of Ancient Digital Manuscript Sources

Similar documents

Attachment(s) - Curriculum Vitae - Photo. Davide Merlitti. Nationality: Italy. 1 / 7

EAD and EAC in Italy and the Italian archival descriptive systems on-line

Satellite Meeting "Conservation and preservation of library material in a cultural-heritage oriented context" 31 August - 1 September 2009 Rome, Italy

e INTESA: L'uso di sistemi italiani di telemedicina e loro Integrazione nel Sistema Sanitario Nazionale" L. Guerriero e R. Bedini

Senso Comune. a Community Knowledge Base for the Italian Language. Creative Commons Attribution-Share Alike 2.5 Italy License

ELECTRONIC EDITIONS WHICH WE HAVE MADE AND WHICH WE WANT TO MAKE

AN OPEN KNOWLEDGE BASE FOR ITALIAN LANGUAGE IN A COLLABORATIVE PERSPECTIVE

Studioddm snc via Malpighi, Milano - Italy. t f studioddm@studioddm.com (.

MULTIFUNCTIONAL DICTIONARIES

Special Topics in Computer Science

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Service Road Map for ANDS Core Infrastructure and Applications Programs

Andrea Pedeferri. Curriculum Vitae

LOD2014 Linked Open Data: where are we? 20 th - 21 st Feb Archivio Centrale dello Stato. SBN in Linked Open Data

How To Create A Charter Corpus On The Web (For Historians)

Selecting a Taxonomy Management Tool. Wendi Pohs InfoClear Consulting #SLATaxo

Research Network and Database System (FuD)

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Image quality issues in digitization projects of historical documents

Research Guide to Italian Literature

SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks

EAC-CPF Ontology and Linked Archival Data

CACAO PROJECT AT THE LOGCLEF TRACK

Integration of Protein-protein Interaction Data in a Genomic and proteomic Data Warehouse

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Bachelor s Degree in English Studies

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

The Manuscript as Cultural Heritage: Digitisation ++

Using NLP and Ontologies for Notary Document Management Systems

Time: A Coordinate for Web Site Modelling

School Library Standards. for California Public Schools, Grades Nine through Twelve

Interactive information visualization in a conference location

Integrated Library Systems (ILS) Glossary

OAISistema verso un portale OAI per gli studi sul Mediterraneo Antico

TRANSKRIBUS. Research Infrastructure for the Transcription and Recognition of Historical Documents

An Introduction to TextGrid

Visualizing Poetry: Creating Tools for Critical Analysis. Introduction Current debates over distant reading (Moretti) seem to imply that digital tools

1 st CIRCULAR. ICE XI International Congress of Egyptologists XI. Florence (Italy) August 23 rd 30 th 2015

Information for the Semantic Web. Procedures for Data Integration through h CIDOC CRM Mapping

CASTELLO DI POSTIGNANO REAL ESTATE PROPOSAL

Mechanics of Materials and Structures Laboratory

Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1

PoS-tagging Italian texts with CORISTagger

Giovanni LA TORRE. PDM PLM PLM Business Consultant. 34 years old 10 years of experience

Overview of admission requirements for the master s degree programs of the Faculty of Arts

Flattening Enterprise Knowledge

Courses in Arabic, Mandarin Chinese, Catalan, Irish Gaelic, and American Sign Language are offered.

ENTERPRISE DOCUMENTS & RECORD MANAGEMENT

IBM Content Analytics with Enterprise Search, Version 3.0

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

@ Biblioteca Salaborsa, Auditorium Enzo Biagi, Piazza del Nettuno, 3, Bologna. tbc Ministero dei Beni e delle Attività Culturali e del Turismo

Processing: current projects and research at the IXA Group

Community Edition. Master Data Management 3.X. Administrator Guide

Weblogs Content Classification Tools: performance evaluation

Study Plan. Bachelor s in. Faculty of Foreign Languages University of Jordan

Business Intelligence for The Internet of Things

Multilingual and Localization Support for Ontologies

The Bibliography of the Italian Parliament: Building a Digital Parliamentary Research Library. What is the BPR?

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

Taxonomies for Auto-Tagging Unstructured Content. Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013

The Dictionary of the Common Modern Greek Language is being compiled 1 under

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Getting Off to a Good Start: Best Practices for Terminology

WebLicht: Web-based LRT services for German

Zukunft. Seit Study Programs. at the University of Heidelberg. International Relations Office

FTA Technology 2009 IT Modernization and Business Rules Extraction

Information and documentation The Dublin Core metadata element set

Transcription:

Institute for Computational Linguistics Pisa Andrea Bozzi Electronic Critical Edition of Ancient Digital Manuscript Sources Archivi e biblioteche: dalla memoria del passato al web Cagliari November 25-26, 2009

Terminological note Electronic edition Computational (or digital) Philology

Basic tools for scholarly editing of digital documents Text indexing and concordances Image enhancement Texts and images: integrated open source environment for images and texts Annotations: collaborative scholarly editing 3D stemmatology (graphical representation of relations among witnesses) NLP tools: lemmatization, morphological analysis, creation of data banks of syntactic structures, sense extraction, identification of named entities,.

The Pinakes Text (PKT) Editing Criteria Linear transcription of a single source (bon manuscrit?) Positive apparatus where to record the variants of the collated sources Specific area of the apparatus where to store the readings selected or proposed by the critical editor Automatic Generation of the textus constitutus Automatic generation of the text of all the other reviewed and collated sources Computer-assisted assessment of the variants and manmachine user interface to hypothesize stemmata resulting from the apparatus data

Aims of PKT To browse digital libraries and view documents To edit documents (add, modify, delete) Edit texts Edit images To search documents Base search: by title, by author, by volume, etc... Advanced search: by text (wordforms, lemmas) and/or: by concepts (thanks to ontologies defined and tagged by each single user) contained in the document

Aims of PKT To enrich documents with meta-data (annotations) relevant for philological analisys given by a single user (or by a community of users who are studying the same textual or image archive) To enrich documents with linguistic analysis: given by the user (or by a community of users who are studying the same textual or image archive) given by a computational tool

Pinakes Text Document View

Pinakes Text Edit Document (actual view)

Pinakes Text Search documents by content Users can specify one or more words to be retrieved in the documents archive For each word it is possible to specify Type: whole word, fragmented word, lemma,... What to consider: case sensitive, stress sensitive,... Where to search: body, notes, titles, prose,... Terms can be combined to obtain a more complex search expression Boolean operators: and, or, not

Pinakes Text Search documents by content

Pinakes Text Search documents by content

Pinakes Text Documents Annotations (view)

Please, test some PKT searching functions on the following web site address http://pinakes.imss.fi.it:8080/pinakestext/home.jsf 16.000 images of the National Edition printed pages 16.000 text file transcriptions Access by wordforms and by lemmas (so far, only for the Il Saggiatore ). The complete lemmatisation will be available next spring.

Flexibility of PKT: some case studies and areas of application Greek papirology and classical philology Egyptology: demotic documents on ostraka Romance Philology Philology of ancient printed books Linguistic tools: morphological analyzer and lemmatization engine

Annotations and critical apparatus

Indexes

Textual criticism for medieval manuscripts Link to collated sources

Analysis of the variant reading in the collated source Selection of the variant eixens

Recording the variant in the apparatus Memoriz. of Eixens var. in critical apparatus

List of collated editions Variant search in different ancient printed editions of the same work

Image of the corresponding page in the selected edition

Future activities Scholarly editing manuscript of modern and contemporary authors (critique génétique) Exporting the edited text, variant apparatuses, annotations and indexes (e.g.index locorum, index verborum) for printed editions Linkage with NLP tools (e.g.: automatic lemmatizer for Latin, Italian, Greek, ) Classifing variants for user dependent hypothesis of stemmata

Pinakes Text and INTEREDITION PTK is a web-based platform with integrated modules for computer-assisted scholarly editing within the roadmap of electronic publishing INTEREDITION (Interoperable Supranational Infrastructure for Digital Editions) COST ACTION IS-0704 European Science Foundation

General description of PKT http://pinakes.imss.fi.it/index.php/pinakestext Partners: - CNR, Istituto di Linguistica Computazionale, Pisa Fondazione Rinascimento Digitale, Firenze; Istituto e Museo della Storia della Scienza, Firenze; Ministero per i Beni Culturali, Roma