Institute for Computational Linguistics Pisa Andrea Bozzi Electronic Critical Edition of Ancient Digital Manuscript Sources Archivi e biblioteche: dalla memoria del passato al web Cagliari November 25-26, 2009
Terminological note Electronic edition Computational (or digital) Philology
Basic tools for scholarly editing of digital documents Text indexing and concordances Image enhancement Texts and images: integrated open source environment for images and texts Annotations: collaborative scholarly editing 3D stemmatology (graphical representation of relations among witnesses) NLP tools: lemmatization, morphological analysis, creation of data banks of syntactic structures, sense extraction, identification of named entities,.
The Pinakes Text (PKT) Editing Criteria Linear transcription of a single source (bon manuscrit?) Positive apparatus where to record the variants of the collated sources Specific area of the apparatus where to store the readings selected or proposed by the critical editor Automatic Generation of the textus constitutus Automatic generation of the text of all the other reviewed and collated sources Computer-assisted assessment of the variants and manmachine user interface to hypothesize stemmata resulting from the apparatus data
Aims of PKT To browse digital libraries and view documents To edit documents (add, modify, delete) Edit texts Edit images To search documents Base search: by title, by author, by volume, etc... Advanced search: by text (wordforms, lemmas) and/or: by concepts (thanks to ontologies defined and tagged by each single user) contained in the document
Aims of PKT To enrich documents with meta-data (annotations) relevant for philological analisys given by a single user (or by a community of users who are studying the same textual or image archive) To enrich documents with linguistic analysis: given by the user (or by a community of users who are studying the same textual or image archive) given by a computational tool
Pinakes Text Document View
Pinakes Text Edit Document (actual view)
Pinakes Text Search documents by content Users can specify one or more words to be retrieved in the documents archive For each word it is possible to specify Type: whole word, fragmented word, lemma,... What to consider: case sensitive, stress sensitive,... Where to search: body, notes, titles, prose,... Terms can be combined to obtain a more complex search expression Boolean operators: and, or, not
Pinakes Text Search documents by content
Pinakes Text Search documents by content
Pinakes Text Documents Annotations (view)
Please, test some PKT searching functions on the following web site address http://pinakes.imss.fi.it:8080/pinakestext/home.jsf 16.000 images of the National Edition printed pages 16.000 text file transcriptions Access by wordforms and by lemmas (so far, only for the Il Saggiatore ). The complete lemmatisation will be available next spring.
Flexibility of PKT: some case studies and areas of application Greek papirology and classical philology Egyptology: demotic documents on ostraka Romance Philology Philology of ancient printed books Linguistic tools: morphological analyzer and lemmatization engine
Annotations and critical apparatus
Indexes
Textual criticism for medieval manuscripts Link to collated sources
Analysis of the variant reading in the collated source Selection of the variant eixens
Recording the variant in the apparatus Memoriz. of Eixens var. in critical apparatus
List of collated editions Variant search in different ancient printed editions of the same work
Image of the corresponding page in the selected edition
Future activities Scholarly editing manuscript of modern and contemporary authors (critique génétique) Exporting the edited text, variant apparatuses, annotations and indexes (e.g.index locorum, index verborum) for printed editions Linkage with NLP tools (e.g.: automatic lemmatizer for Latin, Italian, Greek, ) Classifing variants for user dependent hypothesis of stemmata
Pinakes Text and INTEREDITION PTK is a web-based platform with integrated modules for computer-assisted scholarly editing within the roadmap of electronic publishing INTEREDITION (Interoperable Supranational Infrastructure for Digital Editions) COST ACTION IS-0704 European Science Foundation
General description of PKT http://pinakes.imss.fi.it/index.php/pinakestext Partners: - CNR, Istituto di Linguistica Computazionale, Pisa Fondazione Rinascimento Digitale, Firenze; Istituto e Museo della Storia della Scienza, Firenze; Ministero per i Beni Culturali, Roma