Development of a Topographical Transcription Method. Introduction

Similar documents
Preservation Projects at Basel University

AHDS Digital Preservation Glossary

Twelve. Figure 12.1: 3D Curved MPR Viewer Window

KNOWLEDGE ORGANIZATION

CHAPTER 1 INTRODUCTION

Metadata in Microsoft Office and in PDF Documents Types, Export, Display and Removal

Transana 2.60 Distinguishing features and functions

Electronic Critical Edition of Ancient Digital Manuscript Sources

Perfect PDF 8 Premium

IRA 423/08. Designing the SRT control software: Notes to the UML schemes. Andrea Orlati 1 Simona Righini 2

DESIGNING AESTHETIC FREEFORM OBJECTS: A COURSE FOR INDUSTRIAL DESIGN ENGINEERING STUDENTS

An Introduction to TextGrid

Turning Emergency Plans into Executable

Visibility optimization for data visualization: A Survey of Issues and Techniques

Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA

PROMETHEUS - THE DISTRIBUTED DIGITAL IMAGE ARCHIVE FOR RESEARCH AND EDUCATION GOES INTERNATIONAL!

Introduction to Computer Graphics

2 AIMS: an Agent-based Intelligent Tool for Informational Support

The Key Elements of Digital Asset Management

Introduction to Information Visualization

Analyzing PDFs with Citavi 5

SUBJECT-SPECIFIC CRITERIA

Interactive Timeline Viewer (ItLv): A Tool to Visualize Variants Among Documents

(Digital) Text writing and reading as intercultural processes

Network Administrator s Guide and Getting Started with Autodesk Ecotect Analysis

A grant number provides unique identification for the grant.

Taking Subversion to a Higher Level. Branching/Merging Support. Component Management Support. And More

LDC Template Task Collection 2.0

Service Cloud for information retrieval from multiple origins

COMP Visualization. Lecture 11 Interacting with Visualizations

2. MOTIVATING SCENARIOS 1. INTRODUCTION

METADATA GENERATION FOR CULTURAL HERITAGE

Integrated Open-Source Geophysical Processing and Visualization

Reviewed by Ok s a n a Afitska, University of Bristol

It has been contended that it would be possible for a socialist economy to solve

How To Create A Charter Corpus On The Web (For Historians)

Adobe Dreamweaver Exam Objectives

Laserfiche. and SharePoint Integration. Your potential, realized. Learn More Inside. Include imaged documents in collaborative processes.

Modeling Guidelines Manual

FEAWEB ASP Issue: 1.0 Stakeholder Needs Issue Date: 03/29/ /07/ Initial Description Marco Bittencourt

Leading Adobe Connect meetings

Cartographic and Geospatial Materials

HIGH AND LOW RESOLUTION TEXTURED MODELS OF COMPLEX ARCHITECTURAL SURFACES

Guide To Creating Academic Posters Using Microsoft PowerPoint 2010

AATB/ICCBBA Interim Guidance Document. For use of ISBT 128 by North American Tissue Banks

Application Architectures

WikiPrints Rendering Enterprise Wiki Content for Printing

COCOVILA Compiler-Compiler for Visual Languages

Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley

Organization of VizieR's Catalogs Archival

A Framework for Software Product Line Engineering

aloe-project.de White Paper ALOE White Paper - Martin Memmel

Introduction to Service Oriented Architectures (SOA)

HYPER MEDIA MESSAGING

Elite: A New Component-Based Software Development Model

Table of Contents Author s Preface... 3 Table of Contents... 5 Introduction... 6 Step 1: Define Activities... 7 Identify deliverables and decompose

Automation of metadata processing

Evaluating OO-CASE tools: OO research meets practice

Personal Digital Library: collections and virtual documents

Avigilon View User Guide

Importing Terrain and Imagery into STK

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

SOLUTION BRIEF CA ERwin Modeling. How can I understand, manage and govern complex data assets and improve business agility?

DoQuP project. WP.1 - Definition and implementation of an on-line documentation system for quality assurance of study programmes in partner countries

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Requirements Management with Enterprise Architect

Database Systems. Multimedia Database Management System. Application. User. Application. Chapter 2: Basics

How to use PGS: Basic Services Provision Map App

CiRM to NPM Change

Transcription Format

M3039 MPEG 97/ January 1998

The Role of SPOT Satellite Images in Mapping Air Pollution Caused by Cement Factories

Configuration & Build Management

Winery A Modeling Tool for TOSCA-based Cloud Applications

Information Brokering over the Information Highway: An Internet-Based Database Navigation System

VisuSniff: A Tool For The Visualization Of Network Traffic

Service Engineering, Business Process Management and Design

CoLang 2014 Data Management and Archiving Course. Session 2. Nick Thieberger University of Melbourne

Microsoft Visual Studio Integration Guide

Technical Report. The KNIME Text Processing Feature:

VISUALIZATION. Improving the Computer Forensic Analysis Process through

Flattening Enterprise Knowledge

Aspects of Lyee Configuration Management

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories

Content Management in Web Based Education

One Report, Many Languages: Using SAS Visual Analytics to Localize Your Reports

User Guide. Analytics Desktop Document Number:

Does it fit? KOS evaluation using the ICE-Map Visualization.

Change in Emphasis: Recasting Resource Investments and the Rise of Special Collections

Cross-Company System Simulation using the GUSMA-Standard for Co-Simulation

THE HYPERIMAGE VRE INTEGRATION

Collaborative Editing for All: the Google Docs Example

The Unicode Standard Version 8.0 Core Specification

Realizing business flexibility through integrated SOA policy management.

Windchill Service Information Manager Curriculum Guide

Bob Kibbee, Map & GIS Librarian, Olin Library, rk14@cornell.edu

4.3: Multimedia Database Systems Multimedia Database Management System Data Structure Operations on Data Integration in a Database Model

Perfect PDF & Print 9

A Java Tool for Creating ISO/FGDC Geographic Metadata

Transcription:

Development of a Topographical Transcription Method Introduction In the past years, digitization was about transforming analog documents into a digital representation. Problems in respect of color management, lightness scale, resolution and geometrical distortion had to be and have been solved. Today, the necessary methods can be considered to be well elaborated and satisfactory for the creation of digital images of analog documents. Thus, numerous digitization initiatives led to the formation of web-portals making available digital facsimiles, corresponding metadata and tools to search and browse. However, only few tools are available to uncover the full potential of these digital facsimiles with respect to their use in humanities research. Noticing this deficit, we are developing SALSAH, a Virtual Research Environment (VRE) for the humanities. SALSAH (System for Annotation and Linkage of Sources in Arts and Humanities) is a collaborative research platform allowing for the visualization, the annotation and the linkage of digital resources in the humanities. The application is completely webbased and renders possible the usage of digital resources in humanities research directly on the web. This way, the research data such as annotations, linkages etc. emerges in a borndigital form. When thinking about digital resources such as digital facsimiles and methods to make use of them in the humanities, it is also necessary to think of methods for transcription and text constitution in the digital medium. Furthermore, we have to consider the digital facsimile to be a representation of an analog document offering text but possibly also pictorial information such as illustrations etc. Putting the focus on pictorial aspects, we can also conceive text as being pictorial in the first place. For this reason, we are developing a topographical transcription method for digital facsimiles within SALSAH. This article first briefly describes the general purpose, the functionality and the data model of SALSAH. It then presents general thoughts about the benefit of digital facsimiles and describes the topographical transcription method currently being developed as an extension of SALSAH s core functionality. SALSAH SALSAH has been developed at the Imaging & Media Lab of the University of Basel since summer 2009. It has originated in an art historical context and comprises further humanities disciplines today. Besides the Narrenschiff -project 1, SALSAH is used by two edition projects: the Anton Webern Gesamtausgabe 2 and the Kritische Robert Walser-Ausgabe 3. SALSAH is designed as a general VRE for the humanities (Schweizer 2011: 147ff.). Currently, SALSAH offers methods to work with digital facsimiles (the support of audio and moving image is already planned). SALSAH offers the functionality to: - visualize various digital resources simultaneously - annotate digital resources and to share these annotations collaboratively 1 Together with Prof. Barbara Schellewald, Institute of Art History, University of Basel 2 Institute of Musicology, University of Basel 3 Institute of German Philology, University of Basel

- create links between digital resources and to annotate them - create Regions of Interest (ROI) within digital resources and to annotate and link them - access external repositories of digital resources and to apply SALSAH s functionality to them By the use of SALSAH s annotation and linkage functionality, the research data emerges in the digital medium and is directly connected to the digital resources it refers to. Figure 1 shows an example out of the art historical Narrenschiff -project. The elliptical shapes represent digital objects (here: books and pages) while annotations are indicated by rectangles. The arrows show how the digital objects and annotations are related to each other. The book is characterized by two annotations: title and date of publication. A book is a compound object which means that it consists of other objects: single pages. Each page belonging to the Narrenschiff thus refers to the digital object representing the book. Like the book itself, each page can be annotated 4. Because pages have a certain order, they are annotated with a pagination. Furthermore, each page can be described by composing a page description. By creating links between digital objects, relations between them can be expressed. Each link is again treated as a digital object that can be annotated (here with a description). In this way, the link s semantics can by expressed. By annotating and linking digital objects, the research knowledge emerges as a network-like structure which can be browsed and extended by other Title: Das Narrenschiff Date of Publication: 3rd March 1495 Book Page Page Page Pagination: 1 verso Link Page Description: This page shows a fool Description: Interesting illustrations of a fool Figure 1 Structure of the Research Data within SALSAH researchers. By working on the same digital corpus, humanities researchers can collaborate with each other either within a working group or even in an interdisciplinary setting. The annotations and even the digital objects available can be defined specifically for each project. Each project within SALSAH can define the semantics of its digital objects and which annotations they may have. Digital objects may have a digital representation (that is digital data representing some physical aspects of the analog object such as the digital data of 4 In terms of the data model, annotations and metadata are not distinguished: metadata are also annotations. But in the Graphical User Interface (GUI), metadata will be presented seperately from the annotations since they are quite definite while annotations can be regarded as more subjective and thus open to discussion.

a digital facsimile represents the local reflectance of a page of text), but they can also be abstract constructs (e.g. a person which may characterized by name and birthdate, but there is no further digital data representing the physical aspects of a person). For digital facsimiles, we have recently developed a method to define Regions of Interest. These regions are geometrically described areas on the digital facsimile and can be annotated and linked like other digital objects. This functionality renders possible the direct referencing of parts of pictorial resources. Figure 2 Creation of a Region of Interest Figure 2 shows the creation of a Region of Interest consisting of two polygon shapes. The region can be annotated with a comment. Art historians could describe specifically defined areas of pages of the Narrenschiff by using this functionality. Each region consists of one or more geometrical shapes and annotations which can be configured according to the research project s needs. All of this functionality can also be applied to remote resources not stored in SALSAH s local database. We have already implemented a connection to the assets of the e-codices-project5. Due to SALSAH s flexible data model, the facsimiles of e-codices can be annotated as if they were locally stored in SALSAH. But in fact, only the annotations created in SALSAH are stored locally, the remote facsimiles are referenced in the SALSAH database. SALSAH is thus designed as a shared system: remote resources can be accessed and annotated within the SALSAH environment. On the other hand, all the annotations and links stored in SALSAH could be made available to the outside by implementing an interface accessible via a web service. We are currently working on an interface to export SALSAH s data. This method would also allow for online connections. For example, the e-codices website could then indicate if there are annotations created by SALSAH for certain manuscripts. 5 The project can be accessed here: http://www.e-codices.ch. It currently (last access: 23rd November 2011) encompasses 833 manuscripts from 34 different libraries.

Transcribing Digital Facsimiles Having digital images of analog sources available and digital tools to address them, we are able to conceive a method to transcribe digital facsimiles and subsequently to constitute texts. First, a brief outline about the importance of facsimiles shall be given. Then a method will be described which allows for the creation of transcriptions directly in the SALSAH environment. Importance of Facsimiles The digitization of analog documents makes them available as digital images respectively digital facsimiles. These digital images represent the analog documents with reference to their visual appearance. This allows for the examination of illustrations and all other kinds of pictorial elements contained in these documents. Unlike text-based representations, digital images represent the original material in a non-abstract way 6 not presuming the separation of textual information from the document itself by identifying textual characters. In the (digital) facsimile, the surface of the document and the textual information are still one entity (Gabler 2007: 198). Taking the example of the Burgunderchronik of the XV century scribe Diebold Schilling from Bern, the most easily accessible edition is the purely text-based edition of Gustav Tobler (Tobler 1897 and Tobler 1901) presenting the manuscript kept in Zurich (known as the Zürcher Schilling) in the edition text and the official chronicle from Bern (known as the Berner Schilling) as a variant in the critical apparatus. The illuminations of both manuscripts are only briefly described in a register in the appendix of the edition. The assumptions of the editor seem to have been that the text of the Zürcher Schilling is more authentic because it is thought of as a more original version while the text of the official Berner Schilling is conceived as a censored copy (Tobler 1901: 347). So far, the editor s interest is not orientated towards the documents themselves (their reception etc.) but to find somehow the best text available. As a consequence, both manuscripts are presented in one edition (implying that they are manifestations of the same text) but not without building a hierarchy between them (the text of one manuscript is presented in the edition text, the other manuscript s text is presented in the critical apparatus). Having a look at the printed facsimile editions existing for both manuscripts (only available in few libraries and archives), the overall impression of the two manuscripts is very different. While it can be said that they offer a very similar text where they converge 7, the illuminations offered by the two manuscripts are of very different kind and thus constitute very different relations between texts and pictorial elements significantly influencing the perception of the manuscripts. Besides having a look at the original documents, only facsimile editions reveal these aspects. But since these print editions are high priced and not widespread, their benefit is limited 8. Looking at the younger history of editing in German philology, we can see a paradigm change towards an edition technique consequently integrating facsimiles in the seventies. The Frankfurter Hölderlin-Ausgabe (FHA) realized by Dietrich E. Sattler (Sattler 1975-2008) applied a novel way of editing. Instead of presenting the constituted text as the edition text accompanied by a critical apparatus containing its variants, this edition made visible the entire analytical process beginning with the facsimile and ending with a constituted text (Martens 6 Of course, also the making of digital images can be conceived as an abstraction from the original implying decisions about perspective, resolution, color adjustment etc. 7 The manuscripts are of different temporal extent. 8 In fact, e-codices has already digitized the Berner Schilling. Once it is made available on their website, the accessibility of this manuscript will be unproblematic.

1982: 52ff.). The edited text representing the final state in the constitution process can thus be conceived as the result of an analytical process openly presented to the reader via the consequent integration of facsimile, their diplomatic transcription and a phase analysis (Martens 1982:53f.). The integration of the facsimile in the edition ensures the transparency of the analytical process the edition has undertaken. The reference to the facsimile also emphasizes the status of the document the transcription and process of text constitution are based on (Gabler 2007: 199). Topographical Transcription Method The transcription method being developed in SALSAH is orientated topographically. The transcription process begins by defining visually coherent areas on the digital facsimile (using SALSAH s functionality to create geometrical figures on the facsimile) to be encoded into textual characters line by line. Manuscripts possibly don t offer one overall text area but several distinct areas of textual information (text blocks, annotations, notes, marginalia, glosses etc.). Addressing them topographically renders possible their individual transcription. By transcribing these areas line by line, the correspondency between the encoded text and the facsimile is sustained. Figure 3 Diplomatic Transcription of a Page of the Narrenschiff in SALSAH Figure 39 shows SALSAH s transcription tool still being in an early state of development. On the left hand side, the facsimile is displayed. The regions defined on the facsimile are shown accordingly on the right side as rectangle shapes. Each of these rectangle shapes offers an editable area where the transcription of the corresponding part on the facsimile can be entered. While the facsimile can be feely zoomed and panned, the transcription area on the right side always shows the whole page because it is thought of as a typification of the textual information given by the facsimile. While the facsimile is conceived as an image (even though offering textual information), the transcription area on the right side requires the 9 This is an example out of the Narrenschiff which often combines textual and pictorial information. The transcription method will be used soon in the Weber-project to transcribe supplement material like letters, notes etc.

encoding of the transcription as textual characters. By doing this area per area, the sequential relation between the areas can be left open in the first place. The characters within the areas have to be entered in a linear order, but such an order is not presupposed between the single areas themselves. To the transcription text of each area properties can be assigned. Similar to a word processor, the user is able to make a text selection and to choose a property (like bold, underline etc.). Because SALSAH is designed as a generic and general system for the humanities, the available properties can be defined specifically for each project. These properties represent visual attributes of the transcribed text. Furthermore, structural relations can be defined either within a single transcription area or in between several such areas. In the current state of development, we are thinking of the two basic operations insertion and deletion which could then be combined to a substitution or a transposition. In practice, it would be possible to express textual dynamics by defining structural relations resulting in alternative sequences of textual characters. For example, we could think of the overwriting of characters by others. We would then have an initial set of characters which then would have been substituted by others. Or we could think of additional text which could be considered as an insertion. The transcription of a facsimile as described before possibly offers more than one linear text. The sequential combination of transcription areas and the definition of structural relations 10 (deletions, insertions, substitutions, transpositions) allow for the building of multiple readings. A reading is thought of as an unambiguous sequence of characters representing a certain interpretation of the facsimile. Each reading may be annotated with a comment etc. by other researchers and various readings could be interrelated to each other in order to express their semantic difference. For example, several readings could represent different states in the genesis of a text. These different states are based on the analysis of corrections (insertions, deletions, substitutions, transpositions) present in the digital facsimile. Each reading built by using this transcription method is transparent because is can be backtracked to the diplomatic transcription which is directly related to the facsimile. The described method is not a special tool offered by SALSAH but an integral application of its annotation and linkage possibilities. That way, the constitution of texts representing the content of documents can be seen as a task not fundamentally different from the constitution of research knowledge among sources within SALSAH as a VRE. As any other form of knowledge within SALSAH, the process of transcribing and the constitution of readings can be reconstructed in their generation as well as criticized by annotation. Bibliography GABLER Hans Walter (2007), The Primacy of the Document in Editing, Ecdotica, vol. 4, pp. 197-207. MARTENS Gunter (1982), Texte ohne Varianten? Überlegungen zur Bedeutung der Frankfurter Hölderlin-Ausgabe in der gegenwärtigen Situation der Editionsphilologie, Zeitschrift für deutsche Philologie, vol. 101, Sonderheft: Probleme neugermanistischer Edition, pp. 43-64. 10 Already a simple insertion offers two alternative sequences: a reading without it and another including it.

SATTLER Dietrich E. (ed.) (1975-2008), Friedrich Hölderlin. Sämtliche Werke. Frankfurter Ausgabe, Frankfurt am Main, 20 vols plus supplements. SCHWEIZER Tobias, ROSENTHALER Lukas (2011), SALSAH eine virtuelle Forschungsumgebung für die Geisteswissenschaften, in Konferenzband. EVA 2011 Berlin. Elektronische Medien & Kunst, Kultur, Historie, Berlin, pp. 147-153. TOBLER Gustav (ed.) (1897-1901), Die Berner-Chronik des Diebold Schilling 1468-1484, 2 vols, Bern.