Mass Digitization of Manuscripts and Rare Books: Challenges and Experiences at Bavarian State Library Dr. Markus Brantl
1. The Bavarian State Library (BSB) 1. Institute for Book and Manuscript ConServation (IBR) 2. Munich DigitiZation Center (MDZ) Agenda 2. Mass digitization and BSB s digitization strategy 3. The digitization process of manuscripts and rare prints 4. Project examples: robot scanners and hand-operated bookscanners in 16th century digitization projects 5. Throughput
Founded in 1558 The Bavarian State Library (1) European Universal Library and International Research Library of world renown Central Regional and Archival Library of Bavaria (Legal Deposit since 1668) 713 employees, annual budget: 48,2 Mio. 9.5 million volumes, 55,000 current periodicals Acquisition per year: 140,000 volumes Open daily from 8.00 a.m. to 12 p.m. (112 hours per week) Visits to General Reading Hall: 1.1 million (2009) Loans: 1.9 million (2009), Document Delivery: 400,000 (2009)
92,000 medieval manuscripts (No. 4 worldwide) The Bavarian State Library (2) 20,000 incunabula (No. 1 worldwide) 140,000 16th century rare books (No. 1 in Germany)
Institute of Book and Manuscript ConseRvation (IBR) Founded 1963 Staff 16 Focus on preventive conservation Training of conservators: Bachelorand Master-Programs
The Munich DigitiZation Center (MDZ) National competence center for digitization technology and workflows More than 100 projects since 1997 Mass digitization with state-of-the-art technology (Scan-Robotic)for 16th century books Long-term-preservation in cooperation with the Leibniz-Supercomputing-Centre (LRZ) Staff with Scanning Center 45, mostly third-party funded
MDZ-Homepage Collection of Mss. graec. available
1. The Bavarian State Library (BSB) 1. Institute for Book and Manuscript ConServation (IBR) 2. Munich DigitiZation Center (MDZ) Agenda 2. Mass digitization and BSB s digitization strategy 3. The digitization process of manuscripts and rare prints 4. Project examples: Robot scanners and hand-operated bookscanners in 16th century digitization projects 5. Throughput
Mass digitization Production of more than a million pages? volumes? Definition today production of more than a million pages within a limited time with different stages of indexing (barely/deeply)
BSB s strategy for mass digitization Objective: to digitise and make accessibly free of charge all (copyright-free) BSB library holdings ~ 1.2 million objects In the Internet
How? Third-party funds Materials from 6th-16th century Manuscripts, incunabula, special collections Public-Private Partnership with 17th-19th century Third-party funds 20th-21st century
Current projects EU-Project Europeana Regia : Collaborative initiative between European libraries for the digitization of royal manuscripts in Carolingian and Renaissance Europe; funded by the EU BSB participates with 116 delicate manuscripts, ca. 42,000 pages Term: 2010 2012 DFG-funded-Project Digitization of the BSB Incunables Ca. 9.000 titles, 1.8 million pages Term: 2008 2011 DFG-funded-Project VD16 : BSB Books printed in the 16 th century 37.000 titles; 7.5 million pages Term: 2007 2013 Operated with ScanRobots In comparison BSB s public-private-partnership with Google more than 1 million books (in less than 10 years) more than 250 millions pages Contract signed in February 2007
Agenda 1. The Bavarian State Library (BSB) 1. Institute for Book and Manuscript ConServation (IBR) 2. Munich DigitiZation Center (MDZ) 2. Mass digitization and BSB s digitization strategy 3. The digitization process of manuscripts and rare prints 4. Project Examples: robot scanners and hand-operated book- scanners in 16th century digitization projects 5. Throughput
The digitization process of manuscripts and rare prints 1. Preparation 2. Image Capture 3. Workflow and Indexing 4. Storage and digital long-term preservation 5. Access
Preparation - Cooperation between the IBR and MDZ Conservational checks at the shelves Transport to the inhouse Scanning Center Selection of scanners Training of scan staff in gentle handling of rare books Provision of tools Assistence by conservators in scanning of sensitive and high value books
Handling of the originals Manuscripts Incunabula Rare books Maps Special materials on different writing materials and with various sizes 19
Handling of delicate books: digitization of the Fugger Ehrenbuch Two conservators and one scan operator
Handling: materiality.. Inflexible paper Paper distorsions Books spine difference between papermaking and printing process by printing process, or tight-stapling Size and thickness stapling, gutter, back gluing
Handling: the opening angle 120 Opening angle Spine complex stretched but tolerable Same book: 180 opening angle Spine complex with endbands, sewing, spine lining extremly overstretched, covering leather detached Not allowed at BSB!
Handling: no glassplate 70 % of manuscripts and rare books can not be opened at 180 = reduction of throughput : No plane pressure No direct contact between the glass and the orginal
##Bilder Lighting: as short as possible cold fluorescent lamp, synchronized with CCD-line Flash with Pyrex dome LED No continous lighting during reproduction - exposure damage is cumulative
BSB conservational requirements for manuscripts and rare books reproduction The scanning devices have to follow the book requirements = a good opening angle for the book Short exposure time Different bookcradles
MDZ-Scan facilities for manuscripts and rare books
Standard-bookscanner with 180 bookcradle Reproduction without glassplate
Working without glassplate you need assistance The Munich Digit invented by our conservators
Angle bracket from 90 up to 140 - with holder
Traverse support with 110 aperture angle
Foam wegde covered with acid-free carton
Special cradle Grazer Camera Table
or the mobile version: Grazer Traveller
The ScanRobot cradle: Very flessible from 60 up. Stepless adaptable for the books requirements Self-centering cradle (books position in relation to the scanning head)
Image Production Parameters: Manuscripts, Rare Books and Special Collections Color depth: 24 Bit Resolution: 400 up to 600 ppi optical - always in relation to the original documents size Digital master-file: TIFF uncompressed Media neutral with attached ICC-Profile (Color Management) of the scanning device Authentic, i.e. visible border around the page color, grayscale and size target Image storage size between 20 Megabyte up to 800 Megabyte per image
Scanning Output: Examples
Workflow and indexing basic conditions ZEND= Zentrale Erfassungs- und NachweisDatenbank MySQL-Database, Apache Cocoon and Solr Mapping of the entire production processes in a modular system Different service providers (scanning, text capture) can supply unlimited data to ZEND Workflow-control Every item of the BSB, which will be digitised, follows only the ZEND-workflow Time and cost reduction through automation of standard-processes
The Workflow with ZEND at a glance
ZEND-modules
Indexing: basic tasks All Metadata in one XML-Framework: TEI P5 Administrative Metadata Job management Technical Metadata Image information (ICC Profiles, Formats) Bibliographical MD Data import from Catalog via Z.39.50 Allocation of an URN Catalog Structural MD Table of contents or fulltext Backbone of the production line: the unique, persistent Identifier ID & standardized image names bsb00001119_00001.tif Assignment of an URN (National Bibliography Number) Example : urn:nbn:de:bvb:12- bsb00001119
Example: TEI P5 XML-Data with OCR-text
Indexing: Example from XML to HTML Hit-highlighting in the image
Data storage and Long-Term Preservation: status and forecast 1200 Terabyte 800 500 300 190 2 10 25 50 100 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Leibnizsupercomputing Center Partner of the Bavarian State Library in storage hosting and digital long-term preservation
LRZ backup- & archive system Total capacity 300,000 Tapes 146 PetaByte
Access different ways Internet viewers DFG-Viewer MDZ-Viewer PDF, download, printing 3D NEW: ipad, iphone Gesture based 3D-presentation system for exhibitions Print Document Delivery of high resolution images
Direct access to the digital object via catalog, Google, WorldCat
and PDF-Download: the entire book on your PC 1,000 PDF-downloads per day
3D-Viewer Selected treasures of BSB see http://www.bayerische-landesbibliothek-online.de/3d (requires Java)
3D-Viewer
ipad/iphone-application: Famous books of th Bavarian State Library
BSB-Explorer gesture-based 3D-exhibition system http://www.youtube.com/watch?v=qmmmmvnnxli
Agenda 1. The Bavarian State Library (BSB) 1. Institute for Book and Manuscript ConServation (IBR) 2. Munich DigitiZation Center (MDZ) 2. Mass Digitization and BSB s Digitization Strategy 3. The digitization process of manuscripts and rare prints 4. Project examples: Robot scanners and hand-operated book- scanners in 16th century digitization projects 5. Throughput
Digitization of 16th century rare books in the VD16-1 and VD16-2 Goal: Online Publication of all 16th century books which are unique in the inventory of BSB
Project: VD16-1 Unique books printed between 1500-1517 Project duration: 2006-2008 Production: 4,700 books in 24 months with 3 hand-operated scanners Reduced opening angle solved by book cradle support: only one-sided scanning was possible 1 Scan / click = 1 Image 3 process steps 1. Scanning all left pages 2. Scanning all right pages 3. Assembling left and right pages Problem: No pagination in 16th century books! Higher error rate Reduction of throughput
The way to the ScanRobot Objective: optimization of throughput for the scanning of early and rare prints with limited opening angle 2006 - Market evalution for automatic book scanners (hardware and software) and tender Since 2007 development partnership with Treventus for 16th century books scanning (ongoing)
The use of the ScanRobot in the VD16-2 Project Follow-up project 1518-1600 Projects: 2008-2009 and 2010-2012 Production: 37,000 books 7.5 million pages Book cradle 60 -opening angle, continously adjustable Up and down-movement of the scan-unit continously adjustable Pages taken slightly by volume flow (no sucking!) 1 Scan = 2 Images
1. The Bavarian State Library (BSB) 1. Institute for Book and Manuscript ConServation (IBR) 2. Munich DigitiZation Center (MDZ) Agenda 2. Mass Digitization and BSB s Digitization Strategy 3. The digitization process of manuscripts and rare prints 4. Project examples: Robot scanners and hand-operated bookscanners in 16th century digitization projects 5. Throughput
Throughput: our measurement base The entire base for throughput calculation covers: 1. Preprocessing Transport, creating a check form, selection of qualified scanner 2. Scan-Operating Creating a scan job, positioning of the book, scanning, target scanning, storage, data operations 3. Postprocessing Quality control, complaint and rescanning, WWW-delivery, retransport 4. Long-term preservation Automated data transfer in the archiving system, quality control, deletion of the production files in the scanning after the successful long-term preservation
= Scanning throughput at the MDZ Scanning Center Results of 2009, based on the State of the Art of our scanner equipment, among them 8 devices from 2005 on BSB strict conservational requirements: more 70 % of manuscripts and rare books can not be opened at 180 and under the assumption, that the real working time is 6 hours p. day Manuscripts/rare books and difficult objects. handoperated scanner up to ca. 200 pages/day Manuscripts/rare books with normal condition, handoperated scanner ca. 380 pages/da Rare books with ScanRobot ca. 1,000 pages/day 60
What have we done so far? 1.2 million copyright free books Books available online (28.10.10): 394,000 Up to 1600 = ~ 59,000 books
Contact: MDZ: brantl[at]bsb-muenchen.de IBR: irmhild.schaefer[at]bsb-muenchen.de All images: Copyright BSB