Image quality issues in digitization projects of historical documents



Similar documents
Digitisation Disposal Policy Toolkit

Overview of NDNP Technical Specifications

Get the Best Digital Images Possible. What s it all about anyway?

HP Smart Document Scan Software compression schemes and file sizes

In addition, a decision should be made about the date range of the documents to be scanned. There are a number of options:

Comparison of different image compression formats. ECE 533 Project Report Paula Aguilera

TEXT FILES. Format Description / Properties Usage and Archival Recommendations

MMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations

College Archives Digital Preservation Policy. Created: October 2007 Last Updated: December 2012

Image Compression through DCT and Huffman Coding Technique

Digital Preservation. Guidance Note: Graphics File Formats

QUARTZ HD 600 x 600 dpi optical > 11lp / mm QUARTZ A1 SUPRASCAN. Just smile you are in good hands!

Wisconsin Heritage Online Digital Imaging Guidelines Version 2.0 September 2009

Pictures / images on computers

Assessment of Camera Phone Distortion and Implications for Watermarking

The EDCINE Project Enhanced Digital Cinema

DIGITIZATION S GUIDE. Go for quality and document your process!

balesio Native Format Optimization Technology (NFO)

MassArt Studio Foundation: Visual Language Digital Media Cookbook, Fall 2013

Good Practice Handbook

What Resolution Should Your Images Be?

Introduzione alle Biblioteche Digitali Audio/Video

Data Mining Un-Compressed Images from cloud with Clustering Compression technique using Lempel-Ziv-Welch

Data Storage 3.1. Foundations of Computer Science Cengage Learning

Standard. Record-keeping Requirements for Digitization. April 2009

Digital Image Fundamentals. Selim Aksoy Department of Computer Engineering Bilkent University

SCANNING, RESOLUTION, AND FILE FORMATS

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

TSScan - Usage Guide. Usage Guide. TerminalWorks TSScan 2.5 Usage Guide. support@terminalworks.com

DIGITAL OBJECT an item or resource in digital format. May be the result of digitization or may be born digital.

Structures for Data Compression Responsible persons: Claudia Dolci, Dante Salvini, Michael Schrattner, Robert Weibel

Electronic Records Management Guidelines - File Formats

How to Send Video Images Through Internet

INTEROPERABLE IMAGE DATA ACCESS THROUGH ARCGIS SERVER

Smithsonian Institution Archives Guidance Update SIA. ELECTRONIC RECORDS Recommendations for Preservation Formats. November 2004 SIA_EREC_04_03

New York State Archives Digital Imaging Guidelines (2014) 1

Scanning and OCR Basics

Scanning, analysing and archiving photographs

ANALYSIS OF THE COMPRESSION RATIO AND QUALITY IN MEDICAL IMAGES

1. Redistributions of documents, or parts of documents, must retain the SWGIT cover page containing the disclaimer.

FCE: A Fast Content Expression for Server-based Computing

Trinity College Library Dublin

Reduce File Size. Compatibility. Contents

SECURITY WHERE THE HISTORY LIVES

LittleCMS: A free color management engine in 100K.

ANALYSIS OF THE EFFECTIVENESS IN IMAGE COMPRESSION FOR CLOUD STORAGE FOR VARIOUS IMAGE FORMATS

Big Data Volume & velocity data management with ERDAS APOLLO. Alain Kabamba Hexagon Geospatial

White Paper. "See" what is important

Introduction to image coding

IC 1101 Basic Electronic Practice for Electronics and Information Engineering

Let s Digitize! Funds provided by

DATA RATE AND DYNAMIC RANGE COMPRESSION OF MEDICAL IMAGES: WHICH ONE GOES FIRST? Shahrukh Athar, Hojatollah Yeganeh and Zhou Wang

Outline: Yb optical frequency standard The fiber link Applications and fundamental physics

Preparing Images for PowerPoint, the Web, and Publication

ACADEMIC TECHNOLOGY SUPPORT

Archiving Full Resolution Images

SNC-VL10P Video Network Camera

Overview of the involvement of local Research. Organisations, Enterprises, Universities in. national and international projects on Earth

A Basic Summary of Image Formats

SMU Central University Libraries Digitization Guidelines and Procedures Best Practices for Digitization


Brown County Information Technology Aberdeen, SD. Request for Proposals For Document Management Solution. Proposals Deadline: Submit proposals to:

Digitization of Old Maps Using Deskan Express 5.0

Links. Blog. Great Images for Papers and Presentations 5/24/2011. Overview. Find help for entire process Quick link Theses and Dissertations

Secured Lossless Medical Image Compression Based On Adaptive Binary Optimization

Basic Specifications. Electrical. EPSON Perfection Color EPSON MatrixCCD TM line sensor. device Effective pixels

Photography of Cultural Heritage items

Billy Chi-hing Kwan Associate Museum Librarian/Systems. The Image Library

This is a revised version of an article which first appeared in AMIA Tech Review Volume 2, October

To be productive in today s graphic s industry, a designer, artist, or. photographer needs to have some basic knowledge of various file

E-Content Service Group Virtual Meeting. Digital Preservation: How to Get Started

Digital Imaging and Image Editing

issues Scanning Technologies Technology Cost Scanning Productivity Geometric Scanning Accuracy Total Cost of Ownership (TCO) Scanning Reliability

Video compression: Performance of available codec software

KIP 720 CIS SCANNING SYSTEM WITH ADVANCED KIP REAL TIME THRESHOLDING TECHNOLOGY

Digital Preservation Guidance Note: Selecting File Formats for Long-Term Preservation

A comprehensive survey on various ETC techniques for secure Data transmission

Personal Identity Verification (PIV) IMAGE QUALITY SPECIFICATIONS FOR SINGLE FINGER CAPTURE DEVICES

At Scandoc we specialise in transforming your hard copy paper documents into easily accessible digital files. Our document scanning and data capture

Preserving French Scientific data

Periodontology. Digital Art Guidelines JOURNAL OF. Monochrome Combination Halftones (grayscale or color images with text and/or line art)

Transcription:

Istituto di Fisica Applicata Nello Carrara (CNR) Firenze, Italy - www.ifac.cnr.it Image quality issues in digitization projects of historical documents Franco Lotti IAPR Int. Workshop on Document Analysis Systems Florence, 8-10 September 2004

IFAC activity in Italian digitisation projects of ancient documents IMAGO-II projects (1998-2000). State Archives of Florence and Lucca (more than 100,000 parchment rolls of the Diplomatico fonds, VIII-XIV century). http://www.archiviodistato.firenze.it/progetti/attivite.htm http://www.comune.lucca.it/archiviostato/supp-inf.html Candido project (2000-2003). University Library of Pisa (about 85.000 images of manuscripts, drawings, printed books on paper and parchment, collection of letters of famous personalities, periodicals from XIV to XIX century). http://www.cab.unipd.it/eventi/pisa.php3; http://www.pisa.sbn.it Datini project (2001-2004). State Archive of Prato (450,000 images of Francesco Datini s collection of private and trade letters, bookkeeping notes, management books, etc. XIV century). http://www.archiviodistato.prato.it Monumenti Nazionali project (2002-2004). State Archive of Frosinone (feasibility study for the digitisation and web consultation of parchments preserved by various Middle Ages abbeys and monasteries in Central Italy). Franco Lotti - 2

Digitisation policies of the European Union Lund - 4th April 2001: EC expert s meeting to accomplish coordination mechanisms of digitisation policies and programmes across Europe Lund Principles Lund Action Plan The European culture can be freely accessible through the digitisation of cultural content. This will also support and promote the cultural difference in a global scenario Franco Lotti - 3

Digitisation policies of the European Union MINERVA PROJECT (coordinator: Italy) a network of Member States Ministries - to implement the Lund Action Plan; - to discuss, correlate and harmonise activities in digitisation of cultural and scientific content; - for creating agreed European common recommendations and guidance about: - digitisation - metadata - long-term accessibility - preservation http://www.minervaeurope minervaeurope.org Franco Lotti - 4

Good practices in digitisation projects THE FIRST STEPS - Analysis of documents - Safe document treatment and management - Definition of user s requirements - Quality of acquired images - Efficient and fast retrieval - Cost plan Franco Lotti - 5

Simplified layout of a digitisation project Document IMAGE ACQUISITION A COMPRESSION.... B Indexing METADATA + LAN images DB WEB images DB Master archive LAN server INTRANET Index DB WEB server INTERNET Franco Lotti - 6

Development of the project Acquisition methodology and hardware requirements Ongoing image quality assessment and tests Indexing - formatting and organisation of metadata Accessibility and dissemination strategy Maintenance plan Franco Lotti - 7

Good practices in digitisation projects COSTS Image quality Hardware Image size Storage Operational times Franco Lotti - 8

Factors affecting image quality - Document characteristics - Sensor characteristics - Optical and mechanical performances - Illumination system - Calibration procedures and stability - Type of benches and document supports - Preprocessing - Compression Franco Lotti - 9

Choosing the right instrumentation DOCUMENT CHARACTERISTICS: - Material typology - Binding type - Status of conservation and fragility - Light sensitivity - Size and type of content - B&W, grey or colour Franco Lotti - 10

Choosing the right instrumentation SENSOR CHARACTERISTICS: - Type of view (fixed-planetary, flat-bed, ) - Number of elements and geometry (li( linear, matrix) - Type of scan (one shot, three shotss hots) - Sampling rate (ppi( - pixel per inch) - Bit depth (bpp( - bit per pixel) - Spatial resolution, MTF (cycles/mm) Franco Lotti - 11

Evaluation of the sensor performances Test charts for spatial resolution Sine-wave patterns: to evaluate the modulation transfer function (MTF) Square-wave patterns: to evaluate the contrast transfer function (CTF) Franco Lotti - 12

Evaluation of the sensor performances MTF and CTF Computing the Modulation Transfer Function (MTF), by the acquisition of sine-wave test charts, is the correct way to evaluate the spatial resolution. The evaluation of the Contrast Modulation Function (CTF), by the acquisition of square-wave test charts, tends to over-estimate estimate the quality of the sensor. MTF CTF (*) (*) From: Handbook of Optics, Michael Bass Ed., Donnelly & Sons Publ, 1995, Vol II. 15 Franco Lotti - 13

Evaluation of the sensor performances Colour Test charts Franco Lotti - 14

Choosing the right instrumentation Light, UV fraction and temperature measurements Franco Lotti - 15

Image compression A Reversible (Lossless( Lossless) ) methods: Bitmap TIFF uncompressed TIFF LZW (Lempel Ziv Welch) PNG GIF JPEG 2000 - lossless Franco Lotti - 16

Image compression B Lossy methods: JPEG JPEG 2000 - lossy Franco Lotti - 17

JPEG Compression 1: 20 - artifacts 300 ppi 150 ppi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 18

JPEG Compression 1: 20 - artifacts 1 cm 300 ppi 150 ppi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 19

JPEG Compression 1:50 - artifacts Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 20

JPEG 2000 file format Compatibility with ISO standard Openness Interoperability (systems compliant with the standard) Non-proprietary Supports embedded metadata IPR and XML boxes for property and vendor information Scalability, both in quality and resolution Lossy and lossless decompression Progressive display Tiling of images (useful for large images) ROI (regions of interest) Franco Lotti - 21

Example: Comparison between JPEG and J2K Datini Project (State Archive of Prato) Case A: A Bit rate = 2.55 (CR = 9.4) Ggood quality for LAN consultation (intranet) Francesco di Marco Datini Private Correspondence nce Case B: B Bit rate = 0.46 (CR = 51,8) Low quality: for web dissemination (internet) Franco Lotti - 22

Example: Comparison between JPEG and J2K High-Q Q JPG (intranet) Low-Q Q JPG (internet) High Q Q J2K Low-Q Q J2K Franco Lotti - 23

Example: Comparison between JPEG and J2K Bit rate CR RMSE (*) PE (*) Q-index (**) High Quality JPEG 2.55 9.40 5.05 55 0.73 Low quality JPEG 0.46 51.83 10.93 187 0.33 High Quality J2K 2.55 9.40 4.12 40 0.81 Low quality J 2K 0.46 51.83 9.53 186 0.39 (*) Green channel (**) after: Z. Whang and A. C. Bovik: IEEE Signal Proc., 9, n.3, March 2002 Franco Lotti - 24

Some examples 1 - Parchments 2 - Seals 3 - Printed paper 4 - Manuscripts 5 - Drawing Franco Lotti - 25

Example 1 - Project: IMAGO II Digitisation of the Diplomatico fonds A box of parchment rolls of the Diplomatico. State Archive of Florence State Archive of Florence: More than 140,000 Parchment rolls 681 provenances VIII XIX century Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 26

Example 1 - Project: IMAGO II Digitisation of the Diplomatico fonds State Archive of Lucca: About 23,000 parchment rolls 70 provenances VIII XIX century The parchment rolls of the Diplomatico Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 27

Example 1 - Project: IMAGO II Digitisation of the Diplomatico fonds To ensure safety of document handling, suitable sliding windows have been designed for the acquisition of parchment rolls on both sides (recto and verso) Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 28

Presentation 1:1 Example 1 Project: IMAGO II Digitisation of parchment manuscripts Enlargement: 4:1 Sensor : Matrix, Three shots (no interpolation) Spatial sampling: 200 ppi Colour depth: 36 bit/pixel, rescaled to 24 bpp Compression: JPEG average CR: ~ 10 Access: Intranet. The quality of compressed images showed a hardly appreciable degree of loss, even at enlargements greater than 3:1 Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 29

Example 2 - Project: IMAGO II Digitisation of the Diplomatico fonds ACQUISITION OF WAX SEALS Sensor : Matrix, Three shots (no interpolation) Spatial sampling: 400 ppi Colour depth: 36 bit/pixel rescaled to 24 Illumination: 2 flashes, asymmetric Compression: : JPEG, average CR 10 Access: : intranet Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 30

Example 3 - CANDIDO Project University Library of Pisa Periodicals (XIX Century) Sensor : Three-linear scanner Spatial sampling: 200 ppi Colour depth: 24 bit/pixel Compression: JPEG, CR 10 Access: intranet Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 31

Example 4 - CANDIDO Project University Library of Pisa Correspondance Rosselmini-Gualandi Sensor : Scan back, Three-linear array Spatial sampling: 300 ppi Colour depth: : 48 bit/pixel rescaled to 24 bpp Compression: : JPEG,, CR 10 Access: : intranet Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 32

Example 4 - CANDIDO Project University Library of Pisa Correspondance Rosselmini-Gualandi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 33

Example 4 - CANDIDO Project University Library of Pisa Correspondance Rosselmini-Gualandi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 34

Example 4 - CANDIDO Project University Library of Pisa Correspondance Rosselmini-Gualandi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 35

Example 5 - CANDIDO Project University Library of Pisa Rosellini - Drawings Sensor : Scan back, Three-linear array Spatial sampling: 300 ppi Colour depth: : 48 bit/pixel, rescaled to 24 bpp Compression: : JPEG,, CR 10 Access: : intranet Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 36

Conclusions - Image quality depends on a number of factors related to the capture methods and the further processing steps. - Difficult to weight all those factors in a rigorous way. - Suitable test procedures advisable, also tuned to the specific targets. Key points: Correct acquisition of masters Adaptive and progressive compression and smart, robust IRP methods Standards Sustainability Franco Lotti - 37

Istituto di Fisica Applicata Nello Carrara (CNR) Firenze, Italy - www.ifac.cnr.it Thank you for your kind attention! IAPR Int. Workshop on Document Analysis Systems - Florence, 8-10 September 2004 Franco Lotti - 38