Istituto di Fisica Applicata Nello Carrara (CNR) Firenze, Italy - www.ifac.cnr.it Image quality issues in digitization projects of historical documents Franco Lotti IAPR Int. Workshop on Document Analysis Systems Florence, 8-10 September 2004
IFAC activity in Italian digitisation projects of ancient documents IMAGO-II projects (1998-2000). State Archives of Florence and Lucca (more than 100,000 parchment rolls of the Diplomatico fonds, VIII-XIV century). http://www.archiviodistato.firenze.it/progetti/attivite.htm http://www.comune.lucca.it/archiviostato/supp-inf.html Candido project (2000-2003). University Library of Pisa (about 85.000 images of manuscripts, drawings, printed books on paper and parchment, collection of letters of famous personalities, periodicals from XIV to XIX century). http://www.cab.unipd.it/eventi/pisa.php3; http://www.pisa.sbn.it Datini project (2001-2004). State Archive of Prato (450,000 images of Francesco Datini s collection of private and trade letters, bookkeeping notes, management books, etc. XIV century). http://www.archiviodistato.prato.it Monumenti Nazionali project (2002-2004). State Archive of Frosinone (feasibility study for the digitisation and web consultation of parchments preserved by various Middle Ages abbeys and monasteries in Central Italy). Franco Lotti - 2
Digitisation policies of the European Union Lund - 4th April 2001: EC expert s meeting to accomplish coordination mechanisms of digitisation policies and programmes across Europe Lund Principles Lund Action Plan The European culture can be freely accessible through the digitisation of cultural content. This will also support and promote the cultural difference in a global scenario Franco Lotti - 3
Digitisation policies of the European Union MINERVA PROJECT (coordinator: Italy) a network of Member States Ministries - to implement the Lund Action Plan; - to discuss, correlate and harmonise activities in digitisation of cultural and scientific content; - for creating agreed European common recommendations and guidance about: - digitisation - metadata - long-term accessibility - preservation http://www.minervaeurope minervaeurope.org Franco Lotti - 4
Good practices in digitisation projects THE FIRST STEPS - Analysis of documents - Safe document treatment and management - Definition of user s requirements - Quality of acquired images - Efficient and fast retrieval - Cost plan Franco Lotti - 5
Simplified layout of a digitisation project Document IMAGE ACQUISITION A COMPRESSION.... B Indexing METADATA + LAN images DB WEB images DB Master archive LAN server INTRANET Index DB WEB server INTERNET Franco Lotti - 6
Development of the project Acquisition methodology and hardware requirements Ongoing image quality assessment and tests Indexing - formatting and organisation of metadata Accessibility and dissemination strategy Maintenance plan Franco Lotti - 7
Good practices in digitisation projects COSTS Image quality Hardware Image size Storage Operational times Franco Lotti - 8
Factors affecting image quality - Document characteristics - Sensor characteristics - Optical and mechanical performances - Illumination system - Calibration procedures and stability - Type of benches and document supports - Preprocessing - Compression Franco Lotti - 9
Choosing the right instrumentation DOCUMENT CHARACTERISTICS: - Material typology - Binding type - Status of conservation and fragility - Light sensitivity - Size and type of content - B&W, grey or colour Franco Lotti - 10
Choosing the right instrumentation SENSOR CHARACTERISTICS: - Type of view (fixed-planetary, flat-bed, ) - Number of elements and geometry (li( linear, matrix) - Type of scan (one shot, three shotss hots) - Sampling rate (ppi( - pixel per inch) - Bit depth (bpp( - bit per pixel) - Spatial resolution, MTF (cycles/mm) Franco Lotti - 11
Evaluation of the sensor performances Test charts for spatial resolution Sine-wave patterns: to evaluate the modulation transfer function (MTF) Square-wave patterns: to evaluate the contrast transfer function (CTF) Franco Lotti - 12
Evaluation of the sensor performances MTF and CTF Computing the Modulation Transfer Function (MTF), by the acquisition of sine-wave test charts, is the correct way to evaluate the spatial resolution. The evaluation of the Contrast Modulation Function (CTF), by the acquisition of square-wave test charts, tends to over-estimate estimate the quality of the sensor. MTF CTF (*) (*) From: Handbook of Optics, Michael Bass Ed., Donnelly & Sons Publ, 1995, Vol II. 15 Franco Lotti - 13
Evaluation of the sensor performances Colour Test charts Franco Lotti - 14
Choosing the right instrumentation Light, UV fraction and temperature measurements Franco Lotti - 15
Image compression A Reversible (Lossless( Lossless) ) methods: Bitmap TIFF uncompressed TIFF LZW (Lempel Ziv Welch) PNG GIF JPEG 2000 - lossless Franco Lotti - 16
Image compression B Lossy methods: JPEG JPEG 2000 - lossy Franco Lotti - 17
JPEG Compression 1: 20 - artifacts 300 ppi 150 ppi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 18
JPEG Compression 1: 20 - artifacts 1 cm 300 ppi 150 ppi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 19
JPEG Compression 1:50 - artifacts Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 20
JPEG 2000 file format Compatibility with ISO standard Openness Interoperability (systems compliant with the standard) Non-proprietary Supports embedded metadata IPR and XML boxes for property and vendor information Scalability, both in quality and resolution Lossy and lossless decompression Progressive display Tiling of images (useful for large images) ROI (regions of interest) Franco Lotti - 21
Example: Comparison between JPEG and J2K Datini Project (State Archive of Prato) Case A: A Bit rate = 2.55 (CR = 9.4) Ggood quality for LAN consultation (intranet) Francesco di Marco Datini Private Correspondence nce Case B: B Bit rate = 0.46 (CR = 51,8) Low quality: for web dissemination (internet) Franco Lotti - 22
Example: Comparison between JPEG and J2K High-Q Q JPG (intranet) Low-Q Q JPG (internet) High Q Q J2K Low-Q Q J2K Franco Lotti - 23
Example: Comparison between JPEG and J2K Bit rate CR RMSE (*) PE (*) Q-index (**) High Quality JPEG 2.55 9.40 5.05 55 0.73 Low quality JPEG 0.46 51.83 10.93 187 0.33 High Quality J2K 2.55 9.40 4.12 40 0.81 Low quality J 2K 0.46 51.83 9.53 186 0.39 (*) Green channel (**) after: Z. Whang and A. C. Bovik: IEEE Signal Proc., 9, n.3, March 2002 Franco Lotti - 24
Some examples 1 - Parchments 2 - Seals 3 - Printed paper 4 - Manuscripts 5 - Drawing Franco Lotti - 25
Example 1 - Project: IMAGO II Digitisation of the Diplomatico fonds A box of parchment rolls of the Diplomatico. State Archive of Florence State Archive of Florence: More than 140,000 Parchment rolls 681 provenances VIII XIX century Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 26
Example 1 - Project: IMAGO II Digitisation of the Diplomatico fonds State Archive of Lucca: About 23,000 parchment rolls 70 provenances VIII XIX century The parchment rolls of the Diplomatico Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 27
Example 1 - Project: IMAGO II Digitisation of the Diplomatico fonds To ensure safety of document handling, suitable sliding windows have been designed for the acquisition of parchment rolls on both sides (recto and verso) Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 28
Presentation 1:1 Example 1 Project: IMAGO II Digitisation of parchment manuscripts Enlargement: 4:1 Sensor : Matrix, Three shots (no interpolation) Spatial sampling: 200 ppi Colour depth: 36 bit/pixel, rescaled to 24 bpp Compression: JPEG average CR: ~ 10 Access: Intranet. The quality of compressed images showed a hardly appreciable degree of loss, even at enlargements greater than 3:1 Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 29
Example 2 - Project: IMAGO II Digitisation of the Diplomatico fonds ACQUISITION OF WAX SEALS Sensor : Matrix, Three shots (no interpolation) Spatial sampling: 400 ppi Colour depth: 36 bit/pixel rescaled to 24 Illumination: 2 flashes, asymmetric Compression: : JPEG, average CR 10 Access: : intranet Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 30
Example 3 - CANDIDO Project University Library of Pisa Periodicals (XIX Century) Sensor : Three-linear scanner Spatial sampling: 200 ppi Colour depth: 24 bit/pixel Compression: JPEG, CR 10 Access: intranet Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 31
Example 4 - CANDIDO Project University Library of Pisa Correspondance Rosselmini-Gualandi Sensor : Scan back, Three-linear array Spatial sampling: 300 ppi Colour depth: : 48 bit/pixel rescaled to 24 bpp Compression: : JPEG,, CR 10 Access: : intranet Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 32
Example 4 - CANDIDO Project University Library of Pisa Correspondance Rosselmini-Gualandi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 33
Example 4 - CANDIDO Project University Library of Pisa Correspondance Rosselmini-Gualandi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 34
Example 4 - CANDIDO Project University Library of Pisa Correspondance Rosselmini-Gualandi Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 35
Example 5 - CANDIDO Project University Library of Pisa Rosellini - Drawings Sensor : Scan back, Three-linear array Spatial sampling: 300 ppi Colour depth: : 48 bit/pixel, rescaled to 24 bpp Compression: : JPEG,, CR 10 Access: : intranet Consiglio Nazionale delle Ricerche Istituto di Fisica Applicata Nello Carrara - Firenze Franco Lotti - 36
Conclusions - Image quality depends on a number of factors related to the capture methods and the further processing steps. - Difficult to weight all those factors in a rigorous way. - Suitable test procedures advisable, also tuned to the specific targets. Key points: Correct acquisition of masters Adaptive and progressive compression and smart, robust IRP methods Standards Sustainability Franco Lotti - 37
Istituto di Fisica Applicata Nello Carrara (CNR) Firenze, Italy - www.ifac.cnr.it Thank you for your kind attention! IAPR Int. Workshop on Document Analysis Systems - Florence, 8-10 September 2004 Franco Lotti - 38