Determining the Resolution of Scanned Document Images



Similar documents
Assessment of Camera Phone Distortion and Implications for Watermarking

Visual Structure Analysis of Flow Charts in Patent Images

SIGNATURE VERIFICATION

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Face detection is a process of localizing and extracting the face region from the

OCR and 2D DataMatrix Specification for:

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan

Poker Vision: Playing Cards and Chips Identification based on Image Processing

ACE: Illustrator CC Exam Guide

SOURCE SCANNER IDENTIFICATION FOR SCANNED DOCUMENTS. Nitin Khanna and Edward J. Delp

Segmentation and Automatic Descreening of Scanned Documents

Personal Identity Verification (PIV) IMAGE QUALITY SPECIFICATIONS FOR SINGLE FINGER CAPTURE DEVICES

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

The accurate calibration of all detectors is crucial for the subsequent data

How To Filter Spam Image From A Picture By Color Or Color

Designing forms for auto field detection in Adobe Acrobat

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

A Method of Caption Detection in News Video

Galaxy Morphological Classification

Samsung Rendering Engine for Clean Pages (ReCP) Printer technology that delivers professional-quality prints for businesses

Locating and Decoding EAN-13 Barcodes from Images Captured by Digital Cameras

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Get the Best Digital Images Possible. What s it all about anyway?

Guide To Creating Academic Posters Using Microsoft PowerPoint 2010

Analecta Vol. 8, No. 2 ISSN

Fluid structure interaction of a vibrating circular plate in a bounded fluid volume: simulation and experiment

Holes & Selective Laser Sintering

Visibility optimization for data visualization: A Survey of Issues and Techniques

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections

DIGITAL IMAGE PROCESSING AND ANALYSIS

ELFRING FONTS INC. MICR FONTS FOR WINDOWS

Form Design Guidelines Part of the Best-Practices Handbook. Form Design Guidelines

Palmprint Recognition. By Sree Rama Murthy kora Praveen Verma Yashwant Kashyap

Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition

Detection and Restoration of Vertical Non-linear Scratches in Digitized Film Sequences

Nonlinear Iterative Partial Least Squares Method

Automatic License Plate Recognition using Python and OpenCV

The Wondrous World of fmri statistics

Novel Automatic PCB Inspection Technique Based on Connectivity

Technical Bulletin. Threshold and Analysis of Small Particles on the BD Accuri C6 Flow Cytometer

MEASURES OF VARIATION

Uses of Derivative Spectroscopy

3-D Object recognition from point clouds

The Scientific Data Mining Process

Research on Chinese financial invoice recognition technology

An Iterative Image Registration Technique with an Application to Stereo Vision

Document Image Retrieval using Signatures as Queries

Module 13 : Measurements on Fiber Optic Systems

Morphological segmentation of histology cell images

Copyright 2007 Casa Software Ltd. ToF Mass Calibration

Spectral Line II. G ij (t) are calibrated as in chapter 5. To calibrated B ij (ν), observe a bright source that is known to be spectrally flat

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Understanding CIC Compensation Filters

What Resolution Should Your Images Be?

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Vision based Vehicle Tracking using a high angle camera

A Dynamic Approach to Extract Texts and Captions from Videos

Face Model Fitting on Low Resolution Images

Automatic Extraction of Signatures from Bank Cheques and other Documents

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

Elfring Fonts, Inc. PCL MICR Fonts

Predictive Indicators for Effective Trading Strategies By John Ehlers

Combining an Alternating Sequential Filter (ASF) and Curvelet for Denoising Coronal MRI Images

Lesson 4 Measures of Central Tendency

Optical Character Recognition (OCR)

Copyright 2000 IEEE. Reprinted from IEEE MTT-S International Microwave Symposium 2000

Summarizing and Displaying Categorical Data

MestRe-C User Guide Megan Bragg 04/14/05

Creating Interactive PDF Forms

Super-resolution method based on edge feature for high resolution imaging

SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY

A Lightweight and Effective Music Score Recognition on Mobile Phone

Remote Sensing of Clouds from Polarization

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE.

Labs in Bologna & Potenza Menzel. Lab 3 Interrogating AIRS Data and Exploring Spectral Properties of Clouds and Moisture

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Template-based Eye and Mouth Detection for 3D Video Conferencing

Organize your project in a way that identifies the research questions and methodology you will use.

Tips for optimizing your publications for commercial printing

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Drawing a histogram using Excel

Figure 1. An embedded chart on a worksheet.

Received in revised form 24 March 2004; accepted 30 March 2004

The Array Scanner as Microdensitometer Surrogate: A Deal with the Devil or... a Great Deal?

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT

A. Scan to PDF Instructions

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

White Paper: Microcells A Solution to the Data Traffic Growth in 3G Networks?

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

Analysis of Wing Leading Edge Data. Upender K. Kaul Intelligent Systems Division NASA Ames Research Center Moffett Field, CA 94035

SAMPLE MIDTERM QUESTIONS

Document Image Processing - A Review

Transcription:

Presented at IS&T/SPIE EI 99, Conference 3651, Document Recognition and Retrieval VI Jan 26-28, 1999, San Jose, CA. Determining the Resolution of Scanned Document Images Dan S. Bloomberg Xerox Palo Alto Research Center Palo Alto, CA 9434 Abstract Given the existence of digital scanners, printers and fax machines, documents can undergo a history of sequential reproductions. One of the most important determiners of the quality of the resulting image is the set of underlying resolutions at which the images were scanned and binarized. In particular, a low resolution scan produces a noticeable degradation of image quality, and produces a set of printed fonts that cause omnifont OCR systems to operate with a relatively high error rate. This error rate can be reduced if the OCR system is trained on text having fax scanner degradations, but this also requires that the OCR system can determine in advance if such degraded fonts are present. Methods are found for determining the lowest resolution that a binary document image has been scanned (or printed) in the past. Observing that the signature for a low resolution scan is contained in the boundary contours of the black components, the connected components that constitute the contours are measured, and both the size histogram and its power spectrum are used to determine the underlying low resolution, if any. The primary application is to fax, both standard and fine. Degradation of both font appearance and OCR performance is far more severe for standard than fine fax, so the most important practical problem is to identify documents of binary scanned text that have a standard fax signature. Only a small part of the page image is required to determine the existence of such a signature; consequently, the image is pre-processed to locate a subregion that is well suited for analysis. Results for discriminating between originals, standard fax and fine fax on a small but diverse test suite are given. Keywords: image resolution, facsimile, fax, pattern spectrum, power spectrum, image analysis 1 Introduction The purpose of this study is to find accurate and efficient methods for determining if a digital image has been previously scanned at low resolution, measured in pixels/inch (ppi). The most 1

common fax resolutions are standard, approximately 2 x 1 ppi, and fine, approximately 2 x 2 ppi. Images scanned at standard fax resolution suffer obvious visible degradation. This is also reflected by the generally poor performance of omni-font OCR systems on such images [3]. For example, in the OCR bakeoff from the 1996 SDAIR Conference, the character error rate of the two best recognizers on standard fax was nearly 6 percent; the others were more than 1 percent. This should be compared with best character error rates of less than 2 percent on 3 ppi binary, and less than 3 percent on 2 ppi binary and fine fax. OCR performance can be improved using special recognizers trained on low-resolution fonts, but their use requires prior determination of the effective resolution of the image. If the fax is stored as an electronic image, it is easy to infer the scan resolution from the image size. But if the fax is printed and then scanned at high resolution, with or without prior copying, the inherent low resolution of the image contents must be inferred from the image itself. Our approach is morphological and relies on the statistics that can be accumulated from a region of the image. The methods have general validity, but we apply them for the common situations where the low resolution scan is fax (either about 1 ppi or 2 ppi) and the high resolution scan is either 3 ppi or 4 ppi. For the standard fax identification problem, it is necessary to extract a set of measurements from the image for which a discriminator can unambiguously place the image into one of two categories: high resolution or low resolution. The approach is empirical. Histograms constructed from data extracated by various measurements are analyzed for structural or periodic characteristics. Where low and high resolution images show significant differences, the feasability of building a discriminator is investigated. It is reasonable to vary the discrimination function depending on the final scan resolution, which is typically known or can be inferred from the image size. Fax documents are typically scanned in standard mode and in fine mode. In the sequel, although all images we study have been most recently scanned at high resolution (3 or 4 ppi), for simplicity we refer to an image that was never scanned at low resolution as an original, an image previously scanned at standard fax resolution as a standard-fax, and an image previously scanned at fine fax resolution, which is about 2 x 2 ppi, as a fine-fax. Naturally, it should be easier to discriminate a standard-fax image from an original than a fine-fax image. Fortunately, as a practical matter, it is far more important to identify standard-fax images, because omni-font OCR systems do fairly well on fine-fax. 2 Approach Visually, the difference in appearance between standard fax and original images is striking. Three significant components of the difference are: Thickening introduced by the low resolution fax scanner. A tendency for larger parts of the fax boundaries to be horizontal and vertical. 2

The original boundaries are smoother. The thickness of strokes varies greatly between scanned documents, and is not a robust parameter for discriminating between original and fax images. The other two observations relate to the boundary characteristics, and much of the signature of a low resolution scan is expected to reside in the edge pixels. Thus, the focus of investigation will be on the statistics of boundary pixels, and in particular, functions еof the lengthðof runs or connected components of the boundary. The most important operations for generating еare expected to be image morphology, granulometries, and connected component labelling. There are a number of questions to consider, and the final arbiter is empirical: 1. Should the images be filtered morphologically for noise before extracting the edge pixels? 2. Should all edge pixels be considered together, or only horizontal or vertical edge pixels separately? 3. For standard fax, because the lowest resolution is in the vertical direction, should we concentrate exclusively on vertical edge pixels? 4. What functions еshould be derived from the edge pixels? 5. Of what use is the power spectrum of these functions? 6. How are the measurements combined to form robust and size-invariant discriminators? Regarding the methodology, rather than using a model-based theoretical approach, where you try to guess the salient features in detail a priori and build a specific set of operations to extract exactly those features, we use an iterative method where you start with a guess about which features might be salient and take measurements where differences are expected to arise. Results are noted, and the extraction method is varied in an attempt to improve the results. Through such iterations, a subspace of methods and parameters is investigated. We show a few of the parameters that were varied, along with the best methods found. Aiming for saliency on the low-resolution vertical scan for standard fax, we begin by extracting the vertical edge pixels½. The image is eroded with a horizontal structuring element of length 3, with the center at the center of the structuring element, and the result is XORed back with the uneroded image. Fig. 1 shows the vertical edge pixels extracted from the original, fine fax, and standard fax images. Possibilities for еinclude the number of runlengths of each size,ö е, perhaps weighted by a power of the length of the runs, or the number of connected components at each size, е, again optionally weighted. In Fig. 1, the original appears to have many more isolated single pixels, and far fewer short vertical 8-connected components (8-cc), than the standard fax; the fine fax ½If we use all boundary pixels, the 8-connected components of the boundary are just the size of the connected components in the image. 3

Figure 1: Vertical edge pixels: top (original); middle (fine fax); lower (standard fax) 4

is somewhere in between. For these boundary images, the number of vertical runlengthsöú е is equal to the number of 4-connected components (4-cc) of that size, е. Further, the pattern spectrum[4], which is the number of pixels within runs of a given length, is just the first moment ÐÖÚ Ðµof the runlength histogram,öú е. The histogram of 8-cc cannot be derived simply from vertical runlengths. In general, we measure the distribution of -connected component heights, both 4-cc and 8-cc, and with different momentsñfor the weighting factors; namely, е ÐÑ Ðµ (1) 2.1 Typical component height data Fig. 2 shows the distribution of 4-cc heights, е е, from a typical original and standard fax, both taken from images sampled at 4 ppi. There is relatively little structural difference between the two. Both histograms are dominated by components with very low counts, as is evident from Fig. 1. 8 edge 4-c.c., m =, original height histogram 5 edge 4-c.c., m =, standard fax height histogram 7 6 5 4 3 2 1 45 4 35 3 25 2 15 1 5 5 1 15 2 25 3 35 5 1 15 2 25 3 35 Figure 2: Counts of 4-cc heights, as a function of height, from original and standard fax images, at 4 ppi. Fig. 3 shows the first moment of the distribution of 4-cc heights, е Ре, from a typical original and standard fax. Except forð ½andÐ ¾, the counts are significantly larger for standard fax, and this data may be useful for discrimination. Fig. 4 shows the distribution of 8-cc heights, е е, from a typical original and standard fax. The differences are striking. Most of the original counts are inð ½; the signal forð ½is very small. For the fax, a significant part of the signal is inð ½, and the underlying periodicity of the low-resolution scan at multiples of 4 pixels is evident. Fig. 5 shows the first moment of the distribution of 8-cc heights, е Ре, from a typical original and standard fax. Use of the first moment spreads the data in the original image beyone Ð ½, and the broad peak between 2 and 3 is due to components extending to the x-height of the text. (See Fig. 1). The periodicity of the fax at higher multiples of 4 is clearly evident, accentuated by the weighting factor. 5

8 edge 4-c.c., m = 1, original height histogram 5 edge 4-c.c., m = 1, standard fax height histogram 7 6 5 4 3 2 1 45 4 35 3 25 2 15 1 5 5 1 15 2 25 3 35 5 1 15 2 25 3 35 Figure 3: Counts of the first moment of 4-cc heights, as a function of height, from original and standard fax images, at 4 ppi. 25 edge 8-c.c., m =, original height histogram 9 edge 8-c.c., m =, standard fax height histogram 8 2 15 1 5 7 6 5 4 3 2 1 5 1 15 2 25 3 35 5 1 15 2 25 3 35 Figure 4: Counts of 8-cc heights, as a function of height, from original and standard fax images, at 4 ppi. 25 edge 8-c.c., m = 1, original height histogram 18 edge 8-c.c., m = 1, standard fax height histogram 16 2 15 1 5 14 12 1 8 6 4 2 5 1 15 2 25 3 35 5 1 15 2 25 3 35 Figure 5: Counts of the first moment of 8-cc heights, as a function of height, from original and standard fax images, at 4 ppi. 6

Finally, in Fig. 6 we show comparable 8-cc histograms from original and standard fax images that have been rescanned at 3 ppi. The periodicity of the low resolution scan, this time in units of 3 pixels, is evident. 14 edge 8-c.c., m =, 3 ppi, original height histogram 6 edge 8-c.c., m =, 3 ppi, standard fax height histogram 12 5 1 8 6 4 4 3 2 2 1 5 1 15 2 25 3 35 5 1 15 2 25 3 35 Figure 6: Counts of 8-cc heights, as a function of height, from original and standard fax images, at 3 ppi. Because the differences between original and fax are much larger for 8-cc than 4-cc, we use them exclusively. Pre-filtering with small vertical openings and closings can be done to remove small pixel noise on the vertical edges, before the edges are extracted from the image. However, it was observed that such filtering reduced the differences between original and fax histograms, for both 4-cc and 8-cc; consequently, it is not used. 2.2 Discrimination functions and power spectra È Ðµ combinations of the е, or numbers, ¼¼ ¾µ È Ð Ðµ By inspection, from Fig. 4 for 4 ppi, we ¼¼ È Ð ¾ е choose and from Fig. 6 for 3 ppi, The discrimination functions formed from these features must be invariant with respect to the size of the image and the number of runs or connected components in the image. The score, which is to be compared with a threshold, can be constructed by taking the ratio of two linear Ð µ ½µ (2) ½µ (3) (4) more simply by dividing a linear combination by any one of the 7

Note that these discrimination functions do not explicitly use the low-resolution periodic structure. Where such structure exists, discriminators that are sensitive to the specific wavelengths can be constructed. To emphasize the periodicity, it is useful to take the power spectrum of the histogram data, and use appropriate terms to form discriminating functions. Because the data is real, the power spectrum is symmetric about the center. 1.8e+7 8-cc, m =, Original power spectrum of height histogram 7e+7 8-cc, m = 1, Original power spectrum of height histogram spectral power 1.6e+7 1.4e+7 1.2e+7 1e+7 8e+6 6e+6 4e+6 2e+6 2 4 6 8 1 12 14 16 18 inverse / wavelength spectral power 6e+7 5e+7 4e+7 3e+7 2e+7 1e+7 2 4 6 8 1 12 14 16 18 inverse / wavelength Figure 7: Power spectra of the zeroth and first moments of 8-cc heights for an original at 4 ppi, as a function of inverse distance. 9e+6 8-cc, m =, Standard fax power spectrum of height histogram 7e+7 8-cc, m = 1, Standard fax power spectrum of height histogram spectral power 8e+6 7e+6 6e+6 5e+6 4e+6 3e+6 2e+6 1e+6 spectral power 6e+7 5e+7 4e+7 3e+7 2e+7 1e+7 2 4 6 8 1 12 14 16 18 inverse / wavelength 2 4 6 8 1 12 14 16 18 inverse / wavelength Ô µ ½Æ µ ¾ (5) Figure 8: Power spectra of the zeroth and first moments of 8-cc heights for a standard fax at 4 ppi, as a function of inverse distance. Fig. 7 shows the power spectrum of 8-cc heights with zeroth and first moments, for a typical µ Æ ½ original scanned at 4 ppi. These spectral components are the absolute value squared of the fourier components Ð ¼ е ¾ Ð Æ (6) 8

4e+7 8-cc, m =, Fine fax power spectrum of height histogram 7e+7 8-cc, m = 1, Fine fax power spectrum of height histogram spectral power 3.5e+7 3e+7 2.5e+7 2e+7 1.5e+7 1e+7 5e+6 2 4 6 8 1 12 14 16 18 inverse / wavelength spectral power 6e+7 5e+7 4e+7 3e+7 2e+7 1e+7 2 4 6 8 1 12 14 16 18 inverse / wavelength Figure 9: Power spectra of the zeroth and first moments of 8-cc heights for a fine fax at 4 ppi, as a function of inverse distance. The power spectra were constructed fromæ ¾samples, and are shown up to the Nyquist limit at ½, where the periodicity,, is given by Æ ¾pixels. The original has a very uniform spectrum for m =, and structure with a concentration of weight at a frequency corresponding to about 7 or 8 pixels for m = 1. The power spectra of the zeroth and first moments of 8-cc heights for a typical standard fax are shown in Fig. 8. In contrast to the m = spectrum from the original, the fax spectrum displays considerable structure. The m = 1 spectrum has a notable signature, where the large weight at exactly 4 pixels (, half the Nyquist cutoff) is highly prominent. The power spectra of zeroth and first moments of 8-cc heights for a typical fine fax are given in Fig. 9. In contrast to the standard fax spectrum for m = 1, this shows nearly zero weight at the 4 pixel periodicity. It also has a concentration of weight at a long periodicity of 1 to 12 pixels. For m = 1, the standard fax is more similar to the original than to the fine fax, and these differences will be used to make a discriminator between standard and fine fax. Because the spectral shapes are quite uniform from image to image, discrimination functions can be generated to separate these images reliably. For the zeroth moment spectra, a size-invariant discrimination function can be made using just the ratio of the maximum to minimum values in the spectrum. 3 Subregion selection Empirically, it is observed that the statistics gathered from a small part of a scanned page suffice for making an accurate determination of the underlying resolution. In fact, there are two reasons for not using the entire image. First, and of most importance, stipples and halftones can have an uncharacteristically large population of very small components, which can bias the edge pixel statistics to simulate a higher resolution. Second, the analysis of a subimage is faster than the full page. The image is arbitrarily divided into N equal tiles, where N is typically taken to be 16, and the 9

tile with statistics most similar to text is selected for resolution analysis. The operation to identify textness is as follows. For each tile: The image is threshold reduced[1] by a factor of 8, using a threshold of 1 on each 2x2 reduction. This is equivalent to performing an 8x8 dilation before subsampling, and consolidates the image pixels. The textlines are enhanced by closing with a horizontal structuring element (of size 15). The halftone and stippled regions are removed by first opening with a vertical structuring element (of size 1) to remove the text only, and then XORing to restore only the text. The tile with the largest number of ON pixels is selected. These operations, taking place on a 8x reduced image, are much faster than doing a the resolution analysis on the full image. On a 167 MHz Sun Ultra-Sparc, for a 4 ppi image, the selection of a subregion takes.6 seconds. The resolution analysis requires computation of connected component bounding boxes and, on a subregion of size 1/16 (about 1 million pixels) of a typical image, takes.15 seconds. The total time is less than 1 ms. 4 Discrimination results A set of 2 diverse pages was scanned and printed on a plain paper fax, at both standard and fine fax resolution. The 2 originals and the two sets of fax pages were then scanned at 3 ppi and 4 ppi. The image set is shown in Fig. 1. The discrimination functions given in (4) and (3) were used for the 3 ppi and 4 ppi sets, respectively, and applied to the zeroth and first moments of 8-cc heights. The results using the 8-cc height histogram for 4 ppi originals, fine fax and standard fax are given in Fig. 11, with the zeroth moment on the left and the first moment on the right. In these and following plots, the standard fax is the solid line with diamonds, the originals are the dashed line with + marks, and the fine fax is the dotted line with open squares. Here, the standard fax is far above the originals, with the fine fax in between. A threshold is easily chosen to separate standard fax from originals; thresholds can also be chosen to separate fine fax from each of the others. Similar scores, generated at 3 ppi from the 8-cc height histograms, are shown in Fig. 12. Again the standard fax is easily separated by a single threshold from the original. Using the power spectra for 8-cc, with the zeroth and first moments as shown in Fig. 7, Fig. 8 and Fig. 9, discrimination functions can be made from which the underlying resolution can be identified by application of a threshold. The result on the set of 2 pages is seen in Fig. 13. On the left, for m =, we use as a discrimination function the ratio of the max/min of the spectral values, capping the standard fax ratio at 1 to make visible the lower scores of the other plots. A threshold at about 15 separates the standard fax from fine fax and originals. On the right, for m = 1, we use the spectral discrimination function 1

Figure 1: 2 images analyzed in this section 11

1.8 4 ppi orig and fax scores, m = : (-y2 + Sum(y3,..., y8))/y1 9 4 ppi orig and fax scores, m = 1: (-y2 + Sum(y3,..., y8))/y1 1.6 8 score 1.4 1.2 1.8.6.4.2 score 7 6 5 4 3 2 1 -.2 2 4 6 8 1 12 14 16 18 2 image 2 4 6 8 1 12 14 16 18 2 image Figure 11: Scores of 4 ppi original, fine and standard fax images, using (3). Zeroth and first moments of the 8-cc height histograms are used on the left and right. 3.5 3 ppi orig and fax scores, m = : Sum(y2,..., y8)/y1 14 3 ppi orig and fax scores, m = 1: Sum(y2,..., y8)/y1 3 12 2.5 1 score 2 score 8 6 1.5 4 1 2.5 2 4 6 8 1 12 14 16 18 2 image 2 4 6 8 1 12 14 16 18 2 image Figure 12: Scores of 3 ppi original, fine and standard fax images, using (4). Zeroth and first moments of the 8-cc height histograms are used on the left and right. 12

Spectrum 4 ppi orig and fax scores, m=: ratio max/min 1 Spectrum 4 ppi orig and fax scores, m=1: p(8)/(p(2)+p(3)+p(4)+p(5)) 4 9 8 7 6 35 3 25 score 5 4 3 2 1 score 2 15 1 5 2 4 6 8 1 12 14 16 18 2 image 2 4 6 8 1 12 14 16 18 2 image Ô Ø Figure 13: Spectral power ¼¼ ½¼¼Ô µ scores of 4 ppi original, fine and standard fax images, using the max/min ratio for m = on the left, and using (7) for m = 1 on the right. Ô ¾µ Ô µ Ô µ Ô µ (7) The result is particularly interesting because the fine fax spectrum scores are lower than those for the original image, for all images in the set. This inversion allows a threshold around 2 to make a clear separation between fine fax and standard fax documents. 5 Conclusions Measurements of the vertical edge pixels have been used in various ways to identify an underlying low resolution structure in binary document images. Vertical, rather than horizontal, edge pixels have been chosen for illustration because standard fax has the lowest resolution in the vertical direction. The histogram of 8-cc heights is most sensitive to such structure, clearly displaying the periodicity. Note also the presence of single pixel 8-cc primarily in the original (top) of Fig. 1. By contrast, in all three images, 4-cc heights have many single pixel runs near corners where the boundary direction is changing between vertical and horizontal. Images with many small components, such as fine stipples or halftones, have most vertical edge pixels near corners. There are very few longer runs, and as a result, the 8-cc height histogram is significantly different from that for text and graphics. Consequently, it is important to avoid such areas, and to select the most text-like regions for spectral analysis. This is done using morphological filters on a highly reduced version of the image, and counting the resulting pixels. Because the selection operations assume that most of the text on the page is horizontal, it is important to verify the orientation first, and efficient methods exist to do this[2]. Once a subregion is selected and the 8-cc statistics are accumulated, a discrimination function with a threshold is used to separate images with differing underlying resolutions. The discriminator 13

must give a result that is (1) independent of the size of the image and (2) independent of the size and in the image. Normalization is easily achieved by taking, in general, a ratio of linear combinations of histogram values or power spectral values. The most important distinction is between standard fax and higher resolutions, and this can be done either directly on the histogram or on its power spectrum. The power spectrum can be used as a narrow spectral filter to determine the underlying resolution. As an example, if the high/low resolution ratio is 4, most of the spectral power can be put into the 4 pixel mode, whereas for an image with high/low resolution ratio of 2, the 4 pixel mode will be depleted. With an appropriate discrimination function, this permits accurate separation between standard fax and fine fax images. There are interesting and practical questions that have not been addressed here. For example, a relatively high quality fax scanner was used, on the assumption that the low frequency signature of cheaper scanners will be at least as easy to see. What image quality variations are possible with various low-resolution scanners and fax printers? How is the low-frequency resolution information changed with successive copies? And one possible application is to provide information for image restoration that will smooth boundaries and improve the appearance of degraded text. This may also be helpful as a pre-processing step for OCR. 6 Acknowledgments I am grateful to Ron Danisewicz at Xerox ScanSoft for encouraging work on this interesting problem, and to Judy Bares for helpful discussions. References [1] D. S. Bloomberg, Image analysis using threshold reduction, SPIE Conf. 1568, Image Algebra and Morphological Image Processing II, pp. 38-52, San Diego, CA, July 23-24, 1991. [2] D. S. Bloomberg, G. E. Kopec and L. Dasari, Measuring document image skew and orientation, SPIE Conf. 2422, Document Recognition II, pp. 32-316, San Jose, CA, Feb 6-7, 1995. [3] S. V. Rice, F. R. Jenkins and T. A. Nartker, The Fourth Annual Test of OCR Accuracy, an addendum to Fifth Annual Symp. on Document Analysis and Information Retrieval, Las Vegas, NV, Apr 15-17, 1996. [4] L. Vincent, Fast opening functions and morphological granulometries, SPIE Conf. 23, Image Algebra and Morphological Image Processing V, San Jose, CA, July 1994. 14