Removing Shading Distortions in Camera-based Document Images Using Inpainting and Surface Fitting With Radial Basis Functions

Removing Shading Distortions in Camera-based Document Images Using Inpainting and Surface Fitting With Radial Basis Functions Li Zhang Andy M. Yip Chew Lim Tan School of Computing, 3 Science Drive 2, National University of Singapore Department of Mathematics, 2 Science Drive 2, National University of Singapore {dcszl,andyyip,dcstcl}@nus.edu.sg Abstract Shading distortions are often perceived in geometrically distorted document images due to the change of surface normal with respect to the illumination direction. Such distortions are undesirable because they hamper OCR performance tremendously even when the geometric distortions are corrected. In this paper, we propose an effective method that removes shading distortions in images of documents with various geometric shapes based on the notion of intrinsic images. We first try to derive the shading image using an inpainting technique with an automatic mask generation routine and then apply a surface fitting procedure with radial basis functions to remove pepper noises in the inpainted image and return a smooth shading image. Once the shading image is extracted, the reflectance image can be obtained automatically. Experiments on a wide range of distorted document images demonstrate a robust performance. Moreover, we also show its potential applications to the restoration of historical handwritten documents. 1. Introduction The popularity of current hand held digital devices such as digital cameras, cellphones and PDAs has made camera imaging a convenient way of recording information. With such a camera-enabled device, people can snap photos of documents whenever and wherever needed as a way of daily notes taking. However, this also gives rise to many distorted images especially when the imaging environment is uncontrollable. One of such distortions is shading including shadows. Strictly speaking, shading is the variation in luminance caused by a change of surface normal with respect to the illumination direction while shadow refers to the variation caused by occlusions of the light source. In particular, when capturing documents of non-planar geometric shapes, we often receive images containing both geometric and shading distortions. These create great challenges for current OCR systems to identify the words correctly and in the right sequence. To obtain a good recognition rate, it is necessary to correct both distortions to a certain extent. Brown and Tsoi propose a boundary interpolation method to correct both distortions on images of warped art materials [3]. The method produces good results for a variety of geometric warpings but restricted to iso-parametric folding lines. Since the shape of the warped surface is not required, the uniform parametrization needs to be guided by a checkerboard pattern placed beneath the document. Furthermore, image boundaries must be present and an unobstructed white border needs to be enforced for the estimation of the shading. These conditions are often hard to satisfy when people just take snapshots for convenience. Sun et al. present a system to restore both geometric and photometric artifacts of arbitrarily distorted documents [11]. This system requires a special 3D scanning setup to acquire the depth map of the warped surface and it handles mainly nonsmooth shadings caused by folds. On the other hand, methods have been proposed to separate reflectance and illumination images based on the notion of intrinsic images [1], which defines an image as composed of a reflectance component and a shading component. The illumination image here includes both shading and shadow. Color information has been exploited to separate reflectance from shading based on the observation that shading is almost exclusively defined by luminance while reflectance is defined by both luminance and color [9]. Funt et al. [7] propose a method to recover shading from color images by removing reflectance component based on associated abrupt chromaticity changes. In other word, they use the fact that the change of reflectance is usually caused by a change in color. Similarly, Tappen et al. [12] introduce another method to recover shading and reflectance images using both color information and a classifier trained to recognize local gray-scale patterns to distinguish derivatives causedbyreflectance changes from those caused by shading. The intrinsic images are recovered from its derivatives using the same method as introduced by Weiss [14]. In both

methods, diffuse surfaces are assumed and the thresholding process can potentially flatten out discontinuous geometric features that may appear in the shading image. Toro et al. [13] describe an approach that addresses both diffuse and specular reflections with a known illumination direction. Despite all these efforts in deriving the intrinsic images, there is no single exact solution because the decomposition of the intensity image into its two intrinsic components is theoretically not unique. In view of the daily snapshots of document images containing mainly text and graphics information, we propose a simple yet effective method that extracts the reflectance image by removing the shading distortions including both geometrically caused shadings and cast shadows. Assuming that a given document has a constant colored background, the reflectance image will contain only the printed texts/graphics which indicate a color change. Our objective is to derive this reflectance image so as to improve the document s visual appearance and the OCR performance. To do this, we first extract the shading image through an inpainting technique followed by a surface fitting process when appropriate. The inpainting mask is obtained using an edge-based method followed by some morphological operations. Once the shading image is obtained, the reflectance image can be derived easily based on the notion of intrinsic images. Experiments on various document images demonstrate a robust performance. In addition, we also show that this method can be used to clean up historical handwritten document images with stains and patch noises. 2. A General Work Flow Figure 1 illustrates a detailed work flow of the proposed method. Given a distorted document image, we first extract an inpainting mask, which masks the text/graphics contents that cause a reflectance change. This is done using an edge-based method followed by a morphological operation. Next, a harmonic or Total Variation (TV) inpainting technique is applied to the original image to remove the printed contents using the extracted mask. If the mask does not fully cover the printed contents, the inpainted image may contain scattered pepper noises due to unremoved ink. This can be further refined through an iterative mask enhancement process. Alternatively, if the shading is smooth, a surface fitting scheme can be exploited to eliminate the noises and produce a smooth shading image. Once the shading image is extracted, the reflectance image can be easily derived based on the notion of intrinsic images. 3. Shading Extraction using Inpainting Assuming the given document image has a uniformcolored background such as the normal printed notes, plain Figure 1. Work flow of the restoration method. book pages, etc., an effective cue for differentiating shading from reflectance is the printed regions. It has been observed that luminance variations accompanied by color variations are usually variations in reflectance while luminance variations unaccompanied by color variations are variations in illumination [9]. Therefore, the printed text regions essentially imply the reflectance changes. If we can remove the luminance variations caused by the colored text, we will be left with pure shading variations. Therefore, the first step is to identify the text/graphics locations and remove all the colors that have high contrast to the background. 3.1. Automatic Mask Generation Text localization has been a widely researched area either on document images or digital videos. The techniques can be broadly classified as component-based [6, 10] or texture-based [16, 8]. The component-based methods usually try to analyze the geometrical arrangements of edges or uniform colored components of the characters. The texturebased methods utilize the texture characteristics of text lines to extract the text. Here we are interested in not only texts but also graphics. Whatever that may induce a reflectance change is within our consideration. Therefore, we make use of an edge-based method that essentially identifies pixels that are of high contrast to the background. Next, morphological operations are applied to the edge-detected image, which generates a mask for the printed contents. The detailed procedures are as follows: 1) Convert color images into gray-scale. This can be done by picking the luminance component of a color model such as the V-component of the HSV model or the I-component of the HSI model; 2) Detect edges using canny edge detector. Post-processings such as non-maximum suppression and streaking elimination are also applied for better results; 3) Perform morphological dilation followed by closing. The size of the structuring element can be tuned manually or adjusted automatically based on an estimated average character height when applicable.

3.2. Harmonic/TV Inpainting Once the mask of the printed regions is generated, an inpaintingtechniquecan be used tofill up the masked regions based on the neighboring background pixels. This is essentially to recover the shading in the printed regions based on the assumption that the local variation of shading is small. Digital inpainting was pioneered by Bertalmio et al. [2] and has since been applied to a variety of image processing applications. Here we use it as a way of recovering the shading. In particular, we look at two non-texture variational inpainting models, harmonic and TV inpainting [5]. Mathematically, inpainting can be considered as a local interpolation problem: Given an image I 0 with a hole H inside, we want to find an image I that matches I 0 outside the hole and has consistent information inside the hole. To do this, we try to find I that minimizes the following energy in a continuous domain : E(I) = χ (I I 0 ) 2 dx + λ I 2 dx (1) where λ>0 is a smoothness parameter and χ denotes the characteristic function: { 1, x \ H χ(x) = (2) 0, otherwise To minimize the energy in Eq. 1, we solve the Euler- Lagrange equation: E I =2[χ (I I 0) λδi] =0 (3) By using a gradient-descent method and a discretization using finite difference, we obtain the iterative update formula: I n+1 i,j ( λ = Ii,j n +Δt h 2 (In i+1,j + In i 1,j + In i,j+1 + In i,j 1 ) 4Ii,j) n χ i,j (Ii,j n I 0i,j ) (4) where h is the grid size and the smoothness parameter λ is chosen through trial and error. The time step Δt can be any small constant that makes the iteration stable. We noticed that the harmonic inpainting constructs a smooth solution which may cause problems when the text/graphics at image boundaries are masked out or when interior edges are occluded due to the overlaid text. This can be remedied by using TV inpainting. Instead of using a penalty term I 2 dx in Eq. 1, which is infinite for discontinuous functions, we use I dx instead, which allows discontinuous functions as minimizers. The energy function now becomes: E(I) = χ (I I 0 ) 2 dx + λ I dx (5) where λ =2σ 2 /ν. A minimizer for this energy function can be computed using a similar scheme as for harmonic inpainting. Note that both harmonic and TV inpainting are essentially local models, in which the inpainting is mainly determined by the existing information I 0 in the vicinity of the inpainted domain H. Moreover, Eq. 1 has a built-in denoising capacity so that it is robust to noise. The main difference is that harmonic inpainting builds very smooth solutions and thus does not cope well with edges, while TV inpainting is able to restore narrow broken smooth edges which often exist in document images due to overlaid texts. 3.3. Surface Fitting with RBF With the mask generated, the inpainting process removes all the masked text/graphics and returns a first-hand shading image. However, the result is often not ideal due to the errors in the extracted mask. For example, some unmasked printed pixels will be considered as background and therefore cause pepper noises in the inpainted image. One way to solve this problem is to iteratively improve the mask until no sharp edges are identified. Alternatively, we can remove the pepper noises by using a surface fitting algorithm with radial basis functions (RBF) [4]. This is especially useful when the shading needs to be smooth for further surface reconstruction tasks. Typically, given a set of 3D points {(x i,f(x i )), i=1, 2,,m} where x i is the x-y coordinate and f(x i ) is the z coordinate, a fitted surface can be expressed as: n g(x) = α j h(x y j ) (6) j=1 where {y j,j=1, 2,,n} is a set of selected collocation points and h(x) is the radial basis function. The number of collocation points is selected based on the dimension of the image, e.g. 12 12 for the image in Figure 2. The goal is to find the coefficients α j that minimizes the least square error defined as: { m } e = min (g(x i ) f(x i )) 2 (7) α 1,,α n i=1 with optional boundary conditions.various kernel functions of different smoothness can be used. Here we use Multiquadrics: h(x) = x 2 + c 2,wherec is a constant with c =10in our experiments. The advantages of using RBF fitting are: 1) It gives explicit formula for derivatives which are more accurate and less noisy than finite difference; 2) It is also easy to incorporate various types of boundary conditions; 3) Unlike polynomial fitting, RBF is more flexible and can be used to fit more complicated surfaces. Finally, Figure 2 shows an example of how surface fitting helps to extract smooth shading images.

(a) (b) (c) comparing to 86.7% on the original distorted images. Besides daily snapshots of printed documents, we also evaluated our method on digitized images of historical handwritten documents. These documents contain substantial noises due to the deterioration of the materials and nonuniform lighting. Figure 4 shows that our method can help clean up the noises and return a better image for further Document Image Analysis (DIA) tasks. (d) (e) (f) Figure 2. (a) Image of an arbitrarily warped document page; (b) Extracted inpainting mask; (c) One-pass inpainted image; (d) Shading image using RBF fitting; (e) Fitted 3D surface of the shading image; (f) Extracted reflectance image. 4. Deriving the Reflectance Image Once the shading image is extracted, it is easy to derive the reflectance image based on the notion of intrinsic images. For Lambertian surfaces, the intensity image is the product of the shading image and the reflectance image [1]. Consider the luminance component of the HSV model, we have I = I s I r. Now given the shading image I s,thereflectance image I r can be computed as: I r = e log I log Is. 5. Experimental Results We have evaluated the proposed method on a set of images captured using both normal digital cameras and cellphone cameras. Figure 3(a 1 ) shows a multi-folded paper with printed characters taken in a complex lighting environment. Figure 3(a 3 ) shows that the folded edges are well restored using the TV inpainting algorithm. Figure 3(b 1 ) and (b 4 ) shows a warped map image and its restored reflectance image, respectively. This demonstrates that our method can also deal with graphical documents as long as the background is of constant color. In addition, (a 5 ) and (b 5 ) show the images after geometric restoration which is done independently from the current work as reported in [15]. Next, Figure 3(c 1 ) shows an image taken using a cellphone camera with the phone s shadow on it and (c 4 ) shows the extracted reflectance image with the shadow removed. Lastly, Figure 3(d 1 ) is an image of a pure text document taken using cellphone camera with non-uniform lightings. A set of such text images with different lightings and shadows are used for conducting OCR experiments. For a total of 2,600 words out of 30 document images, we obtained an average word precision of 98.8% on the restored images (a) (b) Figure 4. (a) Noisy historical handwritten documents; (b) Noise cleaned images. Due to the uncontrolled imaging environment, the shading may contain arbitrary illumination variations or cast shadows. Therefore, it does not follow any exact illumination model. Our method here provides a way of estimating the shading and is specifically designed for document images. In some cases where the documents contain images of non-uniform colors such as embedded figures, we can create a mask that covers the whole image and then apply the inpainting algorithm. In addition, this is a standalone shading removal method, which can handle either flat images or geometrically distorted images and does not rely on a shape recovery process. 6. Conclusions In this paper, we propose a method that removes various shading artifacts from distorted document images and recovers the reflectance images for better visualization and further DIA tasks. The main idea is to use the notion of intrinsic images to separate the shading and reflectance images, in which the shading image is extracted based on an inpainting technique followed by a surface fitting procedure to smooth out the noises. Experiments have shown encouraging results and its potential applications to the cleanup of noisy historical documents. Further studies will be carried out to improve the shading extraction algorithm for documents with non-uniform colored background and also those with figures sitting across a folded edge. 7. Acknowledgment This research is supported by A*STAR grant 0421010085 and NUS URC grant R252-000-202-112.

(a 1 ) (a 2 ) (a 3 ) (a 4 ) (a 5 ) (b 1 ) (b 2 ) (b 3 ) (b 4 ) (b 5 ) (c 1 ) (c 2 ) (c 3 ) (c 4 ) (d 1 ) (d 2 ) (d 3 ) (d 4 ) Figure 3. (a 1 )(b 1 )(c 1 )(d 1 ) Original distorted image; (a 2 )(b 2 )(c 2 )(d 2 ) Extracted inpainting mask; (a 3 )(b 3 )(c 3 )(d 3 ) Extracted shading image; (a 4 )(b 4 )(c 4 )(d 4 ) Restored reflectance image; (a 5 )(b 5 ) Geometrically restored image. References [1] H. Barrow and J. Tenenbaum. Recovering intrinsic scene characteristics from images. Computer Vision Systems, pages 3 26, Academic Press, New York, 1978. [2] M. Bertalmio, G. Sapiro, C. Ballester, and V. Caselles. Image inpainting. SIGGRAPH 2000, pages 417 424, 2000. [3] M. S. Brown and Y. C. Tsoi. Geometric and shading correction for images of printed materials using boundary. IEEE Trans. on Image Processing, 15(6):1544 1554, Jun 2006. [4] J. C. Carr, R. K. Beatson, B. C. McCallum, W. R. Fright, T. McLennan, and T. J. Mitchell. Smooth surface reconstruction from noisy range data. Graphite 2003, pages 119 297, 2003. [5] T. F. Chan and J. H. Shen. Mathematical models for local nontexture inpaintings. SIAM Journal on Applied Mathematics, 62(3):1019 1043, 2002. [6] P. Clark and M. Mirmehdi. Recognizing text in real scenes. Int l Journal on Document Analysis and Recognition, 4:243 257, 2002. [7] B. V. Funt, M. S. Drew, and M. Brockington. Recovering shading from color images. 2nd European Conference on Computer Vision, pages 124 132, May 1992. [8] H. Li, D. Doermann, and O. Kia. Automatic text detection and tracking in digital video. IEEE Trans. on Image Processing, 9(1):147 156, 2000. [9] A. Olmos and F. A. A. Kingdom. A biologically inspired algorithm for the recovery of shading and reflectance images. Perception, 33(12):1463 1473, 2004. [10] M. Pietikainen and O. Okun. Edge-based method for text detection from complex document images. Sixth Int l Conf. on Document Analysis and Recognition, pages 286 291, 2001. [11] M. X. Sun, R. G. Yang, L. Yun, G. Landon, B. Seales, and M. S. Brown. Geometric and photometric restoration of distorted documents. IEEE Int l Conf. on Computer Vision, 2:1117 1123, Oct 2005. [12] M. F. Tappen, W. T. Freeman, and E. H. Adelson. Recovering intrinsic images from a single image. Pattern Analysis and Machine Intelligence, 27(9):1459 1472, 2005. [13] J. Toro, D. Ziou, and M. F. Auclair-Fortier. Recovering the shading image under known illumination. 1st Canadian Conf. on Computer and Robot Vision, pages 92 96, 2004. [14] Y. Weiss. Deriving intrinsic images from image sequences. IEEE Int l Conf. on Computer Vision, 2:68 75, 2001. [15] L. Zhang, A. M. Yip, and C. L. Tan. Shape from shading based on lax-friedrichs fast sweeping and regularization techniques with applications to document image restoration. Computer Vision and Pattern Recognition, 2007. [16] Y. Zhong, H. Zhang, and A. K. Jain. Automatic caption localization in compressed video. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(4):385 392, 2000.