Feature description based on Center-Symmetric Local Mapped Patterns Carolina Toledo Ferraz, Osmando Pereira Jr., Adilson Gonzaga Department of Electrical Engineering, University of São Paulo Av. Trabalhador São Carlense, 4-3566-59 São Carlos, SP - Brazil caferraz8@usp.br, osmandoj@gmail.com, agonzaga@sc.usp.br ABSTRACT Local feature description has gained a lot of interest in many applications, such as texture recognition, image retrieval and face recognition. This paper presents a novel method for local feature description based on gray-level difference mapping, called Center-Symmetric Local Mapped Pattern (CS- LMP). The proposed descriptor is invariant to image scale, rotation, illumination and partial viewpoint changes. Furthermore, this descriptor more effectively captures the nuances of the image pixels. The training set is composed of rotated and scaled images, with changes in illumination and view points. The test set is composed of rotated and scaled images. In our experiments, the descriptor is compared to the Center-Symmetric Local Binary Pattern (CS-LBP). The results show that our descriptor performs favorably compared to the CS-LBP. Categories and Subject Descriptors I.5.2 [Pattern Recognition]: Design Methodology feature evaluation and selection Keywords region description, image matching, region detection, feature description. INTRODUCTION In image processing, the local feature description plays an important role in applications such as texture recognition, content-based image retrieval and face recognition. The objective is to build a descriptor histogram vector providing a representation that allows efficient matching of local structures between images. A good descriptor has some important features including invariance to illumination, rotation and scale. Furthermore, it should be robust to errors in detection, geometric distortions from different points of view and photometric deformations caused by different lighting conditions [3]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC 4 March 24-28, 24, Gyeongju, Korea. Copyright 24 ACM 978--453-2469-4/4/3...$5.. http://dx.doi.org/5/255485554895 39 One of the most widely known key feature descriptors published in the literature is the Scale-Invariant Feature Transform (SIFT) [2]. It was introduced by Lowe in 999 []. It is characterized by a 3D histogram of gradient locations and orientations and stores the bins in a vector of 28 positions. Many variations of SIFT have been proposed, such as Principal Components Analysis SIFT (PCA-SIFT) [9], Gradient Location and Orientation Histogram (GLOH) [], Speeded Up Robust Features (SURF) [3], Colored SIFT (CSIFT) [], Center-Symmetric Local Binary Pattern (CS-LBP) [8] and Kernel Projection Based SIFT (KPB-SIFT) [7]. PCA-SIFT applies the standard Principal Components Analysis (PCA) technique to the normalized gradient patches extracted around local features. PCA is able to linearly project high-dimensional descriptors onto a low-dimensional feature space and reduce the high frequency noise in the descriptors. However, the PCA-based feature descriptors need an offline stage to train and estimate the covariance matrix used for PCA projection. Thus, the application of PCAbased features is limited. The GLOH descriptor also uses PCA to reduce the descriptor size. It is very similar to SIFT, however, it uses a Log-Polar location grid instead of a Cartesian one. SURF approximates SIFT using the Haar wavelet response and using integral images to compute the histograms bins. It uses a descriptor vector of 64 positions, providing a better processing speed. CSIFT tries to solve the problem of color image descriptors, improving on SIFT, which is designed mainly for gray images. It uses the color invariance model to build the descriptor. CS-LBP [8] combines the strengths of the well-known SIFT descriptor and the Local Binary Pattern (LBP) texture operator. In [8], it is shown that the construction of the CS-LBP descriptor is simpler than SIFT, but it generates a feature vector of 256 positions, which is twice the SIFT vector size. KBP-SIFT [7] reduced the dimension of the descriptor from 28 to only 36 bins by applying a kernel projection on the orientation gradient patches rather than using smoothed weighted histograms. The approach is tolerant to geometric distortions and does not require a training stage; however, the performance evaluation is not superior in all cases with respect to SIFT. In a recent work, Vieira, Chierici, Ferraz and Gonzaga [5] present a new descriptor based on fuzzy numbers that they called the Local Fuzzy Pattern (LFP). This method interprets the gray-level values of an image neighborhood as a fuzzy set and each pixel level as a fuzzy number. A membership function is used to describe the membership degree of the central pixel to a particular neighborhood. Moreover,
the LFP methodology is a generalization of previously published techniques such as the LBP, the Texture Unit [6], the Fuzzy Number Edge Detector (FUNED) [4] and the Census Transform [6]. One evolution of the LFP is the Local Mapped Pattern (LMP) [5], where the authors consider the sum of the differences of each gray-level of a given neighborhood to the central pixel as a local pattern that can be mapped to a histogram bin using a mapping function. In this paper, we propose a new interest region descriptors, denoted as the Center-Symmetric Local Mapped Pattern (CS-LMP). CS-LMP combines the desirable properties of the CS-LBP using the LMP methodology. This new descriptor captures the small transitions of pixels in the image more accurately, resulting in a greater number of correct matches than with CS-LBP. Furthermore, they do not use PCA to decrease the size of the feature vector, but instead quantize the histogram into a fixed number of bins. The rest of this paper is organized as follows. In Section 2, we briefly describe the methods of CS-LBP and LMP. Section 3 gives details on the proposed approach CS-LMP. The experimental evaluation and the results are presented in Section 4. Finally, we conclude the paper in Section 5. 2. LOCAL DESCRIPTORS In computer vision, local descriptors are based on image features extracted from compact image regions using histograms to generate a feature vector. In this section, two local descriptors are presented: the CS-LBP and the LMP. 2. Center-Symmetric Local Binary Pattern (CS-LBP) The CS-LBP is a modified version of the LBP texture feature and SIFT descriptor, inheriting the desirable properties of both texture features and gradient-based features. The features are extracted by an operator similar to the LBP and the constrution of the descriptor is similar to SIFT. This methodology has many properties such as tolerance to illumination changes, robustness on flat image areas, and computational simplicity [7]. In [8], the authors state that the LBP produces a rather long histogram and therefore is difficult to use in the context of an image descriptor. To solve this problem, they modified the scheme of how to compare the pixels in the neighborhood. Instead of comparing each pixel with the center one, the method compares centersymmetric pairs of pixels as illustrated in Figure. We can see that for 8 neighbors, LBP produces 256 different binary patterns (pattern LBP [, 255]), whereas for CS-LBP, this number is only 6 (pattern [, 5])). The interest region detectors, such as Hessian-Affine, Harris- Affine, Hessian-Laplace and Harris-Laplace [4, 3], provide the regions that are used to compute the descriptors. After computing the regions, they are mapped to a circular region of a constant radius to obtain the scale and affine invariance. These regions are fixed at 4 4 pixels and the pixel values are in the range [,]. Rotation invariance is obtained by rotating the normalized regions in the direction of the dominant gradient orientation. For the construction of the descriptor, the image regions are divided into cells with a location grid, and for each cell, a CS-LBP histogram is built. The CS-LBP feature extraction for each pixel of the region is accomplished according to Equation : 4 Figure : LBP and CS-LBP features for a neighborhood of 8 pixels CS LBP R,N,T (x, y) = (N/2) i= s(n i n i+(n/2) )2 i, () where n i and n i+(n/2) correspond to the gray values of center-symmetric pairs of pixels of N equally spaced pixels on a circle of radius R and {, if x>t, s(x) =, otherwise. To avoid boundary effects in which the descriptor abruptly changes as a feature shifts from one cell to another, bilinear interpolation over the x and y dimensions is used to share the weight of the feature between the four nearest cells. The share for a cell is determined by the bilinear interpolation weights [8]. Thus, the resulting descriptor is a 3D histogram of the CS-LBP feature locations and values. The final descriptor is built by concatenating the feature histograms computed for the cells and the last normalization to obtain the maximum unit for each bin. To eliminate the influence of very large descriptor elements, the bins of the histogram are limited to and the histogram is normalized again. For a 4 4 Cartesian grid, the final feature vector contains 256 positions. 2.2 Local Mapped Pattern (LMP) The LMP methodology assumes that each gray-level distribution within an image neighborhood is a local pattern. This pattern can be represented by the gray-level differences around the central pixel. Figure 2 shows an example of a 3 3 neighborhood and a 3D graphic of the respective local pattern. Figure 2: Gray-level differences as a local pattern Each pattern defined by a W W neighborhood will be mapped to a histogram bin h b using Equation 2, where f g(i,j) is the mapping function, P (k, l) is a weighting matrix of predefined values for each pixel position within the neighborhood, and B is the number of the histogram bins. This equation represents the weighted sum of each gray-level difference
between the neighboring pixels and the central one, mapped onto the [,] interval by a mapping function and rounded to B possible bins. ( W W k= l= h b = round (f ) g(i,j)p (k, l)) W W k= l= P (k, l) (B ), (2) Through Equation 2, it is possible to derive some previously published approaches for local pattern analysis. The Local Binary Pattern (LBP) can be derived from Equation 2 using the Heaviside step function shown in Equation 3: where H[A(k, l) g(i, j)] = f g(i,j) = H[A(k, l) g(i, j)] (3) { if [A(k, l) g(i, j)] < ; if [A(k, l) g(i, j)]. Taking into account the basic LBP with a 3 3neighborhood of pixels, the weighting matrix will be: P (k, l) = 2 4 28 8 64 32 6 The LBP is thus a particular case of the LMP approach. Using different mapping functions, the LMP methodology could extract specific features from the image using local pattern processing. For texture analysis, the LMP approach uses a smooth approximation to the Heaviside step function by a logistic function, a logistic curve, or a common sigmoid curve as in Equation 4. f g(i,j) = +e [A(k,l) g(i,j)] β, (4) where β is the curve slope and [A(k, l) g(i, j)] are the graylevel differences within the neighborhood centered in g(i, j). The proposed weighting matrix is: P (k, l) = When using the sigmoid curve, the method is called LMPs. Sigmoid functions, whose graphs are S-shaped curves, appear in a great variety of contexts, such as the transfer functions of many neural networks. Their ubiquity is no accident; these curves are among the simplest nonlinear curves, striking a graceful balance between linear and nonlinear behavior. Figure 3 shows the mapping functions f g(i,j) used in Equation 2 for the LMP-s (sigmoid) and LBP (Heaviside step) approaches. 3. CENTER-SYMMETRIC LOCAL MAPPED PATTERN (CS-LMP) The CS-LBP descriptor described in the previous section has the following disadvantage: the difference between the symmetrical pairs and the center are thresholded to or, and do not capture the nuances that can occur in the image. 4 (a) Mapping function to the LMP-s method (b) Mapping function to the LBP method Figure 3: Gray-level differences mapped by Equation 2. a) Local Mapped Pattern. b) Local Binary Pattern Analyzing the advantage of flexibility of the LMP method to be malleable by changing the mapping function and the weight matrix in the Equation 2, and the disadvantage of the CS-LBP descriptor (stiffness in using the step function), we propose a new descriptor named CS-LMP based on the CS-LBP, but using a mapping function suitable for the calculation of characteristics in the LMP method. Equation 5 shows the feature extraction for each pixel of the interest region, defined in a neighborhood W W : ( N/2 ) i= f(i)p (i)) R,N,f (x, y) =round N/2 b (5) i= P (i) where b is the number of histogram bins minus one, P (i) is the ith element of the weight matrix P defined in the Equation 6, and f(i) is the sigmoid mapping function defined in Equation 7. where P =[p p p 2...p N/2 ] t (6) f(i) = p i =2 i [n i n i+(n/2) ] +e β where n i and n i+(n/2) correspond to the gray level values of the center-symmetric pairs of N equally spaced pixels on a circle of radius R and β is the sigmoid curve slope. If the coordinates of n i on the circle do not correspond to the image coordinates, the gray-value of this point is interpolated. Figure 4 illustrates the operation of the CS-LMP after setting b to 5 (6 bins minus one). We adopted the Hessian-Affine detector because it presents better results for detecting interest regions in images [8]. Afterwards, the regions are mapped to a circular region of constant radius and rotated in the direction of the dominant gradient, as described in [8] and [2]. Each region is divided into cells with a grid size of 4 4. The features are extracted for each pixel of the interest region using a CS-LMP descriptor and a histogram is built for each cell. The parameters R, N and β were empirically established, the best results were obtained and these will be discussed in Section 4. As it was presented in [7], a bilinear interpolation is made over the x and y dimensions to share the weight of the feature between the four nearest cells. Thus, a 3D histogram of (7)
(a) graf (b) graf4 (c) bark (d) bark4 leu- (f) ven4 leu- (e) ven (g) trees (h) trees4 Figure 4: CS-LMP features for a neighborhood of 8 pixels the CS-LMP feature locations and values is built as shown in Figure 5. The descriptor is made by concatenating the feature histograms computed for the cells, obtaining a vector of 256 positions(4 4 6). The descriptor is then normalized as in the CS-LBP presented in Section 2.. (a) Input (b) 4 4 (c) 4 4 6=256 region cells divisionated bins of concate- histogram Figure 5: CS-LMP descriptor. 4. EXPERIMENTAL EVALUATION In this section, we present the performance of the CS-LMP and CS-LBP descriptors applied to the matching of image pairs using nearest-neighbor similarity. The image database is available on the internet 2. It has 8 sets of images with 6 different types of transformations: viewpoint change, rotation change, scale, ilumination, JPEG compression and blur. The training images used in this paper are shown in Figure 6 and the test images are showninfigure7. Table presents the evaluation of the parameter β in the method CS-LMP for the image bark. We tested some values of the β parameter and we generated the curves, measuring its respective area under curve (AUC). For most of the test images, the best value of β was near for CS-LMP. The evaluation criteria is explained in the next subsection. 4. Evaluation criteria The performance of the CS-LMP descriptor was evaluated and compared with CS-LBP, using an evaluation criterion based on Heikkilä et al. [7, 8]. This criterion consists of the number of correct and false matches for a couple of images, knowing a priori the set of true matches (i.e., the ground truth). Heikkilä et. al. [8] determine the ground truth http://www.robots.ox.ac.uk/ vgg/research/affine/ 2 http://lear.inrialpes.fr/people/mikolajczyk/ 42 (i) ubc (j) ubc4 (k) boat (l) boat4 (m) bike (n) bike4 (o) wall (p) wall4 Figure 6: Training Images: graf (viewpoint change), bark (scale and rotation change), trees (blur image), leuven (ilumination change), ubc (JPEG compression), boat (scale and rotation change), bike (blur image), wall (viewpoint change) (a) resid (b) resid4 (c) east park (e) east (f) east (g) south south4 zmars (i) zmonet (j) zmonet4 (m) belledone (n) belledone4 (o) bip (d) east park4 (h) zmars4 ax- (l) terix4 (k) terix ax- (p) bip4 Figure 7: Test Images: resid, east park, east south (scale and rotation), zmarks, zmonet (rotation change), axterix, belledone, bip (scale change) by the overlap error of the two regions. The overlap error determines how a region A of image overlaps region B of image 2. Image 2 is projected on image by a homography matrix H. The overlap error is defined by the ratio of the intersection
Table : β evaluation example for the image bark β AUC. 993 73 73 73 96 63 96.. (a) resid.. (b) east park and the union of the regions given by Equation 8: ɛ S = (A H T BH)/(A H T BH) (8) A match is assumed to be correct if ɛ S <. To evaluate the performance of our descriptors, we present the results with curves [2] and their respective AUC. Precision represents the number of correct matches over the total number of possible matches returned (falses+corrects) andisdefinedbyequation9. = nc (9) n where n is the number of possible matches returned and the Euclidean distance between their descriptors is below a threshold t d. The number of correct matches n c includes the matched pairs whose Euclidean distance between their descriptors is below a threshold t d and ɛ S < Recall represents the ratio between the correct matches and the total number of true matches (the ground truth) and is defined by Equation. = nc M, () where M is the number of elements in the set of true matches. To construct the curve, each point is defined by choosing n [,N], where N represents the total number of possible matches. Analyzing this curve, a perfect descriptor would give a equal to for any. 4.2 Matching results The interest region detector used to compute the descriptors was the Hessian-Affine detector. As discussed in the previous subsection, the parameters set for the CS-LMP were β =, b = 5, R =2,andN =8. FortheCS- LBP, we used the parameters R =2,N =8,andT =. [7]. We generated the curves for all images of the test set and calculated the AUC for each of them. However, for reasons of clarity, we present only four of them in Figure 8. Note that there is a good for low for the four descriptors, i.e., the first region pairs returned are correct matches; however, as the number of region pairs returned increases, the decreases. We can see that the reduction of is more accentuated by CS-LBP. As the number of returned matches increases (n), CS-LMP achieves a greater number of correct matches (n c) compared to CS-LBP. The AUC results are shown in Table 2 for the CS-LMP and CS-LBP descriptors. Note that for all images, CS-LMP has an AUC value greater than CS-LBP. 43.. (c) east south.. (d) zmars Figure 8: Results of the curves for CS-LMP and CS-LBP descriptors. Table 2: Area under curve (AUC) Image CS-LMP CS-LBP resid 43 332 east park 758 842 east south 27 86 zmars 659 499 zmonet 659 364 axterix 577 237 belledonne 878 74 bip 889 495 Table 3 presents the values for, and the n best returned correspondences by the two descriptors. For example, for the image named resid, taking into account the first 27 possible matches returned, we have 58 correct matches for the CS-LMP and 545 for the CS-LBP descriptor. The image east south presents the difference of 6 correct matches favorable to the CS-LMP and for the image east park, our proposed approach outstands by 52 correct matches over the CS-LBP. Figure 9 presents the correct matches achieved between the image pair called resid. It was performed to check the images test results when applying the descriptors CS-LMP and CS-LBP. 5. CONCLUSIONS In this paper, we proposed a novel method for feature description based on local pattern analysis. The method combines the LMP operator and the CS-LBP descriptor. It uses the CS-LBP-like descriptor and replaces the step function by the sigmoid function. CS-LMP features has many properties that make it wellsuited for this task: a relatively short histogram and tolerance to rotation and scale change. Furthermore, it does not require setting many parameters. The performance of the CS-LMP descriptor was compared to the CS-LBP descriptor in terms of curves. As CS-LMP
2 3 4 5 6 2 3 4 5 6 2 4 6 8 2 4 6 2 4 6 8 2 4 6 Table 3: Correct Matches Image Recall Precision Correct Matches CS-LMP resid 265 648 58/27 CS-LBP 76 37 545/27 CS-LMP east park 79 94 64/37 CS-LBP 343 57 589/37 CS-LMP east south 266 95 745/55 CS-LBP 52 55 685/55 CS-LMP axterix 78 69 276/385 CS-LBP 366 99 266/385 CS-LMP belledonne. 93 5/6 CS-LBP 667 3/6 CS-LMP bip 32 538 476/728 CS-LBP 596 223 453/728 CS-LMP zmars 689.764 68/22 CS-LBP 568.755 66/22 CS-LMP zmonet 74 827 47/52 CS-LBP 427 596 395/52 (a) CS-LMP - 356 correct matches (b) CS-LBP - 35 correct matches Figure 9: Matches achieved between the image pair called resid. Total of 36 correspondences. captures the gray-level nuances of the images better, the curves show a better performance than CS-LBP methodology. We can conclude that CS-LMP is a promising descriptor, with better performance over the CS-LBP descriptor. Among the 8 tested images, the AUC of the curve was greater for all images for CS-LMP than CS-LBP. Moreover, the number of correct correspondences outperformed the CS-LBP. 6. ACKNOWLEDGMENTS The authors would like to thank the São Paulo Research Foundation (FAPESP), grant #2/8645-2, and National Council for Scientific and Technological Development (CNPQ) for their financial support of this research. 7. REFERENCES [] A. E. Abdel-Hakim and A. A. Farag. Csift: A sift descriptor with color invariant characteristics. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR6), 2:978 983, 26. 44 [2] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. New York, Addison-Wesley, 999. [3] H. Bay, T. Tuytelaars, and L. V. Gool. Surf: speeded up robust features. European Conference on Computer Vision, :44 47, 26. [4] I. A. G. Boaventura and A. Gonzaga. Edge detection in digital images using fuzzy numbers. International Journal of Innovative Computing and Applications, 2: 2, 29. [5] C. Chierici, R. T. Vieira, C. T. Ferraz, and et. al. A generalized method for texture classification by local pattern analysis. IEEE Transactions on Systems, Man, and Cybernectics, Part B: Cybernectics (submitted), 23. [6] D.-C. He and L. Wang. Texture unit, texture spectrum, and texture analysis. IEEE Transactions On Geoscience and Remote Sensing, 28:59 52, 99. [7] M. Heikkilä, M. Pietikäinen, and C. Schmid. Description of interest regions with center-symmetric local binary patterns. 5th Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP 26, 4338:58 69, december 26. [8] M. Heikkilä, M. Pietikäinen, and C. Schmid. Description of interest regions with local binary patterns. Pattern Recognition, 42:425 436, 29. [9] Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. Computer Vision and Pattern Recognition, 24., 2:56 53, 24. [] K.Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach Intell., 27():65 63, 25. [] D. G. Lowe. Object recognition from local scale-invariant features. International Conference on Computer Vision, pages 5 57, 999. [2] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 6:9, 24. [3] K. Mikolajczyk and C. Schmid. Scale & affine invariant interest point detectors. International Journal of Computer Vision. Kluwer Academic Publishers, Hingham, MA, USA, 6():63 86, 24. [4] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. V. Gool. A comparison of affine region detectors. Int. J. Comput. Vision, 65(/2):43 72, 25. [5] R. T. Vieira, C. E. de Oliveira Chierici, C. T. Ferraz, and A. Gonzaga. Local fuzzy pattern: A new way for micro-pattern analysis. Springer-Verlag Berlin Heidelberg. Ideal 22, 7435:62 6, 22. [6] R. Zabih and J. Woodfill. Non-parametric local transforms for computing visual correspondence. Third European Conference on Computer Vision. Springer-Verlag New York, Inc., New Jersey, pages 5 58, 994. [7] G. Zhao, L. Chen, G. Chen, and J. Yuan. Kpb-sift: a compact local feature descriptor. MM Proceedings of the international conference on Multimedia, pages 75 78, 2.