C-Green: A computer Vision System for Leaf Identification

C-Green: A computer Vision System for Leaf Identification Chinmai Basavaraj Indiana State University cbasavaraj@sycamores.indstate.edu December 16, 2014 Abstract I describe here a system which uses automatic visual recognition to identify tree species from the picture of their leaves. I mainly describe the Image processing techniques and Computer Vision concepts required to build this system. I have made use of OpenCV an open source computer vision and machine learning software library, to separate the leaf image from a light colored background and to extract features representing the curvature of the leaf s contour over multiple scales. 1 Introduction C Green is a system to automatically identify tree species using computer vision and image processing. Computer vision is the process of modeling and replicating human vision using computer software and hardware. The work described here is closely related to Leafsnap [1] a mobile app, which describes a more advanced and latest version of the current leaf identification system. I have a used a different and simpler approach for Segmentation [Section 2] of the leaf from its background. Apart from that the basic idea is the same. I have made use of OpenCV, an open source computer vision and machine learning software library to implement this system in C++. Our automatic system requires that a single leaf specimen is photographed on a solid lightcolored background (See Figure 1). The recognition process consists of Segmenting the image to obtain a binary image separating the leaf from the background. We do this by estimating a threshold value that separates the background from the foreground and dilating the resulting image to get rid of stem. [Section 2] Extracting curvature features from the binarized image discriminatively representing the shape of the leaf. We compute histograms of curvature over multiple scales using integral measures of curvature. [Section 3] Comparing the features to those from a labeled database of leaf images and returning the species with the closest matches. 1

Figure 1: Image of a leaf on a light-colored solid background The current version of the system does not have the comparison module written into it. The segmentation and extraction process are completed within 1 second for an image of size 640x480 pixels. 2 Segmentation Segmentation is the process of separating the leaf from its background to obtain a binary image (Black/White). My program relies solely upon shape of the leaf to identify its species. Other features such as the color of the leaf, its venation pattern are not suitable for various reasons - they are either too highly variable across different leaves of the same species, undetectable or only present at limited time of the year. Reliable leaf segmentation is thus crucial in order to obtain shape descriptions that are sufficiently accurate for recognition. 2.1 Uniform Thresholding I make use of a simple segmentation method called Uniform Thresholding. Here, the matter is straight forward. If pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black). I read in the image in greyscale format, since the color information is not required. It makes the computation easier and efficient. I go through every pixel, converting the image to binary. (See Figure 3) Figure 2: Image of leaf in greyscale 2

Figure 3: Binary Image of leaf after thresholding 2.2 Dilation - Removing the Stem At this point, the stem of the leaf may or may not be present in the segmentation. The original leaf might not have had a stem to begin with, or it might have been lost during segmentation. To standardize the shape, we have to remove the stems from all segmentations using Dilation. A dilation operation consists of convoluting an image with some kernel, which can have any shape or size, usually a square or circle. The kernel has a defined anchor point, usually being the center of the kernel. The function dilates the image using the specified structuring element that determines the shape of a pixel neighborhood over which the maximum is taken. (See Figure 4) Figure 4: Image of leaf without stem after Dilation 2.3 Resizing We have to resize the image to a standardized pixel value before we begin to extract features. I make use of the opencv resize() function along with bilinear Interpolation to do soft-binning of curvature values. 3 Extraction Leaf shape can be effectively represented using multiscale curvature measures. Curvature is a fundamental property of shape and has thus attracted much attention from the vision community. We make use of integral measures to compute functions of the curvature at a boundary point. 3

One such measure in 2D is the area of intersection of a disk centered at a contour point and the inside of the contour (see Fig 5). For straight, concave, and convex boundaries, the fraction of the disk intersected will be equal to or greater than or less than 0.5 respectively. Figure 5: Curvature = (white pixels in circle)/area Integral measures are fast and easy to compute for images on discrete grids, invariant to rotation, insensitive to small segmentation and discretization errors, independent of the topological complexity. 3.1 Edge Detection In order to find the points along the contour of the leaf, I make use of basic edge detection method. I traverse through the Image matrix comparing each pixel Intensity with its immediate neighbors. Any change in the intensity indicates an edge. I mark all these points and measure the curvature at each of these points. Figure 6: Image of points along the contour resulting from edge detection 3.2 Computing Histograms of Curvature over Scale[HoCS] Histograms are simply collected counts of the data organized into a set of predefined bins. Parallel to measuring the curvature along the contour of the leaf, I construct the histogram of curvature for that specified scale. Once we obtain the curvature measure at a particular point add it to the respective histogram bin. In the end we will have Histogram of Curvature for that particular scale. 4

Figure 7: Image showing how curvature is measured at each point I repeat this process for multiple scales. Finally, we compute histograms of curvature values at multiple scales and then concatenate these histograms to form the HoCS feature. If you look at Figure 8 and Figure 9, you can observe the big difference in HoCS feature calculated for 2 different leaf Images of vaying shape. Figure 8: Histogram of curvature for Maple Leaf Figure 9: Histogram of curvature for some random leaf 5

4 Future Work Comparison First thing I need to do to make the system complete, is to add a comparison module. To promote further research in leaf recognition, Leafsnap [1] has released its extensive dataset consisting of images of leaves taken from two different sources, as well as their automatically generated segmentations. I can make use of this dataset and develop a web based interface, where the user can upload a picture of the leaf and I can use a simple nearest neighbor approach to display top 5 matches. Users can then make the final identification themselves. Classifier Classifying whether the image is of a valid leaf, to decide if it is worth processing further, using a binary classifier applied to gist features [2]. Advanced Thresholding and Segmentation I can make use of more advanced adaptive thresholding and segmentation methods like Otsu s Method. Efficiency Currently the system runtime is O(m*n), where m and n are widht and height of Image in Pixels. Segmentation routine described already makes use of the multi-threaded features. I can incorporate a multi-threaded version of code for Computing Histogram of Curvature over scale. That would improve the speed by a great factor. 5 References Leafsnap : A Computer Vision System for Automatic Plant Species Identification, Neeraj Kumar, Peter N. Belhumeur, Arijit Biswas, David W. Jacobs, W. John Kress, Ida C. Lopez, JoÃ o V. B. Soares, Proceedings of the 12th European Conference on Computer Vision (ECCV), October 2012 Adrian Kaehler, Gary Bradski Learning OpenCV-Computer Vision in C++ with the OpenCV Library: (2Nd Ed.) O Reilly Media 6