Binary Image Scanning Algorithm for Cane Segmentation

Binary Image Scanning Algorithm for Cane Segmentation Ricardo D. C. Marin Department of Computer Science University Of Canterbury Canterbury, Christchurch ricardo.castanedamarin@pg.canterbury.ac.nz Tom Botterill Department of Computer Science University Of Canterbury Canterbury, Christchurch Richard D. Green Department of Computer Science University Of Canterbury Canterbury, Christchurch Abstract In this paper we present a novel approach to vine cane segmentation. The method robust to scale, invariant to rotation of the image of factors of 90 o and is fast to use on high resolution images. We present a formal definition of a vine cane segment and describe an algorithm to find an approximation of a full cane segmentation for a 2D vine image. We have called the algorithm Binary Image Scanning - BIS, since it scans a binary image collecting cross sections of vine pixels in both directions, horizontal and vertical. The two scans are processed to form cane segments and are merged to full cane segmentation. We show results and compare them to previous cane segmentation algorithms. We also discuss ideas to further ext our findings in the future. I. INTRODUCTION In this paper we describe a new method to perform the segmentation of canes of grape vine binary images (see in Figure 1). The segmentation is to be used to infer the 2D tree structure of the vine. The 2D structure is to be used by a vine pruning robot to reconstruct 3D model of vines. Modeling and extracting each cane segment separately will allow us to observe and analyze the vine in a high-level way. For instance, in Botterill et al. [1] we are currently researching bottom up/top down methods to parse vine images and recover the underling 2D structure. Here the terminals of the parsing tree for a given input image would correspond to the cane segments or furthermore the pixels that form up these segments. Therefore, we are interested in a cane segmentation algorithm that is consistent between different vines, indepent of scale, orientation, and that is efficient enough to be used on frames of a video of vines. Our main contributions on this paper are firstly, a formal definition of cane segments; secondly the proposal of the vine cane segmentation problem; and finally a constructive algorithm that approximates a solution to this problem. Also, on the applications side, the resulting cane segments from our method have the desired properties for being used as primitives in a bottom up/top down vine structure parser. The rest of this document is structured as follows. Section II reviews other research on vine modeling and or methods that are relevant to this topic. The main contributions of this paper as described before are presented in Section III. We also show results of applying our method in comparison to other vine segmentation approaches in Section IV. Finally, Section V discuses ideas that could be used for improving our method in the future. II. RELATED RESEARCH This research is related to segmentation on binary images. In particular, cane segmentation has been researched before in Marin et al. [2] and Botterill et al. [1]. In [2] the idea is to automatically cluster together all vine pixels that conform to one cane by learning a mixture of Gaussians with a custom split-and-merge expectation maximization algorithm. The method works well in recognizing different cane segments. However, these cane segments in the same output segmentation are not consistent with a common definition. This means some of them may include overlapping sections or holes in between, while many others do not. Instead, our algorithm uses a formal definition of cane segment (Section III-A). This ensures the consistency of the output which we think is a desirable characteristic for using the output as primitives in a grammar, for inferring the vine structure. In Botterill et al. [1] the authors showed promising results of bottom-up parsing of cane segments. These segments are extracted by simplifying edge polylines using the Ramer- Douglas- Peucker algorithm [1]. Here a hierarchical bottom-up representation is carried from cane edges little segments, to full cane edges, then to cane parts, and finally to complete canes. The method makes use of SVM, to help in deciding which two pairs of parts should be joined or not. Results were reported to be limited to the training procedure [1]. Other research on this technique points into looking at alternative primitives, and making a more robust the parsing system. Our binary scanning algorithm and our cane segment definition are related to medial representations and skeletonisation methods [3]. Our resulting cane segmentation output can be seen as a medial representation of parts of the vines, plus information of width, orientation and segment assignments per pixel. In contrast, skeletonisation methods do not have any other information apart from the skeleton points. Nevertheless, skeletons are a natural representation of articulated object shapes, and so, have been used extensively in plants and trees modeling research. Approaches for constructing 2D skeletons from binary images are usually adaptation of morphological thinning algorithms, ridge finding, medial axis transform or hybrid methods [4] [7]. Gittoes et al. [4], [8] performed a quantitative analysis of the use of these methods for modeling

Fig. 1. From left to right: Input vine image, vine binary map, and result of cane segmentation with our BIS algorithm (see section III). vines. The authors found that, in the general case, connectivity and topology information is lost. Also Lopez et al. [7] reported that when using skeletonisation for trees, user input refinement is often necessary for solving overlapping and occlusions between different tree segments. III. VINE IMAGE SEGMENTATION In this section we will describe a novel method to vine image segmentation. We start with a binary image I formed by vine (1) and non-vine (0) pixels. Each vine consists of many individual branches (canes). In turn each cane is to be composed of one or more cane segments. Is it useful to have a precise definition of what a cane segment is, since our goal is to identify all of them given I. Therefore, we start giving some definitions related to vine image segmentation and then proceed to describe our methods. A. Preeliminaries We define a cane segment S as a compact and connected set of vine pixel coordinates x, such that there exist a continuous and differentiable curve l R 2 satisfying: 1) Maximality: There is no other cane segment Ŝ such that S Ŝ. 2) Constant Width: S has proximately constant crosssectional width (as defined below) along the curve l up to some constant value δ. Figure 2 (a) shows an example of a cane segment where the curve l correspond to a line shown in red. For a given point p l there exist a local frame formed by the tangent and normal directions of the curve at that point. A cane cross section c p is the intersection of vine pixels with the normal line at p. These are shown in green in Figure 2 (a). In particular when the normal lines are aligned with either (1, 0) or (0, 1) we said that the cross sections of the segment are aligned to the horizontal or vertical axis of the image respectively. In this case horizontal cross sections have pixels with constant row value in the image, and vertical cross sections have pixels with constant column value. The width of a cross section c p is defined as a value w cp R such that x i x j w cp for all x i, x j c p. Property 2 of the cane segment definition means that c p1, c p2 S. w cp1 w cp2 max(w cp1,w cp2 ) < δ for all pair of cross sections The vine image segmentation problem can now be stated as: Given the vine binary map I, find all cane segments S according to the definition written above. In the rest of this section we describe a method to approximate a solution to this problem. B. Binary Image Scanning Algorithm To find segments on I, we could start by obtaining a set of potential curves of segments, and then iterating over cross sections of these directions to perform segmentation according to width changes. Instead of finding the set of curves directly, an alternative method is to construct cane segments as collections of cross sections with nearly constant width. Our approach is based on this observation and the assumption that by taking only cross sections aligned to the image axis horizontally or vertically we are able to approximate real cane segments. Figures 2 (b) and (c) shows the results of approximating the cane segment of Figure 2 (a) with cross sections in horizontal and vertical directions. As we can see, the vertical aligned cross sections approximate well the original cane segment, whereas the horizontal aligned cross sections do not. Observe that in a vine image cane segments aligned in any direction can appear. Therefore, our method has two main stages. First we perform a representation of the image with cane segments reconstructed horizontally and vertically; and second we merge both results and discard bad approximations according to a criterion we will define. 1) Axis Aligned Cross Sections Scanning: In this stage, we construct cane segments with cross sections aligned in the horizontal and vertical directions. Being the scanning procedure in both directions analogues, we will explain here the procedure only for horizontal cross sections. Algorithm 1 presents the pseudo-code of the horizontal cross sections scanning procedure. We start by iterating through the rows of I. For a fixed row number r of I, there exist K r cross sections that share that row number: C r = { } c k Kr r. This set can be found by simple iterating over k=1 all columns of I and detecting the changes from non-vine to

(a) Cane segment. (b) Cane segment approximation with horizontal cross sections. (c) Cane segment approximation with vertical cross sections. Fig. 2. Cane segments with an associated curve l shown in red and cross sections shown in green. In (a) cross sections are found from the normal lines to l. In (b) and (c) the segments shown are approximations to the cane segment (a), by collecting cross sections aligned with the horizontal and vertical directions respectively. In (b) and (c) the associated curves are defined from the mean points of the cross sections. Data: Binary vine image I Result: Collection of cane segments S H = {S hi } with horizontal cross sections Initialize S H to empty array; for r 1 to rows(i) do Find C r ; foreach c k r in C r do if r==1 then insert to S H a new segment S with a single cross section c k r ; compute S = min (m S ); where m S is S S H a score of c k r belonging to S; if m S 3δ 2 + 1 then insert cross section c k r to S ; insert to S H a new segment S with a single cross section c k r ; Algorithm 1: Horizontal Cross Section Scanning Algorithm Data: Collection of horizontal S H and vertical S V cane segments Result: Merged collection of cane segments S M Initialize S M to empty array; Prune S H and S V ; foreach S hi in S H do Find intersections I ij of S hi with S vj of S V ; Find j = maxar(i ij ); j if AR(I ij ) > T AR or no intersections then if S hi better than S vj then insert cluster S hi to S M ; tag S vj to not be included; continue; insert cluster S hi to S M ; foreach non-tagged S vj in S V do insert cluster S vj to S M ; Algorithm 2: Merging Cane Segments Algorithm vine pixels and viceversa. After processing the first row, we will have a set of segments S H, each segment initialized with only one cross section. On subsequent rows, we match current cross sections to every segment S in S H. This is done by computing a compatibility score of the target cross section of the current row being part of S. The algorithm s when all the rows have been processed. As we can see, the core idea of the procedure relies in computing the score of how likely a section C r1 is part of a segment S. Denote by C r2 the last cross section of S. Then, in our method the score of C r1 being part of a segment S is computed as the sum of two terms: m(c 1, c 2 wcr1 S w ) = max(w cr1, S w ) + d(c 1, c 2 ) max(w cr1, w cr2 ) Since cane segments should have nearly constant width, compatible sections would have the first term in this equation small enough up to some threshold δ. On the other hand, the dislocation factor should ensure cross sections are close enough to form a connected set. For this, we compute the distance between the mean points p c1 and p c2 of sections c 1 and c 2 respectively and so d(c 1, c 2 ) = p c1 p c2. In particular we computed this distance as absolute differences in rows and columns: d(c 1, c 2 ) = r 1 r 2 + col(p c1 ) col(p c2 ) This allow us to threshold d(c 1, c 2 ) with a function of the threshold of the width δ. In fact, since we want connected segments, we should have r 1 r 2 = 1. Furthermore if we

Fig. 3. From left to right: Input vine image, vine binary map, full horizontal cross sections, horizontal sections after pruning, full vertical cross sections, vertical sections after pruning, and merged cane segments. want sections that are at most separated in columns by half of the width threshold δ, then col(p c1 ) col(p c2 ) δ 2 and thus d(c 1, c 2 ) δ 2 +1. Finally, this implies that m(c1, c 2 ) < 3δ 2 +1. In practice, δ can be treated as a parameter, but should be kept small enough to ensure a consistent cross sectional width in all segments. In our experiments we used δ = 2. Figure 3 shows both axis aligned scans for a sample vine image. 2) Merging and Discarding Bad Approximations: Using the algorithm described in the previous subsection, two scans on the input I in both horizontal and vertical directions gives a collection of cane segments approximations S H and S V respectively. However as we can see in Figure 2 and Figure 3, a particular segment may be better represented in one direction compared to the other. Furthermore we have to address the intersection of segments of S H with that of S V. Our goal is to derive a set of segments S M that merge both kinds of segments and discard bad approximations. Algorithm 2 presents the pseudo-code for our merging procedure. First we start by pruning small and bad shaped segments for both sets S H and S V. For this, we used a size threshold T l on the number of cross sections of a segment. In our experiments we found heuristically that T l = 5 is good enough for vine images. On the other hand, we can recognize bad approximation segments (such as the one on Figure 2 (b) ) by analyzing the values of average width of the cross sections W S = p l w c p and the arc length of the curve l, L S. We found that segments with W S > L S are in general bad approximations of the underlying region, whenever we want the curve l to be dominant orientation of the image region 135 o H V V H 45 o Fig. 4. Criteria for choosing between vertical and horizontal segments that intersect in a region of I. Here V and H stand for vertical and horizontal directions respectively. If the average local orientation of that region is in the blue area, then we choose the vertical segment. Otherwise we choose the horizontal segment. it covers 1. Figure 3 shows the before and after-pruning cane segments for a sample vine image. After pruning sets S H and S V, we proceed to merge the remaining cane segments. The main idea is to include in S M all segments of S H and S V that describe exclusively regions of I. A fundamental issue here is vine regions where segments of both types intersects. To analyze these intersections, for a fixed segment S hi we define the area ratio function AR(I ij ) = Area(I ij) Area(S hi ) where we denoted by I ij = S hi S vj for all S vj S V. The area ratio values can be used to determine how much of the region represented by S hi is actually part of the intersections. Observe that on branches or overlapping parts of the vines, 1 Another method could be to compute local orientation of all segment pixels and compare that to the orientation of l.

Fig. 5. Results of our BIS algorithm to cane segmentation on vine images. Each cane segment is rered by a line using the mean points of the first and last cross sections of the segment. Colors represent different cane segments. some segments may intersect only in a small proportion, namely at the borders and/or border cross sections. Then, we used a threshold value T AR = p such that if the intersection area is p% less than the segment S hi then both segments can still be included in the final S M. On the other hand, if there are more than one S vj for which the intersection is represented at least in %p of the fixed segment, we know that any of these models are as good as S hi to represent the intersection region. However, we select only the model j = max j AR(I ij ), and we need a criterion to decide which of the two models S hi and S vj is better to include in S M and exclude the other. This type of intersections usually happens for regions of I where the orientation of the vine pixels are at a degree of 45 o or 135 o, and is a consequence of using axis aligned cross sections. In our experiments, we decide which segment is better than the other according to the orientation of the vine region as shown in Figure 4. Another efficient method is to look for the model that has maximum length L s. This mean we prefer the model that covers a greater number of vine pixels. Finally, an hypothesis we have is that both criteria are equivalent, but we have no proof of this at the moment. IV. RESULTS Figure IV show results of applying the BIS algorithm to vine images. Here we rer each cane segment by using the line connecting the mean points of the first and last cross sections of the segment. The algorithm is fast, taking between

10ms and 1.5s in execution time for images up to 2000 x 2000 pixels and on a Intel (R) Xeon CPU E31230 processor. This is an improvement to cane segmentation compared to the method presented in [2], which reported being inefficient for big images with many branches. Also our results are consistent between different images, getting by construction the desired properties like near constant width on cross sections. Also note how segments cross sections give a medial representation by parts of the vine, while at the same time keeping information of width, orientation and segment assignment per pixel. V. CONCLUSION AND FUTURE WORK This paper described the Binary Image Scanning BIS algorithm that is able to perform cane segmentation of vine images. We have presented a formal definition of a cane segment. Also, we have formulated the cane segmentation problem and we have constructed an approximated solution by using our scanning algorithm. A natural direction of research is to develop a framework to analyze the error in approximating the ground truth cane segments according to our definition. In our method, different segmentations can be achieved by changing the parameters δ, T l and T AR. Thus, such error analysis could help in determining the best set of parameters to obtain the best possible solution using the BIS algorithm. Furthermore, alternative solutions to the cane segmentation problem stated in this paper, may be researched. On the applications side, a direct use of our cane segmentation method is that of recovering the 2D structure of the vine [1], [2]. In particular, the cane segments found by our scanning algorithm could be used as primitives for a bottomup / top-down parsing system, or an image grammar similar to that of [1]. We are also researching currently other connectivity methods. In fact we have already promising preliminary results of connecting these cane segments using Gibbs sampling, to recover full canes in 2D vine images. Other application of this scanning algorithm is that of finding a simplified skeleton of thin and articulated objects on binary images. By construction our method cluster together axis aligned cross sections with similar width and minimal dislocation. The width parameter δ could even be manipulated to allow cross sections of objects that change considerably its width. The simplified skeleton will be found as the curves resulting from the mean points of the cross sections of the segments. Further than thinning objects in binary images, this skeleton may be used as parts in a hierarchy that may be used to analyze the object from a high level perspective. Yet another completely different application could be to use the segments as strokes in order to obtain a non photo realistic rering or sketch of the input image similar to Mr. Painting Robot [9]. Improvements to our scanning algorithm could analyze in deep the intersection of cane segments in the two different orientations, for the merging step. Though our method of choosing the longest segment may work well for vine images, for other objects it may be desirable to actually merge cross sections and allow segments to have both types of orientations. An interesting research would be to perform two more scans on 45 o and 135 o orientations, and analyze analytically the errors in approximating real cane segments with sets of axis aligned cross sections. Furthermore, a detailed analysis of this error needs to be addressed for the procedure presented in this paper. Finally, though the current implementation showed to be fast and efficient even for large images, the algorithm could be optimized considerably and parallelization of the method could be implemented. REFERENCES [1] T. Botterill, R. Green, and S. Mills, Finding a vine s structure by bottomup parsing of cane edges, in Image and Vision Computing New Zealand (IVCNZ), 2013 28th International Conference of, Nov 2013, pp. 112 117. [2] R. Marin, T. Botterill, and R. Green, Split-and-merge em for vine image segmentation, in Image and Vision Computing New Zealand (IVCNZ), 2013 28th International Conference of, Nov 2013, pp. 270 275. [3] K. Siddiqi and S. Pizer, Medial Representations: Mathematics, Algorithms and Applications, 1st ed. Springer Publishing Company, Incorporated, 2008. [4] W. Gittoes, T. Botterill, and R. Green, Quantitative analysis of skeletonisation algorithms for agricultural modelling of branches, in Image and Vision Computing New Zealand (IVCNZ), 2011. [5] C.-H. Teng, Y.-S. Chen, and W.-H. Hsu, Tree segmentation from an image, in IAPR, Conference on Machine Vision and Applications, 2005. [6] Y. D. Lopez, L. D., Modeling complex unfoliaged trees from a sparse set of images. vol. 29, no. 7, 2010, pp. 2075 2082. [7] L. D. Lopez, D. Shantharaj, H. B. Lu Liu, and J. Yu, Robust imagebased 3d modeling of root architecture, in Computer Graphics International, 2011. [8] W. Gittoes, Vine model fitting using a composite skeletonisation method, University of Canterbury, Tech. Rep., 2011. [9] S. Williams and R. Green, Mr. painting robot, in Image and Vision Computing New Zealand (IVCNZ), 2011 26th International Conference of, Nov 2011.