Extracting Urban Road Networks from High-resolution True Orthoimage and Lidar

Extracting Urban Road Networks from High-resolution True Orthoimage and Lidar Junhee Youn, James S. Bethel, Edward M. Mikhail, and Changno Lee Abstract Automated or semi-automated feature extraction from remotely collected, large scale image data has been a challenging issue in digital photogrammetry for many years. In the feature extraction field, fusing different types of data to provide complementary information about the objects is becoming increasingly important. In this paper, we present a newly developed approach for the automatic extraction of urban area road networks from a true orthoimage and lidar assuming the road network to be a semi-grid pattern. The proposed approach starts from the subdivision of a study area into small regions based on homogeneity of the dominant road directions from the true orthoimage. Each region s road candidates are selected with a proposed free passage measure. This process is called the acupuncture method. Features around the road candidates are used as key factors for an advanced acupuncture method called the region-based acupuncture method. Extracted road candidates are edited to avoid collocation with non-road features such as buildings and grass fields. In order to produce a building map for the prior step, a first-last return analysis and morphological filter are used with the lidar point cloud. A grass area thematic map is generated by supervised classification techniques from a synthetic image, which contains the three color bands from the true orthoimage and the lidar intensity value. Those non-road feature maps are used as a blocking mask for the roads. The accuracy of the result is evaluated quantitatively with respect to manually compiled road vectors, and a completeness of 80 percent and a correctness of 79 percent are obtained with the proposed algorithm on an area of 1,081,600 square meters. Introduction New and innovative sensors are providing us with massive archives of Earth imagery, and the accumulation shows no sign of slowing. Yet, there is no corresponding stream of accurate, edited feature vector data, derived from this imagery to populate and update all of the Geographic Information Systems which require it. The discrepancy between this staggering potential and the actual, meager production rates from cartographic compilers shows that we must increase our capabilities to exploit new imagery in a timely manner. Junhee Youn is with SAMSUNG SDS, KOREA, and was formerly with the School of Civil Engineering, Purdue University (junhee.youn@samsung.com). James S. Bethel and Edward M. Mikhail are with the School of Civil Engineering, Purdue University. Changno Lee is with the Department of Civil Engineering, Seoul National University of Technology, KOREA. Manual delineation of vector features from imagery is tedious, labor intensive work. It requires skill, attention to detail, and a scrupulous commitment to quality and completeness. Yet, this type of work has been consigned to technician or assemblyline status in the industrial economies of the world. How then do we fill the void between the increasing need for cartographic vector data, and the very limited pool of technicians capable of producing it? The answer must be improved algorithms and software implementations to take over some aspects of the manual feature extraction process. One of the principle vector features requiring extraction is the road network. Depending on the scale, roads can be extracted as a single centerline, a pair of parallel lines corresponding to edge of pavement, or a detailed delineation showing medians, turning lanes, irregular widths, etc. Algorithms for road delineation in rural settings have received much attention in the literature. This may be because the rural road often presents the algorithm developer with a simpler detection and extraction problem than the urban road network. Reviewing the previous work for urban area road extraction, a general strategy for the detection problem is to restrict the search space followed by generation of detailed delineation with various algorithms. In Haverkamp (2002), a nonvegetation area is identified from multispectral imagery by using a spectral average, the difference between the red and green bands, and the ratio of the near-infrared to the spectral average. Within the non-vegetation area on the panchromatic imagery, the author rotates a rectangular template and determines an alignment position whose intensity variance is lowest. An approach to detect non-building regions from the DSM and to connect lane segments from imagery is presented in Hinz and Baumgartner (2003). They make use of extracted lane markings and detected cars. Roads are also detected by a combination of classification methods, DSM/DTM analysis, and comparison to an existing road database in Zhang (2004). On the detected roads, the author identifies the lane markings from color stereo imagery with a morphological filter. As a final step, he finds optimal paths among the road segment candidates by dynamic programming. Restricting the allowable road corridors by using an existing road database is also presented in Péteri and Ranchin (2003). They use the road database as an initial input, and extract the road edges with a snake algorithm, adding a parallel term to the internal energy. In Zhu et al. (2005), road versus non-road pixels are discriminated and classified by adopting morphological leveling. The Photogrammetric Engineering & Remote Sensing Vol. 74, No. 2, February 2008, pp. 227 237. 0099-1112/08/7402 0000/$3.00/0 2008 American Society for Photogrammetry and Remote Sensing PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING February 2008 227

authors generate coarse road networks finding straight parallel line segments corresponding to roads, and refine the networks with the operation of mathematical morphology. Research on urban road extraction which integrates or fuses various sensor data is rather limited compared to work using single sensor. Tönjes et al. (1999) proposed a knowledge-based approach integrating information from various sensors (e.g., optical, thermal, SAR) for detecting urban roads. To represent knowledge from the multiple sources, the authors store such information in a semantic network serving as a basis for extraction. By comparing the expected attribute values of the object with the actual values obtained from the image, each object is hypothesized topdown, and internally evaluated to select the best interpretation from the network trees. In Price (2000), urban area street grids are extracted with a feature-based hypothesis and verification scheme. In this paper, hypotheses for the road segments are matched with road edges, and the verification process uses local context and DEM data. In Zhu et al. (2004), road edges shadowed or occluded by tall buildings are detected using aerial imagery and lidar. The authors detect road edges from aerial imagery, extract the edges caused by tall objects using lidar, and link shadowed road segments with a spline-approximation algorithm. Hu et al. (2004) proposed the integrated processing of high-resolution aerial imagery and lidar for the dense urban areas. The authors detect the road candidates by an iterative Hough transform algorithm. Vegetation areas are assigned by statistical pattern recognition techniques from the color image. The road network is then refined by topology analysis. In our approach, we adopt the strategy of multi-source data fusion. We present a new road extraction algorithm for urban areas using aerial imagery and lidar with a preliminary step of generating a true orthoimage. This approach does not make use of existing GIS data. Aerial imagery and the output data from light detection and ranging (lidar) are quite different. An important characteristic of a typical aerial image is the high spatial resolution, but obtaining reliable height information by stereo extraction is slow and labor intensive (in urban areas). On the other hand, lidar provides quite reliable height information, but its spatial resolution is typically lower than aerial imagery. Thus, the two sources are very complementary. Also, lidar point cloud processing from selected sensors can be enhanced with multiple return analysis. The paper is organized in the following four steps. The first section presents an image segmentation algorithm based on the dominant road directions. Second, the acupuncture method (with variations) and the region based acupuncture method, a newly developed method for obtaining road candidates from true orthoimage, are introduced. A filtering of the provisional road network using non-road features (buildings and vegetation) was described in the third section. Finally, results are presented and the strengths and limitations of our algorithms are discussed. Our algorithm is exercised with aerial imagery (and the resultant true orthophoto) and lidar data collected over the Purdue University campus, West Lafayette, Indiana. The ground sample distance of the true orthoimage is 12.5 cm, and the density of lidar point cloud is 1.8 points/m 2. Orientation-based Image Segmentation The basic concept of this approach is to split the image into regions based on dominant road directions. In urban areas, most of the roads are straight and the road network often has a grid form, so the roads will exhibit dominant directions. Having the same grid pattern for the whole area is an ideal case. Certain parts of the area may have north-south and east-west directions roads, and another part may have northeast-southwest and northwest-southeast directions roads. Considering such various grid patterns, a partitioning scheme is proposed. The parent image is successively subdivided into four child image blocks (regions) when the parent image covers an area with more than two dominant directions. Research on this topic has been described in Faber and Förstner (2000). In this paper, they detected orthogonal structures in a moving window and connected components of centers of windows with similar orientation. Because only orthogonal structures are considered, more than two dominant directions (i.e., northsouth, east-west, northeast-southwest) are not extracted. We propose an algorithm which can extract the regions having more than two dominant directions. The process has three steps; extracting line segments, dominant direction detection, and image splitting with the quadtree data structure. To extract the straight line segments associated with the roads, we first apply the Canny operator. The resulting edge pixels are grouped, and each group is fitted to a line. If a group s edge pixels all lie substantially on a line, then the group is treated as straight line. If not, the group is eliminated. Many of the line segments in urban scenes come from road edges, lane markings, building roof boundaries, and sidewalks. These are often parallel with the prevailing road network grid. However, some line segments may come from a building roof pattern or from ornamental planting structures which may not be parallel with the road, and we need to exclude them. Road edges always have an opposite, parallel edge, whereas other line categories lines do not necessarily have this parallel relationship. Therefore, by requiring a detected line to be in such a parallel relationship, we can exclude many non-road features in the scene. To implement a filter based on this criterion, we calculate all the line segments parameters, -. If two segments difference is less than a threshold angle, and the two segments difference is between a predefined minimum and maximum road width, then the two line segments are treated as road-width parallel. Road-width parallel line segments are retained in a filtered line segment image to be used for dominant direction inference. By this algorithm, we have excluded many strong linear features not associated with the road network. Figure 1 shows an example result of detecting all line segments and road-width parallel lines. Figure 1a presents a part of the image of the study area, Figure 1b shows all line segments detected, and the filtered, road-width parallel line segments are shown in Figure 1c. We recursively subdivide an image block (parent region) into four quadrant image blocks (child regions) if the parent region has more than one dominant road direction. To make the decision about splitting or not, we must calculate the number of dominant directions in the region of interest. For determining the dominant direction in the scene, several approaches in the literature have been studied. A common approach is to calculate each line s gradient and length, and to accumulate it into a histogram. The problem is selecting dominant directions from this histogram. Sohn and Dowman (2001) used a hierarchical histogram-clustering method to obtain dominant directions. They derived line-angle information and quantized it into a histogram. Peaks in this histogram correspond to dominant directions. We make use of a modified version of hierarchical histogram-clustering to determine the dominant directions. Based on the extracted road-width parallel line segments, the directions present in this set are weighted by segment length and accumulated into a histogram. The study area is shown in Figure 2a. The angle-length relationships for the whole study area are shown in Figure 2b and 2c. Figure 2b is really a 2D scatter diagram showing the line 228 February 2008 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Figure 1. (a) Original image segment, (b) All detected line segments, and (c) The road-width parallel line segments. Figure 2. (a) Study area image, (b) Angles and length scatter diagram for road-width parallel line segments, and (c) 1D histogram format. segment occurrences by direction and length. Figure 2c shows the same data integrated into a 1D histogram with direction along the horizontal axis. From Figure 2a, we see, by visual inspection, that this region has two dominant directions (two pairs) 0, 90 and 45, 135, approximately. To implement this decision process into the algorithm, we use the hierarchical histogram-clustering approach. This approach minimizes within cluster variance and maximizes between cluster variance. The estimated angles resulting from this technique are 0.4, 90.5 and 41.2, 130.0. The objective of image splitting in this paper is to partition an image into regions until all regions have a unique dominant direction (pair). To partition the image, we apply the quadtree data structure. The image is recursively subdivided until the divisions have a single dominant direction pair, or until a minimum size is reached. The quadtree concept is presented in Figure 3. Let the size of the entire image be m n. First, calculate the entire image s dominant directions, and if the entire region has only one dominant direction, then image splitting is stopped. Otherwise, this region is subdivided into four disjoint quadrant regions, each with size m/2 n/2. Second, if each quadrant has only one dominant direction, then splitting is stopped. Otherwise, each quadrant with multiple dominant directions is further subdivided again into four disjoint quadrant regions with size is m/4 n/4. This process proceeds until each region has its own dominant direction or the region reaches a lower size limit. The size limit is selected as a city block dimension. Figure 4a shows the region segmentation result. The lines in the rectangles show the regions dominant directions in Figure 4a. Because we assume roads are straight, the dominant directions for curved roads are not detected. By using the proposed algorithm, directional information of road candidates in each region is obtained. Road Candidate Extraction Using True Orthoimage The Designation of Road Candidates with the Acupuncture Method Each region from the quadtree segmentation has its own dominant road direction and, as mentioned above, those directions have been filtered, so they are likely parallel with the road network. For the next step, imagine two virtual needles aligned with the dominant directions ( needle m and needle n in Figure 5) that penetrate the two dimensional edge image. We want to quantify whether a needle crosses edges or passes through voids. A free passage measure is defined to realize this concept. Specifically, a needle which meets many edges will have a small free passage measure, whereas a needle which meets few or no PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING February 2008 229

Figure 3. Concept for Quadtree image splitting. Figure 4. (a) Image segmentation result based on dominant road direction, and (b) Road candidates determined by acupuncture method. A color version of this figure as available at the ASPRS website: www.asprs.org. Figure 5. Penetrating needles on the binary edge image. A color version of this figure as available at the ASPRS website: www.asprs.org. edges will have a large free passage measure. Figure 5 illustrates the needle piercing process through an edge image. Comparing needle m which penetrates the middle of city blocks and needle n which pass through a road void, we can see that needle m meets more edges than needle n. As a consequence, needle n has a larger free passage measure, and the position of the needle n would be a more probable location of a road. Because of the analogy with the needle, this process is called the acupuncture method. Expressing the needles as line equations on the edge image, aligned with the dominant directions, and stepping exhaustively across the edge image, we can compute a free passage measure for each candidate line. A candidate line can be expressed as polar coordinate components: is the perpendicular distance measured from the origin, and is an angle from the reference direction. The free passage measure is computed by the following steps. For each line, i, overlay 230 February 2008 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

with each edge, k. The total number of coincident pixels (C i ) will be an indicator of obstacles encountered by that line, i. A large number of such obstacles will indicate that it is less likely to be a road feature, whereas a small number of obstacles will mean that it is more likely to be a road feature. The number of line/edge coincident pixels for line i is: C i Y k (P i I E k ) (1) where E k is an edge pixel, and P i is a line pixel. Finally, we characterize the degree, to which the line is free from obstructing edges as: F i N(P i) N(C i ) N(P i ) 100 (0 F i 100) where F i is the needed free passage measure. A high value (near 100) of F i is an indicator of a road. Repeating this process for each line, we can make a graph of the free passage measure vs. line number i. Peaks in this graphs will become road candidates. To determine the several positions which represent the locations of the roads from the graph, we first choose the highest peak, and the corresponding parameter is selected as the location of road. Once the highest peak is obtained, we apply a proximity filter to the graph. The proximity filter eliminates lines within a threshold from the peak in the graph, therefore, it eliminates redundant lines in the vicinity of the peak. This process is continued until all peaks have been accounted for. With the selected parameters and parameters, we extract the road candidates in the selected region. Figure 6 shows an example sub-scene for applying the acupuncture algorithm. Figure 6a presents a part of the test image, with edges detected by the canny algorithm are shown in Figure 6b. Figure 7 is an example of applying the acupuncture algorithm to the same region. Figure 7a, 7b, and 7c are for the approximately horizontal direction, and 7d, 7e, and 7f are for the approximately vertical direction. Figure 7a and 7d show the graph that represents the relationship between line number and the free passage measure. The determined peak s with proximity filter are shown in Figure 7b and 7e. Overlaid lines in Figure 7c and 7f represent the extracted road candidates. Applying the complete acupuncture method to the study area yields the results shown in Figure 4b. (2) Region-based Acupuncture Method The basic idea of the acupuncture algorithm is that lines on the road in the 2D edge image space encounter fewer edge pixels than non-road lines. However, in the case that the candidate lines pass through a simple roof or a vegetation area, only a small number of edges will be encountered by the needles, and then, such lines can be incorrectly detected as roads. The main defect of the acupuncture method is that it only considers one class of obstacles, and not the features around the line, or context. To overcome this limitation of the analysis, we use the features (i.e., parallel pairs) near the road candidate lines. By doing so, we introduce a two dimensional analysis into the road detection. Parallel pairs overlapping in a direction perpendicular to the lines are often used as strong evidence of road segments (Heipke and Multhammer, 1995; Eckstein, 1995; Zhang, 2004). Among the road-width parallel line segments extracted in the previous section (see Figure 1), the parallel pairs occur in three arrangements as shown in Figure 8, noted as cases I, II, and III. In Figure 8, bold lines are road-width parallel line segments extracted in the previous section, and the shaded rectangles are directional rectangles, which we need to use for the region-based acupuncture method. In cases I and II, the two line segments are perfectly or partly overlapped when projected onto one another, and are inferred to be parallel road edges. However, in case III, the two segments are not overlapped and are less likely to be paired road edges. So, we withdraw the case III. After finding the road-width parallel pair, the two lines are extended to a common length and four end points of the two lines comprise a rectangle that represents the road segment. Because the rectangle has direction information obtained from its orientation, we call this construct a directional rectangle. The collection of directional rectangles is rasterized for overlaying with the prior road candidates to aid in further editing and refinement. The editing takes the following form. If a road candidate line coincides with directional rectangles and the road candidate line s angle is close to the directional rectangle s angle, then it is positive evidence for the presence of a road. But if the road candidate does not coincide with a directional rectangle or the angle attributes disagree, then the evidence does not suggest the presence of a road. We quantify this concept with a parameter named Road Positive Figure 6. Example scene: (a) Black and white aerial image, and (b) Detected edges by canny algorithm. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING February 2008 231

Figure 7. Result of acupuncture method on small example region: (a) through (c) are for first direction, and (d) through (f) are for second direction of pair. (a) and (d) Free passage measure, (b) and (e) proximity filtering result, and (c) and (f) Detected lines on the road. A color version of this figure as available at the ASPRS website: www.asprs.org. Figure 8. Schemes for extending parallel segments and generating directional rectangle. with Parallelism (RPP). RPP is calculated as the length of line that coincides with the same directional rectangles minus the length of line that coincides with other directional rectangles by using the directional rectangle image. Figure 9 shows an example of RPP. In Figure 9, there are two road candidates denoted as line 1 and 2, and the assumed angles of the two lines are 0. The dotted rectangles represent the directional rectangles. By counting the coincidences, then line 1 overlays the six 0 rectangles and one 90 rectangle. For the line 2, it overlays only one 90 rectangle. Because the angle of the lines is 0, RPP for line one is calculated as 6 1 5, and RPP for line 2 is 0 1 1, where the length of a rectangle is assumed as 1. A positive value for RPP is an indicator of a road, a negative value is a contra indicator. With this new information, the acupuncture method is extended and restated as follows. First, we calculate all the F i (free passage measure, Equation 2) for each line, and make a graph. The position parameter,, corresponding to the highest peak in the graph is selected as a road candidate. With direction and, a line equation is made, and RPP is calculated. If the RPP is positive, then such a line is accepted as a road, but if not, then we take the next highest peak from the graph. The next candidate line s RPP is also examined. The revised process, called the region based acupuncture method is summarized in a flow diagram in Figure 10. Applying the complete region-based acupuncture method to the study area yields the results shown in Figure 11a. Refinement of Road Features Using Lidar and Spectral Analysis The prior road extraction results still need to be refined by exclusion editing with some obvious non-road features. In the scene under study, two readily detectable non-road features are buildings and grass. We first describe detection of buildings with lidar, then detection of grass by spectral analysis, and finally the exclusion editing of the provisional road data using these two blocking masks. The building extraction process is performed by separating the lidar point cloud into ground and non-ground points, and eliminating tree points from the non-ground category. To separate the lidar points into ground and non-ground points, a morphology filter (e.g., minimum filter) is a method commonly referenced in the literature. In a minimum filtering, one passes a window over the image and replaces the elevation with the lowest value occurring in the moving window. Assuming that a ground point height is lower than its neighbor object points and assuming that the ground is reasonably horizontal, the terrain height is determined by a lowest height among neighboring points. Applying this process to the whole area, the composite of all such terrain points is an approximation to the DTM. We call it near DTM. After making the near DTM, one calculates the differences 232 February 2008 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Figure 9. Examples of calculating the RPP: Directional Rectangles, (b) 0 Degree Rectangles, and (c) 90 Degree Rectangles. Figure 10. Flow chart for region based acupuncture method. between near DTM and the DSM. If a point has differences which are more than a threshold, then such point is likely belongs to either buildings or trees. Therefore, making a near DTM as close as possible to the real DTM is a key issue for this approach. Some interesting issues had to be addressed, such as the presence in the study scene of metal utility grates (permitting laser ranges to indicate a surface several meters too low). These effects would corrupt the minimum filtering result without special attention, which involves the judicious (and sequential) application of both minimum and maximum filtering. Window sizes for this maximum and minimum filtering are chosen in accordance with the largest building size and utility grates expected in the area. To discriminate the ground versus the non-ground points, we compare the DSM with the near DTM. A point whose DSM height is higher than the near DTM by a chosen threshold value are determined to be above-ground objects in building-tree map. Then, all components in the building-tree map are 8-connected labeled to determine the contiguous components. The groups, with component numbers less than a selected minimum, are eliminated, since such small features are more like a car or other small object. If this has been successful, then we need now to exclude trees, and we will be left with a building map. To detect the trees and to generate the tree map, we adopt a First-last Return Height Analysis (Alharthy and Bethel, 2003). The building-tree map in the previous step is refined by excluding the tree points found in the multiple return analysis. Figure 11b shows the result for building extraction derived by the preceding algorithm. Since the true orthoimage we use is, by definition, orthorectified and georeferenced, the locations of features in the true orthoimage and the lidar are expected to be the consistent. For this portion of the analysis, we make a new synthetic image (color bands from orthoimage plus lidar intensity) and perform a classification to extract the grass areas. There are many classification approaches using color imagery or multispectral imagery, however classification using a synthesized image (orthorectified color image and lidar intensity image) appears to be a novel concept. In our case, the lidar intensity is in the near-infrared, so it makes a good complement to the RGB from the optical image. Researchers have tried before to use an intensity response from lidar for road or building extraction (Alharthy and Bethel, 2003; Hu, 2003; Rottensteiner et al., 2003). Also, an approach to separate roads from vegetation with three band image classification is presented in Hu et al. (2004). From an examination of the literature, we draw the following conclusions. First, intensity values from lidar can be used for feature extraction with a thresholding method or with classification. Second, both three band imagery and lidar intensity data have similar characteristics from the point view of the present classification problem, i.e., separability between road and vegetation is high. A color true orthoimage has three bands, and lidar data has position and intensity values. Considering the intensity value as another image after resampling to a raster, we can make a synthetic image that has four bands (i.e., R (Red), G (Green), B (Blue), and intensity for lidar). Because lidar systems typically operate in the infrared part of the electromagnetic spectrum (Rottensteiner et al., 2003), we expect the fused image to be like a four-band(r, G, B, Infrared) PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING February 2008 233

Figure 11. (a) Extracted roads by the region based acupuncture method, and (b) Generated building map using proposed algorithm. A color version of this figure as available at the ASPRS website: www.asprs.org. multispectral image. Before collecting sample data, five classes are chosen. Those are asphalt, grass, concrete, brick, and tree. Next, collecting training and test sample data for each class is performed. With the training data set and test data set, we apply the Maximum Likelihood classifier using the software MultiSpec. Table 1 shows the error matrix or the confusion matrix from the classification. In the table, Accuracy denotes the 100 minus percent omission error also called the producer s accuracy or reference accuracy. Accuracy * denotes the 100 minus percent commission error, also called the user s accuracy or reliability accuracy. After performing classification, we obtain a labeled thematic map with five classes. From the five classes, we chose the one corresponding to grass and make a binary image for the grass map. In this map, we remove small pixel groups considering them to be noise. The thematic map is shown in Figure 12a, and Figure 12b presents the map for only grass. As the final step, we propose to refine the prior road extraction results with the blocking or exclusion masks just obtained for buildings and grass. It is quite natural that roads cannot pass over a building or a grass area. For each candidate line, we make a pixel set of the corresponding line and sample coordinates. Comparing this set with the raster exclusion mask, if a road pixel coincides with either, then such pixel is eliminated. After this step, short lines are eliminated. Lastly, disparate road segments may be joined or connected if they appear to represent a single road. This is done by comparing line parameters and gap widths. The final, extracted road network is shown in Figure 13. Assigned Constants Including Thresholds Constants that are assigned including thresholds in the previous sections are described as follows. When finding the road-width parallel line segments, difference of segments is between minimum and maximum road width, and the threshold angle of segments is chosen as 2. The threshold angle from the peak in the hierarchical histogramclustering is 22.5. Applying the proximity filter during the Acupuncture Method, the threshold from the peak is chosen as two-thirds of minimum size of city block. The chosen window size for minimum filtering is corresponding TABLE 1. CLASSIFICATION RESULT WITH FOUR BAND SYNTHETIC IMAGE Actual Class Number of Samples in Class Assigned Class Class Name Asphalt Grass Concrete Brick Tree Total Accuracy (%) Asphalt 15755 310 2327 207 3122 21721 72.5 Grass 2 4601 4 116 1010 5733 80.3 Concrete 96 6 4563 46 10 4721 96.7 Brick 33 0 34 3836 8 3911 98.1 Tree 122 22 22 16 2326 2508 92.7 Total 16008 4939 6950 4221 6476 38594 Accuracy (%)* 98.4 93.2 65.7 90.9 35.9 OVERALL CLASS PERFORMANCE (31081/38594) 80.5% 234 February 2008 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Figure 12. (a) Thematic map after classification, and (b) Extracted grass map. A color version of this figure as available at the ASPRS website: www.asprs.org. Figure 13. Extracted road network refined by lidar and spectral analysis. A color version of this figure as available at the ASPRS website: www.asprs.org. to a 150 m 150 m area which is the maximum size of a building, and that for maximum filtering is a 5 m 5 m area corresponding to the size of metal utility grates. The minimum size of components, which should be eliminated when generating a building map, is selected as 25 m 2 corresponding to the maximum size of a car. For joining disparate road segments, a gap width is selected as less than minimum the size of a city block. Evaluation In order to quantitatively evaluate this result, we manually digitized the road network (road centerlines and road segments) from the true orthoimage. We call this the reference data set. We then devised a way, in the raster domain, to compare the correspondence between the derived result (from the algorithm) and the reference data. In the counting units, pixels, a true positive (TP) is where the derived result coincides with the reference result. A false positive (FP) is where there is a road pixel in the derived result that is not in the reference data. A false negative (FN) is where there is a road pixel in the reference data that is not present in the derived result. We use two conventional quality metrics, Completeness TP/(TP FN) Correctness TP/(TP FP) Some would term completeness the Producer s Accuracy, and the correctness the User s Accuracy. For our algorithm we obtained a completeness of 80 percent, a correctness of 79 percent. The geometrical accuracy potential of the TP is expressed as root mean square (RMS) difference between the reference road centerlines and derived results classified as TP. Wiedemann (2003) and Heipke et al. (1997) obtained the RMS for the TP by calculating the shortest distance between extracted pieces and reference pieces, and the value of RMS depends on the buffer width. Unlike Wiedemann (2003) and Heipke et al. (1997), we calculate the shortest distance at all TP-classified pixels and do not use the buffer width for the RMS: N d(der i ref ) 2 i 1 RMS S N where N number of pixels classified as TP, and d(der i ref ) shortest distance between the derived result classified as TP and reference data (road centerline). For our algorithm we obtained an RMS of 2.32 meter, and the average distance between derived and reference road centerlines is less than one traffic lane. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING February 2008 235

The majority of the FN occurs in the roads covered by tree canopy, and a few curved roads also increase the value of FN. Since a tree canopy generates many edge pixels and obscures the road edges, the roads under the trees are not included in the candidates and cannot be recovered during the refining process. With the assumption that roads are straight, curved roads are not extracted. Such undetected roads reduce the achieved completeness. The presence of walkways and parking lots decrease the value of correctness. The study area is a university campus, and the widths of the walkways are wide enough for the cars. Also, the materials of pathways and parking lots are asphalt or concrete, and they are similar to roads. Therefore, it is not easy to discriminate them with the proposed algorithm. It is difficult to directly compare the accuracy obtained from our proposed algorithm with previously published road extraction results. The reason is that tested images are collected from different geographical areas with various sensors. Also, accuracy is influenced by selected thresholds and initial information in semi-automatic approaches. Mayer et al. (2006) have compared various road extraction systems, and they have noted that fair comparison of the tested systems is a difficult issue due to the complexity of practical environments. Therefore, we estimate the quality of the proposed algorithm by comparing the previously published results especially on the similar geographical area to this study (i. e., urban area with more than one grid pattern) in terms of completeness and correctness. Hinz and Baumgartner (2003) reported a completeness of 75 percent and a correctness of 95 percent from the aerial imagery, DSM, and existing GIS database. In Negri et al. (2006), a completness and a correctness can be obtained from SAR imagery up to 88 percent and 86 percent, respectively. Our result is comparable to those values derived from the lidar data by Clode et al. (2007), which is 81 percent in completeness, and 80 percent in correctness. A completeness of 76 percent and a correctness of 54 percent from the Ikonos imagery are shown in Shackelford and Davis (2003). With aerial imagery, Zhang and Couligner (2006) presented a completeness of 51 percent and a correctness of 49 percent. From the published results, it can be seen that the overall quality of the proposed algorithm is in the range of the better algorithm of Hinz and Baumgartner (2003) or Negri et al. (2006) and between the qualities achieved by Shackelford and Davis (2003) or Zhang and Couligner (2006). Conclusions This paper presents an integrated collection of techniques to extract the urban road network from a high-resolution true orthoimage and lidar. We could segment an urban scene based on the dominant road directions by edge analysis on a true orthoimage. A new road extraction algorithm named the acupuncture method (with variations) and region based acupuncture method were developed to generate road candidates. The generated road candidates were then refined by eliminating the candidates which overlay the blocking area produced by lidar. A completeness of 80 percent and a correctness of 79 percent are obtained by proposed algorithm. The approximately 80 percent accuracy score is not sufficient for production work. It is our opinion that cartographic compilers will need 95( ) percent accuracy before they will happily clean up the remaining errors. Our future plans include trying to improve the detection rate at the candidate stage, since you can never recover from candidates that are missed. Most of the undetected roads are located in the area occluded by trees. Those roads may be better detected using the classification result. Therefore, we will explore the detection of roads using the image classification followed by a refinement process with road edges. We will also look at enforcing topology and road design constraints to improve the result. We intend to move from centerline detection to more detailed delineation. We will also test the method on a number of different urban areas. It is well known that urban road characteristics can have very significant regional attributes. In other words, rules and algorithms may need to be tuned for different geographic regions. References Alharthy, A., and J.S. Bethel, 2003. Automated road extraction from lidar data, Proceedings of the ASPRS Annual Conference, 03 09 May, Anchorage, Alaska, unpaginated CD-ROM. Clode, S., F. Rottensteiner, P. Kootsookos, and E. Zelniker, 2007. Detection and vectorization of roads from lidar data, Photogrammetric Engineering & Remote Sensing, 73(5):517 535. Eckstein, W., 1995. Extraction of roads in aerial images using different levels of resolution, Wissenschaftlich-Technische Jahrestagung der DGPF, pp. 283 292. Faber, A., and W. Förstner, 2000. Detection of dominant orthogonal road structures in small scale imagery, Proceedings of the 2000 ISPRS Congress, Amsterdam, 33(B3/1):274 281. Haverkamp, D., 2002. Extracting straight road structure in urban environments using IKONOS satellite imagery, Optical Engineering, 41(9):2107 2110. Heipke, C., H. Mayer, C. Wiedemann, and O. Jamet, 1997. Evaluation of automatic road extraction, International Archives of Photogrammetry and Remote Sensing, 32(4W/2):151 160. Heipke, C., C. Steger, and R. Multhammer, 1995. A hierarchical approach to automatic road extraction from aerial images, SPIE Aerosense 1995 Symposium, 17 21 April, Orlando, Florida, 2486:222 231. Hinz, S., and A. Baumgartner, 2003. Automatic extraction of urban road networks from multi-view aerial imagery, ISPRS Journal of Photogrammetry and Remote Sensing, 58(1 2):83 98. Hu, Y., 2003. Automated Extraction of Digital Terrain Models, Roads, and Building Using Airborne Lidar Data, Ph.D. dissertation, University of Calgary, Canada, 206 p. Hu, X., C.V. Tao, and Y. Hu, 2004. Automatic road extraction from dense urban area by integrated processing of high resolution imagery and lidar data, International Archives of Photogrammetry and Remote Sensing, Istanbul, Turkey, 12 23 July, unpaginated CD-ROM. Mayer, H., S. Hinz, U. Bacher, and E. Baltsavias, 2006. A test of automatic road extraction approaches, International Archives of Photogrammetry, Remote Sensing, and Spatial Information Sciences, 36(3):209 214. Negri, M., P. Gamba, G. Lisini, and F. Tupin, 2006. Junction-aware extraction and regularization of urban road networks in highresolution SAR images, IEEE Transactions on Geoscience and Remote Sensing, 44(10):2962 2971. Péteri, R., J. Celle, and T. Ranchin, 2003. Detection and extraction of road networks from high resolution satellite images, Proceedings of the 2003 IEEE International Conference on Image Processing (ICIP 03), 14 17 September, Barcelona, Spain, Volume I, pp. 301 304. Price, K., 2000. Urban street grid description and verification, IEEE Workshop on Applications of Computer Vision, Palm Springs, California, pp. 148 154. Rottensteiner, F., J. Trinder, S. Clode, and K. Kubik, 2003. Building detection using LIDAR data and multi-spectral images, Proceedings of VII th Digital Image Computing: Techniques and Applications, URL: http://www.cmis.csiro.au/hugues.talbot/dicta2003/ cdrom/pdf/0673.pdf, Sydney, Australia. (last date accessed: 08 November 2007). Shackelford, A.K., and C.H. Davis, 2003. Urban road network extraction from high-resolution multispectral data, Proceedings of Urban 2003 2 nd GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban areas, pp.142 146. 236 February 2008 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Sohn, G., and I.J. Dowman, 2001. Extraction of buildings from highresolution satellite data, Automatic Extraction of Man-Made Objects From Aerial and Space Images(III), A.A. Balkema Publishers, Lisse, Netherlands, pp. 345 355. Tönjes, R., S. Growe, J. Bückner, and C.E. Liedtke, 1999. Knowledge based interpretation of remote sensing images using semantic nets, Photogrammetric Engineering & Remote Sensing, 65(7): 811 821. Wiedemann, C., 2003. External evaluation of road networks, International Archives of Photogrammetry, Remote Sensing, and Spatial Information Sciences, Munich, Germany, 17 19 September, 34(3W/8):93 98. Zhang, C., 2004. Towards an operational system for automated updating of road databases by integration of imagery and geodata, ISPRS Journal of Photogrammetry and Remote Sensing, 58(3 4):166 186. Zhang, Q., and I. Couloigner, 2006. Automated road network extraction from high resolution multi-spectral imagery, Proceedings of the ASPRS Annual Conference, Reno, Nevada, 01 05 May, unpaginated CD-ROM. Zhu, C., W. Shi, M. Pesaresi, L. Liu, X. Chen, and B. King, 2005. The recognition of road network from high-resolution satellite remotely sensed data using image morphological characteristics, International Journal of Remote Sensing, 26(24):5493 5508. Zhu, P., Z. Lu, X. Chen, K. Honda, and A. Eiumnoh, 2004. Extraction of city roads through shadow path reconstruction using laser data, Photogrammetric Engineering & Remote Sensing, 70(12):1433 1440. PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING February 2008 237