Object recognition and image segmentation: the Feature Analyst approach

Chapter 2.3 Object recognition and image segmentation: the Feature Analyst approach D. Opitz, S. Blundell Visual Learning Systems, Missoula, MT 59803. USA. David.Opitz@overwatch.com. KEYWORDS: Automated Feature Extraction, Machine Learning, Geospatial Feature Collection ABSTRACT: The collection of object-specific geospatial features, such as roads and buildings, from high-resolution earth imagery is a timeconsuming and expensive problem in the maintenance cycle of a Geographic Information System (GIS). Traditional collection methods, such as hand-digitizing, are slow, tedious and cannot keep up with the everincreasing volume of imagery assets. In this paper we describe the methodology underlying the Feature Analyst automated feature extraction (AFE) software, which addresses this core problem in GIS technology. Feature Analyst, a leading, commercial AFE software system, provides a suite of machine learning algorithms that learn on-the-fly how to classify object-specific features specified by an analyst. The software uses spatial context when extracting features, and provides a natural, hierarchical learning approach that iteratively improves extraction accuracy. An adaptive user interface hides the complexity of the underlying machine learning system while providing a comprehensive set of tools for feature extraction, editing and attribution. Finally, the system will automatically generate scripts that allow batch-processing of AFE models on additional sets of images to support large-volume, geospatial, data-production requirements.

154 D. Opitz, S. Blundell 1 Introduction High-resolution satellite imaging of the earth and its environment represents an important new technology for the creation and maintenance of geographic information systems (GIS) databases. Geographic features such as road networks, building footprints, vegetation, etc. form the backbone of GIS mapping services for military intelligence, telecommunications, agriculture, land-use planning, and many other vertical market applications. Keeping geographic features current and up-to-date, however, represents a major bottleneck in the exploitation of high-resolution satellite imagery. The Feature Analyst software provides users with a powerful toolset for extracting object-specific, geographic features from high-resolution panchromatic and multi-spectral imagery. The result is a tremendous cost savings in labor and a new workflow process for maintaining the temporal currency of geographic data. Until recently there were two approaches for identifying and extracting objects of interest in remotely sensed images: manual and task-specific automated. The manual approach involves the use of trained image analysts, who manually identify features of interest using various imageanalysis and digitizing tools. Features are hand-digitized, attributed and validated during geospatial, data-production workflows. Although this is still the predominant approach, it falls short of meeting government and commercial sector needs for three key reasons: (1) the lack of available trained analysts; (2) the laborious, time-consuming nature of manual feature collection; and (3) the high-labor costs involved in manual production methods. Given these drawbacks, researchers since the 1970s have been attempting to automate the object recognition and feature extraction process. This was commonly effected by writing a task-specific computer program (McKeown 1993; Nixon and Aguado 2002). However, these programs take an exceedingly long time to develop, requiring expert programmers to spend weeks or months explaining, in computer code, visual clues that are often trivially obvious to the human eye. In addition, the resulting handcrafted programs are typically large, slow and complex. Most importantly, they are operational only for the specific task for which they were designed; typically failing when given a slightly different problem such as a change in spatial resolution, image type, surface material, geographic area, or season. Developing such programs is complicated by the fact that user interest varies significantly. While some task-specific automated approaches have been successful, it is virtually impossible to create fully

Object recognition and image segmentation 155 automated programs that will address all user needs for every possible future situation. The Feature Analyst approach to object-recognition and feature extraction overcomes these shortcomings by using inductive learning algorithms (Mitchell 1997, Quinlan 1993, Rumelhart et al 1986) and techniques to model the object-recognition process. Using Feature Analyst, the user provides the system (computer program) with several examples of desired features from the image. The system then automatically develops a model that correlates known data (such as spectral or spatial signatures) with targeted outputs (i.e., the features or objects of interest). The resulting learned model classifies and extracts the remaining targets or objects in the image. Feature models can be cached in a repository, known as the Feature Model Library, for later use. The accompanying workflow and metadata (information on spectral bandwidth, date and time stamp, etc.) can be used to quickly compose new models for changing target conditions such as geographic location or hour of day. 2 Learning Applied to Image Analysis An inductive learner is a system that learns from a set of labeled examples. A teacher provides the output for each example, and the set of labeled examples given to a learner is called a training set. The task of inductive learning is to generate from the training set a concept description that correctly predicts the output of all future examples, not just those from the training set. Many inductive-learning algorithms have been previously studied (Quinlan 1993; Rumelhart 1986). These algorithms differ both in their concept-representation language, and in their method (or bias) of constructing a concept within this language. These differences are important because they determine the concepts that a classifier induces. Nearly all modern vision systems rely on hand-crafted determinations of which operators work best for an image and what parameter settings work best for those operators (Maloof 1998; McKeown 1996). Such operators not only vary across the desired object to be recognized, but also across resolutions of the same image. Learning in object recognition tasks works by (a) acquiring task-specific knowledge by watching a user perform the tasks and (b) then refining the existing knowledge based on feedback provided by the user. In this approach, the parameters for these objects are tuned by the learning algorithm on-the-fly during the deployment of the algorithm. It is not surprising, therefore, that visual learning (Evangelia

156 D. Opitz, S. Blundell 2000; Nayar and Poggio 1996) can greatly increase the accuracy of a visual system (Burl 1998; Kokiopoulou and Frossard 2006). In addition to increasing overall accuracy, visual learning can yield object-recognition systems that are much easier and faster to develop for a particular problem and resolution (Opitz and Blundell 1999). A model can be trained for one task, and then used to seed the development of a similar problem. This supports the immediate deployment of a new problem with the ability to fine-tune and improve itself though experience. By having a learning system at the core of the object recognition task, one can easily transfer pertinent knowledge from one problem to another, even though that knowledge may be far from perfect. This approach overcomes prior research on visual learning that primarily consisted of hard-coded, problem-specific programs. 3 Feature Analyst In 2001 Visual Learning Systems, Inc. (VLS) developed Feature Analyst as a commercial off-the-shelf (COTS) feature extraction extension for ESRI s ArcGIS software in response to the geospatial market s need for automating the production of geospatial features from earth imagery. Feature Analyst is based on an inductive learning approach to object recognition and feature extraction. Feature Analyst was developed as a plug-in toolset for established GIS and remote sensing software packages (ArcGIS, IMAGINE, SOCET SET, GeoMedia, and RemoteView) in order to integrate the AFE workflow into traditional map production environments. In the Feature Analyst system, the image analyst creates feature extraction models by simply classifying on the computer screen the objects of interest in a small subset of the image or images (Opitz and Blundell, 1999). This approach leverages the natural ability of humans to recognize complex objects in an image. Users with little computational knowledge can effectively create object-oriented AFE models for the tasks under consideration. In addition, users can focus on different features of interest, with the system dynamically learning these features. Feature Analyst provides a paradigm shift to AFE and distinguishes itself from other learning and AFE approaches in that it: (a) incorporates advanced machine learning techniques to provide unparalleled levels of accuracy, (b) utilizes spectral, spatial, temporal, and ancillary information to model the feature extraction process, (c) provides the ability to remove clutter, (d) provides an exceedingly simple interface for feature extraction, (e) automatically generates scripts of each interactive learning process,

Object recognition and image segmentation 157 which can be applied to a large set of images, and (f) provides a set of clean-up and attribution tools to provide end-to-end workflows for geospatial data production. 3.1 Feature Analyst Learning Approach Feature Analyst does not employ a single learning algorithm, but rather uses several different learning approaches depending on the data. The base learning algorithms for Feature Analyst are variants of artificial neural networks (Rumelhart 1986), decision trees (Quinlan 1993), Bayesian learning (Mitchell 1997), and K-nearest neighbor (Mitchell 1997); however, the power of Feature Analyst is believed to be largely due to the power of ensemble learning (Opitz 1999). Research on ensembles has shown that ensembles generally produce more accurate predictions than the individual predictors within the ensemble (Dietterich 2002; Opitz and Maclin 1999). A sample ensemble approach for neural networks is shown in Fig 1, though any classification method can be substituted in place of a neural network (as is the case with Feature Analyst). Each network in the ensemble (networks 1 through N) is trained using the training instances for that network. Then, the predicted output of each of these networks is combined to produce the output of the ensemble (Ô in Fig. 1). Both theoretical research (Opitz and Shavlik 1999; Dietterich 2002) and empirical work (Opitz and Maclin 1999; Opitz and Shavlik 1996) have shown that a good ensemble is one in which (1) the individual networks are accurate, and (2) any errors that these individual networks make occur in different parts of the ensemble s input space. Much ensemble research has focused on how to generate an accurate, yet diverse, set of predictors. Creating such an ensemble is the focus of Feature Analyst ensemble algorithm. Feature Analyst searches for an effective ensemble by using genetic algorithms to generate a set of classifiers that are accurate and diverse in their predictions (Opitz 1999). The Feature Analyst approach has proven to be more accurate on most domains than other current state-of-the-art learning algorithms (including Bagging and Boosting) and works particularly well on problems with numerous diverse inputs, such as high-resolution, multi-spectral and hyperspectral images (Opitz 1999).

158 D. Opitz, S. Blundell Fig. 1. Predictor ensemble for classifying data 3.2 Spatial and Ancillary Information in Learning When classifying objects in imagery, there are only a few attributes accessible to human interpreters. For any single set of imagery these object recognition attributes include: shape, size, color, texture, pattern, shadow, and association. Traditional image processing techniques incorporate only color (spectral signature) and perhaps texture or pattern into an involved expert workflow process; Feature Analyst incorporates all these attributes, behind the scenes, with its learning agents. The most common (and intended) approach is to provide the learning algorithm with a local window of pixels from the image. The prediction task for the learner is to determine whether or not the center pixel is a member of the current feature theme being extracted. Fig. 2 demonstrates this approach. The center pixel for which the prediction is being made is represented in black. In this case, 80 surrounding pixels are also given to the learning algorithm (i.e., there is a 9x9 pixel window). The learner s task is to develop a model between the 81 inputs and the one output (whether or not the center pixel is part of the feature). This approach has distinct advantages: It works well; it is general purpose and applies to any object-recognition task; and it can easily accommodate any image transformation (e.g., edge detection) by simply supplying the pixels of the transformed image. The main draw-

Object recognition and image segmentation 159 backs to this approach are (1) its inability to take into account the spatial context of the larger image space, which can lead to false classifications, and (2) the large amount of information (i.e., pixels) to be processed, which taxes computer memory and slows down processing. As a result of these drawbacks, Feature Analyst offers various input representations based upon the concept of foveal vision (Opitz and Bain 1999), shown in Fig. 3. With a foveal representation, a learning algorithm is given a region of the image with high spatial resolution at the center (where the prediction is being made) and lower spatial resolution away from the center. Such an approach mimics the visual process of most biological species, including humans (i.e., peripheral vision). Foveal representation provides contextual spatial information to the learning algorithm while not overwhelming it when making a local decision (e.g., Is the center pixel part of an armored vehicle?). Fig. 2. Traditional Representation Fig. 3. Foveal Representation In Fig 3, foveal representation provides only 17 inputs to the learner when considering a 9x9 pixel region. Each outer 3x3 region gives the average of the 9 pixels as one input to the learning algorithm. The analyst can widen the range of foveal vision by making the next outer layer an average of a 9x9 region and so on. Thus a 27x27 region would provide only 25 inputs to the learner. Having the Learner concentrate on the center pixels, while taking into account the gist of the outer pixels, represents a great strength in using spatial context for object-recognition tasks. Foveal and other analogous input representations provided by Feature Analyst (such as Bullseye) are making a major breakthrough in automated feature extraction, as they greatly reduce the amount of data given to the Learner especially important when extracting targets from cluttered scenes.

160 D. Opitz, S. Blundell Figs. 4-6 show the value of using object-recognition attributes and spatial context in feature extraction tasks. Fig. 4 is the sample image. Here the object is to extract white lines on airport runways. Using only spectral information, the best an analyst can do is shown in Fig. 5; the results show all materials with similar white reflectance extracted. Fig. 6 shows the Feature Analyst results for extracting white lines using both spectral values and spatial parameters. In this case, the knowledge of the adjacent pavement or grass pixels is included when extracting the white line. This example illustrates the need to take into account spatial information when conducting object-recognition tasks with imagery. Fig. 4. Original image: the objective is to extract only the thin white lines on the runway

Object recognition and image segmentation 161 Fig. 5. Extraction results without the use of spatial attributes Fig. 6. Feature Analyst classification using spatial attributes to extract only the targeted white lines on the runway

162 D. Opitz, S. Blundell 3.3 Hierarchical Learning The Feature Analyst hierarchical workflow consists of the following steps: 1. User digitizes a few examples of the target. Note that in the previous extraction example, the user only had to digitize 3 or 4 small examples for the learning algorithm to extract all of the features correctly. 2. User selects the feature type from the graphical user interface automatically setting all of the learning parameters behind the scenes. 3. User extracts features using a One-Button approach. 4. User examines results and, if required, provides positive and negative examples to remove clutter using Hierarchical Learning. 5. User refines the first-pass predictions, removing clutter with another pass of learning (or removing shape characteristics using the Feature Analyst, Remove Clutter by Shape tool). 6. The user repeats steps 4 and 5 as necessary. Clutter is the most common form of error in feature extraction. The objective of clutter mitigation is to remove false positives. Thus, the learning task is to distinguish between false positives and correctly identified positives. The user generates a training set by labeling the positive features from the previous classification as either positive or negative. The trained learner then classifies only the positive instances from the previous pass. The negative instances are considered correct in clutter mitigation and are thus masked out. Hierarchical learning is necessary for learning complex targets in highresolution imagery. The overall process iteratively narrows the classification task into sub-problems that are more specific and well defined. The user begins the hierarchical process the same as any baseline inductive learning classification, i.e., select labeled examples for the feature being extracted, train the learner, and then classify every pixel in the image based on the learner s prediction. At this point, if not satisfied with the results, the user can apply a hierarchy of learners to improve the classification. The classification is improved in passes where each new pass is designed to remove one form of error from the results of the previous pass. 3.4 Automation of Feature Extraction Tasks Automated feature extraction has been the long-term goal of geospatial data production workflows for the past 30 years. The challenge is developing a flexible approach for transferring domain knowledge of a feature extraction model from image-to-image that is capable of adapting to chang-

Object recognition and image segmentation 163 ing conditions (image resolution, pixel radiometric values, landscape seasonal changes, and the complexity of feature representation). In 2006, VLS introduced the Feature Modeler and the Feature Model Library (FML) tools. These additions to Feature Analyst serve to automate the feature extraction workflow process. Feature Modeler provides users with a comprehensive set of tools for examining and refining feature models created with Feature Analyst (Fig. 7). Feature models, designated as AFE models, comprise the parameter settings for a classifier to extract particular features, including setting for spatial context and hierarchical learning passes. Benefits of this approach include the following: Analysts can create, edit and refine the inner workings of an AFE model, including pixels used for classification of a feature with spatial processing, priority of input bands, rule extraction from a complex learned model and the parameter settings for a learning algorithm. Technicians can access AFE models to run in an interactive mode or in a silent batch mode. In interactive mode, the technician does not need to be concerned with creating the proper workflow or setting parameters for the learning algorithm; but rather, need only provide a labeled set of examples. In batch mode the process is completely automated. A single AFE model, or multiple AFE models, can be run against a single image or a directory of images. Create training set Extract Remove Clutter Aggregate Smooth Fig. 7. A simple AFE model showing the five steps used in processing The Feature Model Library resides within a relational database and is used for storing AFE models to support enterprise-wide geospatial processing. Analysts can search and retrieve AFE models for use in batchmode processing to extract features from imagery without any training sets. The Feature Modeler application allows users to import AFE models,

164 D. Opitz, S. Blundell examine and adjust parameter settings, or deploy a learning model in a batch-processing mode. 3.5 Cleanup and Attribution Tools Object-recognition and feature extraction represent steps in a chain of process used in geospatial data production to collect features for a GIS database. Other steps in the process after feature extraction include feature editing (clean-up), feature generalization, feature attribution, quality control checks and then storage in the GIS database. In almost every instance of feature collection, the designated object needs to be stored as a vector feature to support GIS mapping and spatial analyses. Vector features, commonly stored in Shapefile format, can be stored as points, lines, polygons or TINs. One of the defining characteristics of geospatial vector feature data is the ability to define topology, store feature attributes and retain geopositional information. Feature Analyst provides tools for the majority of these tasks, with an emphasis on semi-automated and automated vector clean-up tools and feature attribution. Feature representation of a road network requires that the road feature be collected as either a polygon or line feature or both. In either case, tools are required to adjust extracted vector features to account for gaps due to occlusion from overhanging trees, to eliminate dangles or overshoots into driveways, to fix intersections and to assist with a host of other tasks. As object-recognition systems evolve there is an everincreasing expectation on the part of the user for a complete solution to the feature extraction problem. 5 Conclusions Feature Analyst provides a comprehensive machine learning based system for assisted and automated feature extraction using earth imagery in commercial GIS, image processing and photogrammetry software. The AFE workflow, integrated with the supporting application tools and capabilities, provides a more holistic solution for geospatial data production tasks. The Feature Analyst user interface supports a simple feature extraction workflow whereby the user provides the system with a set of labeled examples (training set) and then corrects the predicted features of the learning algorithm during the clutter removal process (hierarchical learning). Benefits of this design include:

Object recognition and image segmentation 165 Significant time-savings in the extraction of 2-D and 3-D geospatial features from imagery. O Brien (2003) from the National Geospatial-Intelligence Agency (NGA) conducted a detailed study that indicated Feature Analyst is 5 to 10 times faster than manual extraction methods and more accurate than hand-digitizing on most features (Fig. 8). 60 Extraction Time (minutes) 50 40 30 20 10 0 VLS excl. proc VLS Total Manual Fig. 8. NGA AFE test & evaluation program timing comparisons (O Brien, 2003) Significant increases in accuracy. Feature Analyst has been shown to be more accurate than previous AFE methods and more accurate than hand digitizing on numerous datasets (Brewer et al 2005, O Brien 2003). Workflow extension capabilities to established software. Analysts can leverage Feature Analyst within their preferred workflow on their existing ArcGIS, ERDAS IMAGINE, SOCET SET and soon Remote View systems, increasing operator efficiency and output. A simple One-Button approach for extracting features using the Feature Model Library, as well as advanced tools for creation of geospecific features from high resolution MSI, radar, LiDAR and hyperspectral data. Open and standards-based software architecture allowing third-party developers to incorporate innovative feature extraction algorithms and tools directly into Feature Analyst.

166 D. Opitz, S. Blundell Interoperability amongst users on different platforms. Expert analysts can create and store AFE models in the Feature Model Library, while other analyst can use these models for easy onebutton extractions. A simple workflow and user interface hides the complexity of the AFE approaches. High accuracy with state-of-the-art learning algorithms for object recognition and feature extraction. Post-processing cleanup tools for editing and generalizing features, to providing an end-to-end solution for geospatial data production. AFE modeling tools for capturing workflows and automating feature collection tasks. References Burl M (1998) Learning to recognize volcanoes. Machine Learning Journal. vol 30 (2/3). pp 165-194. Brewer K, Redmond R, Winne K, Opitz D, and Mangrich M. (2005) Classifying and mapping wildfire severity, Photogramm Eng and Remote Sensing. vol 71, no 11. pp 1311-1320. Evangelia MT (2000) Supervised and unsupervised pattern recognition. CRC Press. Boca Raton, FL. Dietterich TG (2002) Ensemble learning. The Handbook of Brain Theory and Neural Networks. vol 2. pp 110-125. Kokiopoulou E and Frossard R (2006) Pattern detection by distributed feature extraction. IEEE International Conference on Image Processing. pp 2761-2764. McKeown et al., (1993) Research in automated analysis of remotely sensed imagery. Proc of the DARPA Image Understanding Workshop. Washington DC. McKeown D (1996) Top ten lessons learned in automated cartography. Technical Report CMU-CS-96-110. Computer science department. Carnegie Mellon University. Pittsburgh, PA. Mitchell T (1997) Machine learning. McGraw Hill. New York, NY. Nayar S, Poggio T (1996) Early visual learning. Oxford University Press. New York, NY. Nixon MS, Aguado AS (2002) Feature extraction and image processing. Elsevier. Amsterdam. Opitz D (1999) Feature selection for ensembles. Proc of the 16th National Conference on Artificial Intelligence. pp. 379-384. Opitz D, Bain W (1999) Experiments on learning to extract features from digital images. IASTED Signal and Image Processing.

Object recognition and image segmentation 167 Opitz D, Blundell S (1999) An intelligent user interface for feature extraction from remotely sensed images. Proc. of the American Society for Photogrammetry and Remote Sensing. pp. 171-177. Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. Journal of Artificial Intelligence Research. vol 11 (1). pp. 169-198. Opitz D, Shavlik J (1996) Generating accurate and diverse members of a neuralnetwork ensemble. Advances in Neural Information Processing Systems. vol 8. pp. 535-541. Opitz D, Shavlik J (1999) Actively searching for an effective neural-network ensemble. Springer-Verlag Series on Perspective in Neural Computing. pp. 79-97. O Brien M (2003) Feature extraction with the VLS Feature Analyst System. ASPRS International Conference. Anchorage, AK. Quinlan J (1993) C4.5: Programs for machine learning. Morgan Kaufmann. San Mateo, CA. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by backprapagation errors. Nature.