Robotic Exploration Under the Controlled Active Vision Framework

Robotic Exloration Under the Controlled Active Vision Framework Christoher E. Smith Scott A. Brandt Nikolaos P. Paanikolooulos chsmith@cs.umn.edu sbrandt@cs.umn.edu naas@cs.umn.edu Artificial Intelligence, Robotics, and Vision Laborator Deartment of Comuter Science Universit of Minnesota 4-192 EE/CS Building 200 Union St. SE Minneaolis, MN 55455 Abstract Flexible oeration of a robot in an uncalibrated environment requires the abilit to recover unknown or artiall known arameters of the worksace through sensing. Of the sensors available to a robotic agent, visual sensors rovide information that is richer and more comlete than other sensors. In this aer we resent robust techniques for the derivation of deth from feature oints on a target s surface and for the accurate and high-seed tracking of moving targets. We use these techniques in a sstem that oerates with little or no a riori knowledge of the object-related arameters resent in the environment. he sstem is designed under the Controlled Active Vision framework [16] and robustl determines arameters such as velocit for tracking moving objects and deth mas of objects with unknown deths and surface structure. Such determination of extrinsic environmental arameters is essential for erforming higher level tasks such as insection, exloration, tracking, grasing, and collision-free motion lanning. For both alications, we use the Minnesota Robotic Visual racker (a visual sensor mounted on the end-effector of a robotic maniulator combined with a real-time vision sstem) to automaticall select feature oints on surfaces, to derive an estimate of the environmental arameter in question, and to sul a control vector based uon these estimates to guide the maniulator. 1. Introduction Presented at the 33rd IEEE Conference on Decision and Control, December 14-16, 1994, Lake Buena Vista, FL. In order to be effective, robotic agents in uncalibrated environments must oerate in a flexible and robust manner. he comutation of unknown arameters (e.g., the velocit of objects and the deth of object feature oints) is essential information for the accurate execution of man robotic maniulation, insection, and exloration tasks in unstructured settings. he determination of such arameters has traditionall relied uon the accurate knowledge of other related environmental arameters. For instance, traditional aroaches to the roblem of deth recover [4][5][11] have assumed that extremel accurate measurements of the camera arameters and the camera sstem geometr are rovided a riori, making these methods useful in onl a limited number of situations. Similarl, revious aroaches to visual tracking assumed known and accurate measures of camera arameters, camera ositioning, maniulator ositioning, target deth, target orientation, and environmental conditions [5][9]. his te of detailed information is not alwas available or, when it is available, not alwas accurate. Inaccuracies are introduced b ath constraints, changes in the robotic sstem, and changes in the oerational environment. In addition, camera calibration and the determination of camera arameters can be comutationall exensive and error rone. In articular, deth derivation and tracking techniques that rel uon stereo vision sstems require careful geometr measurements and the solution of the corresondence roblem, making the comutational overhead rohibitive for real- or near-real-time sstems. Furthermore, man structurefrom-motion algorithms use simle accidental motion of the camera that does not guarantee the best ossible identifiabilit of the deth arameter. o be effective in uncalibrated environments, the robotic agent must erform under a variet of situations when onl simle estimates of arameters (e.g., deth, focal length, ixel size, etc.) are used and with little or no a riori knowledge about the target, the camera, or the environment. One solution to these roblems can be found under the Controlled Active Vision framework [16]. Instead of reling heavil on a riori information, this framework rovides the flexibilit necessar to oerate under dnamic conditions where man environmental and target-related factors are unknown and ossibl changing. he framework is based uon adative controllers that utilize the Sum-of-Squared Differences (SSD) otical flow measurement [1]. he SSD algorithm is used to measure the dislacements of feature oints in a sequence of images where the dislacements ma be induced b maniulator motion, target motion, or both. hese measured dislacements are then used as an inut into the robot controllers, thus closing the control loo. Adative control techniques are useful under a variet of situations, including our alication areas: deth recover and robotic visual tracking. Instead of an accidental motion of the ee-in-hand sstem used in man deth extraction techniques [11][18][22], we roose a controlled exlorator motion that rovides identifiabilit of the deth arameter. o reduce the influence of worksace-, camera-, and maniulator-secific inaccuracies, an adative controller is used to rovide accurate and reliable information regarding the deth of an object s features. his information ma then be used to guide tracking, insection, and maniulation oerations [4][16]. Additionall, we roose a visual tracking sstem that does not rel uon accurate measures of environmental and target arameters. An adative controller is used to track feature oints on a target s surface in site of the unconstrained motion of the target, ossible occlusion of feature oints, and changing target and environmental conditions. Relativel high-seed targets are tracked with onl rough oerating arameter estimates and no exlicit target models. racking seeds are ten times faster and have similar erformance to those reorted b Paanikolooulos [16]. In this aer, we resent techniques for controlled robotic exloration. We first formulate the equations for visual measurements, include enhanced SSD surface construction strategies and otimizations, and resent the automatic feature oint selection scheme. Next, we derive the control and measurement equations used for the adative controller and elaborate on the selection of the features trajectories for the alication of deth recover. Finall, we discuss results from exeriments in both of the selected alications using the Minnesota Robotic Visual racker (MRV). 2. Visual measurements Our deth recover and robotic visual tracking alications both

3. Feature oint selection In addition to latenc and the effect of large dislacements, an algorithm based uon the SSD technique ma fail due to reeated atterns in the intensit function of the image or due to large areas of uniform intensit in the image. Both cases can rovide multile matches within a feature oint s neighborhood, resulting in incorrect dislacements measures. In order to avoid this roblem, our sstem automaticall evaluates and selects feature oints. Feature oints are selected using the SSD measure combined with an auto-correlation technique to roduce an SSD surface corresonding to an auto-correlation in the area Ω [1][16]. Several ossible confidence measures can be alied to the surface to measure the suitabilit of a otential feature oint. he selection of a confidence measure is critical since man such measures lack the robustness required b changes in illumination, etc. We use a two-dimensional dislacement arabolic fit that 2 attemts to fit the arabola e( x r ) = a x r + b x r + c to a cross-section of the surface derived from the SSD measure [16]. he arabola is fit to the surface at redefined directions. Paanikolooulos [16] selected a feature if the minimum directional measure was sufficientl high, according to equation (2.6). For the urose of deth extraction, we extend this aroach in resonse to the aerture roblem [10]. Briefl stated, the aerture roblem refers to the motion of oints that lie on a feature such as an edge. An sufficientl small local measure of motion can onl retrieve that comonent of motion orthogonal to the edge. An comonent of motion due to movement other than movement erendicular to the edge is unrecoverable due to multile matches for an given feature oint [10]. he imroved confidence measure for feature oints ermits the selection of a feature if it has a sufficientl high correlation with the arabola in a articular direction. For instance, a feature oint corresonding to an edge will onl have a high correlation in the direction erendicular to the edge. his directional information is used during deth recover to con- use the same basic visual measurements that are based uon a simle camera model and the measure of otical flow in a temoral sequence of images. he visual measurements are combined with search-secific otimizations and a dnamic ramiding technique in order to enhance the visual rocessing and to otimize the erformance of the sstem in our selected alications. Camera model and otical flow We assume a inhole camera model with a world frame, R W, centered on the otical axis. We also assume a focal length f. A oint P = (X W, Y W, Z W ) in R W, rojects to a oint in the image lane with coordinates (x, ). We can define two scale factors s x and s to account for camera samling and ixel size, and include the center of the image coordinate sstem (c x, c ) given in frame F A [16]. his results in the following equations for the actual image coordinates (x A, A ): fx x A = ------------ W + c (2.1) s x Z x = x + c x W fy A = ------------ W + c. (2.2) s Z = + c W An dislacement of the oint P can be described b a rotation about an axis through the origin and a translation. If this rotation is small, then it can be described as three indeendent rotations about the three axes X W, Y W, and Z W [3]. We will assume that the camera moves in a static environment with a translational velocit ( x,, z ) and a rotational velocit (R x, R, R z ). he velocit of oint P with resect to R W can be exressed as: dp ------ = R P. (2.3) dt B taking the time derivatives and using equations (2.1), (2.2), and (2.3), we obtain: u x z ------- x ------------ x s R x ---------- f --- x 2s x + --- R s = + + --- R (2.4) Z W Z W s x f s x f s z x v z ------- ------------ f --- 2s + --- Rx x s x --- R (2.5) Z W Z W s s f f x s x = + --- R s z. We use a matching-based technique known as the Sum-of- Squared Differences (SSD) otical flow [1]. For the oint (k 1) = (x(k 1), (k 1)) in the image (k 1) where k denotes the kth image in a sequences of images, we want to find the oint (k) = (x(k 1)+u, (k 1)+v). his oint (k) is the new osition of the rojection of the feature oint P in image (k). We assume that the intensit values in the neighborhood N of remain relativel constant over the sequence k. We also assume that for a given k, (k) can be found in an area Ω about (k 1) and that the velocities are normalized b time to get the dislacements. hus, for the oint (k 1), the SSD algorithm selects the dislacement x = (u, v) that minimizes the SSD measure e( ( k 1), x) = [ I k 1 ( xk ( 1 ) + m), k ( 1) + n) (2.6) mn, N I k ( xk ( 1) + m+ u, k ( 1) + n+ v)] 2 where u,v Ω, N is the neighborhood of, m and n are indices for ixels in N, and I k 1 and I k are the intensit functions in images (k 1) and (k). he size of the neighborhood N must be carefull selected to ensure roer sstem erformance. oo small an N fails to cature enough contrast while too large an N increases the associated comutational overhead and enhances the background. In either case, an algorithm based uon the SSD technique ma fail due to inaccurate dislacements. Such an algorithm ma also fail due to too large a latenc in the sstem or dislacements resulting from motions of the object that are too large for the method to accuratel cature. We include techniques related to search otimizations and dnamic ramiding to counter these concerns. Search otimizations he rimar source of latenc in a vision sstem that uses the SSD measure is the time needed to identif ( uv, ) in equation (2.6). o find the true minimum, the SSD measure must be calculated over each ossible ( uv, ). he time required to roduce an SSD surface and to find the minimum can be greatl reduced b emloing search-secific otimizations that, when combined, divide the search time significantl in the exected case. hese search-secific otimizations are resented in [20]. hese otimizations allow the vision sstem described later in the aer to track three to four features at RS-170 video rates (16 msec er field, two fields er frame, 33 msec total with vertical blanking) without video under-samling. Dnamic ramiding Dnamic ramiding is a heuristic technique that attemts to resolve the conflict between accurate ositioning of a maniulator and high-seed tracking when the dislacements of the feature oints are large. Previous alications have ticall deended uon one reset level of ramiding to enhance either the to tracking seed of the maniulator or the ositioning accurac of the end-effector above the target [16]. We use the dnamic ramiding method described in [20] to resolve this conflict.

strain the motion of the maniulator such that the dislacements of the feature oint onl occur orthogonal to the directional feature. his ensures that the onl comonent of motion is recisel that which be recovered under consideration of the aerture roblem. his ideal motion is not alwas achievable; therefore, motions will occur that are aroximatel, but not comletel, orthogonal to the directional feature. In resonse to this uncertaint, SSD matches that lie closer to the direction of the ideal maniulator motion will be referred. Directionall constrained features are not used during the tracking of moving objects since the motion of such objects cannot be characterized until after the selection of tracking features. If used, multile features with dissimilar directional measure are needed, increasing overhead b at least a factor of two. herefore, onl features that are determined to be omnidirectional (e.g., a corner oint) should be used for tracking alications. 4. Modeling and controller design Modeling of the sstem is critical to the design of a controller to erform the task at hand. In our alication areas, the modeling and controller designs are similar, but distinct. We resent onl the deth recover modeling and controller design. Paanikolooulos et al. [15] rovide an in-deth treatment of the modeling and controller design for robotic visual tracking alications. Several robotic servoing roblems require accurate information regarding the deth of the object of interest. When this deth is not rovided a riori, it must be recovered from observations of the environment via sensing. raditional vision-based deth recover techniques in robotic sstems ma suffer from various roblems, including comutational overhead, exensive calibration, frequent recalibration, obstruction, and solution of the corresondence roblem. We resent an alternative algorithm for deth recover using the Controlled Active Vision framework. Consider an object at an unknown deth with a feature oint, P, on the surface of the object. B moving the visual sensor, a sequence of images and the resective rojections of P can be effected. B observing the changes of P s rojections in successive images, an estimate of the deth of P is derived. Over multile images, this estimate can be refined using adative techniques [19]. Previous work in deth recover using active vision relied uon random camera dislacements to rovide dislacement changes in the (k) s [7][12][18][22][23]. In our aroach, we select features automaticall, roduce feature trajectories (a different trajector for ever feature), and redict future dislacements using the deth estimate [19]. he errors in these redicted dislacements, in conjunction with the estimated deth, are included as inuts in the control loo of the sstem. hus, the next calculated movement of the camera will attemt to eliminate the largest ossible ortion of the observed error while adhering to various constraints. his uroseful movement of the visual sensor rovides more accurate deth estimates and faster convergence. For deth recover, we use the SSD otical flow to measure dislacements of feature oints in the image lane. For a single feature oint,, we assume that the otical flow at time instant k is (u(k), v(k)), where is the time between two successive frames. At time instant k, the otical flow is given b: uk ( ) = u o ( k) + u c ( k) (4.1) vk ( ) = v o ( k) + v c ( k). (4.2) Since we are recovering the deth of feature oints on a static object, the comonents of otical flow due to the movement of the object (u o (k) and v o (k)) are zero. Also, equations (4.1) and (4.2) do not account for the comutational delas involved with the calculation of the otical flow. B taking these delas into account, and b eliminating the flow comonents due to object motion, these equations become uk ( ) = u c (( k d + 1) ) (4.3) vk ( ) = v c (( k d + 1) ) (4.4) where d is the comutational dela factor (d {1,2,...}). In order to simlif the notation, the index k will be used instead of k. B utilizing the relations uk ( ) = xk ( + 1) xk ( ) ------------------------------------ vk ( ) = k ( + 1) k ( ) ------------------------------------ and including the inaccuracies of the model (e.g., neglected accelerations, inaccurate robot control) as white noise, equations (4.3) and (4.4) become xk ( + 1) = k ( + 1) = xk ( ) + u c ( k d + 1) + v 1 ( k) k ( ) + v c ( k d + 1) + v 2 ( k) (4.5) (4.6) where v 1 (k) and v 2 (k) are white noise terms with variances σ 2 1 and σ 2 2,resectivel. he above equations ma be written in state-sace form as ( k + 1) = A( k)( k) + B( k d+ 1)u c ( k d+ 1) + H( k)v( k) (4.7) where A( k) = H( k) = I 2, ( k) and v( k) R 2, and u c ( k d+ 1) R 6. he matrix B( k) R 6 is given b: --------------- 0 Z W ()s k x 0 Z W ()s k Z W () ()()s k --------------------- x k ----------- x k Z W () k f () k --------------- ----------- f 2 ()s k -- + ------------- () k s f ()s + ------------- x f (4.8) he vector ( k) = ( xk ( ), k ( )) is the state vector, u c ( k d + 1) = ( x ( k d + 1), ( k d + 1), 0000,,, ) is the control inut vector (onl movement in the x- lane of the end-effector frame is used), and vk ( ) = ( v 1 ( k), v 2 ( k) ) is the white noise vector. he term, B( k d+ 1)u c ( k d+ 1) R 6 can be simlified to B( k d+ 1)u c ( k d+ 1) = (4.9) since Z W does not var over time and onl the x ( k d + 1) and ( k d + 1) control inuts are used. B substitution into equation (4.7) and simlification, we derive the following: xk ( + 1) k ( + 1) = xk ( ) k ( ) (4.10) he revious equation is used under the assumtion that the otical flow induced b the motion of the camera does not change significantl in the time interval. Additionall, the interval should be as small as ossible so that the relations that rovide f -- s x x 2 k x() k k ()s ------------------------ x ------------- Z W s x k x ------------- Z W s k f ( d+ 1) ( d+ 1) ------------ Z W s x ( k d + 1) + x + ------------ Z W s ( k d + 1) ()s k ----------- s x. x()s k ----------- x s v 1 ( k) v 2 ( k).

x ( k d + 1) and ( k d + 1) are as accurate as ossible. he arameter has as its lower bound 16 msec (the samling rate of standard video equiment). If the uer bound on can be reduced such that it is lower than the samling rate of the maniulator controller (28 msec for the PUMA 560), then the sstem will no longer be constrained b the seed of the vision hardware. he objective is to design a secific trajector (a set of desired ( k) = ( x ( k), ( k) ) for successive k s) for ever individual feature in order to identif the deth arameter Z W. hus, we have to design a control law that forces the ee-in-hand sstem to track the desired trajector for ever feature. Based on the information from the automatic feature selection rocedure we select a trajector for the feature. For examle, the ideal trajector for a corner feature is deicted in Figure 1. If the feature belongs to an edge arallel to the -axis, then k ( ) is held constant at init. hen, the control objective function is selected to be Fo ( k + d) E = ( m ( k + d) ( k + d) ) Q (4.11) where E{ Y} ( denotes ( k + d) the exected ( k + d) ) value + u ( of k)l the urandom ( k)} variable Y, m ( k) is the measured state vector, and Q, L d are control weighting matrices. Based on the equation (4.10) and the minimization of the control objective function F o ( k + d) (with resect to the vector u c (k)), we can derive the following control law: u c ( k) = [ Bˆ ( k)qbˆ ( k) + L d ] Bˆ ( k)q d 1 (4.12) ( m ( k) ( k + d) ) + Bˆ ( k i)u c ( k i) i = 1 where Bˆ ( k) is the estimated value of the matrix B( k). he matrix Bˆ ( k) deends on the estimated value of the deth Ẑ W. he estimation of the deth arameter is erformed b using the rocedure described in [16]. In order to use this rocedure, we must rewrite equation (4.10) as follows (for one feature oint): m ( k) = ζ W u cnew ( k d) + n( k) where m ( k) = m ( k) m ( k 1) ζ W = f ( s x Z W ) n( k) N( 0Nk, ( )) x ( k) (4.13) u cnew ( k) = s x ------- s ( k) he objective of the estimation scheme is to estimate the arameter ζ W for each individual feature. When this comutation is comleted, it is trivial to comute the arameter 1 Z W that is needed for the construction of the deth mas. It is assumed that the matrix N( k) is constant ( N( k) = N ). We use the estimation scheme described in [11][13] (for one feature oint):. Figure 2: wo can target u r( k) = r( k 1) + sk ( 1) u r( k) = ( r( k) ) 1 + u cnew ( k d)n 1 u cnew ( k d)} 1 g u ( k) = r( k) u cnew ( k d)n 1 u ζˆ W( k) = ζˆ W( k) + g ( k) (4.15) (4.16) (4.17) (4.18) m ( k) ζˆ W( k)u cnew ( k d)} where the suerscrit denotes the redicted value of a variable, the suerscrit u denotes the udated value of a variable, and s (k) is a covariance scalar. he initial conditions are described in [16]. he term r( 0) can be viewed as a measure of the confidence that we have in the initial estimate ζˆ W( 0). he deth recover rocess does not require camera calibration nor recisel known environmental arameters. Camera arameters are taken directl from the manufacturer s documentation for both the CCD arra and the lens. Exerimental results under this rocess do not suffer significantl due to these estimates, as shown in the following sections. 5. Exerimental design and results he MRV sstem We have imlemented both the deth recover and the visual tracking on the Minnesota Robotic Visual racker (MRV) sstem [2]. he MRV is a multi-architectural sstem that consists of two main arts: the Robot/Control Subsstem (RCS) and the Vision Processing Subsstem (VPS). Derivation of deth he initial exerimental runs were conducted using two soda cans wraed in a checkerboard surface as targets. he leading edges of the cans were laced at 53 cm and 41 cm in deth (see Figure 2). he majorit of the errors in the calculated deths were on the order of sub-ixel in magnitude. For these runs, the initial deth estimate was 25 cm. he reconstructed surfaces of the dual can exeriment are shown in Figure 3. he reconstructed surfaces ζˆ u W( k) = ζˆ W( k 1) (4.14) x* (k) * (k) x* max * max x* init * init k 1 k 2 k F k k 1 k 2 Figure 1: Desired trajectories of the features k F k Figure 3: Reconstructed surfaces

Figure 4: Dual cans with real texture Figure 5: Reconstructed surfaces show that the rocess has catured the curvature of both cans in this exeriment. We dulicated the first set of exeriments without the checkerboard surfaces wraed around the cans. Instead, we used the actual surfaces of the soda cans under normal lighting conditions. his reduced the number of suitable feature oints as well as roducing a non-uniform distribution of feature oints across the surface of the cans. Additionall, the surfaces of the cans exhibited secularities and reflections that added noise to the dislacement measures. he targets (Figure 4) and the result (Figure 5) demonstrate that the additional noise and the distribution of suitable feature oints affected the recover of the curvature on the two surfaces. he errors in these deth measures were ticall on the order of a single ixel in nature. Visual tracking We conducted multile runs for the tracking of objects that exhibited unknown two- and three-dimensional motion with a coarse estimate of the deth of the objects. he targets for these runs included books, batons, and a comuter-generated target dislaed on a large video monitor. During the exeriments, we tested the sstem using targets with linear and curved trajectories. he first exeriments were conducted using a book and a baton as targets in order to test the erformance of the sstem both with and without the dnamic ramiding. hese initial exeriments served to confirm the feasibilit of erforming real-time ramid level switching and to collect data on the erformance of the sstem under the ramiding and search otimizing schemes. Figure 6 shows both the target trajector and maniulator ath and the -200-300 -400 0-600 100 200 300 400 800 600 400 200 sec 1 2 3 4 5 Figure 6: Satial lot and distance vs. time lot distance versus time lot from one of these exeriments. With the ramiding and search otimizations, the sstem was able to track the corner of a book that was moving along a linear trajector at 80 cm/sec. In contrast, the maximum seed reorted b Paanikolooulos [16] was 7 cm/sec, reresenting an order of magnitude imrovement. he dashed line is the target trajector and the solid line reresents the maniulator trajector. he start of the exeriment is in the uer left corner and the end is where the target and maniulator trajectories come together in the lower right. he oscillation at the beginning is due to the switch to higher ramiding levels as the sstem comensates for the seed of the target. he oscillations at the end are roduced when the target slows to a sto and the ramiding dros down to the lowest level. his results in the maniulator centering above the feature accuratel due to the higher resolution available at the lowest ramid level. Once sstem erformance met our minimal requirements, testing roceeded to a comuter-generated target where the trajector and seed could be accuratel controlled. he target (a white box) traced either a square or a circle on the video monitor while the PUMA tracked the target he first exeriments traced a square and demonstrated robust tracking in site of the target exhibiting infinite accelerations (at initial start-u) and discontinuities in the velocit curves (at the corners of the square). he results in Figure 7 clearl show the oscillator nature of the maniulator trajector at the oints where the velocit curve of the target is discontinuous. he seed of the target in these exeriments aroached 15 cm/sec. Another set of exeriments was conducted using the same setu, onl with a target that traced a circle and exhibited Z-axis rotations. In these exeriments, an aroriate controller was used to track both the translational and the rotational motion. A descrition of this controller can be found in [16]. he resulting data has been searated into the three comonents of motion. Figure 8 shows the comonents of translational motion along the X-axis and Y-axis, and the rotational comonent of motion about the Z-axis. he target increases seed over the first 3 circular revolutions to lessen the effect of the infinite accelerations mentioned earlier. Also, there is no rotational comonent of motion during this wind-u. he horizontal scale in these lots has a unit of ccles, corresonding to the 28 msec ccle of the PUMA s Unimate Comuter/Controller. 6. Conclusion his aer resents robust techniques for the oeration of robotic agents in uncalibrated environments. he techniques resented rovide was of recovering unknown worksace arameters using the Controlled Active Vision framework [16]. In articular, this aer resents novel techniques for comuting deth mas and 1 100-1 100 1-1 100 1-1 1 100-1 Figure 7: arget and maniulator aths

200 1 100-1 -1-200 -2 deg -10-20 -30-40 X-comonent X-ranslation Comonent ccles 0 1000 10 2000 20 3000 ccles Y-comonent Y-ranslation Comonent ccles 0 1000 10 2000 20 3000 ccles Rotational comonent Z-Rotation Comonent ccles 0 1000 10 2000 20 3000 ccles Figure 8: Comonents of motion for visual tracking through robotic exloration with a camera. For the comutation of deth mas, we roose a scheme that is based on the automatic selection of features and design of secific trajectories on the image lane for each individual feature. Unlike similar aroaches [8][11][12][18][22], this aroach hels to design trajectories that rovide maximum identifiabilit of the deth arameter. During the execution of the secific trajector, the deth arameter is comuted with the hel of a simle estimation scheme that takes into consideration the revious movements of the camera and the comutational delas. he aroach has been tested and several exerimental results have been resented. For visual tracking, we roose a technique based uon earlier work in visual servoing [16] that achieves suerior seed and accurac through the introduction of several erformance enhancing techniques. In articular, the dnamic ramiding technique rovides a satisfactor comromise to the seed/accurac tradeoff inherent in static ramiding techniques. hese otimizations al to various region-based vision rocessing alications and in our alication, rovide the seedu required to increase directl the effectiveness of the real-time vision sstem. Issues for future research include the automatic selection of the feature window size (an issue discussed in [14]) in order to select a window that has some texture variations, and the imlicit incororation of the robot dnamics. 7. Acknowledgments his work has been suorted b the Deartment of Energ (Sandia National Laboratories) through Contract #AC-3752D, the National Science Foundation through Contracts #IRI-9410003 and #CDA-9222922, the Center for ransortation Studies through Contract #USDO/DRS 93-G-0017-1, the Arm High Performance Comuting Research Center, the 3M Cororation, the Graduate School of the Universit of Minnesota, and the Deartment of Comuter Science of the Universit of Minnesota. 8. References [1] P. Anandan, Measuring visual motion from image sequences, echnical Reort COINS-R-87-21, COINS Deartment, Universit of Massachusetts, 1987. [2] S. Brandt, C. Smith, and N. Paanikolooulos, he Minnesota Robotic Visual racker: a flexible testbed for vision-guided robotic research, o aear, Proceedings of the IEEE International Conference on Sstems, Man, and Cbernetics, 1994. [3] J.J. Craig, Introduction to robotics: mechanics and control, Addison-Wesle, Reading, MA, 1985. [4] J.. Feddema and C.S.G. Lee, Adative image feature rediction and control for visual tracking with a hand-ee coordinated camera, IEEE ransactions on Sstems, Man, and Cbernetics, 20(5):1172-1183, 1990. [5] D.B. Genner, racking known three-dimensional objects, Proceedings of the AAAI 2nd National Conference on AI, 13-17, 1982. [6] J. Heel, Dnamic motion vision, Robotic and Autonomous Sstems, 6(3):297-314, 1990. [7] C.P. Jerian and R. Jain, Structure from motion a critical analsis of methods, IEEE ransactions on Sstems, Man, and Cbernetics, 21(3):572-588, 1991. [8] K.N. Kutulakos and C.R. Der, Recovering shae b urosive viewoint adjustment, Proceedings of the IEEE Conference on Comuter Vision and Pattern Recognition, 16-22, 1992. [9] R.C. Luo, R.E. Mullen Jr., and D.E. Wessel, An adative robotic tracking sstem using otical flow, Proceedings of the IEEE International Conference on Robotics and Automation, 568-573, 1988. [10] D. Marr, Vision, W.H. Freeman and Coman, San Francisco, CA, 1982. [11] L. Matthies,. Kanade, and R. Szeliski, Kalman filter-based algorithms for estimating deth from image sequences, International Journal of Comuter Vision, 3(3):209-238, 1989. [12] L. Matthies, Passive stereo range imaging for semi-autonomous land navigation, Journal of Robotic Sstems, 9(6):787-816, 1992. [13] P.S. Mabeck, Stochastic models, estimation, and control, Vol. 1, Academic Press, Orlando, FL, 1979. [14] M. Okutomi and. Kanade, A locall adative window for signal matching, International Journal of Comuter Vision, 7(2):143-162, 1992. [15] N. Paanikolooulos, P.K. Khosla, and. Kanade, Visual tracking of a moving target b a camera mounted on a robot: a combination of control and vision, IEEE ransactions on Robotics and Automation, 9(1):14-35, 1993. [16] N. Paanikolooulos, Controlled active vision, Ph.D. hesis, Deartment of Electrical and Comuter Engineering, Carnegie Mellon Universit, 1992. [17] J.W. Roach and J.K. Aggarwal, Comuter tracking of objects moving in sace, IEEE ransactions on Pattern Analsis and Machine Intelligence, 2(6):583-588, 1979. [18] C. Shekhar and R. Chellaa, Passive ranging using a moving camera, Journal of Robotic Sstems, 9(6):729-752, 1992. [19] C. Smith and N. Paanikolooulos, Comutation of shae through controlled active exloration, Proceedings of the IEEE International Conference on Robotics and Automation, 2516-21, 1994. [20] C. Smith, S. Brandt, and N. Paanikolooulos, Controlled active exloration of uncalibrated environments, Proceedings of the IEEE International Conference on Comuter Vision and Pattern Recognition, 792-795, 1994. [21] D.B. Stewart, D.E.Schmitz, and P.K. Khosla, CHIMERA II: A realtime multirocessing environment for sensor-based robot control, Proceedings of the Fourth International Smosium on Intelligent Control, 265-271, Setember, 1989. [22] C. omasi and. Kanade, Shae and motion from image streams under orthograh: a factorization method, International Journal of Comuter Vision, 9(2):137-154, 1992. [23] D. Weinshall, Direct comutation of qualitative 3-D shae and motion invariants, IEEE ransactions on Pattern Analsis and Machine Intelligence, 13(12):1236-1240, 1991.