Scale Drift-Aware Large Scale Monocular SLAM
|
|
|
- Lynn Henderson
- 9 years ago
- Views:
Transcription
1 Scale Drift-Aware Large Scale Monocular SLAM Hauke Strasdat Department of Computing, Imperial College London, UK J.M.M. Montiel Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Spain Andrew J. Davison Department of Computing, Imperial College London, UK Abstract State of the art visual SLAM systems have recently been presented which are capable of accurate, large-scale and real-time performance, but most of these require stereo vision. Important application areas in robotics and beyond open up if similar performance can be demonstrated using monocular vision, since a single camera will always be cheaper, more compact and easier to calibrate than a multi-camera rig. With high quality estimation, a single camera moving through a static scene of course effectively provides its own stereo geometry via frames distributed over time. However, a classic issue with monocular visual SLAM is that due to the purely projective nature of a single camera, motion estimates and map structure can only be recovered up to scale. Without the known inter-camera distance of a stereo rig to serve as an anchor, the scale of locally constructed map portions and the corresponding motion estimates is therefore liable to drift over time. In this paper we describe a new near real-time visual SLAM system which adopts the continuous keyframe approach of the best current stereo systems, but accounts for the additional challenges presented by monocular input. In particular, we present a new pose-graph technique which allows for the efficient correction of rotation, translation and scale drift at loop closures. Especially, we describe the Lie group of similarity transformations and its relation to the corresponding Lie algebra. We also present in detail the system s new image processing front-end which is able accurately to track hundreds of features per frame, and a filter-based approach for feature initialisation within keyframe-based SLAM. Our approach is proven via large-scale simulation and real-world experiments where a camera completes large looped trajectories. I. INTRODUCTION Visual SLAM with a single camera is more challenging than when stereo vision can be used, but successful solutions have the potential to make a much wider impact because of the wealth of application domains in robotics and beyond where a single camera can more cheaply, compactly and conveniently be installed. Monocular SLAM research, which naturally overlaps with the highly developed Structure from Motion (SfM) field in computer vision, has seen great progress over the last ten years, and several impressive real-time systems have emerged as researchers have managed to transfer the best sequential SLAM techniques to the single camera domain. However, the successful sequential systems presented have mostly either been limited to operation in a local workspace (e.g. [7, 14, 9, 2]) or, more rarely, have been designed for open-loop visual odometry exploration of larger areas [21, 3]. Meanwhile, in stereo vision-based SLAM there have now been systems presented which offer end to end solutions to real-time large scale SLAM. Probably the most complete is that of Mei et al. [18], which combines robust and accurate local visual odometry with constant-time large-scale mapping (enabled by a continuous relative map representation), highperformance appearance-based loop closure detection, and global metric map if required. Another similar system is that presented by Konolige and Agrawal [16]. It has proven more difficult to achieve real-time large-scale mapping with a monocular camera, due to its nature as a purely projective sensor. Geometry does not just pop out of the data from a moving camera, but must be inferred over time from multiple images. Difficulties had to be overcome before a probabilistic sequential approach could successfully be implemented for monocular SLAM due to the fact that image features must be observed from multiple viewpoints with parallax before fully constrained 3D landmark estimates can be made. Special attention was given to feature initialisation schemes which permitted sequential probabilistic filtering of the joint camera and map state [7, 28, 20]. Beyond this issue, the fact that a single camera does not measure metric scale means that monocular mapping must either proceed with scale as an undetermined factor or that scale has to be introduced by an additional information source such as a calibration object as in [7] or even by exploiting nonholonomic motion constraints [27]. These issues, and the fact that monocular maps are just generally less well constrained than those built by metric sensors, meant that it was more difficult than in SLAM systems with other sensors to build algorithms which could tackle large areas by composing local fragments. One of the first attempts to put a whole large scale monocular SLAM together was by Clemente et al. [4], who took a hierarchical mapping approach where locally filtered submaps were built and joined by an upper level joint map. Map matching based on feature correspondences between submaps allowed global correction of motion and scale drift around a large loop. In [24] a more accurate later evolution of this work based on conditional independent local maps was presented. Eade and Drummond [10, 8] presented a system which had several similarities to these approaches, but also many improvements, in that it was able to operate automatically in true real-time, incorporating effective loopclosing and re-localisation methods, as well as making use of non-linear techniques at various different levels to improve accuracy. At its core, though, it built a graph of local
2 filtered submaps, each with a state vector and covariance, and with the similarity transforms between neighbouring submaps estimated via feature correspondences. A. SLAM via Optimisation Recently, Strasdat et al. [29] presented results which indicate, on the basis of a systematic investigation into the accuracy of local map components versus computational cost, that the keyframes + approach of Klein and Murray [14] based ultimately on well known bundle adjustment methods [30] is strongly advantageous compared to methods like those of Eade [8] or Clemente [4] whose building blocks are locally filtered submaps. Essentially, the accuracy analysis in [29] shows that it is the ability to harvest measurements from large numbers of image features per frame that plays the major role in increasing accuracy in local monocular motion estimation, while incorporating information from a large number of closely spaced camera frames provides less information for a given computation budget. This plays into the hands of the methods which cope much more efficiently with large numbers of features than filters can. In light of this analysis, the previous large scale monocular systems [4] and [8] can be seen as somewhat unsatisfactory approaches, which combine filtering at a local level with at the global level. For instance, when loop closures are imposed, the relative positions of their filtered submaps changes, but any drift within the local maps themselves cannot be resolved. In this paper, we have therefore resolved to take a keyframe approach from top to bottom, aiming for maximum accuracy while taking into full account the special character of SLAM with monocular vision. B. Gauge Freedoms, Monocular SLAM and Scale Drift Metric SLAM systems aim to build coherent maps, in a single coordinate frame, of the areas that a robot moves through. But they must normally do this based on purely relative measurements of the locations of scene entities observable by their on-board sensors. There will therefore always be certain degrees of gauge freedom in the maps that they create, even when the best possible job is done of estimation. These gauge freedoms are degrees of transformation freedom through which the whole map, consisting of feature and robot position estimates taken together, can be transformed without affecting the values of the sensor measurements. In SLAM performed by a robot moving on a ground plane and equipped with a range-bearing sensor, there are three degrees of gauge freedom, since the location of the entire map with regard to translations and rotation in the plane is undetermined by the sensor measurements. In SLAM by a robot moving in 3D and equipped with a sensor like calibrated stereo vision or a 3D laser range-finder, there are six degrees of gauge freedom, since the whole map could experience a rigid body transformation in 3D space. In monocular SLAM, however, there are fundamentally seven degrees of gauge freedom [30], since the overall scale of the map, as well as a 6 DoF (degrees of freedom) rigid transformation, is undetermined (scale and a rigid transformation taken together are often known as a similarity transformation). It is the number of gauge degrees of freedom in a particular type of SLAM which therefore determines the ways in which drift will inevitably occur between different fragments of a map. Consider two distant fragments in a large map built continuously by a single robot: local measurements in each of the fragments have no effect in pulling either towards a particular location in the degrees of gauge freedom. If they are not too distant from each other, they will share some coherence in these degrees of freedom, but only via compounded local measurements along the chain of fragments connecting them. The amount of drift in each of these degrees of freedom will grow depending on the distance between the fragments, and the distribution of the potential drift can be calculated if needed by uncertainty propagation along the chain. It is very well known that planar maps built by a groundbased robot drift in three degrees of freedom; so maps built by a monocular camera with no additional information drift in seven degrees of freedom. It is through these degrees of freedom therefore which loop closure s must adjust local map fragments (poses or submaps in a graph). II. GENERAL FRAMEWORK FOR STATE ESTIMATION USING OPTIMISATION In this section, we give a short review of state estimation using. The goals of this are first to define common notation and terminology for the rest of the paper, and second to clarify the natural close relationship between the different types of our approach uses, such as bundle adjustment and pose-graph. In a general estimation problem, we would like to estimate a vector of parameters p given a vector of measurements f, whereweknowtheformofthelikelihoodfunctionp(f p).the most probable solution is the set of values p which maximises this likelihood, which is equivalent to minimising the negative log-likelihood: arg max p p(f p) = argmin( logp(f p)). (1) p In many problems, negative log-likelihood is known as energy and the goal is to minimise it, which is fully equivalent to maximising the probability of the solution. A. Least Square Problems and Gauss-Newton Optimisation We now speak in more specific but still very widely applicable terms, and assume that the likelihood distribution p(f p) is jointly Gaussian, and thus has the distribution (up to a constant of proportionality): p(f p) exp((f ˆf(p)) Λ f (f ˆf(p))), (2) where Λ f = Σ 1 f, the information matrix or inverse of the measurement covariance matrix Σ f of the likelihood distribution, and ˆf(p) is the measurement function which computes (predicts) the distribution of measurements f given a set of parameters p.
3 The negative log-likelihood, or energy, χ 2 (p) = logp(f p) therefore has the quadratic form: χ 2 (p) = (f ˆf(p)) Λ f (f ˆf(p)). (3) If Λ f is block-diagonal, χ 2 (p) simplifies to: χ 2 (p) = i (f i ˆf i (p i )) Λ f,i (f i ˆf i (p i )). (4) This is the case if f consists of a stack of measurements which are independent, which can very commonly be assumed when many measurements are made from a calibrated sensor of particular interest here, where each measurement is the image location of a scene feature in one frame taken by a camera. Further, and again in our case, Λ f can very often be taken to equal the identity matrix. This implies that the uncertainty in each individual parameter of every measurement has exactly the same value, and that they are all independent. The absolute magnitude of the values on the diagonal of Λ f does not affect the result so they can be set to one. Now, χ 2 (p) is simply a sum of squares, and minimising it is called nonlinear least squares. A common technique for non-linear least squares is the Gauss-Newton method, which is an approximation of the Newton method. This consists of iteratively updating an initial estimate of the parameter vector p by the rule p (i+1) = p (i) +δ, (5) where at each step the update vector δ is found by solving the normal equation: ( J p Λ f J p ) δ = J p Λ f r. (6) Here, J p = r p and r = f ˆf(p) is the residual error. All estimation described in this paper is done using a variant of Gauss-Newton called Levenberg-Marquardt (LM), which alters the normal equation as follows: ( J p Λ f J p +µi ) δ = J pλ f r. (7) The parameter µ rotates the update vector δ towards the direction of the steepest descent. Thus, if µ 0 pure Gauss- Newton is preformed, whereas if µ, gradient descent is used. In LM, we only perform an update p (i) + δ if it will be successful in significantly reducing the residual error. After a successful update, µ is decreased; otherwise µ is increased. The motivation behind this heuristic is that Gauss-Newton can approach a quadratic rate of convergence, but gradient descent behaves much better far from the minimum. B. Covariance Back-propagation So far, we have considered only how to find the single most probable set of parameters p. If there is no gauge freedom in the minimisation argmin p χ 2 (p), then J pλ f J p has full rank and we can determine a full distribution for p. A first-order approximation of the covariance Σ p can be calculated using covariance back-propagation: Σ p = (J pλ f J p ) 1. (8) Note that if p is high-dimensional, this calculation is very expensive because of the large matrix inversion required. C. Optimising Rigid Body Transformations in R 3 In, especially in 3D SLAM, the correct treatment of angles consistently causes confusion. On one hand, a minimal parametrisation is desired, but also singularities should be avoided. Interpolation of angles is not straightforward, since the group of rotation is only locally Euclidean. Probably the most elegant way to represent rigid body transformations in is using a Lie group/algebra representation. It is well known that the Newton method has a quadratic convergence rate given we optimise over a Euclidean space. However, this result can be generalised to general Lie groups [17]. A rigid body transformation in R n can be expressed as an (n+1) (n+1) matrix which can be applied to homogeneous position vectors: [ ] R t T = with R SO(n), t R n, (9) and SO(n) being the Lie group of rotation matrices. The rigid body transformations in R n form a smooth manifold and therefore a Lie group the Special Euclidean group SE(n) [11]. The group operator is the matrix multiplication. A minimal representation of this transformation is defined by the corresponding Lie algebra se(n) which is the tangent space of SE(n) at the identity. In R 3, the algebra elements are 6-vectors (ω,υ) : ω = (ω 1,ω 2,ω 3 ) is the axis-angle representation for rotation, and υ is a rotated version of the translation t. Elements of the se(3) algebra can be mapped to the SE(3) group using the exponential mapping exp SE(3) : exp SE(3) (ω,υ) := [ expso(3) (ω) Vυ ] = [ R t (10) Here, exp SO(3) (ω) = I + sin(θ) θ (ω) + 1 cos(θ) θ (ω) 2 2 is the well-known Rodrigues formula, V = I + 1 cos(θ) θ (ω) 2 + θ sin(θ) θ (ω) 2, θ = ω 3 2, and ( ) is an operator which maps a 3-vector to its skew-symmetric matrix. Since exp SE(3) is surjective, there exists also an inverse relation log SE(3). During, incremental updates δ are calculated in the tangent space around the identity se(3) and mapped back onto the manifold SE(3), leading to a modified version of (5): T (i+1) = exp SE(3) (δ) T (i). (11) In this way, we avoid singularities since δ is always close to the identity while ensuring a minimal representation during. The Jacobian of (11) is linear in T, exp SE(3) (δ) T δ = δ=0 (r 1 ) O 3 3 (r 2 ) O 3 3 (r 3 ) O 3 3 (t) I 3 3 ]., (12) where (r 1,r 2,r 3 ) := R. A general Jacobian f(exp SE(3) (δ) T) δ can be easily determined using the chain rule.
4 III. A KEYFRAME OPTIMISATION SYSTEM FOR LARGE SCALE MONOCULAR SLAM We will now describe the elements of our complete system for sequential monocular SLAM. Similar to Klein and Murray s PTAM system [14], our SLAM framework consists of two parts: a front-end which tracks features in the image and estimates the pose of each live frame given a set of known 3D points, and a back-end which performs joint bundle adjustment over a set of keyframes. In PTAM which focuses on accurate and fast mapping in a small environment, the front-end only performs pose and feature tracking, while the back-end does everything else such as feature initialisation and dropping of new keyframes. Since our approach focusses on mapping large scenes, we use the opposite emphasis. The back-end does only bundle adjustment, but the tracking front-end takes over all important decisions such as when to drop new keyframes and which feature to initialise. This design puts a higher computational burden on the tracking front-end, but facilitates exploration since the stability of the tracking front-end does not depend on the backend to drop new keyframes. A. The Camera Projection Function and Camera Poses Points in the world x j R 3 are mapped to the camera image using the observation function ẑ: ẑ(t i,x j ) = proj(k T i x j ). (13) Here, x j is a homogeneous point, T i SE(3), K is the camera calibration matrix (which we assume is known from prior calibration) and proj is the 3D-2D projection: proj(a) := 1 a 3 (a 1,a 2 ) for a R 3. (14) We represent the pose of the camera at a given time-step i by the rigid body transformation T i. Note that the position of the camera center is not explicitly represented, but can be recovered as R i t i [13, pp.155]. B. Optimisation Back-end In the back-end, we perform an jointly over a set of keyframes T key:i and large set of points x j called bundle adjustment (BA) [30]. In order to be constant-time, the number of keyframes included in BA is restricted. Since we focus on exploration, we optimise over a sliding window of keyframes. At loop closure, different measures have to be taken (see Sec. IV). In general in BA, not all points are visible in every frame. LetZbeasparsematrixwhereanentryiseitheranobservation z i,j if point x j is visible in frame T key:i or zero otherwise. Let I be the sequence of non-zero row indices of Z: I = {(i) : Z i,j 0}. The column index list J is defined similarly. Now, it is obvious how to define the stacked projection function: ẑ(t key:1,...,t key:m,x 1,...,x n,i,j). (15) We solve BA conventionally using LM by minimising χ 2 (p) = (i,j) I J (z i,j ẑ(t key:1,...,t key:m,x 1,...,x n,i,j)) 2 (16) with respect to the parameter vector p = (T key:3,..t key:m,x 1,...,x n ). As a sliding window BA is applied, the window has to be anchored to the previous map. Thus, at least 7 DoF (rotation, translation and scale) should be fixed. Because of this, T key:1,t key:2 are excluded from the. In order to guard against spurious matches, we use the pseudo-huber cost function [13, p.619] as a robust kernel. The primary sparseness pattern in BA that there are only links between points and frames, but no point-point or frame-frame constraints is exploited in the usual way using the Schur complement [30]. However, our BA implementation also exploits the secondary sparseness pattern that not every point is visible in every frame. This is done using the CSparse library [6] which performs efficient symbolic Cholesky decomposition using elimination trees. C. Feature Tracking and Pose Optimisation The main task of the tracking front-end is to estimate the current camera pose T i. However, it also deals with feature and keyframe initialisation. It takes as input a set of keyframe poses T key:i and 3D points x j X which are associated with 2D measurements z key:i,j. The poses T key:i and points x j were already jointly estimated in the back-end (see Sec. III-B). For feature tracking and pose, we use a mixture of bottom-up (for every pixel) and top-down approaches (guidedtechniques).first,weestimatetherelativepose T i = T i 1 T i between the previous and current frames. This is done by the help of dense variational optical flow implemented very efficiently on the GPU [25]. Given the pose estimate of the previous frame T i 1, we can calculate an initial pose estimate. Feature tracking is not done using frame to frame matching. Whereas frame to frame matching is very reliable for pure visual odometry, it prevents mini-loop closures. for the current frame: T [0] i = T i T 1 i 1 Given our initial pose guess T [0] i, we estimate the subset X X of landmarks which might be visible in the current frame. Based on the projection (13), we perform active search [7] for each point x j X, where the centre of the search region is defined by the prediction ẑ(t [0] i,x j ). Generally, the uncertainty of the projection and thus the size and shape of the search region depends on the uncertainty of x j and T i. Given that all points in X are already optimised using keyframe bundle adjustment, and that pose T i 1 is already optimised with respect to the X, then apart from the measurement noise Σ z, the only crucial uncertainty originates from the motion between the previous and the current frame. Let, Σ Ti be the uncertainty of the relative motion. Then the
5 a) Feature initialisation a) After a single update Fig. 1. Illustration of the feature initialisation process. First features are initialised as inverse depth points with infinite uncertainty on the depth (a). A single update leads to a significant reduction of the depth uncertainty (b). The inverse depth features are plotted using a 99.7 percent confidence interval. search area is constrained by the innovation covariance ( ẑ(t ) (,x j ) ẑ(t ),x j ) Σ Ti +Σ z, (17) δ δ with T = exp SE(3) (δ) T [0] i. The search within this region is based on template matching using normalised crosscorrelation. Using the initial guess T [0] i, the templates are perspectively warped. In order to speed-up the search, we do not apply the matching at every single pixel in the search region, but only there where FAST features [26] are detected. As a result, we receive a set of 2D-3D matches between image locations z k and feature points x k. Finally, we optimize the pose T i using motion-only BA. Thus, we minimize χ 2 (T i ) = k (z k ẑ k (T i,x j )) 2 wrt. T i using LM. Again, we use the pseudo-huber cost function as a robust kernel. D. Feature Initialisation In monocular SLAM approaches using filtering, no special treatment for feature initialisation is needed if an inverse depth representation [20] is used. New features are jointly estimated together with all other parameters, but at a cost of O(n 3 ) where n is the number of features visible. In PTAM [14], new features are triangulated between keyframes. In order to restrict the matching, a strong depth prior is used (restricting the PTAM to using features relatively close to the camera). We suggest a feature initialisation method for keyframe-based SLAM systems based on a set of three dimensional information filters which can estimate the position of arbitrarily distant features. Very recently, a similar method was briefly described by Klein and Murray [15]. Compared to our approach, their filters are applied on every frame, are one-dimensional and are only used for data association. Ultimately, we would like to update a set of partially initialised 3D points x new:j given the current camera pose T i. If T i is known, the features x new:n become independent: p(x new:1,...,x new:j,.. T i ) = p(x new:1 ) p(x new:n ). (18) Since the current camera T i is well-optimised wrt. the set of known points X (as described in the previous section), equation(18) is approximately true. Thus, our method employs a set of information filters. Each filter estimates the position of the a single landmark given the current pose estimate. In this sense, our approach is similar to FastSLAM [19]. The difference to FastSLAM is that the partially initialised features x new:j are not used for state estimation immediately. New features are only used for pose estimation after they are jointly bundle adjusted and added to X (see Sec. III-E). The design of a single information filter is greatly inspired by Eade s filtering framework [8]. Features are represented using inverse depth coordinates y = (u,v,q) wrt. the origin frame T org in which they were seen first. They can be mapped to a Euclidean point x org = 1 q (u,v,1) in this local coordinate frame T org. The uncertainty of an inverse depth feature is represented using the inverse covariance matrix Λ y = Σ 1 y. New features are jointly initialised at keyframes where FAST [26] produces candidate locations. In order to ensure an equal distribution of feature locations over the image, we employ a quadtree similar to Mei et al. [18]. Given the FAST feature location z = (u,v), we set y = (u,v,q) with q R +. The uncertainty Λ (0) y is set to diag( 1 σ, 1 z 2 σ, 0). Note that initially z 2 there is an infinite uncertainty along the feature depth, so we do not enforce any depth prior no matter which start value we assign for q. In order to save computation and to avoid divergence, inverse depth features are only updated on selected frames with sufficient parallax. We filter y by minimising: χ 2 (y) = (y ŷ) Λ y (y ŷ)+( z) Λ z ( z) (19) wrt. y using LM. Here, ŷ is set to the initial guess y [0] and z = z ẑ(t i,t 1 orgx org ). The first term in χ 2 (y) ensures that the of y is based on its prior distribution ŷ,λ y. The second term takes care that the projection error wrt. the current frame is reduced. In other words, the point y is moved along its uncertain depth rather than via its certain u,v coordinate in order to reduce the projection error in the current image. We also update the uncertainty using covariance back-propagation (see Fig. 1): ( ) ( ) z z Λ (k+1) y = Λ (k) y + Λ (k) z. (20) y y E. Keyframe Dropping A new keyframe is dropped as soon the distance of the current camera from the closest keyframe exceeds a threshold. Only inverse depth features y which could be reliably tracked and whose position is certain enough are considered for adding to X. Before, they are bundle adjusted with respect to a small number (e.g. three) of surrounding keyframes. Only those features which do not produce a large projection error in any of the surrounding keyframes are added to X. A. Loop closure detection IV. LOOP CLOSURE It is well known that loop closures can be detected effectively using appearance information only[1, 5]. These methods often rely on visual bags of words based on SIFT or SURF features. Given that we have a loop closure detection between two frames each associated with a set of SURF features, the
6 standard method would apply RANSAC in conjunction with the 5-point method [22] in order to estimate the epipolar geometry. Then the relative Euclidean motion up to an unknown scale in translation can be recovered. However, we can exploit the fact that in our SLAM system each frame is associated with a large set of three-dimensional feature points. First, we create a candidate set of SURF feature pairs by matching features between the current frame and the loop frame based on their descriptors. Then, we create a dense surface model using the three-dimensional feature points visible in the loop frame. Next, the unknown depth of the candidate SURF features of the loop frame is calculated using this dense surface model. 1 The computationally very efficient k-nearest neighbour regression algorithm proved to be sufficient for our needs. In other words, we calculate the depth of SURF features by simply interpolating the depth of nearby 3D points. Finally, a 6 DoF loop constraint T loop is calculated based on the 2D-3D SURF correspondences using the P4P-method plus RANSAC. Given this initial pose estimate, more matches can be found using perspective-warped image patches, and the pose is refined using robust. However the relative change in scale s loop due to scale drift has to be computed as well, so we also calculate the depth of the SURF features visible in the current frame using a dense surface model, giving a set of 3D-3D correspondences. We estimate s loop robustly as the median scale difference over all correspondences. B. Loop Closure Correction Let us consider a loop closure scenario in a large-scale map. After the loop closure is detected and matches are found between the current frame and the loop frame, the loop closing problem can be stated as a large BA problem. We have to optimize over all frames and points in the map. However, optimising over a large number of frames is computationally demanding. More seriously, since BA is not a convex problem, and we are far away from the global minimum, it is likely that BA will get stuck in a local minimum. One solution would be to optimise over relative constraints between poses using pose-graph [12][23]. The relative constraints between two poses T i and T j can be calculated easily: T i,j = T j T 1 i. But while an over the 6 DoF constraints would efficiently correct translational and rotational drift, it would not deal with scale drift, and would lead to an unsatisfactory overall result as we will show experimentally in Section V. Therefore, we perform based on 7 DoF constraints S, which are elements of the group of similarity transformations Sim(3): [ ] sr t S = (21) 1 The underlying assumption is that the scene structure is smooth enough. However, this assumption is not only vital if dense surface models are used, but always if we aim to calculate the 3D position of a blob feature no matter which method is used. In other words, the 3D position of a blob is only properly defined if its carrying surface is smooth enough. where s R + is a positive scale factor. A minimal representation sim(3) can be defined be the following 7-vector (ω,σ,υ) with s = e σ. Indeed, it can be shown that Sim(3) is a Lie group, sim(3) is the corresponding Lie algebra and their exponential map is: exp Sim(3) ω σ υ = [ e σ exp SO(3) (ω) Wυ ] = ( c (b 1)σ+aθ [ sr t ) (ω) 2 σ 2 +θ 2 θ 2 ] (22) + ci, with W = aσ+(1 b)θ θ(σ 2 +θ 2 ) (ω) + a = e σ sin(θ), b = e σ cos(θ) and c = eσ 1 σ. Furthermore, exp Sim (3) is surjective, so an inverse relation log Sim(3) exists. In order to prepare for the 7 DoF, we transform each absolute pose T i to an absolute similarity S i, and each relative pose constraint T i,j to a relative similarity constraint S i,j by leaving rotation and translation unchanged and setting the scale s = 1. Only for the loop constraint S loop do we set the scale to s loop 1 (using the estimate obtained as explained in the previous section). We define the residual r i,j between two transformations S i and S j minimally in the tangent space sim(3): r i,j = log Sim(3) ( S i,j S i S 1 j ). (23) The graph of similarity constraints is optimised by minimising: χ 2 (S 2,...,S m ) = i,j r i,jλ Si,j r i,j (24) using LM. The first transform S 1 is fixed and defines the coordinate frame. The corresponding Jacobian is sparse. We exploit the sparseness pattern using the CSparse library. After the similarities S cor i are corrected, we also need to correct the set of points. For each point x j, a frame T i is selected in which it is visible. Now we can map each point relative to its corrected frame: x cor j = (S cor i ) 1 (T i x j ). Afterwards, each similarity transform S cor i is transformed back to a rigid body transform T cor i by setting the translation to st and leaving the rotation unchanged. Finally, the whole map can be further optimised using structure-only or full BA. V. EXPERIMENTS We performed the evaluation of our SLAM system using the Keble College data set [4] a sequence where a handheld sideways-facing camera completes a circuit around a large outdoor square. Images were captured using a low cost IEEE Unibrain camera with resolution at 30 FPS, and using a lens with 80 degree horizontal field of view. Our SLAM framework performed almost in real-time. For the 5952 images in the sequence, the run-time was 465 seconds which corresponds to frame rate of more than 12 FPS. The computation was performed on a desktop computer with an Intel Core 2 Duo processor and an NVIDIA 8800 GT GPU which were used for dense optical flow computation. The trajectory before loop closure is shown in Fig. 2 (a). It consists of 766 keyframes, points and observations. In the back-end, BA with a sliding window
7 (a) before (c) 7 DoF Fig. 2. Keble college data set (b) 6 DoF (d) aerial photo of 10 keyframes was used. A loop closure constraint T loop was successfully calculated and refined using the method described in Sec.IV. A relative scale change of 1 : 0.37 was detected. The large amount of drift can be partially explained by the fact that the visible scene is always very local and that only a rough intrinsic calibration was available. Also, future improvements in our visual odometry implementation could lead to a reduction of drift during exploration. Nevertheless, a certain amount of drift during exploration is unavoidable and our main focus is how to deal with drift when it occurs. A traditional 6 DoF graph such as [12] closes the loop but leaves the scale drift unchanged which leads to a deformed trajectory as shown in Fig. 2 (b). However, if we perform graph using the similarity transform, the result looks significantly better (see Fig. 2 (c)). The graph converged after a single iterations, taking 0.57 seconds. After graph, ten iterations of structureonly bundle adjustment were performed, taking 0.25 seconds. The application of full BA over all 766 keyframes did not lead to a notable further improvement (not shown here). In addition to this real-world experiment, we also performed a series of simulation experiments in full 3D space. A simulated camera was moved on a circular trajectory with radius 10m. The camera is directed outwards. A set of 5000 points was drawn from a ring shaped distribution with radius 11m. The camera trajectory consists of 720 poses. In this simulation environment, our full SLAM system was applied including feature initialisation and fixed-window bundle adjustment with size 10. Only the camera was not simulated and data association was given. The difference between the true trajectory and the estimated one is shown in Fig. 3 (a). In this particular example, we simulated Gaussian image noise with a standard deviation of one pixel. Fig. 3 (b) and (c) show loop closure correction using SE(3) and Sim(3). We varied the amount of image noise from 0 to 1.2 pixels Root mean square error in m (a) before DoF 6 DoF Scale drift Gaussian image noise in pixel (d) whole plot (b) 6 DoF Scale change Root mean square error in m Fig. 3. Simulation experiments 0 (c) 7 DoF Gaussian image noise in pixel (e) close-up and performed 10 simulation runs for each setting. An important result of our experiment is that there is a clear relation between scale drift and image noise (see Fig. 3 (d), thin curve). This indicates the correctness of our characterisation of scale as a parameter in SLAM which drifts during exploration in a similar way to rotation and translation. If we define the first pose as our origin, there is still a scale ambiguity of possible maps. Therefore we define a measure between the corrected poses T cor i and the true poses T true i using the minimum of the sum of square differences over the scale s: st cor M = min s i (ttrue i i ) 2. By dividing M by the number of frames and taking the square root, we obtain the root M mean square error RMSE = 720. The average RMSE over the ten simulation runs is shown in Fig. 3 (d). One can see that Sim(3) (red curve) outperforms SE(3) (green curve) by a large amount, particularly if the scale change is large. But interestingly, even for small scale changes of one to four percent the Sim(3) performs significantly better than SE(3) (see Fig. 3 (e)). Finally we did an experiment to illustrate that our framework naturally extends to multiple loop closures. Imagine an aircraft flying over a sphere and performing monocular SLAM using a downward directed camera (see Fig. 4 (a)). First we perform visual odometry, and afterwards we compute a set of loop closure constraints (shown as blue line segments in Fig. 4 (b)). In this particular example, Sim(3) leads to a small RMSE of only (Fig. 4 (c)), whereas SE(3) results in an RMSE of Videos illustrating the experiments 2 as well as large parts of the source code 3 are available online. VI. CONCLUSIONS We have presented a new monocular SLAM algorithm, based consistently on non-linear, which we be- 2 strasdat/rss2010videos Scale change
8 (a) aircraft over sphere (b) before (c) 7 DoF Fig. 4. Multi-loop-closure example: An aircraft is flying over a sphere and doing SLAM using a single downward-directed camera. lieve takes advantage of the best current knowledge of the hard problem of large scale monocular SLAM to achieve state of the art results. Our algorithm is suitable for sequential operation in large domains, currently at near real-time rates. Importantly, our approach explicitly acknowledges the issue of scale drift at all stages, and offers a practical way to resolve this drift effectively upon loop closures. We show experimentally, both in simulation and using a real outdoor sequence, that a certain amount of scale drift is unavoidable during exploration and this must be taken into account during loop closure to achieve the best results. Also, we have shown that our approach naturally extends to multiple loop closures. BA over all frames and scene points is the gold standard for SfM problems where camera networks are typically strong. However, a lightweight approach of 7 DoF graph- plus structure-only BA seems to be sufficient to accurately close large loops. It is unclear and worth further investigating whether full BA leads to significant better results for the weak camera networks which occur in monocular SLAM. ACKNOWLEDGMENTS This research was supported by the European Research Council Starting Grant , the Spanish MEC Grant DPI and EU FP7-ICT RoboEarth. We are very grateful to colleagues at Imperial College London and the University of Zaragoza for discussions and software collaboration. Special thanks to Giorgio Grisetti and Ethan Eade for the insightful discussions about pose-graph and Lie theory respectively. REFERENCES [1] A. Angeli, D. Filliat, S. Doncieux, and J.-A. Meyer. Fast and incremental method for loop-closure detection using bags of visual words. IEEE Transactions on Robotics (T-RO), 24(5): , [2] D. Chekhlov, M. Pupilli, W. W. Mayol, and A. Calway. Real-time and robust monocular SLAM using predictive multi-resolution descriptors. In Proceedings of the 2nd International Symposium on Visual Computing, [3] J. Civera, O. G. Grasa, A. J. Davison, and J. M. M. Montiel. 1-point RANSAC for EKF-based structure from motion. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), [4] L. A. Clemente, A. J. Davison, I. Reid, J. Neira, and J. D. Tardós. Mapping large loops with a single hand-held camera. In Proceedings of Robotics: Science and Systems (RSS), [5] M. Cummins and P. Newman. Highly scalable appearance-only SLAM FAB-MAP 2.0. In Proceedings of Robotics: Science and Systems (RSS), [6] T. A: Davis. Direct Methods for Sparse Linear Systems. SIAM, [7] A. J. Davison. Real-time simultaneous localisation and mapping with a single camera. In Proceedings of the International Conference on Computer Vision (ICCV), [8] E. Eade. Monocular Simultaneous Localisation and Mapping. PhD thesis, University of Cambridge, [9] E. Eade and T. Drummond. Scalable monocular SLAM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [10] E. Eade and T. Drummond. Monocular SLAM as a graph of coalesced observations. In Proceedings of the International Conference on Computer Vision (ICCV), [11] J. Gallier. Geometric Methods and Applications for Computer Science and Engineering. Springer-Verlag, [12] G. Grisetti, R. Kümmerle, C. Stachniss, U. Frese, and C. Hertzberg. Hierarchical optimization on manifolds for online 2D and 3D mapping. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), [13] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, [14] G. Klein and D. W. Murray. Parallel tracking and mapping for small AR workspaces. In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), [15] G. Klein and D. W. Murray. Parallel tracking and mapping on a camera phone. In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), [16] K. Konolige and M. Agrawal. FrameSLAM: From bundle adjustment to real-time visual mapping. IEEE Transactions on Robotics (T-RO), 24: , [17] R. Mahony and J.H. Manton. The geometry of the Newton method on non-compact Lie groups. Journal of Global Optimization, 23(3): , [18] C. Mei, G. Sibley, M. Cummins, P. Newman, and I. Reid. A constant time efficient stereo SLAM system. In Proceedings of the British Machine Vision Conference (BMVC), [19] M. Montemerlo and S. Thrun. Simultaneous localization and mapping with unknown data association using FastSLAM. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), [20] J. M. M. Montiel, J. Civera, and A. J. Davison. Unified inverse depth parametrization for monocular SLAM. In Proceedings of Robotics: Science and Systems (RSS), [21] E. Mouragnon, M. Lhuillier, M. Dhome, F. Dekeyser, and P. Sayd. Realtime localization and 3D reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [22] D. Nistér. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell., 26(6): , [23] E. Olson, J. J. Leonard, and S. Teller. Fast iterative alignment of pose graphs with poor initial estimates. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), [24] P. Pinies and J. D. Tardós. Large scale SLAM building conditionally independent local maps: Application to monocular vision. IEEE Transactions on Robotics (T-RO), 24(5): , [25] T. Pock, M. Unger, D. Cremers, and H. Bischof. Fast and exact solution of total variation models on the GPU. In Proceedings of the CVPR Workshop on Visual Computer Vision on GPU s, [26] E. Rosten and T. Drummond. Fusing points and lines for high performance tracking. In Proceedings of the International Conference on Computer Vision (ICCV), [27] D. Scaramuzza, F. Fraundorfer, M. Pollefeys, and R. Siegwart. Absolute scale in structure from motion from a single vehicle mounted camera by exploiting nonholonomic constraints. In Proceedings of the International Conference on Computer Vision (ICCV), [28] J. Solà, M. Devy, A. Monin, and T. Lemaire. Undelayed initialization in bearing only SLAM. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), [29] H. Strasdat, J. M. M. Montiel, and A. J. Davison. Real-time monocular SLAM: Why filter? In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), [30] B. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. Bundle adjustment a modern synthesis. In Vision Algorithms: Theory and Practice, volume 1883 of Lecture Notes in Computer Science, pages Springer-Verlag, 2000.
Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.
Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
Metrics on SO(3) and Inverse Kinematics
Mathematical Foundations of Computer Graphics and Vision Metrics on SO(3) and Inverse Kinematics Luca Ballan Institute of Visual Computing Optimization on Manifolds Descent approach d is a ascent direction
Wii Remote Calibration Using the Sensor Bar
Wii Remote Calibration Using the Sensor Bar Alparslan Yildiz Abdullah Akay Yusuf Sinan Akgul GIT Vision Lab - http://vision.gyte.edu.tr Gebze Institute of Technology Kocaeli, Turkey {yildiz, akay, akgul}@bilmuh.gyte.edu.tr
Classifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
Robotics. Lecture 3: Sensors. See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information.
Robotics Lecture 3: Sensors See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review: Locomotion Practical
Removing Moving Objects from Point Cloud Scenes
1 Removing Moving Objects from Point Cloud Scenes Krystof Litomisky [email protected] Abstract. Three-dimensional simultaneous localization and mapping is a topic of significant interest in the research
g 2 o: A General Framework for Graph Optimization
g 2 o: A General Framework for Graph Optimization Rainer Kümmerle Giorgio Grisetti Hauke Strasdat Kurt Konolige Wolfram Burgard Abstract Many popular problems in robotics and computer vision including
Vision based Vehicle Tracking using a high angle camera
Vision based Vehicle Tracking using a high angle camera Raúl Ignacio Ramos García Dule Shu [email protected] [email protected] Abstract A vehicle tracking and grouping algorithm is presented in this work
Static Environment Recognition Using Omni-camera from a Moving Vehicle
Static Environment Recognition Using Omni-camera from a Moving Vehicle Teruko Yata, Chuck Thorpe Frank Dellaert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 USA College of Computing
1-Point RANSAC for EKF Filtering. Application to Real-Time Structure from Motion and Visual Odometry
1-Point RANSAC for EKF Filtering. Application to Real-Time Structure from Motion and Visual Odometry Javier Civera Robotics, Perception and Real-Time Group Universidad de Zaragoza, Spain [email protected]
A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow
, pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
Live Feature Clustering in Video Using Appearance and 3D Geometry
ANGELI, DAVISON: LIVE FEATURE CLUSTERING IN VIDEO 1 Live Feature Clustering in Video Using Appearance and 3D Geometry Adrien Angeli [email protected] Andrew Davison [email protected] Department of
Mean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
3D Scanner using Line Laser. 1. Introduction. 2. Theory
. Introduction 3D Scanner using Line Laser Di Lu Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute The goal of 3D reconstruction is to recover the 3D properties of a geometric
Automatic Labeling of Lane Markings for Autonomous Vehicles
Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 [email protected] 1. Introduction As autonomous vehicles become more popular,
Accurate and robust image superresolution by neural processing of local image representations
Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability
Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller
Flow Separation for Fast and Robust Stereo Odometry
Flow Separation for Fast and Robust Stereo Odometry Michael Kaess, Kai Ni and Frank Dellaert Abstract Separating sparse flow provides fast and robust stereo visual odometry that deals with nearly degenerate
Bayesian Image Super-Resolution
Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
CHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
EXPERIMENTAL EVALUATION OF RELATIVE POSE ESTIMATION ALGORITHMS
EXPERIMENTAL EVALUATION OF RELATIVE POSE ESTIMATION ALGORITHMS Marcel Brückner, Ferid Bajramovic, Joachim Denzler Chair for Computer Vision, Friedrich-Schiller-University Jena, Ernst-Abbe-Platz, 7743 Jena,
Edge tracking for motion segmentation and depth ordering
Edge tracking for motion segmentation and depth ordering P. Smith, T. Drummond and R. Cipolla Department of Engineering University of Cambridge Cambridge CB2 1PZ,UK {pas1001 twd20 cipolla}@eng.cam.ac.uk
MAP ESTIMATION WITH LASER SCANS BASED ON INCREMENTAL TREE NETWORK OPTIMIZER
MAP ESTIMATION WITH LASER SCANS BASED ON INCREMENTAL TREE NETWORK OPTIMIZER Dario Lodi Rizzini 1, Stefano Caselli 1 1 Università degli Studi di Parma Dipartimento di Ingegneria dell Informazione viale
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY
PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY V. Knyaz a, *, Yu. Visilter, S. Zheltov a State Research Institute for Aviation System (GosNIIAS), 7, Victorenko str., Moscow, Russia
Canny Edge Detection
Canny Edge Detection 09gr820 March 23, 2009 1 Introduction The purpose of edge detection in general is to significantly reduce the amount of data in an image, while preserving the structural properties
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson
c 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or
Introduction Epipolar Geometry Calibration Methods Further Readings. Stereo Camera Calibration
Stereo Camera Calibration Stereo Camera Calibration Stereo Camera Calibration Stereo Camera Calibration 12.10.2004 Overview Introduction Summary / Motivation Depth Perception Ambiguity of Correspondence
Oxford Cambridge and RSA Examinations
Oxford Cambridge and RSA Examinations OCR FREE STANDING MATHEMATICS QUALIFICATION (ADVANCED): ADDITIONAL MATHEMATICS 6993 Key Features replaces and (MEI); developed jointly by OCR and MEI; designed for
Feature Tracking and Optical Flow
02/09/12 Feature Tracking and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Many slides adapted from Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Solving Simultaneous Equations and Matrices
Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering
Interactive Computer Graphics
Interactive Computer Graphics Lecture 18 Kinematics and Animation Interactive Graphics Lecture 18: Slide 1 Animation of 3D models In the early days physical models were altered frame by frame to create
Reflection and Refraction
Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,
Tracking of Small Unmanned Aerial Vehicles
Tracking of Small Unmanned Aerial Vehicles Steven Krukowski Adrien Perkins Aeronautics and Astronautics Stanford University Stanford, CA 94305 Email: [email protected] Aeronautics and Astronautics Stanford
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Object Recognition and Template Matching
Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of
The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy
BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.
Real-Time Camera Tracking Using a Particle Filter
Real-Time Camera Tracking Using a Particle Filter Mark Pupilli and Andrew Calway Department of Computer Science University of Bristol, UK {pupilli,andrew}@cs.bris.ac.uk Abstract We describe a particle
Face Model Fitting on Low Resolution Images
Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com
Linear Equations ! 25 30 35$ & " 350 150% & " 11,750 12,750 13,750% MATHEMATICS LEARNING SERVICE Centre for Learning and Professional Development
MathsTrack (NOTE Feb 2013: This is the old version of MathsTrack. New books will be created during 2013 and 2014) Topic 4 Module 9 Introduction Systems of to Matrices Linear Equations Income = Tickets!
Nonlinear Regression:
Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference
Lecture 2: Homogeneous Coordinates, Lines and Conics
Lecture 2: Homogeneous Coordinates, Lines and Conics 1 Homogeneous Coordinates In Lecture 1 we derived the camera equations λx = P X, (1) where x = (x 1, x 2, 1), X = (X 1, X 2, X 3, 1) and P is a 3 4
Simultaneous Gamma Correction and Registration in the Frequency Domain
Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong [email protected] William Bishop [email protected] Department of Electrical and Computer Engineering University
Language Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
Part-Based Recognition
Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem With Unknown Data Association
FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem With Unknown Data Association Michael Montemerlo 11th July 2003 CMU-RI-TR-03-28 The Robotics Institute Carnegie Mellon
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
Tracking in flussi video 3D. Ing. Samuele Salti
Seminari XXIII ciclo Tracking in flussi video 3D Ing. Tutors: Prof. Tullio Salmon Cinotti Prof. Luigi Di Stefano The Tracking problem Detection Object model, Track initiation, Track termination, Tracking
Given a point cloud, polygon, or sampled parametric curve, we can use transformations for several purposes:
3 3.1 2D Given a point cloud, polygon, or sampled parametric curve, we can use transformations for several purposes: 1. Change coordinate frames (world, window, viewport, device, etc). 2. Compose objects
Real-Time SLAM Relocalisation
Real-Time SLAM Relocalisation Brian Williams, Georg Klein and Ian Reid Department of Engineering Science, University of Oxford, UK {bpw,gk,ian}@robots.ox.ac.uk Abstract Monocular SLAM has the potential
MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH
MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH VRVis Research Center for Virtual Reality and Visualization, Virtual Habitat, Inffeldgasse
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
Lecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
Lecture Notes for Chapter 34: Images
Lecture Notes for hapter 4: Images Disclaimer: These notes are not meant to replace the textbook. Please report any inaccuracies to the professor.. Spherical Reflecting Surfaces Bad News: This subject
Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
High-accuracy ultrasound target localization for hand-eye calibration between optical tracking systems and three-dimensional ultrasound
High-accuracy ultrasound target localization for hand-eye calibration between optical tracking systems and three-dimensional ultrasound Ralf Bruder 1, Florian Griese 2, Floris Ernst 1, Achim Schweikard
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
Understanding and Applying Kalman Filtering
Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding
Correlation and Convolution Class Notes for CMSC 426, Fall 2005 David Jacobs
Correlation and Convolution Class otes for CMSC 46, Fall 5 David Jacobs Introduction Correlation and Convolution are basic operations that we will perform to extract information from images. They are in
Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks
Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks Welcome to Thinkwell s Homeschool Precalculus! We re thrilled that you ve decided to make us part of your homeschool curriculum. This lesson
Common Core State Standards for Mathematics Accelerated 7th Grade
A Correlation of 2013 To the to the Introduction This document demonstrates how Mathematics Accelerated Grade 7, 2013, meets the. Correlation references are to the pages within the Student Edition. Meeting
Geometric Optics Converging Lenses and Mirrors Physics Lab IV
Objective Geometric Optics Converging Lenses and Mirrors Physics Lab IV In this set of lab exercises, the basic properties geometric optics concerning converging lenses and mirrors will be explored. The
ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES
ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES B. Sirmacek, R. Lindenbergh Delft University of Technology, Department of Geoscience and Remote Sensing, Stevinweg
Fast Sequential Summation Algorithms Using Augmented Data Structures
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik [email protected] Abstract This paper provides an introduction to the design of augmented data structures that offer
Bags of Binary Words for Fast Place Recognition in Image Sequences
IEEE TRANSACTIONS ON ROBOTICS, VOL., NO., MONTH, YEAR. SHORT PAPER 1 Bags of Binary Words for Fast Place Recognition in Image Sequences Dorian Gálvez-López and Juan D. Tardós, Member, IEEE Abstract We
A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA
A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA N. Zarrinpanjeh a, F. Dadrassjavan b, H. Fattahi c * a Islamic Azad University of Qazvin - [email protected]
Segmentation of building models from dense 3D point-clouds
Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
RGB-D Mapping: Using Kinect-Style Depth Cameras for Dense 3D Modeling of Indoor Environments
RGB-D Mapping: Using Kinect-Style Depth Cameras for Dense 3D Modeling of Indoor Environments Peter Henry 1, Michael Krainin 1, Evan Herbst 1, Xiaofeng Ren 2, Dieter Fox 1 Abstract RGB-D cameras (such as
Moving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the
Towards Using Sparse Bundle Adjustment for Robust Stereo Odometry in Outdoor Terrain
Sünderhauf, N. & Protzel, P. (2006). Towards Using Sparse Bundle Adjustment for Robust Stereo Odometry in Outdoor Terrain. In Proc. of Towards Autonomous Robotic Systems TAROS06, Guildford, UK, pp. 206
3D Tranformations. CS 4620 Lecture 6. Cornell CS4620 Fall 2013 Lecture 6. 2013 Steve Marschner (with previous instructors James/Bala)
3D Tranformations CS 4620 Lecture 6 1 Translation 2 Translation 2 Translation 2 Translation 2 Scaling 3 Scaling 3 Scaling 3 Scaling 3 Rotation about z axis 4 Rotation about z axis 4 Rotation about x axis
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
Incremental Surface Extraction from Sparse Structure-from-Motion Point Clouds
HOPPE, KLOPSCHITZ, DONOSER, BISCHOF: INCREMENTAL SURFACE EXTRACTION 1 Incremental Surface Extraction from Sparse Structure-from-Motion Point Clouds Christof Hoppe 1 [email protected] Manfred Klopschitz
Relating Vanishing Points to Catadioptric Camera Calibration
Relating Vanishing Points to Catadioptric Camera Calibration Wenting Duan* a, Hui Zhang b, Nigel M. Allinson a a Laboratory of Vision Engineering, University of Lincoln, Brayford Pool, Lincoln, U.K. LN6
Introduction to Robotics Analysis, Systems, Applications
Introduction to Robotics Analysis, Systems, Applications Saeed B. Niku Mechanical Engineering Department California Polytechnic State University San Luis Obispo Technische Urw/carsMt Darmstadt FACHBEREfCH
Space Perception and Binocular Vision
Space Perception and Binocular Vision Space Perception Monocular Cues to Three-Dimensional Space Binocular Vision and Stereopsis Combining Depth Cues 9/30/2008 1 Introduction to Space Perception Realism:
CSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
