SVO: Fast Semi-Direct Monocular Visual Odometry

Size: px
Start display at page:

Download "SVO: Fast Semi-Direct Monocular Visual Odometry"


1 SVO: Fast Sem-Drect Monocular Vsual Odometry Chrstan Forster, Mata Pzzol, Davde Scaramuzza Abstract We propose a sem-drect monocular vsual odometry algorthm that s precse, robust, and faster than current state-of-the-art methods. The sem-drect approach elmnates the need of costly feature extracton and robust matchng technques for moton estmaton. Our algorthm operates drectly on pxel ntenstes, whch results n subpxel precson at hgh frame-rates. A probablstc mappng method that explctly models outler measurements s used to estmate 3D ponts, whch results n fewer outlers and more relable ponts. Precse and hgh frame-rate moton estmaton brngs ncreased robustness n scenes of lttle, repettve, and hgh-frequency texture. The algorthm s appled to mcro-aeral-vehcle stateestmaton n GPS-dened envronments and runs at 55 frames per second on the onboard embedded computer and at more than 300 frames per second on a consumer laptop. We call our approach SVO (Sem-drect Vsual Odometry) and release our mplementaton as open-source software. I. INTRODUCTION Mcro Aeral Vehcles (MAVs) wll soon play a major role n dsaster management, ndustral nspecton and envronment conservaton. For such operatons, navgatng based on GPS nformaton only s not suffcent. Precse fully autonomous operaton requres MAVs to rely on alternatve localzaton systems. For mnmal weght and powerconsumpton t was therefore proposed [1] [5] to use only a sngle downward-lookng camera n combnaton wth an Inertal Measurement Unt. Ths setup allowed fully autonomous way-pont followng n outdoor areas [1] [3] and collaboraton between MAVs and ground robots [4], [5]. To our knowledge, all monocular Vsual Odometry (VO) systems for MAVs [1], [2], [6], [7] are featurebased. In RGB-D and stereo-based SLAM systems however, drect methods [8] [11] based on photometrc error mnmzaton are becomng ncreasngly popular. In ths work, we propose a sem-drect VO that combnes the success-factors of feature-based methods (trackng many features, parallel trackng and mappng, keyframe selecton) wth the accurracy and speed of drect methods. Hgh framerate VO for MAVs promses ncreased robustness and faster flght maneuvres. An open-source mplementaton and vdeos of ths work are avalable at: A. Taxonomy of Vsual Moton Estmaton Methods Methods that smultaneously recover camera pose and scene structure from vdeo can be dvded nto two classes: The authors are wth the Robotcs and Percepton Group, Unversty of Zurch, Swtzerland Ths research was supported by the Swss Natonal Scence Foundaton through project number ( Swarm of Flyng Cameras ), the Natonal Centre of Competence n Research Robotcs, and the CTI project number a) Feature-Based Methods: The standard approach s to extract a sparse set of salent mage features (e.g. ponts, lnes) n each mage; match them n successve frames usng nvarant feature descrptors; robustly recover both camera moton and structure usng eppolar geometry; fnally, refne the pose and structure through reprojecton error mnmzaton. The majorty of VO algorthms [12] follows ths procedure, ndependent of the appled optmzaton framework. A reason for the success of these methods s the avalablty of robust feature detectors and descrptors that allow matchng between mages even at large nter-frame movement. The dsadvantage of feature-based approaches s the relance on detecton and matchng thresholds, the neccessty for robust estmaton technques to deal wth wrong correspondences, and the fact that most feature detectors are optmzed for speed rather than precson, such that drft n the moton estmate must be compensated by averagng over many feature-measurements. b) Drect Methods: Drect methods [13] estmate structure and moton drectly from ntensty values n the mage. The local ntensty gradent magntude and drecton s used n the optmsaton compared to feature-based methods that consder only the dstance to some feature-locaton. Drect methods that explot all the nformaton n the mage, even from areas where gradents are small, have been shown to outperform feature-based methods n terms of robustness n scenes wth lttle texture [14] or n the case of cameradefocus and moton blur [15]. The computaton of the photometrc error s more ntensve than the reprojecton error, as t nvolves warpng and ntegratng large mage regons. However, snce drect methods operate drectly on the ntensty values of the mage, the tme for feature detecton and nvarant descrptor computaton can be saved. B. Related Work Most monocular VO algorthms for MAVs [1], [2], [7] rely on PTAM [16]. PTAM s a feature-based SLAM algorthm that acheves robustness through trackng and mappng many (hundreds) of features. Smultaneously, t runs n real-tme by parallelzng the moton estmaton and mappng tasks and by relyng on effcent keyframe-based Bundle Adjustment (BA) [17]. However, PTAM was desgned for augmented realty applcatons n small desktop scenes and multple modfcatons (e.g., lmtng the number of keyframes) were necessary to allow operaton n large-scale outdoor envronments [2]. Early drect monocular SLAM methods tracked and mapped few sometmes manually selected planar patches [18] [21]. Whle the frst approaches [18], [19] used flterng algorthms to estmate structure and moton, later methods

2 [20] [22] used nonlnear least squares optmzaton. All these methods estmate the surface normals of the patches, whch allows trackng a patch over a wde range of vewponts, thus, greatly reducng drft n the estmaton. The authors of [19] [21] reported real-tme performance, however, only wth few selected planar regons and on small datasets. A VO algorthm for omndrectonal cameras on cars was proposed n [22]. In [8], the local planarty assumpton was relaxed and drect trackng wth respect to arbtrary 3D structures computed from stereo cameras was proposed. In [9] [11], the same approach was also appled to RGB-D sensors. Wth DTAM [15], a novel drect method was ntroduced that computes a dense depthmap for each keyframe through mnmsaton of a global, spatally-regularsed energy functonal. The camera pose s found through drect whole mage algnment usng the depth-map. Ths approach s computatonally very ntensve and only possble through heavy GPU parallelzaton. To reduce the computatonal demand, the method descrbed n [23], whch was publshed durng the revew process of ths work, uses only pxels characterzed by strong gradent. C. Contrbutons and Outlne The proposed Sem-Drect Vsual Odometry (SVO) algorthm uses feature-correspondence; however, featurecorrespondence s an mplct result of drect moton estmaton rather than of explct feature extracton and matchng. Thus, feature extracton s only requred when a keyframe s selected to ntalze new 3D ponts (see Fgure 1). The advantage s ncreased speed due to the lack of featureextracton at every frame and ncreased accuracy through subpxel feature correspondence. In contrast to prevous drect methods, we use many (hundreds) of small patches rather than few (tens) large planar patches [18] [21]. Usng many small patches ncreases robustness and allows neglectng the patch normals. The proposed sparse model-based mage algnment algorthm for moton estmaton s related to model-based dense mage algnment [8] [10], [24]. However, we demonstrate that sparse nformaton of depth s suffcent to get a rough estmate of the moton and to fnd featurecorrespondences. As soon as feature correspondences and an ntal estmate of the camera pose are establshed, the algorthm contnues usng only pont-features; hence, the name sem-drect. Ths swtch allows us to rely on fast and establshed frameworks for bundle adjustment (e.g., [25]). A Bayesan flter that explctly models outler measurements s used to estmate the depth at feature locatons. A 3D pont s only nserted n the map when the correspondng depth-flter has converged, whch requres multple measurements. The result s a map wth few outlers and ponts that can be tracked relably. The contrbutons of ths paper are: (1) a novel semdrect VO ppelne that s faster and more accurate than the current state-of-the-art for MAVs, (2) the ntegraton of a probablstc mappng method that s robust to outler measurements. New Image Last Frame Frame Queue Moton Estmaton Thread yes Sparse Model-based Image Algnment Feature Algnment Feature Extracton Intalze Depth-Flters Pose & Structure Refnement Mappng Thread Is Keyframe? no Update Depth-Flters Converged? Fg. 1: Trackng and mappng ppelne Map yes: nsert new Pont Secton II provdes an overvew of the ppelne and Secton III, thereafter, ntroduces some requred notaton. Secton IV and V explan the proposed moton-estmaton and mappng algorthms. Secton VII provdes expermental results and comparsons. II. SYSTEM OVERVIEW Fgure 1 provdes an overvew of SVO. The algorthm uses two parallel threads (as n [16]), one for estmatng the camera moton, and a second one for mappng as the envronment s beng explored. Ths separaton allows fast and constant-tme trackng n one thread, whle the second thread extends the map, decoupled from hard real-tme constrants. The moton estmaton thread mplements the proposed sem-drect approach to relatve-pose estmaton. The frst step s pose ntalsaton through sparse model-based mage algnment: the camera pose relatve to the prevous frame s found through mnmzng the photometrc error between pxels correspondng to the projected locaton of the same 3D ponts (see Fgure 2). The 2D coordnates correspondng to the reprojected ponts are refned n the next step through algnment of the correspondng feature-patches (see Fgure 3). Moton estmaton concludes by refnng the pose and the structure through mnmzng the reprojecton error ntroduced n the prevous feature-algnment step. In the mappng thread, a probablstc depth-flter s ntalzed for each 2D feature for whch the correspondng 3D pont s to be estmated. New depth-flters are ntalsed whenever a new keyframe s selected n regons of the mage where few 3D-to-2D correspondences are found. The flters are ntalsed wth a large uncertanty n depth. At every subsequent frame the depth estmate s updated n a Bayesan fashon (see Fgure 5). When a depth flter s uncertanty becomes small enough, a new 3D pont s nserted n the map and s mmedately used for moton estmaton.

3 III. NOTATION Before the algorthm s detaled, we brefly defne the notaton that s used throughout the paper. The ntensty mage collected at tmestep k s denoted wth I k : Ω R 2 R, where Ω s the mage doman. Any 3D pont p=(x,y,z) S on the vsble scene surface S R 3 maps to the mage coordnates u=(u,v) Ω through the camera projecton model π :R 3 R 2 : u=π( k p), (1) where the prescrpt k denotes that the pont coordnates are expressed n the camera frame of reference k. The projecton π s determned by the ntrnsc camera parameters whch are known from calbraton. The 3D pont correspondng to an mage coordnate u can be recovered, gven the nverse projecton functon π 1 and the depth d u R: kp=π 1 (u,d u ), (2) where R Ω s the doman for whch the depth s known. The camera poston and orentaton at tmestep k s expressed wth the rgd-body transformaton T k,w SE(3). It allows us to map a 3D pont from the world coordnate frame to the camera frame of reference: k p=t k,w wp. The relatve transformaton between two consecutve frames can be computed wth T k,k 1 = T k,w T 1 k 1,w. Durng the optmzaton, we need a mnmal representaton of the transformaton and, therefore, use the Le algebra se(3) correspondng to the tangent space of SE(3) at the dentty. We denote the algebra elements also named twst coordnates wth ξ = (ω,ν) T R 6, where ω s called the angular velocty and ν the lnear velocty. The twst coordnates ξ are mapped to SE(3) by the exponental map [26]: T(ξ)=exp( ˆξ). (3) IV. MOTION ESTIMATION SVO computes an ntal guess of the relatve camera moton and the feature correspondences usng drect methods and concludes wth a feature-based nonlnear reprojectonerror refnement. Each step s detaled n the followng sectons and llustrated n Fgures 2 to 4. A. Sparse Model-based Image Algnment The maxmum lkelhood estmate of the rgd body transformaton T k,k 1 between two consecutve camera poses mnmzes the negatve log-lkelhood of the ntensty resduals: [ T k,k 1 = argmn ρ δi ( T,u )] du. (4) T R The ntensty resdual δ I s defned by the photometrc dfference between pxels observng the same 3D pont. It can be computed by back-projectng a 2D pont u from the prevous mage I k 1 and subsequently projectng t nto the current camera vew: δi ( T,u ) = I k (π ( T π 1 (u,d u ) )) I k 1 (u) u R, (5) I k 1 p 1 u 3 p 2 T k,k 1 I k u 1 u 2 u 1 Fg. 2: Changng the relatve pose T k,k 1 between the current and the prevous frame mplctly moves the poston of the reprojected ponts n the new mage u. Sparse mage algnment seeks to fnd T k,k 1 that mnmzes the photometrc dfference between mage patches correspondng to the same 3D pont (blue squares). Note, n all fgures, the parameters to optmze are drawn n red and the optmzaton cost s hghlghted n blue. I r1 p 1 I r2 p 2 u 4 I k p 3 u 2 u 3 u 1 u 2 p 3 u 3 u 3 u 1 u 2 u 4 Fg. 3: Due to naccuraces n the 3D pont and camera pose estmaton, the photometrc error between correspondng patches (blue squares) n the current frame and prevous keyframes r can further be mnmsed by optmsng the 2D poston of each patch ndvdually. I r2 I k δu 1 δu 2 δu 3 p 3 I r1 δu 4 p 2 w T w,k p 3 p 1 p 3 Fg. 4: In the last moton estmaton step, the camera pose and the structure (3D ponts) are optmzed to mnmze the reprojecton error that has been establshed durng the prevous feature-algnment step. where R s the mage regon for whch the depth d u s known at tme k 1 and for whch the back-projected ponts are vsble n the current mage doman: R= { u u R k 1 π ( T π 1 (u,d u ) ) Ω k }. (6) For the sake of smplcty, we assume n the followng that the ntensty resduals are normally dstrbuted wth unt varance. The negatve log lkelhood mnmzer then corresponds to the least squares problem: ρ[.] ˆ= In practce, the dstrbuton has heaver tals due to occlusons and thus, a robust cost functon must be appled [10]. In contrast to prevous works, where the depth s known for large regons n the mage [8] [10], [24], we only know the depth d u at sparse feature locatons u. We denote small patches of 4 4 pxels around the feature pont wth the vector I(u ). We seek to fnd the camera pose that mnmzes

4 the photometrc error of all patches (see Fgure 2): T k,k 1 = arg mn T k,k δi(t k,k 1,u ) 2. (7) R Snce Equaton (7) s nonlnear n T k,k 1, we solve t n an teratve Gauss-Newton procedure. Gven an estmate of the relatve transformaton ˆT k,k 1, an ncremental update T(ξ) to the estmate can be parametrsed wth a twst ξ se(3). We use the nverse compostonal formulaton [27] of the ntensty resdual, whch computes the update step T(ξ) for the reference mage at tme k 1: δi(ξ,u )=I k ( π ( ˆT k,k 1 p ) ) I k 1 (π ( T(ξ) p ) ), (8) wth p = π 1 (u,d u ). The nverse of the update step s then appled to the current estmate usng Equaton (3): ˆT k,k 1 ˆT k,k 1 T(ξ) 1. (9) Note that we do not warp the patches for computng speedreasons. Ths assumpton s vald n case of small frame-toframe motons and for small patch-szes. To fnd the optmal update step T(ξ), we compute the dervatve of (7) and set t to zero: δi(ξ,u ) δi(ξ,u )=0. (10) R To solve ths system, we lnearze around the current state: δi(ξ,u ) δi(0,u )+ δi(0,u ) ξ (11) The Jacoban J := δi(0,u ) has the dmenson 16 6 because of the 4 4 patch-sze and s computed wth the chan-rule: δi(ξ,u ) ξ = I k 1(a) a a=u π(b) b=p T(ξ) p b ξ ξ=0 By nsertng (11) nto (10) and by stackng the Jacobans n a matrx J, we obtan the normal equatons: J T J ξ = J T δi(0), (12) whch can be solved for the update twst ξ. Note that by usng the nverse compostonal approach, the Jacoban can be precomputed as t remans constant over all teratons (the reference patch I k 1 (u ) and the pont p do not change), whch results n a sgnfcant speedup [27]. B. Relaxaton Through Feature Algnment The last step algned the camera wth respect to the prevous frame. Through back-projecton, the found relatve pose T k,k 1 mplctly defnes an ntal guess for the feature postons of all vsble 3D ponts n the new mage. Due to naccuraces n the 3D ponts postons and, thus, the camera pose, ths ntal guess can be mproved. To reduce the drft, the camera pose should be algned wth respect to the map, rather than to the prevous frame. All 3D ponts of the map that are vsble from the estmated camera pose are projected nto the mage, resultng n an estmate of the correspondng 2D feature postons u (see Fgure 3). For each reprojected pont, the keyframe r that observes the pont wth the closest observaton angle s dentfed. The feature algnment step then optmzes all 2D feature-postons u n the new mage ndvdually by mnmzng the photometrc error of the patch n the current mage wth respect to the reference patch n the keyframe r: u 1 = argmn u 2 I k(u ) A I r (u ) 2,. (13) Ths algnment s solved usng the nverse compostonal Lucas-Kanade algorthm [27]. Contrary to the prevous step, we apply an affne warpng A to the reference patch, snce a larger patch sze s used (8 8 pxels) and the closest keyframe s typcally farther away than the prevous mage. Ths step can be understood as a relaxaton step that volates the eppolar constrants to acheve a hgher correlaton between the feature-patches. C. Pose and Structure Refnement In the prevous step, we have establshed feature correspondence wth subpxel accuracy at the cost of volatng the eppolar constrants. In partcular, we have generated a reprojecton resdual δu = u π(t k,w w p ) 0, whch on average s around 0.3 pxels (see Fgure 11). In ths fnal step, we agan optmze the camera pose T k,w to mnmze the reprojecton resduals (see Fgure 4): 1 T k,w = argmn T k,w 2 u π(t k,w w p ) 2. (14) Ths s the well known problem of moton-only BA [17] and can effcently be solved usng an teratve non-lnear least squares mnmzaton algorthm such as Gauss Newton. Subsequently, we optmze the poston of the observed 3D ponts through reprojecton error mnmzaton (structureonly BA). Fnally, t s possble to apply local BA, n whch both the pose of all close keyframes as well as the observed 3D ponts are jontly optmzed. The BA step s ommtted n the fast parameter settngs of the algorthm (Secton VII). D. Dscusson The frst (Secton IV-A) and the last (Secton IV-C) optmzaton of the algorthm seem to be redundant as both optmze the 6 DoF pose of the camera. Indeed, one could drectly start wth the second step and establsh featurecorrespondence through Lucas-Kanade trackng [27] of all feature-patches, followed by nonlnear pose refnement (Secton IV-C). Whle ths would work, the processng tme would be hgher. Trackng all features over large dstances (e.g., 30 pxels) requres a larger patch and a pyramdal mplementaton. Furthermore, some features mght be tracked naccurately, whch would requre outler detecton. In SVO however, feature algnment s effcently ntalzed by only optmzng sx parameters the camera pose n the sparse mage algnment step. The sparse mage algnment step satsfes mplctly the eppolar constrant and ensures that there are no outlers. One may also argue that the frst step (sparse mage algnment) would be suffcent to estmate the camera moton. In

5 T r,k I r u I k u d mn ˆd d k d max Fg. 5: Probablstc depth estmate dˆ for feature n the reference frame r. The pont at the true depth projects to smlar mage regons n both mages (blue squares). Thus, the depth estmate s updated wth the trangulated depth d k computed from the pont u of hghest correlaton wth the reference patch. The pont of hghest correlaton les always on the eppolar lne n the new mage. fact, ths s what recent algorthms developed for RGB-D cameras do [10], however, by algnng the full depth-map rather than sparse patches. We found emprcally that usng the frst step only results n sgnfcantly more drft compared to usng all three steps together. The mproved accuracy s due to the algnment of the new mage wth respect to the keyframes and the map, whereas sparse mage algnment algns the new frame only wth respect to the prevous frame. V. MAPPING Gven an mage and ts pose {I k,t k,w }, the mappng thread estmates the depth of 2D features for whch the correspondng 3D pont s not yet known. The depth estmate of a feature s modeled wth a probablty dstrbuton. Every subsequent observaton {I k,t k,w } s used to update the dstrbuton n a Bayesan framework (see Fgure 5) as n [28]. When the varance of the dstrbuton becomes small enough, the depth-estmate s converted to a 3D pont usng (2), the pont s nserted n the map and mmedately used for moton estmaton (see Fgure 1). In the followng we report the basc results and our modfcatons to the orgnal mplementaton n [28]. Every depth-flter s assocated to a reference keyframe r. The flter s ntalzed wth a hgh uncertanty n depth and the mean s set to the average scene depth n the reference frame. For every subsequent observaton {I k,t k,w }, we search for a patch on the eppolar lne n the new mage I k that has the hghest correlaton wth the reference patch. The eppolar lne can be computed from the relatve pose between the frames T r,k and the optcal ray that passes through u. The pont of hghest correlaton u corresponds to the depth d k that can be found by trangulaton (see Fgure 5). The measurement d k s modeled wth a Gaussan + Unform mxture model dstrbuton [28]: a good measurement s normally dstrbuted around the true depth d whle an outler measurement arses from a unform dstrbuton n the nterval [d mn,d max ]: p( d d k,ρ )=ρ N ( d k d,τ 2 ) +(1 ρ )U ( d k d mn,d max ), where ρ s the nler probablty and τ 2 the varance of a good measurement that can be computed geometrcally by assumng a photometrc dsparty varance of one pxel n the mage plane [29]. (a) (b) (c) Fg. 6: Very lttle moton s requred by the MAV (seen from the sde at the top) for the uncertanty of the depth-flters (shown as mangenta lnes) to converge. The recursve Bayesan update step for ths model s descrbed n detal n [28]. In contrast to [28], we use nverse depth coordnates to deal wth large scene depths. The proposed depth estmaton s very effcent when only a small range around the current depth estmate on the eppolar lne s searched; n our case the range corresponds to twce the standard devaton of the current depth estmate. Fgure 6 demonstrates how lttle moton s requred to sgnfcantly reduce the uncertanty n depth. The man advantage of the proposed methods over the standard approach of trangulatng ponts from two vews s that we observe far fewer outlers as every flter undergoes many measurements untl convergence. Furthermore, erroneous measurements are explctly modeled, whch allows the depth to converge even n hghly-smlar envronments. In [29] we demonstrate how the same approach can be used for dense mappng. VI. IMPLEMENTATION DETAILS The algorthm s bootstrapped to obtan the pose of the frst two keyframes and the ntal map. Lke n [16], we assume a locally planar scene and estmate a homography. The ntal map s trangulated from the frst two vews. In order to cope wth large motons, we apply the sparse mage algnment algorthm n a coarse-to-fne scheme. The mage s halfsampled to create an mage pyramd of fve levels. The ntensty resdual s then optmzed at the coarsest level untl convergence. Subsequently, the optmzaton s ntalzed at the next fner level. To save processng tme, we stop after convergence on the thrd level, at whch stage the estmate s accurate enough to ntalze feature algnment. The algorthm keeps for effcency reasons a fxed number of keyframes n the map, whch are used as reference for feature-algnment and for structure refnement. A keyframe s selected f the Eucldean dstance of the new frame relatve to all keyframes exceeds 12% of the average scene depth. When a new keyframe s nserted n the map, the keyframe farthest apart from the current poston of the camera s removed. In the mappng thread, we dvde the mage n cells of fxed sze (e.g., pxels). A new depth-flter s ntalzed at the FAST corner [30] wth hghest Sh-Tomas score n the cell unless there s already a 2D-to-3D correspondence present. Ths results n evenly dstrbuted features n the mage. The same grd s also used for reprojectng the map before feature algnment. Note that we extract FAST corners at every level of the mage pyramd to fnd the best corners ndependent of the scale.

6 y [m] Groundtruth Fast PTAM x [m] Fg. 7: Comparson aganst the ground-truth of SVO wth the fast parameter settng (see Table I) and of PTAM. Zoomng-n reveals that the proposed algorthm generates a smoother trajectory than PTAM. VII. EXPERIMENTAL RESULTS Experments were performed on datasets recorded from a downward-lookng camera 1 attached to a MAV and sequences from a handheld camera. The vdeo was processed on both a laptop 2 and on an embedded platform 3 that s mounted on the MAV (see Fgure 17). Note that at maxmum 2 CPU cores are used for the algorthm. The experments on the consumer laptop were run wth two dfferent parameters settngs, one optmsed for speed and one for accuracy (Table I). On the embedded platform only the fast parameters settng s used. Fast Accurate Max number of features per mage Max number of keyframes Local Bundle Adjustment no yes TABLE I: Two dfferent parameter settngs of SVO. We compare the performance of SVO wth the modfed PTAM algorthm of [2]. The reason we do not compare wth the orgnal verson of PTAM [16] s because t does not handle large envronments and s not robust enough n scenes of hgh-frequency texture [2]. The verson of [2] solves these problems and consttutes to our knowledge the best performng monocular SLAM algorthm for MAVs. A. Accuracy We evaluate the accuracy on a dataset that has also been used n [2] and s llustrated n Fgure 7. The ground-truth for the trajectory orgnates from a moton capture system. The trajectory s 84 meters long and the MAV flew on average 1.2 meters above the flat ground. Fgures 8 and 9 llustrate the poston and atttude error over tme. In order to generate the plots, we algned the frst 10 frames wth the ground-truth usng [31]. The results of PTAM are n a smlar range as reported n [2]. Snce the plots are hghly dependent on the accuracy of algnment of the frst 10 frames, we also report the drft n meters 1 Matrx Vson BlueFox, global shutter, pxel resoluton. 2 Intel 7, 8 cores, 2.8 GHz 3 Odrod-U2, ARM Cortex A-9, 4 cores, 1.6 GHz x-error [m] y-error [m] z-error [m] Accurate Fast PTAM tme [s] Fg. 8: Poston drft of SVO wth fast and accurate parameter settng and comparson aganst PTAM. roll-error [rad] ptch-error [rad] yaw-error [rad] Accurate Fast PTAM tme [s] Fg. 9: Atttutde drfts of SVO wth fast and accurate parameter settng and comparson aganst PTAM. scale change [%] reprojecton error [px] 8 6 Accurate Fast PTAM tme [s] Fg. 10: Scale-drft over tme of the trajectory shown n Fgure Intal error Fnal error tme [s] Fg. 11: Average reprojecton error over tme of the trajectory shown n Fgure 7. The ntal error s after sparse mage algnment (Secton IV-A) and the fnal error after pose refnement (Secton IV-C). no. features Accurate Fast tme [s] Fg. 12: Number of tracked features over tme for two dfferent parameter settngs. For the accurate parameter settng, the number of features s lmted to 200 and for the fast settng to 120.

7 Pos-RMSE Pos-Medan Rot-RMSE Rot-Medan [m/s] [m/s] [deg/s] [deg/s] fast accurate PTAM TABLE II: Relatve pose and rotaton error of the trajectory n Fgure 7 per second n Table II as proposed and motvated n [32]. Overall, both versons of SVO are more accurate than PTAM. We suspect the man reason for ths result to orgnate from the fact that the PTAM verson of [2] does not extract features on the pyramd level of hghest resoluton and subpxel refnement s not performed for all features n PTAM. Neglectng the hghest resoluton mage nevtably results n less accuracy whch s clearly vsble n the closeup of Fgure 7. In [2], the use of lower resoluton mages s motvated by the fact that hgh-frequency self-smlar texture n the mage results n too many outler 3D ponts. SVO effcently copes wth ths problem by usng the depth-flters whch results n very few outlers. Snce a camera s only an angle-sensor, t s mpossble to obtan the scale of the map through a Structure from Moton ppelne. Hence, n the above evaluaton we also algn the scale of the frst 10 measurements wth the ground-truth. The proposed ppelne propagates the scale, however wth some drft that s shown n Fgure 10. The scale drft s computed by comparng the eucldean norm of the relatve translaton aganst the ground-truth. The unknown scale and the scale drft motvate the need for a camera-imu state estmaton system for MAV control, as descrbed n [33]. Fgure 11 llustrates the average reprojecton error. The sparse mage algnment step brngs the frame very close to the fnal pose, as the refnement step reduces the error only margnally. The reprojecton error s generated n the feature-algnment step; hence, ths plot also shows that patches move only a fracton of a pxel durng ths step. The dfference n accuracy between the fast and accurate parameter settng s not sgnfcant. Optmzng the pose and the observed 3D ponts separately at every teraton (fast parameter settng) s accurate enough for MAV moton estmaton. B. Runtme Evaluaton Fgures 13 and 14 show a break-up of the tme requred to compute the camera moton on the specfed laptop and embedded platform respectvely wth the fast-parameter settng. The laptop s capable to process the frames faster than 300 frames per second (fps) whle the embedded platform runs at 55 fps. The correspondng tme for PTAM s 91 fps and 27 fps respectvely. The man dfference s that SVO does not requre feature extracton durng moton estmaton whch consttutes the bulk of tme n PTAM (7 ms on the laptop, 16 ms on the embedded computer). Addtonally, PTAM tracks between 160 and 220 features whle n the fast parameter settng, ths value s lmted to 120. The reason why we can relably track the camera wth less features s the use of depth-flters, whch assures that the features beng tracked Pyramd Creaton: 0.06ms Sparse Image Algnment: 0.81ms Feature Algnment: 1.73ms Refnement: 0.16ms Total Moton Estmaton: 3.04ms Processng tme [ms] Fg. 13: Tmng results on a laptop computer. Pyramd Creaton: 0.85ms Sparse Image Algnment: 5.53ms Feature Algnment: 9.37ms Refnement: 0.85ms Total Moton Estmaton: 18.17ms Processng tme [ms] Fg. 14: Tmng results on the embedded platform. Fg. 15: Successful trackng n scenes of hgh-frequency texture. (a) SVO outlers (b) PTAM Fg. 16: Sdevew of a pecewse-planar map created by SVO and PTAM. The proposed method has fewer outlers due to the depth-flter. are relable. Moton estmaton for the accurate parameter settng takes on average 6ms on the laptop. The ncrease n tme s manly due to local BA, whch s run at every keyframe and takes 14ms. The tme requred by the mappng thread to update all depth-flters wth the new frame s hghly dependent on the number of flters. The number of flters s hgh after a keyframe s selected and reduces quckly as flters converge. On average, the mappng thread s faster than the moton estmaton thread, thus t s not a lmtng factor. C. Robustness The speed and accuracy of SVO s partally due to the depth-flter, whch produces only a mnmal number of outler 3D ponts. Also the robustness s due to the depth-

8 Processor Camera Fg. 17: Nano+ by KMel Robotcs, customzed wth embedded processor and downward-lookng camera. SVO runs at 55 frames per second on the platform and s used for stablzaton and control. flter: precse, hgh frame-rate trackng allows the flter to converge even n scenes of repettve and hgh-frequency texture (e.g., asphalt, grass), as t s best demonstrated n the vdeo accompanyng ths paper. Screenshots of the vdeo are shown n Fgure 15. Fgure 16 shows a comparson of the map generated wth PTAM and SVO n the same scene. Whle PTAM generates outler 3D ponts, by contrast SVO has almost no outlers thanks to the use of the depth-flter. VIII. CONCLUSION In ths paper, we proposed the sem-drect VO ppelne SVO that s precse and faster than the current state-of-theart. The gan n speed s due to the fact that feature-extracton and matchng s not requred for moton estmaton. Instead, a drect method s used, whch s based drectly on the mage ntenstes. The algorthm s partcularly useful for stateestmaton onboard MAVs as t runs at more than 50 frames per second on current embedded computers. Hgh framerate moton estmaton, combned wth an outler resstant probablstc mappng method, provdes ncreased robustness n scenes of lttle, repettve, and hgh frequency-texture. REFERENCES [1] M. Blösch, S. Wess, D. Scaramuzza, and R. Segwart, Vson based MAV navgaton n unknown and unstructured envronments, Proc. IEEE Int. Conf. on Robotcs and Automaton, [2] S. Wess, M. W. Achtelk, S. Lynen, M. C. Achtelk, L. Knep, M. Chl, and R. Segwart, Monocular Vson for Long-term Mcro Aeral Vehcle State Estmaton: A Compendum, Journal of Feld Robotcs, vol. 30, no. 5, [3] D. Scaramuzza, M. Achtelk, L. Dotsds, F. Fraundorfer, E. Kosmatopoulos, A. Martnell, M. Achtelk, M. Chl, S. Chatzchrstofs, L. Knep, D. Gurdan, L. Heng, G. Lee, S. Lynen, L. Meer, M. Pollefeys, A. Renzagla, R. Segwart, J. Stumpf, P. Tanskanen, C. Troan, and S. Wess, Vson-Controlled Mcro Flyng Robots: from System Desgn to Autonomous Navgaton and Mappng n GPS-dened Envronments, IEEE Robotcs and Automaton Magazne, [4] C. Forster, S. Lynen, L. Knep, and D. Scaramuzza, Collaboratve Monocular SLAM wth Multple Mcro Aeral Vehcles, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [5] C. Forster, M. Pzzol, and D. Scaramuzza, Ar-Ground Localzaton and Map Augmentaton Usng Monocular Dense Reconstructon, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [6] L. Knep, M. Chl, and R. Segwart, Robust Real-Tme Vsual Odometry wth a Sngle Camera and an IMU, Proc. Brtsh Machne Vson Conference, [7] J. Engel, J. Sturm, and D. Cremers, Accurate Fgure Flyng wth a Quadrocopter Usng Onboard Vsual and Inertal Sensng, n Proc. VCoMoR Workshop at IEEE/RJS IROS, [8] A. Comport, E. Mals, and P. Rves, Real-tme Quadrfocal Vsual Odometry, The Internatonal Journal of Robotcs Research, vol. 29, no. 2-3, pp , Jan [9] T. Tykkälä, C. Audras, and A. I. Comport, Drect Iteratve Closest Pont for Real-tme Vsual Odometry, n Int. Conf. on Computer Vson, [10] C. Kerl, J. Sturm, and D. Cremers, Robust Odometry Estmaton for RGB-D Cameras, n Proc. IEEE Int. Conf. on Robotcs and Automaton, [11] M. Melland and A. I. Comport, On unfyng key-frame and voxelbased dense vsual SLAM at large scales, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [12] D. Scaramuzza and F. Fraundorfer, Vsual Odometry, Part I: The Frst 30 Years and Fundamentals [Tutoral], IEEE RAM, [13] M. Iran and P. Anandan, All About Drect Methods, n Proc. Workshop Vs. Algorthms: Theory Pract., 1999, pp [14] S. Lovegrove, A. J. Davson, and J. Ibanez-Guzman, Accurate vsual odometry from a rear parkng camera, n Intellgent Vehcle, IEEE Symposum, [15] R. a. Newcombe, S. J. Lovegrove, and A. J. Davson, DTAM: Dense Trackng and Mappng n Real-Tme, IEEE Int. Conf. on Computer Vson, pp , Nov [16] G. Klen and D. Murray, Parallel Trackng and Mappng for Small AR Workspaces, IEEE and ACM Internatonal Symposum on Mxed and Augmented Realty, pp. 1 10, Nov [17] H. Strasdat, J. M. M. Montel, and A. J. Davson, Real-tme Monocular SLAM: Why Flter? Proc. IEEE Int. Conf. on Robotcs and Automaton, pp , [18] H. Jn, P. Favaro, and S. Soatto, A sem-drect approach to structure from moton, The Vsual Computer, vol. 19, no. 6, pp , [19] N. D. Molton, A. J. Davson, and I. Red, Locally Planar Patch Features for Real-Tme Structure from Moton, n Proc. Brtsh Machne Vson Conference, [20] G. Slvera, E. Mals, and P. Rves, An Effcent Drect Approach to Vsual SLAM, IEEE Transactons on Robotcs, [21] C. Me, S. Benhmane, E. Mals, and P. Rves, Effcent Homographybased Trackng and 3-D Reconstructon for Sngle Vewpont Sensors, IEEE Transactons on Robotcs, vol. 24, no. 6, pp , [22] A. Pretto, E. Menegatt, and E. Pagello, Omndrectonal Dense Large-Scale Mappng and Navgaton Based on Meanngful Trangulaton, n Proc. IEEE Int. Conf. on Robotcs and Automaton, [23] J. Engel, J. Sturm, and D. Cremers, Sem-Dense Vsual Odometry for a Monocular Camera, n Proc. IEEE Int. Conf. on Computer Vson. [24] S. Benhmane and E. Mals, Integraton of Eucldean constrants n template based vsual trackng of pecewse-planar scenes, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [25] R. Kümmerle, G. Grsett, and K. Konolge, g2o: A General Framework for Graph Optmzaton, Proc. IEEE Int. Conf. on Robotcs and Automaton, [26] Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry, An Invtaton to 3-D Vson: From Images to Geometrc Models. Sprnger Verlag, [27] S. Baker and I. Matthews, Lucas-Kanade 20 Years On: A Unfyng Framework: Part 1, Internatonal Journal of Computer Vson, vol. 56, no. 3, pp , [28] G. Vogatzs and C. Hernández, Vdeo-based, Real-Tme Mult Vew Stereo, Image and Vson Computng, vol. 29, no. 7, [29] M. Pzzol, C. Forster, and D. Scaramuzza, REMODE: Probablstc, Monocular Dense Reconstructon n Real Tme, n Proc. IEEE Int. Conf. on Robotcs and Automaton, [30] E. Rosten, R. Porter, and T. Drummond, FASTER and better: A machne learnng approach to corner detecton, IEEE Trans. Pattern Analyss and Machne Intellgence, vol. 32, pp , [31] S. Umeyama, Least-Squares Estmaton of Transformaton Parameters Between Two Pont Patterns, IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 4, [32] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, A Benchmark for the Evaluaton of RGB-D SLAM Systems, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [33] S. Lynen, M. W. Achtelk, S. Wess, M. Chl, and R. Segwart, A Robust and Modular Mult-Sensor Fuson Approach Appled to MAV Navgaton, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, 2013.

Algebraic Point Set Surfaces

Algebraic Point Set Surfaces Algebrac Pont Set Surfaces Gae l Guennebaud Markus Gross ETH Zurch Fgure : Illustraton of the central features of our algebrac MLS framework From left to rght: effcent handlng of very complex pont sets,

More information

As-Rigid-As-Possible Image Registration for Hand-drawn Cartoon Animations

As-Rigid-As-Possible Image Registration for Hand-drawn Cartoon Animations As-Rgd-As-Possble Image Regstraton for Hand-drawn Cartoon Anmatons Danel Sýkora Trnty College Dubln John Dnglana Trnty College Dubln Steven Collns Trnty College Dubln source target our approach [Papenberg

More information

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Dropout: A Simple Way to Prevent Neural Networks from Overfitting Journal of Machne Learnng Research 15 (2014) 1929-1958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever

More information

Face Alignment through Subspace Constrained Mean-Shifts

Face Alignment through Subspace Constrained Mean-Shifts Face Algnment through Subspace Constraned Mean-Shfts Jason M. Saragh, Smon Lucey, Jeffrey F. Cohn The Robotcs Insttute, Carnege Mellon Unversty Pttsburgh, PA 15213, USA {jsaragh,slucey,jeffcohn}

More information

Sequential DOE via dynamic programming

Sequential DOE via dynamic programming IIE Transactons (00) 34, 1087 1100 Sequental DOE va dynamc programmng IRAD BEN-GAL 1 and MICHAEL CARAMANIS 1 Department of Industral Engneerng, Tel Avv Unversty, Ramat Avv, Tel Avv 69978, Israel E-mal:

More information

MANY of the problems that arise in early vision can be

MANY of the problems that arise in early vision can be IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004 147 What Energy Functons Can Be Mnmzed va Graph Cuts? Vladmr Kolmogorov, Member, IEEE, and Ramn Zabh, Member,

More information

Who are you with and Where are you going?

Who are you with and Where are you going? Who are you wth and Where are you gong? Kota Yamaguch Alexander C. Berg Lus E. Ortz Tamara L. Berg Stony Brook Unversty Stony Brook Unversty, NY 11794, USA {kyamagu, aberg, leortz, tlberg}

More information



More information

As-Rigid-As-Possible Shape Manipulation

As-Rigid-As-Possible Shape Manipulation As-Rgd-As-Possble Shape Manpulaton akeo Igarash 1, 3 omer Moscovch John F. Hughes 1 he Unversty of okyo Brown Unversty 3 PRESO, JS Abstract We present an nteractve system that lets a user move and deform

More information

Boosting as a Regularized Path to a Maximum Margin Classifier

Boosting as a Regularized Path to a Maximum Margin Classifier Journal of Machne Learnng Research 5 (2004) 941 973 Submtted 5/03; Revsed 10/03; Publshed 8/04 Boostng as a Regularzed Path to a Maxmum Margn Classfer Saharon Rosset Data Analytcs Research Group IBM T.J.

More information

Documentation for the TIMES Model PART I

Documentation for the TIMES Model PART I Energy Technology Systems Analyss Programme Documentaton for the TIMES Model PART I Aprl 2005 Authors: Rchard Loulou Uwe Remne Amt Kanuda Antt Lehtla Gary Goldsten 1 General

More information

(Almost) No Label No Cry

(Almost) No Label No Cry (Almost) No Label No Cry Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa {namesurname}@anueduau

More information

Turbulence Models and Their Application to Complex Flows R. H. Nichols University of Alabama at Birmingham

Turbulence Models and Their Application to Complex Flows R. H. Nichols University of Alabama at Birmingham Turbulence Models and Ther Applcaton to Complex Flows R. H. Nchols Unversty of Alabama at Brmngham Revson 4.01 CONTENTS Page 1.0 Introducton 1.1 An Introducton to Turbulent Flow 1-1 1. Transton to Turbulent

More information

Effect of a spectrum of relaxation times on the capillary thinning of a filament of elastic liquid

Effect of a spectrum of relaxation times on the capillary thinning of a filament of elastic liquid J. Non-Newtonan Flud Mech., 72 (1997) 31 53 Effect of a spectrum of relaxaton tmes on the capllary thnnng of a flament of elastc lqud V.M. Entov a, E.J. Hnch b, * a Laboratory of Appled Contnuum Mechancs,

More information

Stable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation

Stable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation Stable Dstrbutons, Pseudorandom Generators, Embeddngs, and Data Stream Computaton PIOTR INDYK MIT, Cambrdge, Massachusetts Abstract. In ths artcle, we show several results obtaned by combnng the use of

More information

TrueSkill Through Time: Revisiting the History of Chess

TrueSkill Through Time: Revisiting the History of Chess TrueSkll Through Tme: Revstng the Hstory of Chess Perre Dangauther INRIA Rhone Alpes Grenoble, France Ralf Herbrch Mcrosoft Research Ltd. Cambrdge, UK Tom Mnka

More information

Support vector domain description

Support vector domain description Pattern Recognton Letters 20 (1999) 1191±1199 Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty

More information

Ensembling Neural Networks: Many Could Be Better Than All

Ensembling Neural Networks: Many Could Be Better Than All Artfcal Intellgence, 22, vol.37, no.-2, pp.239-263. @Elsever Ensemblng eural etworks: Many Could Be Better Than All Zh-Hua Zhou*, Janxn Wu, We Tang atonal Laboratory for ovel Software Technology, anng

More information

Complete Fairness in Secure Two-Party Computation

Complete Fairness in Secure Two-Party Computation Complete Farness n Secure Two-Party Computaton S. Dov Gordon Carmt Hazay Jonathan Katz Yehuda Lndell Abstract In the settng of secure two-party computaton, two mutually dstrustng partes wsh to compute

More information

Do Firms Maximize? Evidence from Professional Football

Do Firms Maximize? Evidence from Professional Football Do Frms Maxmze? Evdence from Professonal Football Davd Romer Unversty of Calforna, Berkeley and Natonal Bureau of Economc Research Ths paper examnes a sngle, narrow decson the choce on fourth down n the

More information

Assessing health efficiency across countries with a two-step and bootstrap analysis *

Assessing health efficiency across countries with a two-step and bootstrap analysis * Assessng health effcency across countres wth a two-step and bootstrap analyss * Antóno Afonso # $ and Mguel St. Aubyn # February 2007 Abstract We estmate a sem-parametrc model of health producton process

More information

The Relationship between Exchange Rates and Stock Prices: Studied in a Multivariate Model Desislava Dimitrova, The College of Wooster

The Relationship between Exchange Rates and Stock Prices: Studied in a Multivariate Model Desislava Dimitrova, The College of Wooster Issues n Poltcal Economy, Vol. 4, August 005 The Relatonshp between Exchange Rates and Stock Prces: Studed n a Multvarate Model Desslava Dmtrova, The College of Wooster In the perod November 00 to February

More information

From Computing with Numbers to Computing with Words From Manipulation of Measurements to Manipulation of Perceptions

From Computing with Numbers to Computing with Words From Manipulation of Measurements to Manipulation of Perceptions IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 45, NO. 1, JANUARY 1999 105 From Computng wth Numbers to Computng wth Words From Manpulaton of Measurements to Manpulaton

More information

The Global Macroeconomic Costs of Raising Bank Capital Adequacy Requirements

The Global Macroeconomic Costs of Raising Bank Capital Adequacy Requirements W/1/44 The Global Macroeconomc Costs of Rasng Bank Captal Adequacy Requrements Scott Roger and Francs Vtek 01 Internatonal Monetary Fund W/1/44 IMF Workng aper IMF Offces n Europe Monetary and Captal Markets

More information

DISCUSSION PAPER. Should Urban Transit Subsidies Be Reduced? Ian W.H. Parry and Kenneth A. Small

DISCUSSION PAPER. Should Urban Transit Subsidies Be Reduced? Ian W.H. Parry and Kenneth A. Small DISCUSSION PAPER JULY 2007 RFF DP 07-38 Should Urban Transt Subsdes Be Reduced? Ian W.H. Parry and Kenneth A. Small 1616 P St. NW Washngton, DC 20036 202-328-5000 Should Urban Transt Subsdes

More information

The Developing World Is Poorer Than We Thought, But No Less Successful in the Fight against Poverty

The Developing World Is Poorer Than We Thought, But No Less Successful in the Fight against Poverty Publc Dsclosure Authorzed Pol c y Re s e a rc h Wo r k n g Pa p e r 4703 WPS4703 Publc Dsclosure Authorzed Publc Dsclosure Authorzed The Developng World Is Poorer Than We Thought, But No Less Successful

More information

can basic entrepreneurship transform the economic lives of the poor?

can basic entrepreneurship transform the economic lives of the poor? can basc entrepreneurshp transform the economc lves of the poor? Orana Bandera, Robn Burgess, Narayan Das, Selm Gulesc, Imran Rasul, Munsh Sulaman Aprl 2013 Abstract The world s poorest people lack captal

More information

Alpha if Deleted and Loss in Criterion Validity 1. Appeared in British Journal of Mathematical and Statistical Psychology, 2008, 61, 275-285

Alpha if Deleted and Loss in Criterion Validity 1. Appeared in British Journal of Mathematical and Statistical Psychology, 2008, 61, 275-285 Alpha f Deleted and Loss n Crteron Valdty Appeared n Brtsh Journal of Mathematcal and Statstcal Psychology, 2008, 6, 275-285 Alpha f Item Deleted: A Note on Crteron Valdty Loss n Scale Revson f Maxmsng

More information

Why Don t We See Poverty Convergence?

Why Don t We See Poverty Convergence? Why Don t We See Poverty Convergence? Martn Ravallon 1 Development Research Group, World Bank 1818 H Street NW, Washngton DC, 20433, USA Abstract: We see sgns of convergence n average lvng standards amongst

More information



More information