SVO: Fast SemiDirect Monocular Visual Odometry


 Lenard Mills
 1 years ago
 Views:
Transcription
1 SVO: Fast SemDrect Monocular Vsual Odometry Chrstan Forster, Mata Pzzol, Davde Scaramuzza Abstract We propose a semdrect monocular vsual odometry algorthm that s precse, robust, and faster than current stateoftheart methods. The semdrect approach elmnates the need of costly feature extracton and robust matchng technques for moton estmaton. Our algorthm operates drectly on pxel ntenstes, whch results n subpxel precson at hgh framerates. A probablstc mappng method that explctly models outler measurements s used to estmate 3D ponts, whch results n fewer outlers and more relable ponts. Precse and hgh framerate moton estmaton brngs ncreased robustness n scenes of lttle, repettve, and hghfrequency texture. The algorthm s appled to mcroaeralvehcle stateestmaton n GPSdened envronments and runs at 55 frames per second on the onboard embedded computer and at more than 300 frames per second on a consumer laptop. We call our approach SVO (Semdrect Vsual Odometry) and release our mplementaton as opensource software. I. INTRODUCTION Mcro Aeral Vehcles (MAVs) wll soon play a major role n dsaster management, ndustral nspecton and envronment conservaton. For such operatons, navgatng based on GPS nformaton only s not suffcent. Precse fully autonomous operaton requres MAVs to rely on alternatve localzaton systems. For mnmal weght and powerconsumpton t was therefore proposed [1] [5] to use only a sngle downwardlookng camera n combnaton wth an Inertal Measurement Unt. Ths setup allowed fully autonomous waypont followng n outdoor areas [1] [3] and collaboraton between MAVs and ground robots [4], [5]. To our knowledge, all monocular Vsual Odometry (VO) systems for MAVs [1], [2], [6], [7] are featurebased. In RGBD and stereobased SLAM systems however, drect methods [8] [11] based on photometrc error mnmzaton are becomng ncreasngly popular. In ths work, we propose a semdrect VO that combnes the successfactors of featurebased methods (trackng many features, parallel trackng and mappng, keyframe selecton) wth the accurracy and speed of drect methods. Hgh framerate VO for MAVs promses ncreased robustness and faster flght maneuvres. An opensource mplementaton and vdeos of ths work are avalable at: A. Taxonomy of Vsual Moton Estmaton Methods Methods that smultaneously recover camera pose and scene structure from vdeo can be dvded nto two classes: The authors are wth the Robotcs and Percepton Group, Unversty of Zurch, Swtzerland Ths research was supported by the Swss Natonal Scence Foundaton through project number ( Swarm of Flyng Cameras ), the Natonal Centre of Competence n Research Robotcs, and the CTI project number a) FeatureBased Methods: The standard approach s to extract a sparse set of salent mage features (e.g. ponts, lnes) n each mage; match them n successve frames usng nvarant feature descrptors; robustly recover both camera moton and structure usng eppolar geometry; fnally, refne the pose and structure through reprojecton error mnmzaton. The majorty of VO algorthms [12] follows ths procedure, ndependent of the appled optmzaton framework. A reason for the success of these methods s the avalablty of robust feature detectors and descrptors that allow matchng between mages even at large nterframe movement. The dsadvantage of featurebased approaches s the relance on detecton and matchng thresholds, the neccessty for robust estmaton technques to deal wth wrong correspondences, and the fact that most feature detectors are optmzed for speed rather than precson, such that drft n the moton estmate must be compensated by averagng over many featuremeasurements. b) Drect Methods: Drect methods [13] estmate structure and moton drectly from ntensty values n the mage. The local ntensty gradent magntude and drecton s used n the optmsaton compared to featurebased methods that consder only the dstance to some featurelocaton. Drect methods that explot all the nformaton n the mage, even from areas where gradents are small, have been shown to outperform featurebased methods n terms of robustness n scenes wth lttle texture [14] or n the case of cameradefocus and moton blur [15]. The computaton of the photometrc error s more ntensve than the reprojecton error, as t nvolves warpng and ntegratng large mage regons. However, snce drect methods operate drectly on the ntensty values of the mage, the tme for feature detecton and nvarant descrptor computaton can be saved. B. Related Work Most monocular VO algorthms for MAVs [1], [2], [7] rely on PTAM [16]. PTAM s a featurebased SLAM algorthm that acheves robustness through trackng and mappng many (hundreds) of features. Smultaneously, t runs n realtme by parallelzng the moton estmaton and mappng tasks and by relyng on effcent keyframebased Bundle Adjustment (BA) [17]. However, PTAM was desgned for augmented realty applcatons n small desktop scenes and multple modfcatons (e.g., lmtng the number of keyframes) were necessary to allow operaton n largescale outdoor envronments [2]. Early drect monocular SLAM methods tracked and mapped few sometmes manually selected planar patches [18] [21]. Whle the frst approaches [18], [19] used flterng algorthms to estmate structure and moton, later methods
2 [20] [22] used nonlnear least squares optmzaton. All these methods estmate the surface normals of the patches, whch allows trackng a patch over a wde range of vewponts, thus, greatly reducng drft n the estmaton. The authors of [19] [21] reported realtme performance, however, only wth few selected planar regons and on small datasets. A VO algorthm for omndrectonal cameras on cars was proposed n [22]. In [8], the local planarty assumpton was relaxed and drect trackng wth respect to arbtrary 3D structures computed from stereo cameras was proposed. In [9] [11], the same approach was also appled to RGBD sensors. Wth DTAM [15], a novel drect method was ntroduced that computes a dense depthmap for each keyframe through mnmsaton of a global, spatallyregularsed energy functonal. The camera pose s found through drect whole mage algnment usng the depthmap. Ths approach s computatonally very ntensve and only possble through heavy GPU parallelzaton. To reduce the computatonal demand, the method descrbed n [23], whch was publshed durng the revew process of ths work, uses only pxels characterzed by strong gradent. C. Contrbutons and Outlne The proposed SemDrect Vsual Odometry (SVO) algorthm uses featurecorrespondence; however, featurecorrespondence s an mplct result of drect moton estmaton rather than of explct feature extracton and matchng. Thus, feature extracton s only requred when a keyframe s selected to ntalze new 3D ponts (see Fgure 1). The advantage s ncreased speed due to the lack of featureextracton at every frame and ncreased accuracy through subpxel feature correspondence. In contrast to prevous drect methods, we use many (hundreds) of small patches rather than few (tens) large planar patches [18] [21]. Usng many small patches ncreases robustness and allows neglectng the patch normals. The proposed sparse modelbased mage algnment algorthm for moton estmaton s related to modelbased dense mage algnment [8] [10], [24]. However, we demonstrate that sparse nformaton of depth s suffcent to get a rough estmate of the moton and to fnd featurecorrespondences. As soon as feature correspondences and an ntal estmate of the camera pose are establshed, the algorthm contnues usng only pontfeatures; hence, the name semdrect. Ths swtch allows us to rely on fast and establshed frameworks for bundle adjustment (e.g., [25]). A Bayesan flter that explctly models outler measurements s used to estmate the depth at feature locatons. A 3D pont s only nserted n the map when the correspondng depthflter has converged, whch requres multple measurements. The result s a map wth few outlers and ponts that can be tracked relably. The contrbutons of ths paper are: (1) a novel semdrect VO ppelne that s faster and more accurate than the current stateoftheart for MAVs, (2) the ntegraton of a probablstc mappng method that s robust to outler measurements. New Image Last Frame Frame Queue Moton Estmaton Thread yes Sparse Modelbased Image Algnment Feature Algnment Feature Extracton Intalze DepthFlters Pose & Structure Refnement Mappng Thread Is Keyframe? no Update DepthFlters Converged? Fg. 1: Trackng and mappng ppelne Map yes: nsert new Pont Secton II provdes an overvew of the ppelne and Secton III, thereafter, ntroduces some requred notaton. Secton IV and V explan the proposed motonestmaton and mappng algorthms. Secton VII provdes expermental results and comparsons. II. SYSTEM OVERVIEW Fgure 1 provdes an overvew of SVO. The algorthm uses two parallel threads (as n [16]), one for estmatng the camera moton, and a second one for mappng as the envronment s beng explored. Ths separaton allows fast and constanttme trackng n one thread, whle the second thread extends the map, decoupled from hard realtme constrants. The moton estmaton thread mplements the proposed semdrect approach to relatvepose estmaton. The frst step s pose ntalsaton through sparse modelbased mage algnment: the camera pose relatve to the prevous frame s found through mnmzng the photometrc error between pxels correspondng to the projected locaton of the same 3D ponts (see Fgure 2). The 2D coordnates correspondng to the reprojected ponts are refned n the next step through algnment of the correspondng featurepatches (see Fgure 3). Moton estmaton concludes by refnng the pose and the structure through mnmzng the reprojecton error ntroduced n the prevous featurealgnment step. In the mappng thread, a probablstc depthflter s ntalzed for each 2D feature for whch the correspondng 3D pont s to be estmated. New depthflters are ntalsed whenever a new keyframe s selected n regons of the mage where few 3Dto2D correspondences are found. The flters are ntalsed wth a large uncertanty n depth. At every subsequent frame the depth estmate s updated n a Bayesan fashon (see Fgure 5). When a depth flter s uncertanty becomes small enough, a new 3D pont s nserted n the map and s mmedately used for moton estmaton.
3 III. NOTATION Before the algorthm s detaled, we brefly defne the notaton that s used throughout the paper. The ntensty mage collected at tmestep k s denoted wth I k : Ω R 2 R, where Ω s the mage doman. Any 3D pont p=(x,y,z) S on the vsble scene surface S R 3 maps to the mage coordnates u=(u,v) Ω through the camera projecton model π :R 3 R 2 : u=π( k p), (1) where the prescrpt k denotes that the pont coordnates are expressed n the camera frame of reference k. The projecton π s determned by the ntrnsc camera parameters whch are known from calbraton. The 3D pont correspondng to an mage coordnate u can be recovered, gven the nverse projecton functon π 1 and the depth d u R: kp=π 1 (u,d u ), (2) where R Ω s the doman for whch the depth s known. The camera poston and orentaton at tmestep k s expressed wth the rgdbody transformaton T k,w SE(3). It allows us to map a 3D pont from the world coordnate frame to the camera frame of reference: k p=t k,w wp. The relatve transformaton between two consecutve frames can be computed wth T k,k 1 = T k,w T 1 k 1,w. Durng the optmzaton, we need a mnmal representaton of the transformaton and, therefore, use the Le algebra se(3) correspondng to the tangent space of SE(3) at the dentty. We denote the algebra elements also named twst coordnates wth ξ = (ω,ν) T R 6, where ω s called the angular velocty and ν the lnear velocty. The twst coordnates ξ are mapped to SE(3) by the exponental map [26]: T(ξ)=exp( ˆξ). (3) IV. MOTION ESTIMATION SVO computes an ntal guess of the relatve camera moton and the feature correspondences usng drect methods and concludes wth a featurebased nonlnear reprojectonerror refnement. Each step s detaled n the followng sectons and llustrated n Fgures 2 to 4. A. Sparse Modelbased Image Algnment The maxmum lkelhood estmate of the rgd body transformaton T k,k 1 between two consecutve camera poses mnmzes the negatve loglkelhood of the ntensty resduals: [ T k,k 1 = argmn ρ δi ( T,u )] du. (4) T R The ntensty resdual δ I s defned by the photometrc dfference between pxels observng the same 3D pont. It can be computed by backprojectng a 2D pont u from the prevous mage I k 1 and subsequently projectng t nto the current camera vew: δi ( T,u ) = I k (π ( T π 1 (u,d u ) )) I k 1 (u) u R, (5) I k 1 p 1 u 3 p 2 T k,k 1 I k u 1 u 2 u 1 Fg. 2: Changng the relatve pose T k,k 1 between the current and the prevous frame mplctly moves the poston of the reprojected ponts n the new mage u. Sparse mage algnment seeks to fnd T k,k 1 that mnmzes the photometrc dfference between mage patches correspondng to the same 3D pont (blue squares). Note, n all fgures, the parameters to optmze are drawn n red and the optmzaton cost s hghlghted n blue. I r1 p 1 I r2 p 2 u 4 I k p 3 u 2 u 3 u 1 u 2 p 3 u 3 u 3 u 1 u 2 u 4 Fg. 3: Due to naccuraces n the 3D pont and camera pose estmaton, the photometrc error between correspondng patches (blue squares) n the current frame and prevous keyframes r can further be mnmsed by optmsng the 2D poston of each patch ndvdually. I r2 I k δu 1 δu 2 δu 3 p 3 I r1 δu 4 p 2 w T w,k p 3 p 1 p 3 Fg. 4: In the last moton estmaton step, the camera pose and the structure (3D ponts) are optmzed to mnmze the reprojecton error that has been establshed durng the prevous featurealgnment step. where R s the mage regon for whch the depth d u s known at tme k 1 and for whch the backprojected ponts are vsble n the current mage doman: R= { u u R k 1 π ( T π 1 (u,d u ) ) Ω k }. (6) For the sake of smplcty, we assume n the followng that the ntensty resduals are normally dstrbuted wth unt varance. The negatve log lkelhood mnmzer then corresponds to the least squares problem: ρ[.] ˆ= In practce, the dstrbuton has heaver tals due to occlusons and thus, a robust cost functon must be appled [10]. In contrast to prevous works, where the depth s known for large regons n the mage [8] [10], [24], we only know the depth d u at sparse feature locatons u. We denote small patches of 4 4 pxels around the feature pont wth the vector I(u ). We seek to fnd the camera pose that mnmzes
4 the photometrc error of all patches (see Fgure 2): T k,k 1 = arg mn T k,k δi(t k,k 1,u ) 2. (7) R Snce Equaton (7) s nonlnear n T k,k 1, we solve t n an teratve GaussNewton procedure. Gven an estmate of the relatve transformaton ˆT k,k 1, an ncremental update T(ξ) to the estmate can be parametrsed wth a twst ξ se(3). We use the nverse compostonal formulaton [27] of the ntensty resdual, whch computes the update step T(ξ) for the reference mage at tme k 1: δi(ξ,u )=I k ( π ( ˆT k,k 1 p ) ) I k 1 (π ( T(ξ) p ) ), (8) wth p = π 1 (u,d u ). The nverse of the update step s then appled to the current estmate usng Equaton (3): ˆT k,k 1 ˆT k,k 1 T(ξ) 1. (9) Note that we do not warp the patches for computng speedreasons. Ths assumpton s vald n case of small frametoframe motons and for small patchszes. To fnd the optmal update step T(ξ), we compute the dervatve of (7) and set t to zero: δi(ξ,u ) δi(ξ,u )=0. (10) R To solve ths system, we lnearze around the current state: δi(ξ,u ) δi(0,u )+ δi(0,u ) ξ (11) The Jacoban J := δi(0,u ) has the dmenson 16 6 because of the 4 4 patchsze and s computed wth the chanrule: δi(ξ,u ) ξ = I k 1(a) a a=u π(b) b=p T(ξ) p b ξ ξ=0 By nsertng (11) nto (10) and by stackng the Jacobans n a matrx J, we obtan the normal equatons: J T J ξ = J T δi(0), (12) whch can be solved for the update twst ξ. Note that by usng the nverse compostonal approach, the Jacoban can be precomputed as t remans constant over all teratons (the reference patch I k 1 (u ) and the pont p do not change), whch results n a sgnfcant speedup [27]. B. Relaxaton Through Feature Algnment The last step algned the camera wth respect to the prevous frame. Through backprojecton, the found relatve pose T k,k 1 mplctly defnes an ntal guess for the feature postons of all vsble 3D ponts n the new mage. Due to naccuraces n the 3D ponts postons and, thus, the camera pose, ths ntal guess can be mproved. To reduce the drft, the camera pose should be algned wth respect to the map, rather than to the prevous frame. All 3D ponts of the map that are vsble from the estmated camera pose are projected nto the mage, resultng n an estmate of the correspondng 2D feature postons u (see Fgure 3). For each reprojected pont, the keyframe r that observes the pont wth the closest observaton angle s dentfed. The feature algnment step then optmzes all 2D featurepostons u n the new mage ndvdually by mnmzng the photometrc error of the patch n the current mage wth respect to the reference patch n the keyframe r: u 1 = argmn u 2 I k(u ) A I r (u ) 2,. (13) Ths algnment s solved usng the nverse compostonal LucasKanade algorthm [27]. Contrary to the prevous step, we apply an affne warpng A to the reference patch, snce a larger patch sze s used (8 8 pxels) and the closest keyframe s typcally farther away than the prevous mage. Ths step can be understood as a relaxaton step that volates the eppolar constrants to acheve a hgher correlaton between the featurepatches. C. Pose and Structure Refnement In the prevous step, we have establshed feature correspondence wth subpxel accuracy at the cost of volatng the eppolar constrants. In partcular, we have generated a reprojecton resdual δu = u π(t k,w w p ) 0, whch on average s around 0.3 pxels (see Fgure 11). In ths fnal step, we agan optmze the camera pose T k,w to mnmze the reprojecton resduals (see Fgure 4): 1 T k,w = argmn T k,w 2 u π(t k,w w p ) 2. (14) Ths s the well known problem of motononly BA [17] and can effcently be solved usng an teratve nonlnear least squares mnmzaton algorthm such as Gauss Newton. Subsequently, we optmze the poston of the observed 3D ponts through reprojecton error mnmzaton (structureonly BA). Fnally, t s possble to apply local BA, n whch both the pose of all close keyframes as well as the observed 3D ponts are jontly optmzed. The BA step s ommtted n the fast parameter settngs of the algorthm (Secton VII). D. Dscusson The frst (Secton IVA) and the last (Secton IVC) optmzaton of the algorthm seem to be redundant as both optmze the 6 DoF pose of the camera. Indeed, one could drectly start wth the second step and establsh featurecorrespondence through LucasKanade trackng [27] of all featurepatches, followed by nonlnear pose refnement (Secton IVC). Whle ths would work, the processng tme would be hgher. Trackng all features over large dstances (e.g., 30 pxels) requres a larger patch and a pyramdal mplementaton. Furthermore, some features mght be tracked naccurately, whch would requre outler detecton. In SVO however, feature algnment s effcently ntalzed by only optmzng sx parameters the camera pose n the sparse mage algnment step. The sparse mage algnment step satsfes mplctly the eppolar constrant and ensures that there are no outlers. One may also argue that the frst step (sparse mage algnment) would be suffcent to estmate the camera moton. In
5 T r,k I r u I k u d mn ˆd d k d max Fg. 5: Probablstc depth estmate dˆ for feature n the reference frame r. The pont at the true depth projects to smlar mage regons n both mages (blue squares). Thus, the depth estmate s updated wth the trangulated depth d k computed from the pont u of hghest correlaton wth the reference patch. The pont of hghest correlaton les always on the eppolar lne n the new mage. fact, ths s what recent algorthms developed for RGBD cameras do [10], however, by algnng the full depthmap rather than sparse patches. We found emprcally that usng the frst step only results n sgnfcantly more drft compared to usng all three steps together. The mproved accuracy s due to the algnment of the new mage wth respect to the keyframes and the map, whereas sparse mage algnment algns the new frame only wth respect to the prevous frame. V. MAPPING Gven an mage and ts pose {I k,t k,w }, the mappng thread estmates the depth of 2D features for whch the correspondng 3D pont s not yet known. The depth estmate of a feature s modeled wth a probablty dstrbuton. Every subsequent observaton {I k,t k,w } s used to update the dstrbuton n a Bayesan framework (see Fgure 5) as n [28]. When the varance of the dstrbuton becomes small enough, the depthestmate s converted to a 3D pont usng (2), the pont s nserted n the map and mmedately used for moton estmaton (see Fgure 1). In the followng we report the basc results and our modfcatons to the orgnal mplementaton n [28]. Every depthflter s assocated to a reference keyframe r. The flter s ntalzed wth a hgh uncertanty n depth and the mean s set to the average scene depth n the reference frame. For every subsequent observaton {I k,t k,w }, we search for a patch on the eppolar lne n the new mage I k that has the hghest correlaton wth the reference patch. The eppolar lne can be computed from the relatve pose between the frames T r,k and the optcal ray that passes through u. The pont of hghest correlaton u corresponds to the depth d k that can be found by trangulaton (see Fgure 5). The measurement d k s modeled wth a Gaussan + Unform mxture model dstrbuton [28]: a good measurement s normally dstrbuted around the true depth d whle an outler measurement arses from a unform dstrbuton n the nterval [d mn,d max ]: p( d d k,ρ )=ρ N ( d k d,τ 2 ) +(1 ρ )U ( d k d mn,d max ), where ρ s the nler probablty and τ 2 the varance of a good measurement that can be computed geometrcally by assumng a photometrc dsparty varance of one pxel n the mage plane [29]. (a) (b) (c) Fg. 6: Very lttle moton s requred by the MAV (seen from the sde at the top) for the uncertanty of the depthflters (shown as mangenta lnes) to converge. The recursve Bayesan update step for ths model s descrbed n detal n [28]. In contrast to [28], we use nverse depth coordnates to deal wth large scene depths. The proposed depth estmaton s very effcent when only a small range around the current depth estmate on the eppolar lne s searched; n our case the range corresponds to twce the standard devaton of the current depth estmate. Fgure 6 demonstrates how lttle moton s requred to sgnfcantly reduce the uncertanty n depth. The man advantage of the proposed methods over the standard approach of trangulatng ponts from two vews s that we observe far fewer outlers as every flter undergoes many measurements untl convergence. Furthermore, erroneous measurements are explctly modeled, whch allows the depth to converge even n hghlysmlar envronments. In [29] we demonstrate how the same approach can be used for dense mappng. VI. IMPLEMENTATION DETAILS The algorthm s bootstrapped to obtan the pose of the frst two keyframes and the ntal map. Lke n [16], we assume a locally planar scene and estmate a homography. The ntal map s trangulated from the frst two vews. In order to cope wth large motons, we apply the sparse mage algnment algorthm n a coarsetofne scheme. The mage s halfsampled to create an mage pyramd of fve levels. The ntensty resdual s then optmzed at the coarsest level untl convergence. Subsequently, the optmzaton s ntalzed at the next fner level. To save processng tme, we stop after convergence on the thrd level, at whch stage the estmate s accurate enough to ntalze feature algnment. The algorthm keeps for effcency reasons a fxed number of keyframes n the map, whch are used as reference for featurealgnment and for structure refnement. A keyframe s selected f the Eucldean dstance of the new frame relatve to all keyframes exceeds 12% of the average scene depth. When a new keyframe s nserted n the map, the keyframe farthest apart from the current poston of the camera s removed. In the mappng thread, we dvde the mage n cells of fxed sze (e.g., pxels). A new depthflter s ntalzed at the FAST corner [30] wth hghest ShTomas score n the cell unless there s already a 2Dto3D correspondence present. Ths results n evenly dstrbuted features n the mage. The same grd s also used for reprojectng the map before feature algnment. Note that we extract FAST corners at every level of the mage pyramd to fnd the best corners ndependent of the scale.
6 y [m] Groundtruth Fast PTAM x [m] Fg. 7: Comparson aganst the groundtruth of SVO wth the fast parameter settng (see Table I) and of PTAM. Zoomngn reveals that the proposed algorthm generates a smoother trajectory than PTAM. VII. EXPERIMENTAL RESULTS Experments were performed on datasets recorded from a downwardlookng camera 1 attached to a MAV and sequences from a handheld camera. The vdeo was processed on both a laptop 2 and on an embedded platform 3 that s mounted on the MAV (see Fgure 17). Note that at maxmum 2 CPU cores are used for the algorthm. The experments on the consumer laptop were run wth two dfferent parameters settngs, one optmsed for speed and one for accuracy (Table I). On the embedded platform only the fast parameters settng s used. Fast Accurate Max number of features per mage Max number of keyframes Local Bundle Adjustment no yes TABLE I: Two dfferent parameter settngs of SVO. We compare the performance of SVO wth the modfed PTAM algorthm of [2]. The reason we do not compare wth the orgnal verson of PTAM [16] s because t does not handle large envronments and s not robust enough n scenes of hghfrequency texture [2]. The verson of [2] solves these problems and consttutes to our knowledge the best performng monocular SLAM algorthm for MAVs. A. Accuracy We evaluate the accuracy on a dataset that has also been used n [2] and s llustrated n Fgure 7. The groundtruth for the trajectory orgnates from a moton capture system. The trajectory s 84 meters long and the MAV flew on average 1.2 meters above the flat ground. Fgures 8 and 9 llustrate the poston and atttude error over tme. In order to generate the plots, we algned the frst 10 frames wth the groundtruth usng [31]. The results of PTAM are n a smlar range as reported n [2]. Snce the plots are hghly dependent on the accuracy of algnment of the frst 10 frames, we also report the drft n meters 1 Matrx Vson BlueFox, global shutter, pxel resoluton. 2 Intel 7, 8 cores, 2.8 GHz 3 OdrodU2, ARM Cortex A9, 4 cores, 1.6 GHz xerror [m] yerror [m] zerror [m] Accurate Fast PTAM tme [s] Fg. 8: Poston drft of SVO wth fast and accurate parameter settng and comparson aganst PTAM. rollerror [rad] ptcherror [rad] yawerror [rad] Accurate Fast PTAM tme [s] Fg. 9: Atttutde drfts of SVO wth fast and accurate parameter settng and comparson aganst PTAM. scale change [%] reprojecton error [px] 8 6 Accurate Fast PTAM tme [s] Fg. 10: Scaledrft over tme of the trajectory shown n Fgure Intal error Fnal error tme [s] Fg. 11: Average reprojecton error over tme of the trajectory shown n Fgure 7. The ntal error s after sparse mage algnment (Secton IVA) and the fnal error after pose refnement (Secton IVC). no. features Accurate Fast tme [s] Fg. 12: Number of tracked features over tme for two dfferent parameter settngs. For the accurate parameter settng, the number of features s lmted to 200 and for the fast settng to 120.
7 PosRMSE PosMedan RotRMSE RotMedan [m/s] [m/s] [deg/s] [deg/s] fast accurate PTAM TABLE II: Relatve pose and rotaton error of the trajectory n Fgure 7 per second n Table II as proposed and motvated n [32]. Overall, both versons of SVO are more accurate than PTAM. We suspect the man reason for ths result to orgnate from the fact that the PTAM verson of [2] does not extract features on the pyramd level of hghest resoluton and subpxel refnement s not performed for all features n PTAM. Neglectng the hghest resoluton mage nevtably results n less accuracy whch s clearly vsble n the closeup of Fgure 7. In [2], the use of lower resoluton mages s motvated by the fact that hghfrequency selfsmlar texture n the mage results n too many outler 3D ponts. SVO effcently copes wth ths problem by usng the depthflters whch results n very few outlers. Snce a camera s only an anglesensor, t s mpossble to obtan the scale of the map through a Structure from Moton ppelne. Hence, n the above evaluaton we also algn the scale of the frst 10 measurements wth the groundtruth. The proposed ppelne propagates the scale, however wth some drft that s shown n Fgure 10. The scale drft s computed by comparng the eucldean norm of the relatve translaton aganst the groundtruth. The unknown scale and the scale drft motvate the need for a cameraimu state estmaton system for MAV control, as descrbed n [33]. Fgure 11 llustrates the average reprojecton error. The sparse mage algnment step brngs the frame very close to the fnal pose, as the refnement step reduces the error only margnally. The reprojecton error s generated n the featurealgnment step; hence, ths plot also shows that patches move only a fracton of a pxel durng ths step. The dfference n accuracy between the fast and accurate parameter settng s not sgnfcant. Optmzng the pose and the observed 3D ponts separately at every teraton (fast parameter settng) s accurate enough for MAV moton estmaton. B. Runtme Evaluaton Fgures 13 and 14 show a breakup of the tme requred to compute the camera moton on the specfed laptop and embedded platform respectvely wth the fastparameter settng. The laptop s capable to process the frames faster than 300 frames per second (fps) whle the embedded platform runs at 55 fps. The correspondng tme for PTAM s 91 fps and 27 fps respectvely. The man dfference s that SVO does not requre feature extracton durng moton estmaton whch consttutes the bulk of tme n PTAM (7 ms on the laptop, 16 ms on the embedded computer). Addtonally, PTAM tracks between 160 and 220 features whle n the fast parameter settng, ths value s lmted to 120. The reason why we can relably track the camera wth less features s the use of depthflters, whch assures that the features beng tracked Pyramd Creaton: 0.06ms Sparse Image Algnment: 0.81ms Feature Algnment: 1.73ms Refnement: 0.16ms Total Moton Estmaton: 3.04ms Processng tme [ms] Fg. 13: Tmng results on a laptop computer. Pyramd Creaton: 0.85ms Sparse Image Algnment: 5.53ms Feature Algnment: 9.37ms Refnement: 0.85ms Total Moton Estmaton: 18.17ms Processng tme [ms] Fg. 14: Tmng results on the embedded platform. Fg. 15: Successful trackng n scenes of hghfrequency texture. (a) SVO outlers (b) PTAM Fg. 16: Sdevew of a pecewseplanar map created by SVO and PTAM. The proposed method has fewer outlers due to the depthflter. are relable. Moton estmaton for the accurate parameter settng takes on average 6ms on the laptop. The ncrease n tme s manly due to local BA, whch s run at every keyframe and takes 14ms. The tme requred by the mappng thread to update all depthflters wth the new frame s hghly dependent on the number of flters. The number of flters s hgh after a keyframe s selected and reduces quckly as flters converge. On average, the mappng thread s faster than the moton estmaton thread, thus t s not a lmtng factor. C. Robustness The speed and accuracy of SVO s partally due to the depthflter, whch produces only a mnmal number of outler 3D ponts. Also the robustness s due to the depth
8 Processor Camera Fg. 17: Nano+ by KMel Robotcs, customzed wth embedded processor and downwardlookng camera. SVO runs at 55 frames per second on the platform and s used for stablzaton and control. flter: precse, hgh framerate trackng allows the flter to converge even n scenes of repettve and hghfrequency texture (e.g., asphalt, grass), as t s best demonstrated n the vdeo accompanyng ths paper. Screenshots of the vdeo are shown n Fgure 15. Fgure 16 shows a comparson of the map generated wth PTAM and SVO n the same scene. Whle PTAM generates outler 3D ponts, by contrast SVO has almost no outlers thanks to the use of the depthflter. VIII. CONCLUSION In ths paper, we proposed the semdrect VO ppelne SVO that s precse and faster than the current stateoftheart. The gan n speed s due to the fact that featureextracton and matchng s not requred for moton estmaton. Instead, a drect method s used, whch s based drectly on the mage ntenstes. The algorthm s partcularly useful for stateestmaton onboard MAVs as t runs at more than 50 frames per second on current embedded computers. Hgh framerate moton estmaton, combned wth an outler resstant probablstc mappng method, provdes ncreased robustness n scenes of lttle, repettve, and hgh frequencytexture. REFERENCES [1] M. Blösch, S. Wess, D. Scaramuzza, and R. Segwart, Vson based MAV navgaton n unknown and unstructured envronments, Proc. IEEE Int. Conf. on Robotcs and Automaton, [2] S. Wess, M. W. Achtelk, S. Lynen, M. C. Achtelk, L. Knep, M. Chl, and R. Segwart, Monocular Vson for Longterm Mcro Aeral Vehcle State Estmaton: A Compendum, Journal of Feld Robotcs, vol. 30, no. 5, [3] D. Scaramuzza, M. Achtelk, L. Dotsds, F. Fraundorfer, E. Kosmatopoulos, A. Martnell, M. Achtelk, M. Chl, S. Chatzchrstofs, L. Knep, D. Gurdan, L. Heng, G. Lee, S. Lynen, L. Meer, M. Pollefeys, A. Renzagla, R. Segwart, J. Stumpf, P. Tanskanen, C. Troan, and S. Wess, VsonControlled Mcro Flyng Robots: from System Desgn to Autonomous Navgaton and Mappng n GPSdened Envronments, IEEE Robotcs and Automaton Magazne, [4] C. Forster, S. Lynen, L. Knep, and D. Scaramuzza, Collaboratve Monocular SLAM wth Multple Mcro Aeral Vehcles, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [5] C. Forster, M. Pzzol, and D. Scaramuzza, ArGround Localzaton and Map Augmentaton Usng Monocular Dense Reconstructon, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [6] L. Knep, M. Chl, and R. Segwart, Robust RealTme Vsual Odometry wth a Sngle Camera and an IMU, Proc. Brtsh Machne Vson Conference, [7] J. Engel, J. Sturm, and D. Cremers, Accurate Fgure Flyng wth a Quadrocopter Usng Onboard Vsual and Inertal Sensng, n Proc. VCoMoR Workshop at IEEE/RJS IROS, [8] A. Comport, E. Mals, and P. Rves, Realtme Quadrfocal Vsual Odometry, The Internatonal Journal of Robotcs Research, vol. 29, no. 23, pp , Jan [9] T. Tykkälä, C. Audras, and A. I. Comport, Drect Iteratve Closest Pont for Realtme Vsual Odometry, n Int. Conf. on Computer Vson, [10] C. Kerl, J. Sturm, and D. Cremers, Robust Odometry Estmaton for RGBD Cameras, n Proc. IEEE Int. Conf. on Robotcs and Automaton, [11] M. Melland and A. I. Comport, On unfyng keyframe and voxelbased dense vsual SLAM at large scales, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [12] D. Scaramuzza and F. Fraundorfer, Vsual Odometry, Part I: The Frst 30 Years and Fundamentals [Tutoral], IEEE RAM, [13] M. Iran and P. Anandan, All About Drect Methods, n Proc. Workshop Vs. Algorthms: Theory Pract., 1999, pp [14] S. Lovegrove, A. J. Davson, and J. IbanezGuzman, Accurate vsual odometry from a rear parkng camera, n Intellgent Vehcle, IEEE Symposum, [15] R. a. Newcombe, S. J. Lovegrove, and A. J. Davson, DTAM: Dense Trackng and Mappng n RealTme, IEEE Int. Conf. on Computer Vson, pp , Nov [16] G. Klen and D. Murray, Parallel Trackng and Mappng for Small AR Workspaces, IEEE and ACM Internatonal Symposum on Mxed and Augmented Realty, pp. 1 10, Nov [17] H. Strasdat, J. M. M. Montel, and A. J. Davson, Realtme Monocular SLAM: Why Flter? Proc. IEEE Int. Conf. on Robotcs and Automaton, pp , [18] H. Jn, P. Favaro, and S. Soatto, A semdrect approach to structure from moton, The Vsual Computer, vol. 19, no. 6, pp , [19] N. D. Molton, A. J. Davson, and I. Red, Locally Planar Patch Features for RealTme Structure from Moton, n Proc. Brtsh Machne Vson Conference, [20] G. Slvera, E. Mals, and P. Rves, An Effcent Drect Approach to Vsual SLAM, IEEE Transactons on Robotcs, [21] C. Me, S. Benhmane, E. Mals, and P. Rves, Effcent Homographybased Trackng and 3D Reconstructon for Sngle Vewpont Sensors, IEEE Transactons on Robotcs, vol. 24, no. 6, pp , [22] A. Pretto, E. Menegatt, and E. Pagello, Omndrectonal Dense LargeScale Mappng and Navgaton Based on Meanngful Trangulaton, n Proc. IEEE Int. Conf. on Robotcs and Automaton, [23] J. Engel, J. Sturm, and D. Cremers, SemDense Vsual Odometry for a Monocular Camera, n Proc. IEEE Int. Conf. on Computer Vson. [24] S. Benhmane and E. Mals, Integraton of Eucldean constrants n template based vsual trackng of pecewseplanar scenes, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [25] R. Kümmerle, G. Grsett, and K. Konolge, g2o: A General Framework for Graph Optmzaton, Proc. IEEE Int. Conf. on Robotcs and Automaton, [26] Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry, An Invtaton to 3D Vson: From Images to Geometrc Models. Sprnger Verlag, [27] S. Baker and I. Matthews, LucasKanade 20 Years On: A Unfyng Framework: Part 1, Internatonal Journal of Computer Vson, vol. 56, no. 3, pp , [28] G. Vogatzs and C. Hernández, Vdeobased, RealTme Mult Vew Stereo, Image and Vson Computng, vol. 29, no. 7, [29] M. Pzzol, C. Forster, and D. Scaramuzza, REMODE: Probablstc, Monocular Dense Reconstructon n Real Tme, n Proc. IEEE Int. Conf. on Robotcs and Automaton, [30] E. Rosten, R. Porter, and T. Drummond, FASTER and better: A machne learnng approach to corner detecton, IEEE Trans. Pattern Analyss and Machne Intellgence, vol. 32, pp , [31] S. Umeyama, LeastSquares Estmaton of Transformaton Parameters Between Two Pont Patterns, IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 4, [32] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, A Benchmark for the Evaluaton of RGBD SLAM Systems, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, [33] S. Lynen, M. W. Achtelk, S. Wess, M. Chl, and R. Segwart, A Robust and Modular MultSensor Fuson Approach Appled to MAV Navgaton, n Proc. IEEE/RSJ Int. Conf. on Intellgent Robots and Systems, 2013.
Algebraic Point Set Surfaces
Algebrac Pont Set Surfaces Gae l Guennebaud Markus Gross ETH Zurch Fgure : Illustraton of the central features of our algebrac MLS framework From left to rght: effcent handlng of very complex pont sets,
More informationAsRigidAsPossible Image Registration for Handdrawn Cartoon Animations
AsRgdAsPossble Image Regstraton for Handdrawn Cartoon Anmatons Danel Sýkora Trnty College Dubln John Dnglana Trnty College Dubln Steven Collns Trnty College Dubln source target our approach [Papenberg
More informationDropout: A Simple Way to Prevent Neural Networks from Overfitting
Journal of Machne Learnng Research 15 (2014) 19291958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever
More informationFace Alignment through Subspace Constrained MeanShifts
Face Algnment through Subspace Constraned MeanShfts Jason M. Saragh, Smon Lucey, Jeffrey F. Cohn The Robotcs Insttute, Carnege Mellon Unversty Pttsburgh, PA 15213, USA {jsaragh,slucey,jeffcohn}@cs.cmu.edu
More informationSequential DOE via dynamic programming
IIE Transactons (00) 34, 1087 1100 Sequental DOE va dynamc programmng IRAD BENGAL 1 and MICHAEL CARAMANIS 1 Department of Industral Engneerng, Tel Avv Unversty, Ramat Avv, Tel Avv 69978, Israel Emal:
More informationMANY of the problems that arise in early vision can be
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004 147 What Energy Functons Can Be Mnmzed va Graph Cuts? Vladmr Kolmogorov, Member, IEEE, and Ramn Zabh, Member,
More informationWho are you with and Where are you going?
Who are you wth and Where are you gong? Kota Yamaguch Alexander C. Berg Lus E. Ortz Tamara L. Berg Stony Brook Unversty Stony Brook Unversty, NY 11794, USA {kyamagu, aberg, leortz, tlberg}@cs.stonybrook.edu
More informationBRNO UNIVERSITY OF TECHNOLOGY
BRNO UNIVERSITY OF TECHNOLOGY FACULTY OF INFORMATION TECHNOLOGY DEPARTMENT OF INTELLIGENT SYSTEMS ALGORITHMIC AND MATHEMATICAL PRINCIPLES OF AUTOMATIC NUMBER PLATE RECOGNITION SYSTEMS B.SC. THESIS AUTHOR
More informationAsRigidAsPossible Shape Manipulation
AsRgdAsPossble Shape Manpulaton akeo Igarash 1, 3 omer Moscovch John F. Hughes 1 he Unversty of okyo Brown Unversty 3 PRESO, JS Abstract We present an nteractve system that lets a user move and deform
More informationBoosting as a Regularized Path to a Maximum Margin Classifier
Journal of Machne Learnng Research 5 (2004) 941 973 Submtted 5/03; Revsed 10/03; Publshed 8/04 Boostng as a Regularzed Path to a Maxmum Margn Classfer Saharon Rosset Data Analytcs Research Group IBM T.J.
More informationDocumentation for the TIMES Model PART I
Energy Technology Systems Analyss Programme http://www.etsap.org/tools.htm Documentaton for the TIMES Model PART I Aprl 2005 Authors: Rchard Loulou Uwe Remne Amt Kanuda Antt Lehtla Gary Goldsten 1 General
More information(Almost) No Label No Cry
(Almost) No Label No Cry Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa {namesurname}@anueduau
More informationTurbulence Models and Their Application to Complex Flows R. H. Nichols University of Alabama at Birmingham
Turbulence Models and Ther Applcaton to Complex Flows R. H. Nchols Unversty of Alabama at Brmngham Revson 4.01 CONTENTS Page 1.0 Introducton 1.1 An Introducton to Turbulent Flow 11 1. Transton to Turbulent
More informationEffect of a spectrum of relaxation times on the capillary thinning of a filament of elastic liquid
J. NonNewtonan Flud Mech., 72 (1997) 31 53 Effect of a spectrum of relaxaton tmes on the capllary thnnng of a flament of elastc lqud V.M. Entov a, E.J. Hnch b, * a Laboratory of Appled Contnuum Mechancs,
More informationStable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation
Stable Dstrbutons, Pseudorandom Generators, Embeddngs, and Data Stream Computaton PIOTR INDYK MIT, Cambrdge, Massachusetts Abstract. In ths artcle, we show several results obtaned by combnng the use of
More informationTrueSkill Through Time: Revisiting the History of Chess
TrueSkll Through Tme: Revstng the Hstory of Chess Perre Dangauther INRIA Rhone Alpes Grenoble, France perre.dangauther@mag.fr Ralf Herbrch Mcrosoft Research Ltd. Cambrdge, UK rherb@mcrosoft.com Tom Mnka
More informationSupport vector domain description
Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty
More informationEnsembling Neural Networks: Many Could Be Better Than All
Artfcal Intellgence, 22, vol.37, no.2, pp.239263. @Elsever Ensemblng eural etworks: Many Could Be Better Than All ZhHua Zhou*, Janxn Wu, We Tang atonal Laboratory for ovel Software Technology, anng
More informationComplete Fairness in Secure TwoParty Computation
Complete Farness n Secure TwoParty Computaton S. Dov Gordon Carmt Hazay Jonathan Katz Yehuda Lndell Abstract In the settng of secure twoparty computaton, two mutually dstrustng partes wsh to compute
More informationDo Firms Maximize? Evidence from Professional Football
Do Frms Maxmze? Evdence from Professonal Football Davd Romer Unversty of Calforna, Berkeley and Natonal Bureau of Economc Research Ths paper examnes a sngle, narrow decson the choce on fourth down n the
More informationAssessing health efficiency across countries with a twostep and bootstrap analysis *
Assessng health effcency across countres wth a twostep and bootstrap analyss * Antóno Afonso # $ and Mguel St. Aubyn # February 2007 Abstract We estmate a semparametrc model of health producton process
More informationThe Relationship between Exchange Rates and Stock Prices: Studied in a Multivariate Model Desislava Dimitrova, The College of Wooster
Issues n Poltcal Economy, Vol. 4, August 005 The Relatonshp between Exchange Rates and Stock Prces: Studed n a Multvarate Model Desslava Dmtrova, The College of Wooster In the perod November 00 to February
More informationFrom Computing with Numbers to Computing with Words From Manipulation of Measurements to Manipulation of Perceptions
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 45, NO. 1, JANUARY 1999 105 From Computng wth Numbers to Computng wth Words From Manpulaton of Measurements to Manpulaton
More informationThe Global Macroeconomic Costs of Raising Bank Capital Adequacy Requirements
W/1/44 The Global Macroeconomc Costs of Rasng Bank Captal Adequacy Requrements Scott Roger and Francs Vtek 01 Internatonal Monetary Fund W/1/44 IMF Workng aper IMF Offces n Europe Monetary and Captal Markets
More informationDISCUSSION PAPER. Should Urban Transit Subsidies Be Reduced? Ian W.H. Parry and Kenneth A. Small
DISCUSSION PAPER JULY 2007 RFF DP 0738 Should Urban Transt Subsdes Be Reduced? Ian W.H. Parry and Kenneth A. Small 1616 P St. NW Washngton, DC 20036 2023285000 www.rff.org Should Urban Transt Subsdes
More informationThe Developing World Is Poorer Than We Thought, But No Less Successful in the Fight against Poverty
Publc Dsclosure Authorzed Pol c y Re s e a rc h Wo r k n g Pa p e r 4703 WPS4703 Publc Dsclosure Authorzed Publc Dsclosure Authorzed The Developng World Is Poorer Than We Thought, But No Less Successful
More informationcan basic entrepreneurship transform the economic lives of the poor?
can basc entrepreneurshp transform the economc lves of the poor? Orana Bandera, Robn Burgess, Narayan Das, Selm Gulesc, Imran Rasul, Munsh Sulaman Aprl 2013 Abstract The world s poorest people lack captal
More informationAlpha if Deleted and Loss in Criterion Validity 1. Appeared in British Journal of Mathematical and Statistical Psychology, 2008, 61, 275285
Alpha f Deleted and Loss n Crteron Valdty Appeared n Brtsh Journal of Mathematcal and Statstcal Psychology, 2008, 6, 275285 Alpha f Item Deleted: A Note on Crteron Valdty Loss n Scale Revson f Maxmsng
More informationWhy Don t We See Poverty Convergence?
Why Don t We See Poverty Convergence? Martn Ravallon 1 Development Research Group, World Bank 1818 H Street NW, Washngton DC, 20433, USA Abstract: We see sgns of convergence n average lvng standards amongst
More informationUPGRADE YOUR PHYSICS
Correctons March 7 UPGRADE YOUR PHYSICS NOTES FOR BRITISH SIXTH FORM STUDENTS WHO ARE PREPARING FOR THE INTERNATIONAL PHYSICS OLYMPIAD, OR WISH TO TAKE THEIR KNOWLEDGE OF PHYSICS BEYOND THE ALEVEL SYLLABI.
More information