Introduction to Computer Vision for Robotics Jan-Michael Frahm Overview Camera model Multi-view geometr Camera pose estimation Feature tracking & matching Robust pose estimation
Homogeneous coordinates e 3 M Unified notation: include origin in affine basis a 3 e 2 o a 2 a Affine basis matri M e a r r r r e e2 e3 o a 2 = a3 Homogeneous Coordinates of M Properties of affine transformation Transformation T affine combines linear mapping and coordinate shift in homogeneous coordinates Linear mapping with A 33 matri coordinate shift with t 3 translation vector A t = = 33 3 M TaffineM M Parallelism is preserved 2 22 23 affine = a3 a32 a33 tz ratios of length, area, and volume are preserved Transformations can be concatenated: if M = T M and M = T M M = T T M = T M T a a2 a3 t a a a t 2 2 2 2 2 2
Projective geometr Projective space P 2 is space of ras emerging from O view point O forms projection center for all ras ras v emerge from viewpoint into scene ra g is called projective point, defined as scaled v: g=lv w R 3 v = O v g( λ) = = λv, λ R w Projective and homogeneous points Given: Plane Π in R 2 embedded in P 2 at coordinates w= viewing ra g intersects plane at v (homogeneous coordinates) all points on ra g project onto the same homogeneous point v projection of g onto Π is defined b scaling v=g/l = g/w Π (Ρ 2 ) w R 3 g( λ) = λv = λ = w w= v w v = g = = w w O 3
Finite and infinite points All ras g that are not parallel to Π intersect at an affine point v on Π. The ra g(w=) does not intersect Π. Hence v is not an affine point but a direction. Directions have the coordinates (,,z,) T Projective space combines affine space with infinite points (directions). P 3 g g g 2 w= Π (Ρ 2 ) v v v2 O v g = Affine and projective transformations Affine transformation leaves infinite points at infinit ' M = TaffineM X a a2 a3 t X a2 a22 a23 t = Z a3 a32 a33 t z Z Projective transformations move infinite points into finite affine space ' M = TprojectiveM p X a a2 a3 t X a a a t = = p 2 22 23 λ λ zp Z a3 a32 a33 tz Z w w 4 w42 w43 w44 Eample: Parallel lines intersect at the horizon (line of infinite points). We can see this intersection due to perspective projection! 4
Pinhole Camera (Camera obscura) Interior of camera obscura Camera obscura (Sunda Magazine, 838) (France, 83) Pinhole camera model Image sensor aperture Center (c, c ) object View direction Focal length f image lens 5
Perspective projection Perspective projection in P 3 models pinhole camera: scene geometr is affine Ρ 3 space with coordinates M=(X,,Z,) T camera focal point in O=(,,,) T, camera viewing direction along Z image plane (,) in Π(Ρ 2 ) aligned with (X,) at Z= Z Scene point M projects onto point M p on plane surface X Z (Optical ais) M = Z Π 3 Π (Ρ 2 ) Z M p O Camera center X Image plane ZX p X Z Z p Z Z Mp = = = ZZ Z Z Z Z Z Z Projective Transformation Projective Transformation maps M onto M p in Ρ 3 space O M p X M ρ ρm p p X X p = TpM ρ = = Z Z Z W z Z ρ = = projective scale factor Z Projective Transformation linearizes projection 6
Perspective Projection Dimension reduction from Ρ 3 into Ρ 2 b projection onto Π(Ρ 2 ) Π (Ρ 2 ) O m p X M p p p p p p Z = Z Perspective projection P from Ρ 3 onto Ρ 2 : X Z ρmp = DpTpM = PM ρ =, ρ = Z Z z Image plane and image sensor A sensor with picture elements (Piel) is added onto the image plane Piel coordinates m = (,) T Piel scale f= (f,f ) T Focal length Z Z (Optical ais) Projection center Image center c= (c, c ) T X Image sensor Image-sensor mapping: m = Km p Piel coordinates are related to image coordinates b affine transformation K with five parameters: f s c K = f c Image center c at optical ais distance Z p (focal length) and Piel size determines piel resolution f, f image skew s to model angle between piel rows and columns 7
Projection in general pose T cam R C = T P m p Rotation [R] Projection: ρ m = PM p Projection center C M T T R R C Tscene = T cam = T World coordinates Projection matri P Camera projection matri P combines: inverse affine transformation T cam - from general pose to origin Perspective projection P to image plane at Z = affine mapping K from image to sensor coordinates scene pose transformation: T projection: P = = [ I ] scene T T R R C = T f s c sensor calibration: K = f c T T ρm = PM, P = KP Tscene = K R R C 8
2-view geometr: F-Matri Projection onto two views: P K R I = [ ] [ ] [ ] ρ m = P M = K R I M ρ m = K R I M ρ M M P Z m O X e m C P K R I C P [ ] [ ] [ ] [ ] = ρ m = PM = K R I C M M = + K R I M K R I C O ρ m = K R R K ρ m K R C ρ m = ρ H m + e H m Epipolar line X X = = + M = M + O Z Z The Fundamental Matri F The projective points e and (H m ) define a plane in camera (epipolar plane Π e ) the epipolar plane intersect the image plane in a line (epipolar line u e ) the corresponding point m lies on that line: m T u e = If the points (e ),(m ),(H m ) are all collinear, then the collinearit theorem applies: (m T e H m ) =. T collinearit of m, e, H m m ( e H m ) = ez e [ e] = ez e e e [ ] F3 3 Fundamental Matri F F = [ e ] H Epipolar constraint T m Fm = 9
Estimation of F from correspondences Given a set of corresponding points, solve linearil for the 9 elements of F in projective coordinates since the epipolar constraint is homogeneous up to scale, onl eight elements are independent since the operator [e] and hence F have rank 2, F has onl 7 independent parameters (all epipolar lines intersect at e) each correspondence gives collinearit constraint => solve F with minimum of 7 correspondences for N>7 correspondences minimize distance point-line: N n= T 2 ( m Fm ) min!, n, n m Fm = T i i det( F ) = (Rank 2 constraint) The Essential Matri E F is the most general constraint on an image pair. If the camera calibration matri K is known, then more constraints are available Essential Matri E m T ~ ~ ~ m ~ Fm = E = [ e] T T T ( Km ) F( Km ) = m ( K FK ) R with [ e] = ez e e 4243 E holds the relative orientation of a calibrated camera pair. It has 5 degrees of freedom: 3 from rotation matri R ik, 2 from direction of translation e, the epipole. e z E e e
Estimation of P from E From E we can obtain a camera projection matri pair: E=Udiag(,,)V T P =[I 33 3 ] and there are four choices for P : P =[UWV T +u 3 ] or P =[UWV T -u 3 ] or P =[UW T V T +u 3 ] or P =[UW T V T -u 3 ] with W = onl one with 3D point in front of both cameras four possible configurations: 3D Feature Reconstruction corresponding point pair (m, m ) is projected from 3D feature point M M is reconstructed from b (m, m ) triangulation M has minimum distance of intersection P m I m I M d 2 d min! constraints: I d = T I d = T F minimize reprojection error: P ( m P M) + ( m PM) min. 2 2
Multi View Tracking 2D match: Image correspondence (m, mi) 3D match: Correspondence transfer (mi, M) via P 3D Pose estimation of Pi with mi - Pi M => min. P m M 3D match m mi P 2D match Pi Minimize lobal reprojection error: N K i = k = m PM k, i i k 2 min! Correspondences matching vs. tracking Image-to-image correspondences are essential to 3D reconstruction KLT-tracker SIFT-matcher Etract features independentl and then match b comparing descriptors [Lowe 24] Etract features in first images and find same feature back in net view [Lucas & Kanade 98], [Shi & Tomasi 994] Small difference between frames potential large difference overall 2
SIFT-detector Scale and image-plane-rotation invariant feature descriptor [Lowe 24] SIFT-detector Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters 3
Difference of Gaussian for Scale invariance Gaussian Difference of Gaussian Difference-of-Gaussian with constant ratio of scales is a close approimation to Lindeberg s scale-normalized Laplacian [Lindeberg 998] Difference of Gaussian for Scale invariance Difference-of-Gaussian with constant ratio of scales is a close approimation to Lindeberg s scale-normalized Laplacian [Lindeberg 998] 4
Blur Subtract Detect maima and minima of difference-of-gaussian in scale space Fit a quadratic to surrounding values for sub-piel and sub-scale interpolation (Brown & Lowe, 22) Talor epansion around point: Ke point localization Offset of etremum (use finite differences for derivatives): Orientation normalization Histogram of local gradient directions computed at selected scale Assign principal orientation at peak of smoothed histogram Each ke specifies stable 2D coordinates (,, scale, orientation) 2π 5
Eample of kepoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (a) 23389 image (b) 832 DOG etrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures courtes Lowe SIFT vector formation Thresholded image gradients are sampled over 66 arra of locations in scale space Create arra of orientation histograms 8 orientations 44 histogram arra = 28 dimensions eample 22 histogram arra Lowe 6
Sift feature detector Robust data selection: RANSAC Estimation of plane from point data {potential plane points} Select m samples Compute n-parameter solution Evaluate on {potential plane points} {inlier samples } Best solution so far? es no Keep it no -(-(#inlier/ {potential plane points} ) n ) steps >.99 Best solution, {inlier}, {outlier} 7
RANSAC: Evaluate Hpotheses Evaluate cost function 2 2 c λ ε + c c 2 2 + c λ ε < + c c else if 2λ ε c + c 2 2 λ ε 2 c 2 2 ( + λ ε ) ε 2 ε potential points ε ε ε n ε ε ε n ε ε ε n plane hpotheses ε ε ε n Robust Pose Estimation Calibrated Camera Bootstrap known 3D points {2D-2D correspondences} {2D-3D correspondences} E-RANSAC(5-point algorithm) E Estimate P,P P,P RANSAC(3-point algorithm) P nonlinear refinement with all inliers P nonlinear refinement with all inliers P,P triangulate points 8
References [Lowe 24] David Lowe, Distinctive Image Features from Scale- Invariant Kepoints, IJCV, 6(2), 24, pp9- [Lucas & Kanade 98] Bruce D. Lucas and Takeo Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings International Joint Conference on Artificial Intelligence, 98. [Shi & Tomasi 994] Jianbo Shi and Carlo Tomasi, Good Features to Track, IEEE Conference on Computer Vision and Pattern Recognition 994 [Baker & Matthews 24] S. Baker and I. Matthews, Lucas-Kanade 2 ears On: A Unifing Framework.International Journal of Computer Vision, 56(3):22 255, March 24. [Fischler & Bolles] M.A. Fischler and R.C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analsis and automated cartograph.cacm, 24(6), June 8. References [Mikolajczk 23], K.Mikolajczk, C.Schmid. A Performance Evaluation of Local Descriptors. CVPR 23 [Lindeberg 998], T. Lindeberg, "Feature detection with automatic scale selection," International Journal of Computer Vision, vol. 3, no. 2, 998 [Hartle 23] R. hartle and A. Zisserman, Multiple View Geometr in Computer Vision, 2 nd edition, Cambridge Press, 23 9