From 2D to 3D: Monocular Vision With application to robotics/ar

Similar documents
Introduction Epipolar Geometry Calibration Methods Further Readings. Stereo Camera Calibration

RGB-D Mapping: Using Kinect-Style Depth Cameras for Dense 3D Modeling of Indoor Environments

Wii Remote Calibration Using the Sensor Bar

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

A. OPENING POINT CLOUDS. (Notepad++ Text editor) (Cloud Compare Point cloud and mesh editor) (MeshLab Point cloud and mesh editor)

Removing Moving Objects from Point Cloud Scenes

2-View Geometry. Mark Fiala Ryerson University

Epipolar Geometry and Visual Servoing

A Learning Based Method for Super-Resolution of Low Resolution Images

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH

3D Scanner using Line Laser. 1. Introduction. 2. Theory

Feature Tracking and Optical Flow

Object tracking & Motion detection in video sequences

Localization of Mobile Robots Using Odometry and an External Vision Sensor

Real-Time Camera Tracking Using a Particle Filter

High-accuracy ultrasound target localization for hand-eye calibration between optical tracking systems and three-dimensional ultrasound

Solution Guide III-C. 3D Vision. Building Vision for Business. MVTec Software GmbH

Point Cloud Simulation & Applications Maurice Fallon

An Iterative Image Registration Technique with an Application to Stereo Vision

Automatic Labeling of Lane Markings for Autonomous Vehicles

CS 534: Computer Vision 3D Model-based recognition

ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES

C# Implementation of SLAM Using the Microsoft Kinect

PCL Tutorial: The Point Cloud Library By Example. Jeff Delmerico. Vision and Perceptual Machines Lab 106 Davis Hall UB North Campus.

EXPERIMENTAL EVALUATION OF RELATIVE POSE ESTIMATION ALGORITHMS

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Flow Separation for Fast and Robust Stereo Odometry

Tracking of Small Unmanned Aerial Vehicles

Digital Image Increase

An Efficient Solution to the Five-Point Relative Pose Problem

Blender 3D Animation

Least-Squares Intersection of Lines

Interactive Dense 3D Modeling of Indoor Environments

OVER the past decade, there has been a tremendous

Mobile Robot FastSLAM with Xbox Kinect

INTRODUCTION TO RENDERING TECHNIQUES

3D Vision An enabling Technology for Advanced Driver Assistance and Autonomous Offroad Driving

Build Panoramas on Android Phones

Object Recognition and Template Matching

Real-time Visual Tracker by Stream Processing

How does the Kinect work? John MacCormick

Real-Time 3D Reconstruction Using a Kinect Sensor

3D Vision An enabling Technology for Advanced Driver Assistance and Autonomous Offroad Driving

Situated Visualization with Augmented Reality. Augmented Reality

Handbook of Robotics. Chapter 22 - Range Sensors

Spatio-Temporally Coherent 3D Animation Reconstruction from Multi-view RGB-D Images using Landmark Sampling

VIRTUAL TRIAL ROOM USING AUGMENTED REALITY

Classifying Manipulation Primitives from Visual Data

Taking Inverse Graphics Seriously

3D/4D acquisition. 3D acquisition taxonomy Computer Vision. Computer Vision. 3D acquisition methods. passive. active.

Motion Capture Sistemi a marker passivi

Part-Based Recognition

Building an Advanced Invariant Real-Time Human Tracking System

animation animation shape specification as a function of time

Video stabilization for high resolution images reconstruction

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

CS 4620 Practicum Programming Assignment 6 Animation

Bayesian Image Super-Resolution

High-Resolution Multiscale Panoramic Mosaics from Pan-Tilt-Zoom Cameras

Face Model Fitting on Low Resolution Images

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER

Incremental Surface Extraction from Sparse Structure-from-Motion Point Clouds

Professor, D.Sc. (Tech.) Eugene Kovshov MSTU «STANKIN», Moscow, Russia

Robotics. Lecture 3: Sensors. See course website for up to date information.

Topographic Change Detection Using CloudCompare Version 1.0

Towards Using Sparse Bundle Adjustment for Robust Stereo Odometry in Outdoor Terrain

Understanding and Applying Kalman Filtering

T-REDSPEED White paper

1-Point RANSAC for EKF Filtering. Application to Real-Time Structure from Motion and Visual Odometry

MACHINE VISION MNEMONICS, INC. 102 Gaither Drive, Suite 4 Mount Laurel, NJ USA

Terrain Traversability Analysis using Organized Point Cloud, Superpixel Surface Normals-based segmentation and PCA-based Classification

An Introduction to Applied Mathematics: An Iterative Process

E27 SPRING 2013 ZUCKER PROJECT 2 PROJECT 2 AUGMENTED REALITY GAMING SYSTEM

Geometric Camera Parameters

Augmented Architectural Environments

CHAPTER 6 TEXTURE ANIMATION

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

AUGMENTED Reality (AR) makes the physical world a part

Visual-based ID Verification by Signature Tracking

THE problem of visual servoing guiding a robot using

Automatic 3D Mapping for Infrared Image Analysis

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Integration Services

Traffic Driven Analysis of Cellular Data Networks

Auto Head-Up Displays: View-Through for Drivers

To determine vertical angular frequency, we need to express vertical viewing angle in terms of and. 2tan. (degree). (1 pt)

New Measurement Concept for Forest Harvester Head

Transcription:

From 2D to 3D: Monocular Vision With application to robotics/ar

Motivation How many sensors do we really need?

Motivation What is the limit of what can be inferred from a single embodied (moving) camera frame?

Aim AR with a hand-held camera Visual Tracking provides registration Track without prior model of world Challenges Speed Accuracy Robustness Interaction with real world

Existing attempts: SLAM Simultaneous Localization and Mapping Well-established in robotics (using a rich array of sensors) Demonstrated with a single hand-held camera by Davison 2003

Model-based tracking vs SLAM

Model-based tracking vs SLAM Model-based tracking is More robust More accurate Why? SLAM fundamentally harder?

Pinhole camera model X, Y, Z fx / Z, fy / Z [ X fx f Y fy = Z Z 1 f ] X 0 Y 0 Z 1 0 1 x = PX

Pinhole camera model principal point:( p x, f X + Zp x f f Y + Zp x = Z f K = f f py ) X p x 1 0 Y py 1 0 Z 1 1 0 1 px p y calibration matrix 1 P = K [ I 0]

Camera rotation and translation In non-homogeneous coordinates: ( ~ ~ ~ X cam = R X - C X cam x = K[ I 0] X cam ) ~ ~ ~ R RC X R RC = = X 1 1 0 1 0 [ ] ~ = K R RC X P = K[ R t ], ~ t = RC Note: C is the null space of the camera projection matrix (PC=0)

Triangulation Given projections of a 3D point in two or more images (with known camera matrices), find the coordinates of the point X? x1 O1 x2 O2

Structure from Motion (SfM) Given: m images of n fixed 3D points xij = Pi Xj, i = 1,, m, j = 1,, n Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij Xj x1j P1 x3j x2j P3 P2

SfM ambiguity If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same: 1 x = PX = P (k X) k It is impossible to recover the absolute scale of the scene!

Structure from Motion (SfM) Given: m images of n fixed 3D points xij = Pi Xj, i = 1,, m, j = 1,, n Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q: X QX, P PQ-1 We can solve for structure and motion when 2mn >= 11m +3n For two cameras, at least 7 points are needed

Bundle Adjustment Non-linear method for refining structure and motion (Levenberg-Marquardt) Minimizing re-projection error Xj E (P, X) = D( x ij, Pi X j ) m n i =1 j =1 P1Xj P1 x3j x1j P2 Xj x2j P3 Xj P3 P2 2

Self-calibration Self-calibration (auto-calibration) is the process of determining intrinsic camera parameters directly from uncalibrated images For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images Compute initial projective reconstruction and find 3D projective transformation matrix Q such that all camera matrices are in the form Pi = K [Ri ti] Can use constraints on the form of the calibration matrix: zero skew

Why is this cool? http://www.youtube.com/watch?v=sqegero5bfo

Why is this still cool? http://www.youtube.com/watch?v=p16frkjlvi0

The SLAM Problem Simultaneous Localization And Mapping A robot is exploring an unknown, static environment Given: The robot's controls Observations of nearby features Estimate: Map of features Path of the robot

Structure of the landmark-based SLAM Problem

SLAM a hard problem?? SLAM: robot path and map are both unknown Robot path error correlates errors in the map

SLAM a hard problem?? Robot pose uncertainty In the real world, the mapping between observations and landmarks is unknown Picking wrong data associations can have catastrophic consequences Pose error correlates data associations

SLAM Full SLAM: Estimates entire path and map! p ( x1:t, m z1:t, u1:t ) Online SLAM: p ( xt, m z1:t, u1:t ) = p ( x1:t, m z1:t, u1:t ) dx1dx2...dxt 1 Integrations typically done one at a time Estimates most recent pose and map!

Graphical Model of Full SLAM p ( x1:t, m z1:t, u1:t )

Graphical Model of Online SLAM p ( xt, m z1:t, u1:t ) = p ( x1:t, m z1:t, u1:t ) dx1 dx2...dxt 1

Scan Matching Maximize the likelihood of the i-th pose and map relative to the (i-1)-th pose and map { xˆt = arg max p( zt xt, mˆ xt current measurement [ t 1] } ) p( xt ut 1, xˆt 1 ) robot motion map constructed so far Calculate the map according to mapping with known poses based on the poses and observations

SLAM approach

PTAM approach

Tracking & Mapping threads

Mapping thread

Stereo Initialization 5 point-pose algorithm (Stewenius et al '06) Requires a pair of frames and feature correspondences Provides initial (sparse) 3D point cloud

Wait for new keyframe Keyframes are only added if: There is a baseline to the other keyframes Tracking quality is good When a keyframe is added: The mapping thread stops whatever it is doing All points in the map are measured in the keyframe New map points are found and added to the map

Add new map points Want as many map points as possible Check all maximal FAST corners in the keyframe: Check Shi-Tomasi score Check if already in map Epipolar search in a neighboring keyframe Triangulate matches and add to map Repeat in four image pyramid levels

Optimize map Use batch SFM method: Bundle Adjustment* Adjusts map point positions and keyframe poses Minimizes re-projection error of all points in all keyframes (or use only last N keyframes) Cubic complexity with keyframes, linear with map points Compatible with M-estimators (we use Tukey)

Map maintenance When camera is not exploring, mapping thread has idle time use this to improve the map Data association in bundle adjustment is reversible Re-attempt outlier measurements Try to measure new map features in all old keyframes

Tracking thread

Pre-process frame Make mono and RGB version of image Make 4 pyramid levels Detect FAST corners

Project Points Use motion model to update camera pose Project all map points into image to see which are visible, and at what pyramid level Choose subset to measure ~50 biggest features for coarse stage 1000 randomly selected for fine stage

Measure Points Generate 8x8 matching template (warped from source keyframe) Search a fixed radius around projected position Use zero-mean SSD Only search at FAST corner points Up to 10 inverse composition iterations for subpixel position (for some patches) Typically find 60-70% of patches

Update camera pose 6-DOF problem 10 iterations Tukey M-Estimator to minimize a robust objective function of re-projection error where ej is the re-projection error vector

Bundle-adjustment Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in any of these keyframes Y - Keyframe for which a measurement of any point in Z has been made That is, local bundle Optimizes the pose of the most recent keyframe and its closest neighbors, and all of the map points seen by these, using all of the measurements ever made of these points.

Video http://www.youtube.com/watch?v=y9hmn6bd-v8 http://www.youtube.com/watch?v=pbi5hwitbx4

Capabilities

Capabilities Bundle adjusted point cloud with PTAM Multi-scale Compactly Supported Basis Functions

Video http://www.youtube.com/watch?v=czisk7omanw

RGB-D Sensor Principle: structured light IR projector + IR camera RGB camera Dense depth images

Kinect-based mapping

System Overview Frame-to-frame alignment Global optimization (SBA for loop closure)

Feature matching

RANSAC Features correspondences are established; outliers robustly removed Homography (Transformation) between the two keyframes can now be estimated

Global Optimization (RGBD-ICP)

Benefits Visual and depth information used jointly for real-time mapping application Reconstruct a dense map of the environment Avoid dense stereo for every pair of KeyFrames Optimize over sparse set of feature points Results in dramatic speed improvements Allows for computing other valuable algorithms simultaneously (e.g. navigation, obstacle avoidance, scene understanding)

Video http://www.cs.washington.edu/ai/mobile_robotics/projects/rgbd-3d-mapping/

Kinect + Real-time reconstruction

Video http://research.microsoft.com/apps/video/dl.aspx?id=152815

Conclusion So much information available from a single camera Yet to truly understand what we can infer from a single camera Several exciting technologies in the recent past Software problem; not a hardware limitation Monocular vision can be sufficient for a lot of use cases

Thanks!