Midterm Examination. CS 766: Computer Vision

Similar documents
Feature Tracking and Optical Flow

Geometric Camera Parameters

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY

Image Segmentation and Registration

3D Scanner using Line Laser. 1. Introduction. 2. Theory

2-View Geometry. Mark Fiala Ryerson University

CS 534: Computer Vision 3D Model-based recognition

Convolution. 1D Formula: 2D Formula: Example on the web:

Lecture 2: Homogeneous Coordinates, Lines and Conics

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

Relating Vanishing Points to Catadioptric Camera Calibration

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

V-PITS : VIDEO BASED PHONOMICROSURGERY INSTRUMENT TRACKING SYSTEM. Ketan Surender

Edge detection. (Trucco, Chapt 4 AND Jain et al., Chapt 5) -Edges are significant local changes of intensity in an image.

Wii Remote Calibration Using the Sensor Bar

Camera geometry and image alignment

Introduction Epipolar Geometry Calibration Methods Further Readings. Stereo Camera Calibration

Optical Tracking Using Projective Invariant Marker Pattern Properties

An Iterative Image Registration Technique with an Application to Stereo Vision

Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition

DETECTION OF PLANAR PATCHES IN HANDHELD IMAGE SEQUENCES

Interactive 3D Scanning Without Tracking

Least-Squares Intersection of Lines

Distinctive Image Features from Scale-Invariant Keypoints

4BA6 - Topic 4 Dr. Steven Collins. Chap. 5 3D Viewing and Projections

Jiří Matas. Hough Transform

Probabilistic Latent Semantic Analysis (plsa)

Randomized Trees for Real-Time Keypoint Recognition

VOLUMNECT - Measuring Volumes with Kinect T M

Object tracking & Motion detection in video sequences

Canny Edge Detection

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

Part-Based Recognition

Computer vision. 3D Stereo camera Bumblebee. 25 August 2014

Terrain Traversability Analysis using Organized Point Cloud, Superpixel Surface Normals-based segmentation and PCA-based Classification

Vision based Vehicle Tracking using a high angle camera

DESIGN & DEVELOPMENT OF AUTONOMOUS SYSTEM TO BUILD 3D MODEL FOR UNDERWATER OBJECTS USING STEREO VISION TECHNIQUE

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation

Automated Location Matching in Movies

Make and Model Recognition of Cars

Face detection is a process of localizing and extracting the face region from the

Build Panoramas on Android Phones

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

3D Model based Object Class Detection in An Arbitrary View

Automatic georeferencing of imagery from high-resolution, low-altitude, low-cost aerial platforms

Automatic 3D Mapping for Infrared Image Analysis

Local features and matching. Image classification & object localization

EXPERIMENTAL EVALUATION OF RELATIVE POSE ESTIMATION ALGORITHMS

Geometry of Vectors. 1 Cartesian Coordinates. Carlo Tomasi

Augmented Reality Tic-Tac-Toe

PCL Tutorial: The Point Cloud Library By Example. Jeff Delmerico. Vision and Perceptual Machines Lab 106 Davis Hall UB North Campus.

Geometric Transformation CS 211A

THE problem of visual servoing guiding a robot using

3D OBJECT MODELING AND RECOGNITION IN PHOTOGRAPHS AND VIDEO

Stitching of X-ray Images

OBJECT TRACKING USING LOG-POLAR TRANSFORMATION

Real-Time Automated Simulation Generation Based on CAD Modeling and Motion Capture

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

The Geometry of Perspective Projection

LIST OF CONTENTS CHAPTER CONTENT PAGE DECLARATION DEDICATION ACKNOWLEDGEMENTS ABSTRACT ABSTRAK

3. Reaction Diffusion Equations Consider the following ODE model for population growth

Machine vision systems - 2

Image Processing Techniques applied to Liquid Argon Time Projection Chamber(LArTPC) Data

ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Sharpening through spatial filtering

Taking Inverse Graphics Seriously

Interactive Segmentation, Tracking, and Kinematic Modeling of Unknown 3D Articulated Objects

Epipolar Geometry and Visual Servoing

Gradient and Curvature from Photometric Stereo Including Local Confidence Estimation

More Local Structure Information for Make-Model Recognition

INTRODUCTION TO RENDERING TECHNIQUES

Lecture 12: Cameras and Geometry. CAP 5415 Fall 2010

Tracking in flussi video 3D. Ing. Samuele Salti

Projection Center Calibration for a Co-located Projector Camera System

Evaluation of Interest Point Detectors

How To Analyze Ball Blur On A Ball Image

Robust Pedestrian Detection and Tracking From A Moving Vehicle

Poker Vision: Playing Cards and Chips Identification based on Image Processing

VISION-BASED POSITION ESTIMATION IN MULTIPLE QUADROTOR SYSTEMS WITH APPLICATION TO FAULT DETECTION AND RECONFIGURATION

SSIM Technique for Comparison of Images

Handbook of Robotics. Chapter 22 - Range Sensors

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

Realtime 3D Computer Graphics Virtual Reality

An Efficient Solution to the Five-Point Relative Pose Problem

Pro/ENGINEER Wildfire 4.0 Basic Design

3D Object Recognition in Clutter with the Point Cloud Library

Projective Geometry. Projective Geometry

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson

A Learning Based Method for Super-Resolution of Low Resolution Images

MIFT: A Mirror Reflection Invariant Feature Descriptor

Video-Rate Stereo Vision on a Reconfigurable Hardware. Ahmad Darabiha Department of Electrical and Computer Engineering University of Toronto

Feature Point Selection using Structural Graph Matching for MLS based Image Registration

Equations Involving Lines and Planes Standard equations for lines in space

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Transcription:

Midterm Examination CS 766: Computer Vision November 9, 2006 LAST NAME: SOLUTION FIRST NAME: Problem Score Max Score 1 10 2 14 3 12 4 13 5 16 6 15 Total 80 1of7

1. [10] Camera Projection (a) [2] True or False: The orthographic projection of two parallel lines in the world must be parallel in the image. True (b) [3] Under what conditions will a line viewed with a pinhole camera have its vanishing point at infinity? The line is in a plane parallel to the image plane. (c) [5] A scene point at coordinates (400,600,1200) is perspectively projected into an image at coordinates (24,36), where both coordinates are given in millimeters in the camera coordinate frame and the camera s principal point is at coordinates (0,0,f) (i.e., u 0 = 0and v 0 = 0). Assuming the aspect ratio of the pixels in the camera is 1, what is the focal length of the camera? (Note: the aspect ratio is defined as the ratio between the width and the height of a pixel; i.e., k u /k v.) u = fx/z, so f = uz/x = 24 * 1200 / 400 = 72 mm. 2of7

2. [14] Camera Calibration (a) [5] Show how the projection of a point in a planar scene at world coordinates (X, Y ) to pixel coordinates (u, v) inanimage plane can be represented using a planar affine camera model. su sv = s p 11 p 21 0 p 12 p 22 0 p 13 p 23 p 33 X Y 1 or, equivalently since the overall scale of the above matrix does not matter, u v = p 11 p 12 p 13 X Y p 21 p 22 p 23 1 (b) [3] Under what conditions is the use of an affine transformation appropriate when viewing a planar scene? If the field of view of the scene plane is such that all points visible on the world plane are at approximately the same depth from the camera compared to the distance of the camera from the plane. (c) [3] How many degrees of freedom are there to solve for in (a), and what is the minimum number of calibration points needed to estimate the calibration parameters? There are 6 degrees of freedom since the overall scale does not matter. Therefore, we need at least 3 points. (d) [3] What effects can a planar affine transformation have on parallel lines? Planar affine transformations preserve parallelism. 3of7

3. [12] Edge Detection Compare the Canny edge detector and the Laplacian-of-Gaussian (LoG) edge detector for each of the following questions. (a) [3] Which of these operators is/are isotropic and which is/are non-isotropic? LoG is isotropic and Canny is non-isotropic. (b) [3] Describe each operator in terms of the order of the derivatives that it computes. LoG is 2nd derivative and Canny is 1st derivative. (c) [3] What parameters must be defined by the user for each operator? LoG requires σ defining the scale of the Gaussian blurring. Canny requires σ and two thresholds for hysteresis. (e) [3] Which detector is more likely to produce long, thin contours? Briefly explain. Canny because of non-maximum suppression which thins and hyteresis thresholding which can fill in weak edge gaps. 4of7

4. [13] Feature Detection and Description (a) [5] We want a method for corner detection for use with 3D images, i.e., there is an intensity value for each (x,y,z) voxel. Describe a generalization of either the Harris corner detector or the Tomasi-Kanade corner detector by giving the main steps of an algorithm, including a test to decide when a voxel is a corner point. Similarly to the 2D case, we can estimate the average change in intensity for a shift (u,v,w) around a given voxel using a bilinear approximation given by u E(u, v, w) = [u v w] M v w where M is a 3 x 3 matrix computed from partial derivatives in the 3 directions. Compute the 3 eigenvalues, λ 1, λ 2 and λ 3, of M, specifying an ellipsoid at the voxel that measures the variation in intensity in 3 orthogonal directions. Using the test in the Tomasi- Kanade detector, mark the voxel as a corner point if the smallest eigenvalue, λ 3, is greater than a threshold. Using the test in the Harris operator, mark the voxel as a corner point if λ 1 λ 2 λ 3 k(λ 1 + λ 2 + λ 2 ) is greater than a threshold. (b) [8] The SIFT descriptor is a popular method for describing selected feature points based on local neighborhood properties so that they can be matched reliably across images. Assuming feature points have been previously detected using the SIFT feature detector, (i) briefly describe the main steps of creating the SIFT feature descriptor at a given feature point, and (ii) name three (3) scene or image changes that the SIFT descriptor is invariant to (i.e., relatively insensitive to). (i) At each point where a SIFT "keypoint" is detected, the descriptor is constructed by computing a set of 16 orientation histograms based on 4 x 4 windows within a 16 x 16 pixel neighborhood centered around the keypoint. At each pixel in the neighborhood, the gradient direction (quantized to 8 directions) is computed using a Gaussian with σ equal to 0.5 times the scale of the keypoint. The orientation histograms are computed relative to the orientation at the keypoint, with values weighted by the gradient magnitude of each pixel in the window. This results in a vector of 128 (= 16 x 8) feature values in the SIFT descriptor. (The values in the vector are also normalized to enhance invariance to illumination changes.) (ii) Because the SIFT descriptor is based on edge orientation histograms, which are robust to contrast and brightness changes and are detected at different scales, the descriptor is translation, rotation, scale, and illumination (both intensity change by adding a constant and intensity change by contrast stretching) invariant. It is not invariant to significant viewpoint changes. 5of7

5. [16] Hough Transform and RANSAC After running your favorite stereo algorithm assume you have produced a dense depth map such that for each pixel in the input image you have its associated scene point s (X, Y, Z) coordinates in the camera coordinate frame. Assume the image is of a scene that contains a single dominant plane (e.g., the front wall of a building) at unknown orientation, plus smaller numbers of other scene points (e.g., from trees, poles and a street) that are not part of this plane. As you know, the plane equation is given by ax + by + cz +d=0. (a) [8] Define a Hough transform based algorithm for detecting the orientation of the plane in the scene. That is, define the dimensions of your Hough space, a procedure for mapping the scene points (i.e., the (X, Y, Z) coordinates for each pixel) into this space, and how the plane s orientation is determined. Assuming the plane is not allowed to pass through the camera coordinate frame origin, we can divide by d, resulting in three parameters, A = a/d, B = b/d, and C = c/d that define a plane. Therefore the Hough parameter space is three dimensional corresponding to possible values of A, B, and C. Assuming we can bound the range of possible values of these three parameters, we then take each pixel s (X, Y, Z) coordinates and increment all points H( p, q, r) in Hough space that satisfy px + qy + rz + 1 = 0. The point (or small region) in H that has the maximum number of votes determines the desired scene plane. (b) [8] Describe how the RANSAC algorithm could be used to detect the orientation of the plane in the scene from the scene points. Step 1: Randomly pick 3 pixels in the image and, using their (X, Y, Z) coordinates, compute the plane that is defined by these points. Step 2: For each of the remaining pixels in the image, compute the distance from its (X, Y, Z) position to the computed plane and, if it is within a threshold distance, increment a counter of the number of points (the "inliers") that agree with the hypothesized plane. Step 3: Repeat Steps 1 and 2 many times, and then select the triple of points that has the largest count associated with it. Step 4: Using the triple of points selected in Step 3 plus all of the other inlier points which contributed to the count, recompute the best planar fit to all of these points. 6of7

6. [15] Epipolar Geometry and Stereo (a) [5] Consider the following top view ofastereo rig: Left Image Left Optical Center f Right Image f 45 Right Optical Center Draw the front views of the two 2D images and show (and clearly label) the approximate positions of the epipoles and epipolar lines for this configuration of cameras. (b) [3] Given aconjugate pair of points, at pixel coordinates p =(x l, y l )inthe left image and at pixel coordinates q =(x r, y r )inthe right image of a stereo pair, give the equation that describes the relationship between these points when neither the intrinsic nor extrinsic parameters of the cameras are known. Also specify how many conjugate pairs of points are needed to solve for all of the unknowns in your equation. q T Fp = 0 where F is the 3 3 Fundamental matrix containing 8 degrees of freedom. Each correspondence generates one linear constraint on the elements of F; hence at least 8 conjugate pairs (and no noise) are needed to compute it using a linear algorithm such as the 8-point algorithm. (Note that by adding other constraints on the form of F, nonlinear techniques can be use to estimate F from 7 point correspondences.) (c) [3] What is the disparity gradient constraint that is often used in solving the stereo correspondence problem. Nearby image points should have correspondences that have similar disparities. (d) [4] Yes or No: Can scene depth be recovered from a stereo pair of images taken under each of the following circumstances: (i) [2] Two images produced by weak perspective projection. No (ii) [2] Two images taken by a single perspective projection camera that has been rotated about its optical center. No 7of7

8of7