Computer Vision - part II Review of main parts of Section B of the course School of Computer Science & Statistics Trinity College Dublin Dublin 2 Ireland www.scss.tcd.ie Lecture Name Course Name 1 1
2 nd half Vision Course - on one page 3D vision camera calibration stereo single moving camera photometric stereo radiometry Structure from motion VSLAM Feature Extraction Dense - HOG Sparse - SIFT, SURF Classification Training & evaluation (ROC) Feature Selection High dimensional data Applications Recognition, Photosynth, CBIR 2
3D vision - Camera Calibration Pinhole Camera model Extrinsics and Intrinsics Zhang s Method Know main mathematical structure of the method Know how it is applied practically 3
Stereo Vision Epipolar Geometry Canonical configuration Calculation of Depth Assumptions & Limitations Solving the Correspondence Problem Constraints to apply Bottom up (regions to features) Vs top down (features to regions) Disparity - PMF algorithm + Middlebury stereo vision page vision.middlebury.edu/stereo/ 4
VSLAM-Davison Real time - optimise for speed EKF based approach Shi-Thomasi feature extract Patch around keypoint Orientation and warp function assessment Building sparse 3D map EKF update Limit the search space - limit the number of keypoints 5
SFM - Pollefeys Offline process - optimise for accuracy Locate keypoints - Shi & Thomasi Solve F matrix - know the steps Use only key frames Solve for close views Find Calibration matrix Dense surface estimation Multi-view linking 3D surface reconstruction & texture 6
People tracking with HOG What is a HOG? Human Detection, Dalal and Triggs, CVPR 2005 Break Image in to cells Calculate HOG Normalise in overlapping blocks HOG data used in Classification Support Vector Machine or other classifiers Practical Issues (effect of smoothing, sampling scales, etc) 7
People Tracking - HOF Improvement when combined with HOG Motion used in activity recognition Differential Optical Flow Boundary Motion Histogram Internal Motion Histogram 8
SIFT Recognise 3D objects in 2D images Challenges that must be overcome Scale, view point, lighting, occlusion, noise SIFT - sparse features Detecting Features in scale space Different Gaussians, at different scales Find features in approximately in scale space Select and precise fit using principal curvatures HOG computed around feature point Rotational Invariance by relative to primary direction Illumination Invariance through normalisation 9
SURF Keypoint Approximation of Hessian Filters for Dxx,Dyy, Dxy and Dyx Use of the Integral Image Scale space through scaling filters Non maximal suppression in 3X3X3 region Descriptor Harr wavelet responses in a rotating window Accelerated matching due to contrast measure 10
Classification Bayesian Classifiers Model the PDF of the classes KNN Maximum likelihood Mahalanobis Distance Performance Evaluation Overfitting / Selection Bias ROC curve Analysis Cross Validation / Bootstrapping 11
Feature Selection The Problem High Dimensional Data, Data Set Imbalance PCA Find the features with the most variance LDA Find the features that are most separable Advanced methods to achieve classification SVM, Manifolds. All about getting good features 12
Applications: Recognition Face Recognition Individual recognition Face class recognition Eigen Faces Restrictions on data format 3D object recognition SIFT and SURF Strong features Geometric relationships - matching criterion 13
Applications: Photosynth Feature Extraction SIFT Camera Callibration PTLens F - Matrix Calculation RANSAC 3D point cloud hyperlinks between images Image selection based in view angle and scale 14
Applications: CBIR Search for images based on Content Feature Extraction Global vs Local Feature Vector Fast comparison histograms Earth Mover Distance Relevance Feedback 15
Exam Section A Dr Pitie - 3 questions Section B Dr Lacey - 3 questions Do 4 questions from 6, 2 questions from each section All questions have the same structure: Theory / knowledge part [7 marks] Practical problem solving part [18 marks] 16
Previous question Part A Analyse the differences between the SIFT and SURF feature detectors, comparing feature key point identi@ication methods and methods for achieving scale invariance and orientation invariance. Part B Your new employer, the Dublin Virtual Tourist Board, wants to create an interactive web site that allows users to explore major landmarks in Dublin online. Given recent budget cuts the only equipment they can give you is a good quality SLR camera and a reasonably powerful computer. You need to propose a design that will achieve their objective. Your design should include a detailed description of the steps required to process the images captured and an analysis of any limitations in the performance of the system. Please clearly state any assumptions that you make in the design of the system. 17
SIFT Multi-Scale Feature point detection: A Scale space pyramid is created by sub sampling the image to produce images of different sizes...explain Low contrast or poor edge responses are rejected if below a threshold. Long edges are removed by examining the Principal Curvatures at the point if it is strong in one direction and weak in the perpendicular direction these key points are rejected. [2 marks] SURF Feature point detection SURF uses an approximation to the hessian to find key points in the image. The hessian is the matrix of partial derivatives. SURF approximates these partial derivatives using binary masks. The binary masks are convolved with the Integral image to find the features. The integral image is constructed by summing all pixels above and to the left of the current pixel. [1 mark] SURF Multi-Scale Feature point detection Multiscale detection is achieved by scaling up the size of the binary masks. Key points are detected in a 3X3 neighbourhood, if they are also present in the scale above and below they are marked as potential key points [1 mark] SIFT Orientation invariance SIFT calculates the Histogram of orientation gradients in a window around the key point. The gradient strength and the distance from the key point weight the values in the histogram. The histogram is smoothed and threshold. If there are more than one dominant direction in the histogram a second key point is generated with that orientation. [1 mark] SURF Orientation invariance SURF calculates the response of vertical and Harr wavelets in a sliding window around the keypoint. The angle of the sliding window is a configurable parameter. [1 mark] SURF is faster than SIFT and the SURF feature orientation vector is less prone to being corrupted by noise because it calculated over the area of the harr wavelet rather than using a single pixel edge direction. [1 mark] 18
Students should highlight two main solutions: 1. Solution based on extracting the 3D surfaces from the images and allowing users to browse the database of photographs based on their location within the 3D model. 2. Solution based on extracting 3D surfaces from the images and also extracting textures from the images and building a fully textured 3D model that can be explored by the user. [2 marks] In both solutions students should cover the following key issues: Camera Calibration: performing a camera calibration of the camera by using a checker board pattern and the approach of Zhang as implemented in OpenCV (this would limit the camera to one focal length). An alternative, and preferred approach would be to exploit the information contained in the JPEG header of the image file and use PTLens to determine the camera intrinsic parameters. A third approach (less favoured) would be to perform selfcalibration again this would lead to a limitation of a single camera focal length. For both camera calibration approaches the single focal length limitation could be counteracted by taking several different sequences using different focal lengths / lenses and combining the separately calculated 3D models. [2 marks] Feature extraction: using a feature extraction system such as Shi-Thomasi, SIFT, SURF, etc. to identify key points between the images. SIFT and SURF would be preferable as the features are capable of being matched at multiple scales and are more unique.[2 marks] Stereo View set up Calculating the F matrix between the images using RANSAC and computing the number of feature points in the image that are inliers. Iterate until high confidence has been achieved. The validity of the stereo calculation needs to be assessed if the baseline between the two views is small (the angle between the views in less than 10deg) then the calculation of the F matrix will be ill conditioned. This can be verified in two ways. 1. Where the estimate of the camera positions is very close reject the match 2. After Polfleys if the stereo match estimated by using the Epipolar lines from the F matrix is better than using a simple 2D planar homography then this is a good stereo pair otherwise reject it. [4 marks] 19
Dense 3D point matching Having identified the good stereo pairs dense stereo matching should be performed along the Epi-poalr lines. Images may be rectified into the canonical configuration in order to speed up the matching process. Constraints such as the Disparity Limit, order constraint and other constraints - describe these [3 marks] Multi-View Linking The points generated from multiple 3D images pairs must be merged. Noise and camera calibration errors means that the same physical point may be recorded in different 3D positions in different views. Describe how this works..[3 marks] Depending on the approach taken by the student: 1. 3D model Approach One approach to displaying the images is to build a 3D surface from the 3D point cloud and texture it using the texture information from the 2D images. In order to achieve this the 3D depth math would have to be smoothed to remove the impact of noise. Then a 3D polygonal mesh would have to be built using Delaunay Triangulation or similar. The texture for the polygons..explain how to build a model. [2 marks] 2. 3D browsing of photo database If we take the Photosynth approach the 3D model is used to explore the database of original images. The user sees the image from the database that is closest to the view direction and scale of the current view of the 3D model. If the user change s their view position or zooms. Explain how the photosynth apporach works.[2 marks] 20
Use Diagrams where appropriate Use Bullet points where appropriate Use Flowcharts where appropriate Long rambling answers tend not to pick up marks - be concise and to the point Answer all questions in a seperate answerbook 21
Best of Luck! School of Computer Science & Statistics Trinity College Dublin Dublin 2 Ireland www.scss.tcd.ie Lecture Name Course Name 22 22