Video Google: A Text Retrieval Approach to Object Matching in Videos, J. Sivic and A. Zisserman (ICCV 2003)

Similar documents
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Automated Location Matching in Movies

Feature Tracking and Optical Flow

3D Model based Object Class Detection in An Arbitrary View

Vision based Vehicle Tracking using a high angle camera

Online Learning of Patch Perspective Rectification for Efficient Object Detection

Image Segmentation and Registration

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

Randomized Trees for Real-Time Keypoint Recognition

Convolution. 1D Formula: 2D Formula: Example on the web:

Probabilistic Latent Semantic Analysis (plsa)

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Make and Model Recognition of Cars

Distinctive Image Features from Scale-Invariant Keypoints

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

3D OBJECT MODELING AND RECOGNITION IN PHOTOGRAPHS AND VIDEO

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

Environmental Remote Sensing GEOG 2021

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation

Mean-Shift Tracking with Random Sampling

CLUSTER ANALYSIS FOR SEGMENTATION

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH

Camera geometry and image alignment

Detecting and Tracking Moving Objects for Video Surveillance

Removing Moving Objects from Point Cloud Scenes

PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY

Local features and matching. Image classification & object localization

COLOR-BASED PRINTED CIRCUIT BOARD SOLDER SEGMENTATION

Segmentation & Clustering

Component Ordering in Independent Component Analysis Based on Data Power

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Classifying Manipulation Primitives from Visual Data

Object class recognition using unsupervised scale-invariant learning

Taking Inverse Graphics Seriously

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Discovering objects and their location in images

Introduction Epipolar Geometry Calibration Methods Further Readings. Stereo Camera Calibration

Dimensionality Reduction: Principal Components Analysis

A Movement Tracking Management Model with Kalman Filtering Global Optimization Techniques and Mahalanobis Distance

Principal components analysis

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

Manifold Learning Examples PCA, LLE and ISOMAP

Tracking in flussi video 3D. Ing. Samuele Salti

DATA ANALYSIS II. Matrix Algorithms

Part-Based Recognition

Build Panoramas on Android Phones

Wii Remote Calibration Using the Sensor Bar

Data Mining: Algorithms and Applications Matrix Math Review

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

2.2 Creaseness operator

Texture. Chapter Introduction

Statistical Machine Learning

State of Stress at Point

Real-Time Tracking of Pedestrians and Vehicles

Object tracking & Motion detection in video sequences

Automatic georeferencing of imagery from high-resolution, low-altitude, low-cost aerial platforms

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Rafael Martín & José M. Martínez

An Iterative Image Registration Technique with an Application to Stereo Vision

How To Cluster

ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER

CHAPTER 6 TEXTURE ANIMATION

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

AN EFFICIENT HYBRID REAL TIME FACE RECOGNITION ALGORITHM IN JAVA ENVIRONMENT ABSTRACT

Motion Capture Sistemi a marker passivi

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

OBJECT TRACKING USING LOG-POLAR TRANSFORMATION

Automatic 3D Mapping for Infrared Image Analysis

Tutorial on Markov Chain Monte Carlo

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Learning Motion Categories using both Semantic and Structural Information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Human behavior analysis from videos using optical flow

So which is the best?

Object Segmentation by Long Term Analysis of Point Trajectories

Free Inductive/Logical Test Questions

WP1: Video Data Analysis

Robust Pedestrian Detection and Tracking From A Moving Vehicle

Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with Panchromatic Textural Features

THE development of methods for automatic detection

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

2-View Geometry. Mark Fiala Ryerson University

FastKeypointRecognitioninTenLinesofCode

Bi-directional Tracking using Trajectory Segment Analysis

Similarity and Diagonalization. Similar Matrices

Machine vision systems - 2

The PageRank Citation Ranking: Bring Order to the Web

Optical Tracking Using Projective Invariant Marker Pattern Properties

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Feature Point Selection using Structural Graph Matching for MLS based Image Registration

Tracking of Small Unmanned Aerial Vehicles

Footwear Print Retrieval System for Real Crime Scene Marks

Android Ros Application

Transcription:

Video Google: A Text Retrieval Approach to Object Matching in Videos, J. Sivic and A. Zisserman (ICCV 2003) for Video Shots, J. Sivic, F. Schaffalitzky, and A. Zisserman (ECCV 2004) Robin Hewitt Video Google Goal: fast, accurate googling on video files (movies). Uses document-retrieval methods. A video frame is analogous to a document. A visual interest point is analogous to a word. Each movie is treated as a separate database. Video data is analyzed and indexed once. Arbitrary visual-content queries execute very quickly.

Video Google indexing Document Indexing Document Å Words Video Indexing Frame Å Salient-point feature vectors (~1000/frame) Word Å Word Root (walking Å walk) Individual feature vectors Å Cluster-centroid vectors Update inverted file Create inverted file Optional: stop list identifies very common words ( a, the, and ). Optional: stop list identifies very common and very rare feature vectors. Inverted File Format text Word Lookup Table great Ådocument list grebe Å document list greed Å document list greek Å document list green Å document list document list each entry contains - document ID - word-position list (Yes, this is a lot of data.)

Inverted File Format video Feature Lookup Table Cluster 401 Å frame list Cluster 402 Å frame list Cluster 403 Å frame list frame list each entry contains - frame number - feature-location list Feature Types 1. Corners: affine invariant interest points. 2. Blobs: Maximally Stable Extremal Regions (MSER).

Affine Invariant Interest Point Detector 1. Detect corners with Harris/Förstner method: I x I xi 2 I xi y = 2 y I y 2 µ Resp = Det( µ ) αtr ( µ ) 2. Select scale maximum of (scale-normalized) Laplacian: ( I ( x, y, σ ) I ( x, y σ )) 2 L = σ, xx + (In practice, use difference of Gaussians at and k.) yy Affine Invariant Interest Point Detector Isotropic corners have no privileged axis. Circle is a good bounding shape. Harris corner detector handles these well. Isotropic Ellipse is a better bounding shape for anisotropic corners. Scale has two dimensions. Orientation axis is the principle eigenvector. Anisotropic

Affine Invariant Interest Point Detector 3. Iterative procedure estimates anisotropy, orientation, and location. - Initialize bounding ellipse as a circle. U is the local window, is the second-moments matrix. - Iterate until convergence (no further change): ( 1) 2 ( 1) 2 ( 0) ( ) 1 k k U = µ...( µ ) 1 U - Each iteration rescales U by the eigenvalues of and orients it to the eigenvectors. is recomputed each iteration, using the new window, U. Affine Invariant Interest Point Detector 4. The interest region, U, is warped into a circle to create the affine-invariant pre-image. 5. The normalized interest point is represented by gradient histograms from 16 subwindows (SIFT).

Maximally Stable Extremal Regions - MSER Watershed method for blob detection: Apply a series of thresholds one for each grayscale level. Threshold the image at each level to create a series of black and white images. One extreme will be all white, the other all black. In between, blobs grow and merge. Maximally Stable Extremal Regions - MSER Blob that remains stable over a large threshold range are MSERs. Criterion: da/dt, where A = area, and t = threshold.

Maximally Stable Extremal Regions - MSER MSERs are located with an ellipse. These are also warped into circles and represented with SIFT descriptors. Features Corners cyan Blobs - yellow Sawtooth skyline Dashed line on street

Features Example video frame Features ~1600 features per frame Cyan - corners Yellow - blobs

From Features to Words Track each feature over several frames Skip features that don t persist for 3 or more frames Average features that do Remove the 10% with largest covariance Feature count is now ~1000/frame From Features to Words Cluster the averaged feature vectors (K-Means) Used 164 frames Hand selected to include 19 locations, 4-9 frames each Wide variation in viewpoint Clustered each feature type separately Cluster centroids become words in the inverted file

Representing Frames as Documents Create frame vectors k-dimensional vectors, V, where k = total number of words Each component of V is a weighted word-occurance count: nid N V i = log nd ni where n id = occurances of word in frame n d = number of words in frame N = number of frames in movie n i = occurances of word in movie Video Google search Specify query with a bounding box Features in query Å words (cluster centroids) Each word Å frame list Frame Å weighted word vector Angle between frame vector and query vector gives relevance Finally, filter and re-rank frames based on spatial consistency

Goal: detect and recognize objects in all frames of a video. Extends Video Google work. Uses motion and continuity between frames. Infers object s significance from its occurance frequency. Leverages artistic effects tracking, selective focus, etc. Uses same features as in Video Google. Algorithm Overview 1. Detect features in each frame. 2. Link features between consecutive frame pairs. 3. Short-range track repair interpolate tracks for missing features through 2-5 frames.

Algorithm Overview, cont. 4. Cluster feature tracks into oversegmented, but safe groupings. 5. Merge the track clusters with consistent 3D motion to extract objects. 6. Long-range track repair using wide-baseline stereo. 2. Linking features between frame pairs: Same features as in Video Google. Match between each frame pair, within 50 pixels. Validate by cross-correlation. Delete ambiguities anything that matches more than once between frames. Eliminate additional outliers by loosely enforcing epipolar geometry (RANSAC with a 3 pixel inlier threshold).

3. Short-range track repair: Estimate motion from momentum of previous n frames (n 5). In frame with missing feature, look for a match in the predicted area. If not found, keep looking for a match in few (2-5) more frames. If a match is found, extend the track. Effect of short-range track repair

Effect of short-range track repair Left: feature matching between frame pairs. Right: tracks after short-range track repair. 4. Cluster tracks into safe (i.e., conservative) groupings: Group tracks that match the same projective homography in all three frame pairs of three consecutive frames. Move ahead one frame and repeat for next (overlapping) triplet. This over-segments the tracks into small, short motion groups, each centered on one frame.

4. Cluster tracks into safe groupings, cont: Count co-occurances of each track pair over n (~10) consecutive frames. Co-occurance matrix, W, accumulates votes. W is similar to a correlation matrix. w ij accumulates a vote each time tracks i and j are in the same short-term motion group. Short-motion co-occurance matrix

4. Cluster tracks into safe groupings, cont: Threshold W, s.t. threshold > n/2 to ensure that no track is assigned to more than one cluster. Find connected components of the graph corresponding to thresholded W. This is equivalent re-ordering the rows of W s.t. the white regions form bands that continue to the diagonal. Matrix W after re-ordering rows

outliers Two clusters of correlated motion 5. Merge clusters to extract objects: For each pair of clusters Fit a full, 3D affine transformation over >20 frames using RANSAC on 4 tracks Validate by projecting the remaining tracks If 90% of tracks project consistently, merge clusters All agglomerated groupings are presumed to be objects

Trajectories of 5 feature ellipses, all in the same object Tracking the B&B owner Each 3D projection is over any set of 20 or more frames. But the track may continue further. This allows flexbility to accommodate slowly deforming objects.

6. Long-range track repair: If an object s grouped tracks all disappear, that may be due to occlusion. The same tracks should then reappear later on. Track sets are matched with wide-baseline stereo. Only features within grouped tracks are matched. Long-range track repair