Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Similar documents
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Local features and matching. Image classification & object localization

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

Fast Matching of Binary Features

Convolutional Feature Maps

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Big Data Analytics CSCI 4030

Semantic Recognition: Object Detection and Scene Segmentation

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION

Object Recognition. Selim Aksoy. Bilkent University

Machine Learning using MapReduce

Group Sparse Coding. Fernando Pereira Google Mountain View, CA Dennis Strelow Google Mountain View, CA

What is Data mining?

Data Mining for Knowledge Management. Classification

The Delicate Art of Flower Classification

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Probabilistic Latent Semantic Analysis (plsa)

Social Media Mining. Data Mining Essentials

Classification and Prediction

Web Document Clustering

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Jiří Matas. Hough Transform

Unsupervised Data Mining (Clustering)

Text Analytics Illustrated with a Simple Data Set

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

FastKeypointRecognitioninTenLinesofCode

1 o Semestre 2007/2008

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Driver Cell Phone Usage Detection From HOV/HOT NIR Images

Discovering objects and their location in images

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Search Result Optimization using Annotators

Similarity Search in a Very Large Scale Using Hadoop and HBase

Randomized Trees for Real-Time Keypoint Recognition

Chapter 13: Query Processing. Basic Steps in Query Processing

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Bags of Binary Words for Fast Place Recognition in Image Sequences

Data Mining With Big Data Image De-Duplication In Social Networking Websites

Professor Anita Wasilewska. Classification Lecture Notes

Mining Mid-level Features for Image Classification

Fast Contextual Preference Scoring of Database Tuples

The Artificial Prediction Market

Semantic Image Segmentation and Web-Supervised Visual Learning

CSC 411: Lecture 07: Multiclass Classification

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

3D Model based Object Class Detection in An Arbitrary View

Data, Measurements, Features

Blocks that Shout: Distinctive Parts for Scene Classification

How To Make A Credit Risk Model For A Bank Account

The Data Mining Process

Big Data and Scripting map/reduce in Hadoop

Camera geometry and image alignment

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Similarity Search in a Very Large Scale Using Hadoop and HBase

Introduction to Parallel Programming and MapReduce

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Categorical Data Visualization and Clustering Using Subjective Factors

Decision Trees from large Databases: SLIQ

2015 Workshops for Professors

Segmentation & Clustering

YouTube Scale, Large Vocabulary Video Annotation

An Introduction to Data Mining

Document Image Retrieval using Signatures as Queries

Building Data Cubes and Mining Them. Jelena Jovanovic

Classifying Manipulation Primitives from Visual Data

Multi-Dimensional Data Visualization. Slides courtesy of Chris North

Learning Motion Categories using both Semantic and Structural Information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Machine Learning for Data Science (CS4786) Lecture 1

Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES. Eva Hörster

Data Mining Techniques in CRM

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING

Final Project Report

Signature verification using Kolmogorov-Smirnov. statistic

MS1b Statistical Data Mining

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Search and Information Retrieval

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

2014/02/13 Sphinx Lunch

Analyzing Transaction Data

Chapter 20: Data Analysis

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

LCs for Binary Classification

Image Classification for Dogs and Cats

K-Means Clustering Tutorial

Transcription:

Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object class detection Hough-voting techniques Support Vector Machines (SVM) detector on HOG features Deformable part-based model (DPM) R-CNN (detector with Neural Networks) Object (class) segmentation Unsupervised segmentation ( bottom-up techniques) Supervised segmentation ( top-down techniques) Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition: Indexing for Fast Retrieval Sanja Fidler CSC420: Intro to Image Understanding 2 / 28

Recognizing or Retrieving Specific Objects Example: Visual search in feature films [Source: J. Sivic, slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 3 / 28

Recognizing or Retrieving Specific Objects Example: Search photos on the web for particular places [Source: J. Sivic, slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 4 / 28

Sanja Fidler CSC420: Intro to Image Understanding 5 / 28

Why is it Difficult? Objects can have possibly large changes in scale, viewpoint, lighting and partial occlusion. [Source: J. Sivic, slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 6 / 28

Why is it Difficult? There is tones of data. Sanja Fidler CSC420: Intro to Image Understanding 7 / 28

Our Case: Matching with Local Features For each image in our database we extracted local descriptors (e.g., SIFT) Sanja Fidler CSC420: Intro to Image Understanding 8 / 28

Our Case: Matching with Local Features For each image in our database we extracted local descriptors (e.g., SIFT) Sanja Fidler CSC420: Intro to Image Understanding 8 / 28

Our Case: Matching with Local Features Let s focus on descriptors only (vectors of e.g. 128 dim for SIFT) Sanja Fidler CSC420: Intro to Image Understanding 8 / 28

Our Case: Matching with Local Features Sanja Fidler CSC420: Intro to Image Understanding 8 / 28

Our Case: Matching with Local Features Sanja Fidler CSC420: Intro to Image Understanding 8 / 28

Our Case: Matching with Local Features Sanja Fidler CSC420: Intro to Image Understanding 8 / 28

Indexing! Sanja Fidler CSC420: Intro to Image Understanding 9 / 28

Indexing Local Features: Inverted File Index For text documents, an efficient way to find all pages on which a word occurs is to use an index. We want to find all images in which a feature occurs. To use this idea, well need to map our features to visual words. Why? [Source: K. Grauman, slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 10 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 11 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 11 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 11 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 11 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 11 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 11 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 11 / 28

Visual Words All example patches on the right belong to the same visual word. [Source: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 12 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 13 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 13 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 13 / 28

What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 13 / 28

Inverted File Index Now we found all images in the database that have at least one visual word in common with the query image But this can still give us lots of images... What can we do? Sanja Fidler CSC420: Intro to Image Understanding 14 / 28

Inverted File Index Now we found all images in the database that have at least one visual word in common with the query image But this can still give us lots of images... What can we do? Idea: Compute similarity between query image and retrieved images. How can we do this fast? Sanja Fidler CSC420: Intro to Image Understanding 14 / 28

Relation to Documents [Slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 15 / 28

Bags of Visual Words [Slide credit: R. Urtasun] Summarize entire image based on its distribution (histogram) of word occurrences. Analogous to bag of words representation commonly used for documents. Sanja Fidler CSC420: Intro to Image Understanding 16 / 28

Compute a Bag-of-Words Description Sanja Fidler CSC420: Intro to Image Understanding 17 / 28

Compute a Bag-of-Words Description Sanja Fidler CSC420: Intro to Image Understanding 17 / 28

Compute a Bag-of-Words Description Instead of a histogram, for retrieval it s better to re-weight the image description vector T = [t 1, t 2,..., t i,... ] with term frequency-inverse document frequency (tf-idf), a standard trick in document retrieval: t i = n id log N n d n i where: n id... is the number of occurrences of word i in image d n d... is the total number of words in image d n i... is the number of occurrences of word i in the whole database N... is the number of documents in the whole database Sanja Fidler CSC420: Intro to Image Understanding 17 / 28

Compute a Bag-of-Words Description Instead of a histogram, for retrieval it s better to re-weight the image description vector T = [t 1, t 2,..., t i,... ] with term frequency-inverse document frequency (tf-idf), a standard trick in document retrieval: t i = n id log N n d n i where: n id... is the number of occurrences of word i in image d n d... is the total number of words in image d n i... is the number of occurrences of word i in the whole database N... is the number of documents in the whole database The weighting is a product of two terms: the word frequency n id n d, and the inverse document frequency log N n i Sanja Fidler CSC420: Intro to Image Understanding 17 / 28

Compute a Bag-of-Words Description Instead of a histogram, for retrieval it s better to re-weight the image description vector t = [t 1, t 2,..., t i,... ] with term frequency-inverse document frequency (tf-idf), a standard trick in document retrieval: t i = n id log N n d n i where: n id... is the number of occurrences of word i in image d n d... is the total number of words in image d n i... is the number of occurrences of word i in the whole database N... is the number of documents in the whole database The weighting is a product of two terms: the word frequency n id n d, and the inverse document frequency log N n i Intuition behind this: word frequency weights words occurring often in a particular document, and thus describe it well, while the inverse document frequency downweights the words that occur often in the full dataset Sanja Fidler CSC420: Intro to Image Understanding 17 / 28

Compute a Bag-of-Words Description Sanja Fidler CSC420: Intro to Image Understanding 17 / 28

Compute a Bag-of-Words Description Sanja Fidler CSC420: Intro to Image Understanding 17 / 28

Comparing Images Compute the similarity by normalized dot product between their (tf-idf weighted) occurrence counts sim(t j, q) = < t j, q > t j q Rank images in database based on the similarity score (the higher the better) Are we done? Sanja Fidler CSC420: Intro to Image Understanding 18 / 28

Comparing Images Compute the similarity by normalized dot product between their (tf-idf weighted) occurrence counts sim(t j, q) = < t j, q > t j q Rank images in database based on the similarity score (the higher the better) Are we done? No. Our similarity doesn t take into account geometric relations between features. Sanja Fidler CSC420: Intro to Image Understanding 18 / 28

Comparing Images Compute the similarity by normalized dot product between their (tf-idf weighted) occurrence counts sim(t j, q) = < t j, q > t j q Rank images in database based on the similarity score (the higher the better) Are we done? No. Our similarity doesn t take into account geometric relations between features. Take top K best ranked images and do spatial verification (compute transformation and count inliers) Sanja Fidler CSC420: Intro to Image Understanding 18 / 28

Comparing Images Compute the similarity by normalized dot product between their (tf-idf weighted) occurrence counts sim(t j, q) = < t j, q > t j q Rank images in database based on the similarity score (the higher the better) Are we done? No. Our similarity doesn t take into account geometric relations between features. Take top K best ranked images and do spatial verification (compute transformation and count inliers) Sanja Fidler CSC420: Intro to Image Understanding 18 / 28

Spatial Verification Both image pairs have many visual words in common Only some of the matches are mutually consistent [Source: O. Chum] Sanja Fidler CSC420: Intro to Image Understanding 19 / 28

Summary Stuff You Need To Know Fast image retrieval: Compute features in all images from database, and query image. Cluster the descriptors from the images in the database (e.g., k-means) to get k clusters. These clusters are vectors that live in the same dimensional space as the descriptors. We call them visual words. Assign each descriptor in database and query image to the closest cluster. Build an inverted file index For a query image, lookup all the visual words in the inverted file index to get a list of images that share at least one visual word with the query Compute a bag-of-words (BoW) vector for each retrieved image and query. This vector just counts the number of occurrences of each word. It has as many dimensions as there are visual words. Weight the vector with tf-idf. Compute similarity between query BoW vector and all retrieved image BoW vectors. Sort (highest to lowest). Take top K most similar images (e.g, 100) Do spatial verification on all top K retrieved images (RANSAC + affine or homography + remove images with too few inliers) Sanja Fidler CSC420: Intro to Image Understanding 20 / 28

Summary Stuff You Need To Know Matlab function: [IDX, W] = kmeans(x, k); where rows of X are descriptors, rows of W are visual words vectors, and IDX are assignments of rows of X to visual words Once you have W, you can quickly compute IDX via the dist2 function (Assignment 2): D = dist2(x, W ); [,IDX] = min(d, [], 2); A much faster way of computing the closest cluster (IDX) is via the FLANN library: http://www.cs.ubc.ca/research/flann/ Since X is typically super large, kmeans will run for days... A solution is to randomly sample a few descriptors from X and cluster those. Another great possibility is to use this: http://www.robots.ox.ac.uk/~vgg/software/fastanncluster/ Sanja Fidler CSC420: Intro to Image Understanding 21 / 28

Even Faster? Can we make the retrieval process even more efficient? Sanja Fidler CSC420: Intro to Image Understanding 22 / 28

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. Sanja Fidler CSC420: Intro to Image Understanding 23 / 28

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. First, an initial k- means process is run on the training data, defining k cluster centers. Sanja Fidler CSC420: Intro to Image Understanding 23 / 28

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. First, an initial k- means process is run on the training data, defining k cluster centers. The same process is then recursively applied to each group. Sanja Fidler CSC420: Intro to Image Understanding 23 / 28

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. First, an initial k- means process is run on the training data, defining k cluster centers. The same process is then recursively applied to each group. The tree is determined level by level, up to some maximum number of levels L. Sanja Fidler CSC420: Intro to Image Understanding 23 / 28

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. First, an initial k- means process is run on the training data, defining k cluster centers. The same process is then recursively applied to each group. The tree is determined level by level, up to some maximum number of levels L. Each division into k parts is only defined by the distribution of the descriptor vectors that belong to the parent quantization cell. Sanja Fidler CSC420: Intro to Image Understanding 23 / 28

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. First, an initial k- means process is run on the training data, defining k cluster centers. The same process is then recursively applied to each group. The tree is determined level by level, up to some maximum number of levels L. Each division into k parts is only defined by the distribution of the descriptor vectors that belong to the parent quantization cell. Sanja Fidler CSC420: Intro to Image Understanding 23 / 28

Constructing the tree Offline phase: hierarchical clustering. Vocabulary Tree Sanja Fidler CSC420: Intro to Image Understanding 24 / 28

Constructing the tree Offline phase: hierarchical clustering. Vocabulary Tree Sanja Fidler CSC420: Intro to Image Understanding 24 / 28

Constructing the tree Offline phase: hierarchical clustering. Vocabulary Tree Sanja Fidler CSC420: Intro to Image Understanding 24 / 28

Constructing the tree Offline phase: hierarchical clustering. Vocabulary Tree Sanja Fidler CSC420: Intro to Image Understanding 24 / 28

Parsing the tree Online phase: each descriptor vector is propagated down the tree by at each level comparing the descriptor vector to the k candidate cluster centers (represented by k children in the tree) and choosing the closest one. The tree directly defines the visual vocabulary and an efficient search procedure in an integrated manner. Every node in the vocabulary tree is associated with an inverted file. The inverted files of inner nodes are the concatenation of the inverted files of the leaf nodes (virtual). Sanja Fidler CSC420: Intro to Image Understanding 25 / 28

Vocabulary size Complexity: branching factor and number of levels Most important for the retrieval quality is to have a large vocabulary Sanja Fidler CSC420: Intro to Image Understanding 26 / 28

Visual words/bags of words Good flexible to geometry / deformations / viewpoint compact summary of image content provides vector representation for sets very good results in practice Bad background and foreground mixed when bag covers whole image optimal vocabulary formation remains unclear basic model ignores geometry must verify afterwards, or encode via features Sanja Fidler CSC420: Intro to Image Understanding 27 / 28

Next Time Recognition Techniques in 20 min Sanja Fidler CSC420: Intro to Image Understanding 28 / 28