Content Based Image Search. Mark Desnoyer

Similar documents
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

Building Rome in a Day

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Fast Matching of Binary Features

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION

GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas

Distributed Kd-Trees for Retrieval from Very Large Image Collections

Image Retrieval for Image-Based Localization Revisited

Local features and matching. Image classification & object localization

The Delicate Art of Flower Classification

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Virtual Zoom: Augmented Reality on a Mobile Device

System Behavior Analysis by Machine Learning

State of the Art and Challenges in Crowd Sourced Modeling

Canonical Image Selection for Large-scale Flickr Photos using Hadoop

Avoiding confusing features in place recognition

Probabilistic Latent Semantic Analysis (plsa)

3D Model based Object Class Detection in An Arbitrary View

PHOTO BASED INTERACTION AND QUERY FOR END-USER APPLICATIONS - THE GEOPILOT PROJECT

Key Pain Points Addressed

Keyframe-Based Real-Time Camera Tracking

Digital Asset Management and Controlled Vocabulary

Classifying Manipulation Primitives from Visual Data

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Speaker: Prof. Mubarak Shah, University of Central Florida. Title: Representing Human Actions as Motion Patterns

The Stanford Mobile Visual Search Data Set

Similarity Search in a Very Large Scale Using Hadoop and HBase

Machine Learning for Data Science (CS4786) Lecture 1

Teaching Big Data by Three Levels of Projects. Abstract

Towards Linear-time Incremental Structure from Motion

Consumer video dataset with marked head trajectories

Data Mining With Big Data Image De-Duplication In Social Networking Websites

Building Rome on a Cloudless Day

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

Tracking in flussi video 3D. Ing. Samuele Salti

Human behavior analysis from videos using optical flow

First-Person Activity Recognition: What Are They Doing to Me?

Get Out of my Picture! Internet-based Inpainting

Big Data: Image & Video Analytics

Clustering Technique in Data Mining for Text Documents

CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER

Cloud and Mobile Computing

A HYBRID APPROACH FOR RETRIEVING DIVERSE SOCIAL IMAGES OF LANDMARKS

Image Classification for Dogs and Cats

Image Search by MapReduce

Group Sparse Coding. Fernando Pereira Google Mountain View, CA Dennis Strelow Google Mountain View, CA

What Makes a Great Picture?

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES

Introduction. Selim Aksoy. Bilkent University

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

How To Understand The Value Of Big Data

Object Categorization using Co-Occurrence, Location and Appearance

View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems

YouTube Scale, Large Vocabulary Video Annotation

The use of computer vision technologies to augment human monitoring of secure computing facilities

Discovering objects and their location in images

Semantic Image Segmentation and Web-Supervised Visual Learning

MusicGuide: Album Reviews on the Go Serdar Sali

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Who are you? Learning person specific classifiers from video

Digital Asset Management. Content Control for Valuable Media Assets

Bags of Binary Words for Fast Place Recognition in Image Sequences

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

Introduction to Data Mining

Cloud-Based Image Coding for Mobile Devices Toward Thousands to One Compression

High Level Describable Attributes for Predicting Aesthetics and Interestingness

Image Segmentation and Registration

Outdoors Augmented Reality on Mobile Phone using Loxel-Based Visual Feature Organization

arxiv:submit/ [cs.cv] 13 Apr 2016

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

Content-Based Image Retrieval

ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES

The Scientific Data Mining Process

Object Recognition. Selim Aksoy. Bilkent University

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

Movie Classification Using k-means and Hierarchical Clustering

SWIGS: A Swift Guided Sampling Method

Finding Groups of Duplicate Images in Very Large Datasets

Camera geometry and image alignment

Newton s Laws of Motion Project

Footwear Print Retrieval System for Real Crime Scene Marks

Open Source Tools for 3D Forensic Reconstructions Part 3 Eugene Liscio, P. Eng. November 2011

Build Panoramas on Android Phones

COMP : 096: Computational Photography

Flow Separation for Fast and Robust Stereo Odometry

PCL Tutorial: The Point Cloud Library By Example. Jeff Delmerico. Vision and Perceptual Machines Lab 106 Davis Hall UB North Campus.

CS Data Science and Visualization Spring 2016

CODING MODE DECISION ALGORITHM FOR BINARY DESCRIPTOR CODING

A Bayesian Framework for Unsupervised One-Shot Learning of Object Categories

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

MACHINE LEARNING IN HIGH ENERGY PHYSICS

A Review of Data Mining Techniques

Visual and mobile Smart Data

A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK

Transcription:

Content Based Image Search Mark Desnoyer

Text Search - Bag of Words Gnome walks him walked house internet Gnome moon grass caught him shuttle walks PC Nord

Text Search - TF-IDF

Text Search - Query

Current Image Search On the web uses context around image Words around it Words in the alt tag Those words are treated as a document Same as normal text search But we want pictures, not text! Query: Horse

Searching With Pictures How about searching with pictures instead

Using Visual Words For Search Use visual words paradigm we've seen before Can use all the text search machinery we already have But, we're searching with pictures now

The Players O. Chum et al., Total recall: Automatic query expansion with a generative feature model for object retrieval, in Proc. ICCV, vol. 2, 2007. N. Snavely, S. M. Seitz, and R. Szeliski, Photo tourism: exploring photo collections in 3D, in International Conference on Computer Graphics and Interactive Techniques (ACM New York, NY, USA, 2006), 835-846. D. Nister and H. Stewenius, Scalable recognition with a vocabulary tree, in Proc. CVPR, vol. 5, 2006.

SIFT Features Succinct descriptors Scale invariant Robust to changes in lighting, viewpoint, blur etc. Therefore, normally used in this search context

Bag of Words First Step - Build a Dictionary Must be big to be expressive enough to differentiate objects So, cluster SIFT features Each cluster is a word in the dictionary But K-means clustering 10M+ descriptors is O(NK) Hierarchical K-means (Nister) Approximate K-means (Chum)

Vocabulary Tree (Nister) Vocabulary Tree Copyright David Nistér, Henrik Stewénius

Vocabulary Tree (Nister) Vocabulary Tree Copyright David Nistér, Henrik Stewénius

Vocabulary Tree (Nister) Vocabulary Tree Copyright David Nistér, Henrik Stewénius

Vocabulary Tree (Nister) Copyright David Nistér, Henrik Stewénius

Vocabulary Tree (Nister) Copyright David Nistér, Henrik Stewénius

Approximate Nearest Neighbour (Chum) Most of the time in K-Means is spent doing Nearest Neighbour Nearest Neighbour can be approximated using kd-trees O(N logk) vs. O(NK)

Another Problem - Synonyms Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike).

Video Google Search for recurring objects in a movie Synonyms suppressed by enforcing consistency in time Stop list used to throw out words that are too common

Photo Tourism Overview Scene reconstruction Photo Explorer Input photographs Relative camera positions and orientations Point cloud Sparse correspondence Copyright Noah Snavely

Photo Tourism Scene Recontruction Automatically estimate position, orientation, and focal length of cameras 3D positions of feature points Feature detection Pairwise feature matching Correspondence estimation Copyright Noah Snavely Incremental structure from motion

Photo Tourism Demo

Photo Tourism Limitations Matching is only performed between pairs of images Does not scale to large datasets

Chum et al. Experiment 5K labeled images of sights at Oxford 1M Flickr images from popular tags (distractors) Dictionary built from Oxford Images 16M descriptions -> 1M word dictionary Query for landmarks and calculate PR curve using different forms of query expansion

Copyright James Philbin

Copyright James Philbin

Copyright James Philbin

Copyright James Philbin

Copyright James Philbin

Copyright James Philbin

Copyright James Philbin

Copyright James Philbin

Text Query Expansion In text search, some words are similar, but they are different in the dictionary e.g. gray and grey Improve results by expanding the query to include similar words e.g. "grey goose" -> "grey goose gray" Similar words are found by clustering on document data At query time, relevant clusters are found and pulled in False positives add a lot of noise to the results

Image Query Expansion - Baseline

Transitive Closure

Average Query Expansion

Recursive Average Query Expansion

Multiple Image Resolution Expansion

Copyright James Philbin

Demo http://arthur.robots.ox.ac.uk:8080/search/? id=oxc1_hertford_000011

Results - PR Curves Before & After Expansion

Results - Effect Of Distractors My distractors: 9K images from searches like "building", "cathedral", "library", "historic", "spire" etc.