1. Bag of visual words model: recognizing object categories

Similar documents
Local features and matching. Image classification & object localization

Probabilistic Latent Semantic Analysis (plsa)

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Object class recognition using unsupervised scale-invariant learning

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Discovering objects and their location in images

Visual Categorization with Bags of Keypoints

Discovering objects and their location in images

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

The Delicate Art of Flower Classification

A Simple Introduction to Support Vector Machines

A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK

Lecture 2: The SVM classifier

CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER

View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Support Vector Machine (SVM)

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Review of Codebook Models in Patch-Based Visual Object Recognition

Group Sparse Coding. Fernando Pereira Google Mountain View, CA Dennis Strelow Google Mountain View, CA

Image Classification for Dogs and Cats

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Jiří Matas. Hough Transform

Semantic Image Segmentation and Web-Supervised Visual Learning

Classifying Manipulation Primitives from Visual Data

ImageCLEF 2011

Object Recognition. Selim Aksoy. Bilkent University

Bag-of-Features Model Application to Medical Image Classification

Several Views of Support Vector Machines

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Big Data Analytics CSCI 4030

Blocks that Shout: Distinctive Parts for Scene Classification

Environmental Remote Sensing GEOG 2021

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Neural Networks and Support Vector Machines

Multi-View Object Class Detection with a 3D Geometric Model

RUN-LENGTH ENCODING FOR VOLUMETRIC TEXTURE

Randomized Trees for Real-Time Keypoint Recognition

Image Segmentation and Registration

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration

Driver Cell Phone Usage Detection From HOV/HOT NIR Images

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

Support Vector Machines Explained

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition

Class-specific Sparse Coding for Learning of Object Representations

Machine Learning in Computer Vision A Tutorial. Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN

A Weakly Supervised Approach for Semantic Image Indexing and Retrieval

Mining Mid-level Features for Image Classification

Finding people in repeated shots of the same scene

BRIEF: Binary Robust Independent Elementary Features

Statistical Models in Data Mining

Weak Hypotheses and Boosting for Generic Object Detection and Recognition

3D Model based Object Class Detection in An Arbitrary View

An Automatic and Accurate Segmentation for High Resolution Satellite Image S.Saumya 1, D.V.Jiji Thanka Ligoshia 2

Who are you? Learning person specific classifiers from video

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Feature description based on Center-Symmetric Local Mapped Patterns

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation

Tracking in flussi video 3D. Ing. Samuele Salti

Semantic Description of Humans in Images

Distinctive Image Features from Scale-Invariant Keypoints

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

Knowledge Discovery from patents using KMX Text Analytics

Object Class Recognition by Unsupervised Scale-Invariant Learning

FPGA-BASED PARALLEL HARDWARE ARCHITECTURE FOR REAL-TIME OBJECT CLASSIFICATION. by Murad Mohammad Qasaimeh

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

KEITH LEHNERT AND ERIC FRIEDRICH

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Figure 1: Saving Project

What is Data mining?

FastKeypointRecognitioninTenLinesofCode

The Visual Internet of Things System Based on Depth Camera

Classification using Intersection Kernel Support Vector Machines is Efficient

A Bayesian Framework for Unsupervised One-Shot Learning of Object Categories

Android Ros Application

UNIVERSITY OF OSLO. Faculty of Mathematics and Natural Sciences

The Artificial Prediction Market

Support Vector Machines

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Support Vector Machines with Clustering for Training with Very Large Datasets

The Data Mining Process

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Learning Mid-Level Features For Recognition

Distributed Kd-Trees for Retrieval from Very Large Image Collections

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

Data Mining and Visualization

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Signature Segmentation from Machine Printed Documents using Conditional Random Field

Hierarchical Learning of Dominant Constellations for Object Class Recognition

Traffic Flow Monitoring in Crowded Cities

Transcription:

1. Bag of visual words model: recognizing object categories 1 1 1

Problem: Image Classification Given: positive training images containing an object class, and negative training images that don t Classify: a test image as to whether it contains the object class or not? 2 2 2

Weakly-supervised learning Learn model from a set of training images containing object instances Know if image contains object or not But no segmentation of object or manual selection of features 3 3 3

Three stages: 1. Represent each training image by a vector Use a bag of visual words representation 2. Train a classify to discriminate vectors corresponding to positive and negative training images Use a Support Vector Machine (SVM) classifier 3. Apply the trained classifier to the test image 4

Representation: Bag of visual words Visual words are iconic image patches or fragments represent the frequency of word occurrence but not their position Image Collection of visual words 5

Example: Learn visual words by clustering 10 10 Interest point features: textured neighborhoods are selected produces 100-1000 regions per image Weber, Welling & Perona 2000 6

Learning visual words by clustering ctd Pattern Space (100+ dimensions) 7

Example of visual words learnt by clustering faces 100-1000 images ~100 visual words 8

Image representation normalized histogram detect interest point features find closest visual word to region around detected points record number of occurrences, but not position frequency.. codewords 9

Example Image collection: four object classes + background Faces 435 Motorbikes 800 Airplanes 800 Cars (rear) 1155 Background 900 Total: 4090 The Caltech 5 10

Represent an image as a histogram of visual words iconic image fragments 0 1 0 2... Detect affine covariant regions Bag of words model Represent each region by a SIFT descriptor Build visual vocabulary by k-means clustering (K~1,000) Assign each region to the nearest cluster centre 11

Visual vocabulary for affine covariant patches Detect patches [Mikolajczyk and Schmid 02] [Matas et al. 02] Normalize patch Compute SIFT descriptor [Lowe 99] Vector quantize descriptors from a set of training images using k-means + + 12

Descriptors SIFT [Lowe 99] distribution of the gradient over an image patch image patch gradient x 3D histogram y 4x4 location grid and 8 orientations (128 dimensions) very good performance in image matching [Mikolaczyk and Schmid 03] 13

Vector quantize the descriptor space (SIFT) The same visual word 14

Each image: assign all detections to their visual words gives bag of visual word representation normalized histogram of word frequencies also called bag of key points 15

Visual words from affine covariant patches Vector quantize SIFT descriptors to a vocabulary of iconic visual words. Design of descriptors makes these words invariant to: illumination affine transformations (viewpoint) Size (granularity) of vocabulary is an important parameter fine grained represent model instances coarse grained represent object categories 16

Examples of visual words 17

More visual words 18

Visual synonyms and polysemy Visual Polysemy: Single visual word occurring on different (but locally similar) parts on different object categories. Visual Synonyms: Two different visual words representing a similar part of an object (wheel of a motorbike). 19

Training data: vectors are histograms, one from each training image positive negative Train classifier,e.g.svm 20

21

22

23

24

SVM classifier with kernels f(x) = NX i α i k(x i, x)+b N = size of training data weight (may be zero) support vector f(x) ( 0 positive class < 0 negative class 25

Linear separability linearly separable linear kernel sufficient not linearly separable use non-linear kernel 26

Some popular kernels Linear: K(x, y) =x > y Polynomial: K(x, y) = ³ x > y + c n Radial basis function: K(x, y) =e γ x y 2 Chi-squared: K(x, y) =e γχ2 (x,y) where χ 2 (x, y) = P j ( x j y j ) 2 x j +y j 27

Advantage of linear kernels at test time f(x) = NX i α i k(x i, x)+b N = size of training data f(x) = NX i α i x i > x + b = w > x + b Independent of size of training data 28

Manually gathered training images Current Paradigm for learning an object category model Test images Visual words Learn a visual category model Evaluate classifier / detector.......... 29

Example: weak supervision Training 50% images No identifcation of object within image Motorbikes Airplanes Frontal Faces Testing 50% images Simple object present/absent test Cars (Rear) Background Learning SVM classifier 2 Gaussian kernel using χ as distance between histograms Result Between 98.3 100% correct, depending on class Csurka et al 2004 Zhang et al 2005 30

Localization according to visual word probability sparse segmentation 20 40 60 80 100 120 20 40 60 80 100 120 50 100 150 200 50 100 150 200 foreground word more probable background word more probable 31

Why does SVM learning work? Learns foreground and background visual words w foreground words positive weight background words negative weight 32

Bag of visual words summary Advantages: largely unaffected by position and orientation of object in image fixed length vector irrespective of number of detections Very successful in classifying images according to the objects they contain Still requires further testing for large changes in scale and viewpoint Disadvantages: No explicit use of configuration of visual word positions Poor at localizing objects within an image 33