Local features and matching. Image classification & object localization

Similar documents

Robust Real-Time Face Detection

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Probabilistic Latent Semantic Analysis (plsa)

The Delicate Art of Flower Classification

Visual Categorization with Bags of Keypoints

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Semantic Recognition: Object Detection and Scene Segmentation

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Object class recognition using unsupervised scale-invariant learning

3D Model based Object Class Detection in An Arbitrary View

Active Learning with Boosting for Spam Detection

Lecture 2: The SVM classifier

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Randomized Trees for Real-Time Keypoint Recognition

Convolutional Feature Maps

Support Vector Machine (SVM)

Classification using Intersection Kernel Support Vector Machines is Efficient

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Semantic Image Segmentation and Web-Supervised Visual Learning

Support Vector Machines with Clustering for Training with Very Large Datasets

Object Categorization using Co-Occurrence, Location and Appearance

Finding people in repeated shots of the same scene

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

Segmentation & Clustering

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

The use of computer vision technologies to augment human monitoring of secure computing facilities

The Visual Internet of Things System Based on Depth Camera

Who are you? Learning person specific classifiers from video

Image Classification for Dogs and Cats

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Group Sparse Coding. Fernando Pereira Google Mountain View, CA Dennis Strelow Google Mountain View, CA

Data Mining Practical Machine Learning Tools and Techniques

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation

Social Media Mining. Data Mining Essentials

Discovering objects and their location in images

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Segmentation as Selective Search for Object Recognition

Interactive Offline Tracking for Color Objects

Part-Based Recognition

A Learning Based Method for Super-Resolution of Low Resolution Images

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

Mean Shift Based Clustering in High Dimensions: A Texture Classification Example

Classifying Manipulation Primitives from Visual Data

Big Data: Image & Video Analytics

Open-Set Face Recognition-based Visitor Interface System

View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems

Object Recognition. Selim Aksoy. Bilkent University

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING

Making Sense of the Mayhem: Machine Learning and March Madness

Automated Attendance Management System using Face Recognition

Multi-class Classification: A Coding Based Space Partitioning

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Machine Learning Final Project Spam Filtering

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Blocks that Shout: Distinctive Parts for Scene Classification

EXPLORING IMAGE-BASED CLASSIFICATION TO DETECT VEHICLE MAKE AND MODEL FINAL REPORT

High Level Describable Attributes for Predicting Aesthetics and Interestingness

Multi-View Object Class Detection with a 3D Geometric Model

Signature Segmentation and Recognition from Scanned Documents

The Scientific Data Mining Process

Jiří Matas. Hough Transform

Data Mining. Nonlinear Classification

Tracking in flussi video 3D. Ing. Samuele Salti

Environmental Remote Sensing GEOG 2021

How To Use A Near Neighbor To A Detector

Mean-Shift Tracking with Random Sampling

Image Segmentation and Registration

A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK

Video Surveillance System for Security Applications

Discovering objects and their location in images

A fast multi-class SVM learning method for huge databases

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

A Simple Introduction to Support Vector Machines

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Learning Mid-Level Features For Recognition

Support Vector Machines Explained

Signature Segmentation from Machine Printed Documents using Conditional Random Field

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation

L25: Ensemble learning

E-commerce Transaction Anomaly Classification

Principled Hybrids of Generative and Discriminative Models

BRIEF: Binary Robust Independent Elementary Features

Deformable Part Models with CNN Features

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Statistical Models in Data Mining

Transcription:

Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization

Category recognition Image classification: assigning a class label to the image Car: present Cow: present Bike: not present Horse: not present

Category Tasks recognition Image classification: assigning a class label to the image Car: present Cow: present Bike: not present Horse: not present Object localization: define the location and the category Car Cow Location Category

Given Image classification Positive training images containing an object class Negative training images that don t Classify A test image as to whether it contains the object class or not?

Bag-of-features features for image classification Origin: texture recognition Texture is characterized by the repetition of basic elements or textons Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Bag-of-features features Origin: bag-of-words (text) Orderless document representation: frequencies of words from a dictionary Classification to determine document categories Bag-of-words Common o 2 0 1 3 People 3 0 0 2 Sculpture 0 1 3 0

Bag-of-features features for image classification SVM Extract regions Compute Find clusters Compute distance Classification descriptors and frequencies matrix [Nowak,Jurie&Triggs,ECCV 06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV 07]

Bag-of-features features for image classification SVM Extract regions Compute Find clusters Compute distance Classification descriptors and frequencies matrix Step 1 Step 2 Step 3 [Nowak,Jurie&Triggs,ECCV 06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV 07]

Step 1: feature extraction Scale-invariant image regions + SIFT Rotation invariance for many realistic collections too much invariance Dense descriptors Improve results in the context of categories (for most categories) Interest points do not necessarily capture all features Color-based descriptors Shape-based descriptors

Dense features - Multi-scale dense grid: extraction of small overlapping patches at multiple scales -Computation of the SIFT descriptor for each grid cells -Exp.: Horizontal/vertical step size 6 pixel, scaling factor of 1.2 per level

Bag-of-features features for image classification SVM Extract regions Compute Find clusters Compute distance Classification descriptors and frequencies matrix Step 1 Step 2 Step 3

Step 2: Quantization

Step 2:Quantization Clustering

Step 2: Quantization Visual vocabulary Clustering

Airplanes Motorbikes Faces Wild Cats Leaves People Bikes Examples for visual words

Step 2: Quantization Cluster descriptors K-means Gaussian mixture model Assign each visual word to a cluster Hard or soft assignment Build frequency histogram

Hard or soft assignment K-means hard assignment Assign to the closest cluster center Count number of descriptors assigned to a center Gaussian mixture model soft assignment Estimate distance to all centers Sum over number of descriptors Represent image by a frequency histogram

Image representation fr requenc cy.. codewords Each image is represented by a vector, typically 1000-4000 dimension, normalization with L2 norm fine grained represent model instances coarse grained represent object categories

Bag-of-features features for image classification SVM Extract regions Compute Find clusters Compute distance Classification descriptors and frequencies matrix Step 1 Step 2 Step 3

Step 3: Classification Learn a decision rule (classifier) assigning bag-of- features representations of images to different classes Decision boundary Zebra Non-zebra

Training data Vectors are histograms, one from each training image positive negative Train classifier,e.g.svm

Classifiers K-nearest neighbor classifier Linear classifier Support Vector Machine Non-linear classifier Kernel trick Explicit lifting

Classification Assign input vector to one of two or more classes Any decision rule divides input space into decision regions separated by decision boundaries

Nearest Neighbor Classifier Assign label of nearest training data point to each test data point from Duda et al. Voronoi partitioning of feature space for 2-category 2-D and 3-D data

k-nearest Neighbors For a new point, find the k closest points from training data Labels of the k points vote to classify Works well provided there is lots of data and the distance function is good k =5

Linear classifiers Find linear function (hyperplane) to separate positive and negative examples x x i i positive : negative : x x i i w w b b 0 0 Which hyperplane is best?

Linear classifiers - margin x 2 (color) Generalization is not good in this case: x1(roundness) 1 Better if a margin is introduced: b/ w x 2 (color) x1(roundness) 1

Support vector machines Find hyperplane that maximizes the margin between the positive and negative examples x positive ( y 1) : x w b x i i negative( i y i 1) : x i i w b 1 1 For support, vectors, x i w b 1 i The margin is 2 / w Support vectors Margin

Nonlinear SVMs Datasets that are linearly separable work out great: 0 x But what if the dataset is just too hard? 0 x We can map it to a higher-dimensional space: x 2 0 x

Nonlinear SVMs General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x φ(x)

Nonlinear SVMs The kernel trick: instead of explicitly computing the lifting transformation φ(x), define a kernel function K such that K(x i,x j j) = φ(x i ) φ(x j ) This gives a nonlinear decision boundary in the original feature e space: i yik ( xi, x ) i b

Kernels for bags of features Kernels for bags of features Hellinger kernel N i h i h h h K ) ( ) ( ) ( Hellinger kernel i i h i h h h K 1 2 1 2 1 ) ( ) ( ), ( Histogram intersection kernel N i i h i h h h I 1 2 1 2 1 )) ( ), ( min( ), ( Generalized Gaussian kernel i 1 2 ) ( 1 exp ) ( h h D h h K Generalized Gaussian kernel D can be Euclidean distance, χ 2 distance etc. 2 1 2 1 ), ( exp ), ( h h D A h h K D can be Euclidean distance, χ distance etc. N i h i h h h D 2 2 1 2 1 ) ( ) ( ), ( 2 i i h i h 1 2 1 2 1 ) ( ) ( ), ( 2

Multi-class SVMs Various direct formulations exist, but they are not widely used in practice. It is more common to obtain multi-class SVMs by combining two-class SVMs in various ways. One versus all: Training: learn an SVM for each class versus the others Testing: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value One versus one: Training: learn an SVM for each pair of classes Testing: each learned SVM votes for a class to assign to the test example

Why does SVM learning work? Learns foreground and background visual words foreground words high weight background words low weight

Illustration Localization according to visual word probability Correct Image: 35 Correct Image: 37 20 40 60 80 100 120 20 40 60 80 100 120 50 100 150 200 50 100 150 200 Correct Image: 38 Correct Image: 39 20 40 60 80 100 120 20 40 60 80 100 120 50 100 150 200 50 100 150 200 foreground word more probable background word more probable

Illustration A linear SVM trained from positive and negative window descriptors A few of the highest weighed descriptor vector dimensions (= 'PAS + tile') + lie on object boundary (= local shape structures common to many training exemplars)

Bag-of-features features for image classification Excellent results in the presence of background clutter bikes books building cars people phones trees

Examples for misclassified images Books- misclassified into faces, faces, buildings Buildings- misclassified into faces, trees, trees Cars- misclassified into buildings, phones, phones

Bag of visual words summary Advantages: largely unaffected by position and orientation of object in image fixed length vector irrespective of number of detections very successful in classifying images according to the objects they contain Disadvantages: no explicit use of configuration of visual word positions poor at localizing objects within an image

Evaluation of image classification PASCAL VOC [05-10] datasets PASCAL VOC 2007 Training and test dataset available Used to report state-of-the-art the art results Collected January 2007 from Flickr 500 000 images downloaded and random subset selected 20 classes Class labels per image + bounding boxes 5011 training i images, 4952 test t images Evaluation measure: average precision

PASCAL 2007 dataset

PASCAL 2007 dataset

Evaluation

Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization

Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object within the frame Bounding box or pixellevel segmentation

Pixel-level level object classification

Difficulties Intra-class variations Scale and viewpoint change Multiple aspects of categories

Approaches Intra-class variation => Modeling of the variations, mainly by learning from a large dataset, for example by SVMs Scale + limited viewpoints changes => multi-scale approach or invariant local features Multiple aspects of categories => separate detectors for each aspect, front/profile face, build an approximate 3D category model

Approaches Localization (bounding box) Sliding window approach Localization (segmentation) Pixel-based +MRF

Training Localization with sliding window Positive examples Negative examples Description + Learn a classifier

Localization with sliding window Testing at multiple l locations and scales Find local maxima, non-maxima suppression

Haar Wavelet / SVM Human Detector Haar wavelet descriptors Training set (2k positive / 10k negative) training 1326-D descriptor descriptors test Support vector machine results Test image [Papageorgiou & Poggio, 1998] Multi-scale search 53

Which Descriptors s are Important? 32x32 descriptors 16x16 descriptors Mean response difference between positive & negative training examples Essentially just a coarse-scale human silhouette template!

Some Detection Results

The Viola/Jones Face Detector A seminal approach to real-time object detection Training is slow, but detection is very fast Key ideas Integral images for fast feature evaluation Boosting for feature selection Attentional cascade for fast rejection of non-face windows P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.

Image Features Rectangle filters Value = (pixels in white area) (pixels in white area) (pixels in black area)

Fast computation with integral images The integral image computes a value at each pixel (x,y) that is the sum (x,y) of the pixel values above and to the left of (x,y), inclusive This can quickly be computed in one pass through the image

Feature selection For a 24x24 detection region, the number of possible rectangle features is ~160,000!

Feature selection For a 24x24 detection region, the number of possible rectangle features is ~160,000! At test time, it is impractical to evaluate the entire feature set Can we create a good classifier using just a small subset of all possible features? How to select such a subset?

Boosting Boosting is a classification scheme that works by combining weak learners into a more accurate ensemble classifier Training consists of multiple boosting rounds During each boosting round, we select a weak learner that does well on examples that were hard for the previous weak learners Hardness is captured by weights attached to training examples Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.

Boosting for face detection First two features selected by boosting: This feature combination can yield 100% s eatu e co b at o ca y e d 00% detection rate and 50% false positive rate

Attentional cascade We start with simple classifiers which reject many of the negative sub-windows while detecting almost all positive sub-windows Positive response from the first classifier triggers the evaluation of a second (more complex) classifier, and so on A negative outcome at any point leads to the immediate rejection of the sub-window IMAGE SUB-WINDOW Classifier 1 T Classifier 2 T Classifier 3 T FACE F F F NON-FACE NON-FACE NON-FACE

The implemented system Training Data 5000 faces All frontal, rescaled to 24x24 pixels 300 million non-faces 9500 non-face images Faces are normalized Scale, translation Many variations Across individuals Illumination Pose

Result of Face Detector on Test Images

Profile Detection

Profile Features

Summary: Viola/Jones detector Rectangle features Integral images for fast computation ti Boosting for feature selection Attentional cascade for fast rejection of negative windows Available in open CV

Histogram of Oriented Gradient Human Detector Descriptors are a grid of local Histograms of Oriented Gradients (HOG) Linear SVM for runtime efficiency Tolerates different poses, clothing, lighting and background Assumes upright fully visible people Importance weighted responses 69 [Dalal & Triggs, CVPR 2005]

Human detection