Image Processing Technology and Applications: Object Recognition and Detection in Natural Images

Similar documents
Local features and matching. Image classification & object localization

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Semantic Recognition: Object Detection and Scene Segmentation

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Object Recognition. Selim Aksoy. Bilkent University

Image Classification for Dogs and Cats

Probabilistic Latent Semantic Analysis (plsa)

Convolutional Feature Maps

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Finding people in repeated shots of the same scene

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

3D Model based Object Class Detection in An Arbitrary View

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Segmentation as Selective Search for Object Recognition

Classification using Intersection Kernel Support Vector Machines is Efficient

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

The Visual Internet of Things System Based on Depth Camera

Driver Cell Phone Usage Detection From HOV/HOT NIR Images

A Bayesian Framework for Unsupervised One-Shot Learning of Object Categories

Visual Categorization with Bags of Keypoints

Object class recognition using unsupervised scale-invariant learning

Lecture 2: The SVM classifier

Image Segmentation and Registration

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Blocks that Shout: Distinctive Parts for Scene Classification

Discovering objects and their location in images

Semantic Image Segmentation and Web-Supervised Visual Learning

The Delicate Art of Flower Classification

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Feature Tracking and Optical Flow

Learning Mid-Level Features For Recognition

Scalable Object Detection by Filter Compression with Regularized Sparse Coding

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Pedestrian Detection using R-CNN

Unsupervised Discovery of Mid-Level Discriminative Patches

ImageCLEF 2011

Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition

Integral Channel Features

Android Ros Application

Supporting Online Material for

Deformable Part Models with CNN Features

ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES

CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER

The use of computer vision technologies to augment human monitoring of secure computing facilities

View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems

Automatic Attribute Discovery and Characterization from Noisy Web Data

Object Categorization using Co-Occurrence, Location and Appearance

Learning Spatial Context: Using Stuff to Find Things

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

How To Use A Near Neighbor To A Detector

EXPLORING IMAGE-BASED CLASSIFICATION TO DETECT VEHICLE MAKE AND MODEL FINAL REPORT

Bag-of-Features Model Application to Medical Image Classification

Multi-View Object Class Detection with a 3D Geometric Model

YouTube Scale, Large Vocabulary Video Annotation

Image and Video Understanding

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Mean-Shift Tracking with Random Sampling

A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK

Improving Spatial Support for Objects via Multiple Segmentations

MIFT: A Mirror Reflection Invariant Feature Descriptor

Face Recognition in Low-resolution Images by Using Local Zernike Moments

High Level Describable Attributes for Predicting Aesthetics and Interestingness

Who are you? Learning person specific classifiers from video

Pedestrian Detection with RCNN

The goal is multiply object tracking by detection with application on pedestrians.

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

Histograms of Oriented Gradients for Human Detection

Group Sparse Coding. Fernando Pereira Google Mountain View, CA Dennis Strelow Google Mountain View, CA

Learning Invariant Features through Topographic Filter Maps

Environmental Remote Sensing GEOG 2021

Scale-Invariant Object Categorization using a Scale-Adaptive Mean-Shift Search

Hierarchical Learning of Dominant Constellations for Object Class Recognition

Taking Inverse Graphics Seriously

MusicGuide: Album Reviews on the Go Serdar Sali

Jiří Matas. Hough Transform

Latest Advances in Deep Learning. Yao Chou

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

Fast Matching of Binary Features

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation

Discovering objects and their location in images

Learning Motion Categories using both Semantic and Structural Information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Section 16.2 Classifying Images of Single Objects 504

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Randomized Trees for Real-Time Keypoint Recognition

G E N E R A L A P P R O A CH: LO O K I N G F O R D O M I N A N T O R I E N T A T I O N I N I M A G E P A T C H E S

Tracking in flussi video 3D. Ing. Samuele Salti

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation

Locality-constrained Linear Coding for Image Classification

Make and Model Recognition of Cars

Object Detection with Discriminatively Trained Part Based Models

Automatic parameter regulation for a tracking system with an auto-critical function

Galaxy Morphological Classification

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION

Interactive Offline Tracking for Color Objects

Transcription:

Image Processing Technology and Applications: Object Recognition and Detection in Natural Images Katherine Bouman MIT November 26, 2012

Goal To be able to automatically understand the content of an image Input Black Box Output Chameleon Hand Grass Road Sand Lecture Title-2

Motivation Recently object detection in natural images is starting to have a lot of commercial success! Automatic Focus MobilEye Google Goggles Lecture Title-3

Motivation Black Box Chameleon Hand Grass Road Sand INDEXING RETRIEVAL QUERY Lecture Title-4 DATABASE RESULT

Motivation Black Box Chameleon Hand Grass Road Sand INDEXING RETRIEVAL DATABASE QUERY RESULT This image contains a chameleon Lecture Title-5

Motivation Black Box Chameleon Hand Grass Road Sand INDEXING RETRIEVAL DATABASE QUERY RESULT This image contains a chameleon Lecture Title-6

Is Object Detection Really that Hard? Find the chair in this image Output of normalized correlation This is a chair Lecture Title-7 Slide Credit: A. Torralba

Is Object Detection Really that Hard? Find the chairs in this image Output of normalized correlation Garbage! Lecture Title-8 Slide Credit: A. Torralba

Challenges of Object Detection View Point Variation Lecture Title-9

Challenges of Object Detection View Point Variation Illumination Lecture Title-10 Image Credit: S. Ullman

Challenges of Object Detection View Point Variation Illumination Occlusion Lecture Title-11

Challenges of Object Detection View Point Variation Illumination Occlusion Scale Lecture Title-12

Challenges of Object Detection View Point Variation Illumination Occlusion Scale Deformation/Articulation Lecture Title-13

Challenges of Object Detection View Point Variation Illumination Occlusion Scale Deformation/Articulation Intra-Class Variation Lecture Title-14

Challenges of Object Detection View Point Variation Illumination Occlusion Scale Deformation/Articulation Intra-Class Variation Background Clutter Lecture Title-15 Image Credit: A. Torralba

General Approach Pipeline: Learning Model POSTIVE EXAMPLES 1 Vector e r u t a Fe Feature Vecto r2 Feature Vector 3 tor 4 FeFaeature Vec ture Vec tor 5 EXTRACT FEATURES LEARN CLASSIFIER NEGATIVE EXAMPLES 1 Vector e r u t a Fe Feature Vecto r 2 Feature Vector 3 tor 4 FeFaeature Vec ture Vec tor 5 Lecture Title-16

General Approach Pipeline: Detecting Objects Input Image Feature Vector 1 EXTRACT FEATURES Feature Vector 2 Feature Vector 3 CLASSIFY Is there a squirrel here? Lecture Title-17

Questions To Answer What features should we use? Intensity, color, gradient information, etc What models should we use? How can we realistically implement an algorithm? Lecture Title-18

What features should we use? Lecture Title-19

Common Image Descriptors for Detection Descriptors encode local neighboring window around keypoints Commonly descriptors in object detection try to capture gradient information Human Perception is sensitive to gradient orientation Invariant to changes in lighting and small deformations Lecture Title-20

Common Image Descriptors for Detection Most common image descriptors currently used in object detection SIFT Scale Invariant Feature Transform HOG Histogram of Oriented Gradients and many variants of these Lecture Title-21

SIFT Scale Invariant Feature Transform Input an Image Extract Keypoints Finds corners Determines scale and orientation of the keypoint Compute Descriptor for each Keypoint Histogram of gradients in Gaussian window around keypoint Lecture Title-22 Input Image Extract Keypoints Compute Descriptors Image Credit: VLFeat

SIFT Scale Invariant Feature Transform Compute the gradient for each pixel in local neighboring window Typically 8 gradient directions Neighboring window is determined by scale of the keypoint Pool Gradients into a 4x4 histogram Weight each magnitude by a Gaussian window centered around the keypoint 8x4x4=128 dimensional output vector normalized to 1 Lecture Title-23 showing only 2x2 here but is generally 4x4 Image Credit: D. Lowe

SIFT Scale Invariant Feature Transform Match groups of keypoints across images Invariant to scale and some changes in lighting and orientation Great for finding the same instance of an object! Lecture Title-24 Image Credit: D. Lowe

SIFT Scale Invariant Feature Transform Not good at finding different instances of an object Lecture Title-25 Image Credit: D. Lowe

SIFT Scale Invariant Feature Transform Input an Image Extract Keypoints Finds corners Determines scale and orientation of the keypoint Compute Descriptor for each Keypoint Histogram of gradients in Gaussian window around keypoint Lecture Title-26 Input Image Extract Keypoints Compute Descriptors Image Credit: VLFeat

DSIFT Dense Scale Invariant Feature Transform Input an Image Compute Descriptor for each k pixels Use a fixed scale to calculate each descriptor No longer scale invariant Lecture Title-27

HOG Histogram of Oriented Gradients Input an Image Normalize Gamma and Color Compute Gradients Accumulate weighted votes for gradient orientation over spatial bins Normalize contrast within overlapping blocks of cells Collect HOGs for all blocks over image Lecture Title-28

HOG Histogram of Oriented Gradients Lecture Title-29

HOG Histogram of Oriented Gradients Lecture Title-30

What models should we use? Lecture Title-31

Families of Detection/Recognition Models Bag of Words Models Rigid Template Models Csurka, Dance, Fan, Willamowski, and Bray 2004 Sivic, Russell, Freeman, Zisserman, ICCV 2005 Sirovich and Kirby 1987 Turk, Pentland, 1991 Dalal & Triggs, 2006 Structure Models -Part and Voting Models Fischler and Elschlager, 1973 Burl, Leung, and Perona, 1995 Weber, Welling, and Perona, 2000 Fergus, Perona, & Zisserman, CVPR 2003 Lecture Title-32 Viola and Jones, ICCV 2001 Heisele, Poggio, et. al., NIPS 01 Schneiderman, Kanade 2004 Vidal-Naquet, Ullman 2003 Image Credit: A. Torralba

Families of Detection/Recognition Models Models capture varying degrees of spatial relationships between features Bag of Words Structure Models Rigid Template Models Lecture Title-33

Bag of Words Lecture Title-34 Image Credit: L. Fei Fei

Bag of Words Extract local descriptors from image Learn a visual vocabulary (codebook) of local descriptors Quantize the local descriptors using the codebook Represent images by frequencies of visual words Lecture Title-35 Slide Credit: J. Hays

Bag of Words: Learning a Codebook Extract Features from all images and then cluster SIFT, DSIFT K-means How to choose vocabulary size? Too small: visual words not representative of all patches Too large: quantization artifacts and overfitting Lecture Title-36 Image Credit: J. Hays

Bag of Words: Coding Local Descriptors Vector Quantization Coding Map each feature to the index of the nearest visual word in the codebook Locally-Constrained Linear Coding Write each feature as a linear combination of the visual words Lecture Title-37 Image Credit: J. Hays

Bag of Words: Coding Local Descriptors Original Image Vector Quantization Coding Locality-Constrained Linear Coding Lecture Title-38 Generated Images from the average patch associated with each visual word

Bag of Words: Pooling Sum Pooling Max Pooling Lecture Title-39

Learning Detection Classifiers Extract Regions from Images Containing the desired object Containing everything other than the desired object Compute Feature Vector for Each Region Train SVM Linear vs. Non-linear Kernel + Thousands More Positive Examples + Millions More Negative Examples Lecture Title-40

Bag of Words: Where it Works Texture Recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures, it is the identify of the textons and not their spatial arrangement that matters Lecture Title-41 Image Credit: J. Malik

Bag of Words: Where it Fails Lecture Title-42 Slide Credit: J. Hays

Spatial Pyramids Lecture Title-43 Slide Credit: J. Hays

Spatial Pyramids Extension of the bag of words model Locally orderless representation at several levels of resolution Some spatial information Lecture Title-44 Image Credit: S. Lazebnik

Spatial Pyramids: Learning Receptive Fields Recent work in learning Receptive Fields rather than using a regular grid Example Motivation: Sunset and Highway Images Traditional Receptive Fields Learned Receptive Fields Lecture Title-45 Image Credit: Y. Jia

Rigid Template Models Find the HOG feature vector for each image Originally developed to used for pedestrian detection by Dalal and Triggs HOG Lecture Title-46 Image Credit: Dalal

Rigid Template Models: Detecting Objects Convolve low resolution HOG with the rigid template Test Image HOG Response x Pedestrian Model Lecture Title-47 Image Credit: P. Felzenszwalb

Rigid Template Models: Mixture Models Objects take on different appearances Pose Multiple types of the same object For each object create a mixture model to capture the various appearances of the same object Lecture Title-48 Image Credit: P. Felzenszwalb

Structure Models Part Models Voting models Deformable Models Few parts (~6) Many patches (>100) No parts Lecture Title-49 Slide Credit: A. Torralba

Part Based Models Lecture Title-50 Image Credit: Fischler & Elschlager

Part Based Model Models an object as a number of smaller parts that are allowed to deviate slightly from average appearance Star model - coarse root and higher resolution part filters Lecture Title-51 Root Filter Part Filters Spatial Model for location of each part relative to root Parts are not positioned the same in each detection Image Credit: P. Felzenszwalb

Part Based Model: Training using Latent SVM Use SVM to find classifier START Lecture Title-52 Initialize Part Locations Use classifier to estimate the latent variable part locations

Part Based Model: Detecting Objects Test Image Pedestrian Model Extract HOG Features at multiple scales Convolve low resolution HOG with the root filter and high resolution HOG with the parts filters Transform part responses to point to where the star s root should be located Add responses to find score for root locations Lecture Title-53 Image Credit: P. Felzenszwalb

Voting Models Create weak detectors by using parts and voting for the object s center location Car model Screen model Lecture Title-54 Slide Credit: A. Torralba

Voting Models: Collecting Parts First we collect a set of part templates from a set of training objects. Lecture Title-55 Slide Credit: A. Torralba

Voting Models: Weak Part Detectors We now define a family of weak detectors as: = * = Lecture Title-56 Better than chance Slide Credit: A. Torralba

Voting Models: Weak Part Detectors We can do a better job using filtered images * = = * = Still a weak detector but better than before Lecture Title-57 Slide Credit: A. Torralba

Voting Model Example: Screen Detection Feature output Thresholded output Strong classifier + Adding features Strong classifier at iteration 200 Lecture Title-58 Slide Credit: A. Torralba

Boosting Defines a classifier using an additive model Strong classifier Features vector Weight Weak classifier Lecture Title-59 Slide Credit: A. Torralba

Boosting Example It is a sequential procedure: Each data point has a class label: y t = +1 ( ) -1 ( ) and a weight: w t =1 Lecture Title-60 Slide Credit: A. Torralba

Boosting Example Each data point has a class label: y t = +1 ( ) -1 ( ) and a weight: w t =1 Lecture Title-61 This classifier seems to be the best Slide Credit: A. Torralba

Boosting Example Each data point has a class label: y t = +1 ( ) -1 ( ) We update the weights: w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again Lecture Title-62 Slide Credit: A. Torralba

Boosting Example Each data point has a class label: y t = +1 ( ) -1 ( ) w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again Lecture Title-63 Slide Credit: A. Torralba

Boosting Example Each data point has a class label: y t = +1 ( ) -1 ( ) w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again Lecture Title-64 Slide Credit: A. Torralba

Boosting Example Each data point has a class label: y t = +1 ( ) -1 ( ) w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again Lecture Title-65 Slide Credit: A. Torralba

Boosting Example f 1 f 2 f 4 f 3 The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers. Lecture Title-66 Slide Credit: A. Torralba

Learning Detection Classifiers: Exemplar SVMs Learns a separate classifier for EVERY positive example (and millions of negative examples) At test time each classifier is applied to the image Lecture Title-67 Image Credit: A. Malisiewicz

Learning Detection Classifiers: Exemplar SVMs Allows for more accurate correspondence and information transfer Lecture Title-68 Image Credit: A. Malisiewicz

How can we realistically implement an algorithm? Lecture Title-69

Learning Detection Classifiers Extract Regions from Images Containing the desired object Containing everything other than the desired object Compute Feature Vector for Each Region Train SVM Linear vs. Non-linear Kernel + Thousands More Positive Examples + Millions More Negative Examples Lecture Title-70

Training SVMs using Hard Negative Mining Too many negatives examples to train against all of them Time and Memory Constraints Be smart about how you choose what negatives to train against Positives Negatives Lecture Title-71

Training SVMs using Hard Negative Mining Too many negatives examples to train against all of them Time and Memory Constraints Be smart about how you choose what negatives to train against Positives Negatives Lecture Title-72

Training SVMs using Hard Negative Mining Too many negatives examples to train against all of them Time and Memory Constraints Be smart about how you choose what negatives to train against Positives Negatives Lecture Title-73

Training SVMs using Hard Negative Mining Too many negatives examples to train against all of them Time and Memory Constraints Be smart about how you choose what negatives to train against Positives Negatives Lecture Title-74 Classifies these points incorrectly

Training SVMs using Hard Negative Mining Too many negatives examples to train against all of them Time and Memory Constraints Be smart about how you choose what negatives to train against Positives Negatives Lecture Title-75

Detecting Objects Must run a classifier at every position at every scale in order to detect object Lecture Title-76

Detecting Objects Lecture Title-77

Detecting Objects Lecture Title-78

Detecting Objects Lecture Title-79

Generating Potential Object Regions Most algorithms rely on an exhaustive search to find object detections Number of Pixels X Number of Scales Quickly find a smaller number of potential object bounding boxes to search in Reduce Search Space Exhaustive Search Lecture Title-80 Selective Search Image Credit: K. Van de Sande

Generating Potential Object Regions Segmentation Algorithm Start with oversegmentation in a variety of color spaces Group regions for each color space in a greedy fashion until the image is a single region Size and Texture Lecture Title-81 Image Credit: K. Van de Sande

Generating Potential Object Regions Lecture Title-82

Generating Potential Object Regions Lecture Title-83

Datasets for Object Classification/Detection Caltech101 Caltech256 PASCAL ImageNET LabelMe Lecture Title-84

Questions? Lecture Title-85