Object Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr



Similar documents
Introduction. Selim Aksoy. Bilkent University

3D Model based Object Class Detection in An Arbitrary View

Introduction to Pattern Recognition

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Semantic Recognition: Object Detection and Scene Segmentation

Probabilistic Latent Semantic Analysis (plsa)

Content-Based Image Retrieval

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Object Categorization using Co-Occurrence, Location and Appearance

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Object class recognition using unsupervised scale-invariant learning

Discovering objects and their location in images

Discovering objects and their location in images

High Level Describable Attributes for Predicting Aesthetics and Interestingness

Local features and matching. Image classification & object localization

Open issues and research trends in Content-based Image Retrieval

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Big Data: Image & Video Analytics

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Learning Motion Categories using both Semantic and Structural Information

Jiří Matas. Hough Transform

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

CS 534: Computer Vision 3D Model-based recognition

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Colorado School of Mines Computer Vision Professor William Hoff

MusicGuide: Album Reviews on the Go Serdar Sali

Principled Hybrids of Generative and Discriminative Models

Environmental Remote Sensing GEOG 2021

Tracking in flussi video 3D. Ing. Samuele Salti

LabelMe: Online Image Annotation and Applications

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Solving Big Data Problems in Computer Vision with MATLAB Loren Shure

Semantic Image Segmentation and Web-Supervised Visual Learning

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

The Visual Internet of Things System Based on Depth Camera

A Learning Based Method for Super-Resolution of Low Resolution Images

A Bayesian Framework for Unsupervised One-Shot Learning of Object Categories

Part-Based Recognition

Multi-View Object Class Detection with a 3D Geometric Model

T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Mean-Shift Tracking with Random Sampling

Image Classification for Dogs and Cats

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Feature Tracking and Optical Flow

Object Class Recognition by Unsupervised Scale-Invariant Learning

Character Image Patterns as Big Data

Fast Matching of Binary Features

Applications of Deep Learning to the GEOINT mission. June 2015

An Automatic and Accurate Segmentation for High Resolution Satellite Image S.Saumya 1, D.V.Jiji Thanka Ligoshia 2

Introduction. A. Bellaachia Page: 1

Tracking Groups of Pedestrians in Video Sequences

Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

MULTI-LEVEL SEMANTIC LABELING OF SKY/CLOUD IMAGES

Clustering Connectionist and Statistical Language Processing

Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES. Eva Hörster

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Big Data: Rethinking Text Visualization

Robert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

Perception-based Design for Tele-presence

Scale-Invariant Object Categorization using a Scale-Adaptive Mean-Shift Search

Creating a Big Data Resource from the Faces of Wikipedia

Finding people in repeated shots of the same scene

Limitations of Human Vision. What is computer vision? What is computer vision (cont d)?

Neural Network based Vehicle Classification for Intelligent Traffic Control

Digital Image Fundamentals. Selim Aksoy Department of Computer Engineering Bilkent University

FastKeypointRecognitioninTenLinesofCode

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

Combining Local Recognition Methods for Better Image Recognition

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

Information Management course

Classifying Manipulation Primitives from Visual Data

Distributed forests for MapReduce-based machine learning

Detection and Recognition of Mixed Traffic for Driver Assistance System

Localizing 3D cuboids in single-view images

Exploration and Visualization of Post-Market Data

Taking Inverse Graphics Seriously

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

Determining optimal window size for texture feature extraction methods

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Transcription:

Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr

Image classification Image (scene) classification is a fundamental problem in image understanding. Automatic techniques for associating scenes with semantic labels have a high potential for improving the performance of other computer vision applications such as browsing (natural grouping of images instead of clusters based only on low-level features), retrieval (filtering images in archives based on content), and object recognition (the probability of an unknown object/region that exhibits several local features of a ship actually being a ship can be increased if the scene context is known to be a coast with high confidence but can be decreased if no water related context is dominant in that scene). CS 484, Fall 2012 2012, Selim Aksoy 2

Image classification The image classification problem has two critical components: representing images and learning models for semantic categories using these representations. Early work used low-level global features extracted from the whole image or from a fixed spatial layout. More recent approaches exploit local statistics in images using patches extracted by interest point detectors. Other configurations that use regions and their spatial relationships are also proposed. CS 484, Fall 2012 2012, Selim Aksoy 3

Hierarchical image classification Images Others (Face/close-up) Indoor Outdoor Others (Face/close-up) City Landscape Others Sunset Mountain/forest Mountain Forest Hierarchy of 11 scene categories (Vailaya et al., Image classification for content-based indexing, IEEE Trans. Image Processing, 2001). CS 484, Fall 2012 2012, Selim Aksoy 4

Hierarchical image classification Image representation: Mean and std. dev. of LUV values in 10x10 blocks for indoor/outdoor classification. Edge direction histograms for city/landscape classification. Histograms of HSV and LUV values for sunset/mountain/forest classification. Classification: Class-conditional density estimation using vector quantization. Bayesian classification. CS 484, Fall 2012 2012, Selim Aksoy 5

Hierarchical image classification CS 484, Fall 2012 2012, Selim Aksoy 6

Hierarchical image classification CS 484, Fall 2012 2012, Selim Aksoy 7

Image classification using bag-of-words Caltech data set: 13 natural scene categories. IDIAP data set (left to right): mountain, forest, indoor, city-panorama, city-street. CS 484, Fall 2012 2012, Selim Aksoy 8

Image classification using bag-of-words Flowchart from Fei-Fei Li, Pietro Perona, A Bayesian hierarchical model for learning natural scene categories, IEEE CVPR, 2005. CS 484, Fall 2012 2012, Selim Aksoy 9

Image classification using bag-of-words A codebook obtained from 650 training examples from 13 categories. Image patches are detected by a sliding grid and random sampling of scales. CS 484, Fall 2012 2012, Selim Aksoy 10

CS 484, Fall 2012 2012, Selim Aksoy 11

CS 484, Fall 2012 2012, Selim Aksoy 12

Image classification using bag-of-words CS 484, Fall 2012 2012, Selim Aksoy 13

Image classification using bag-of-words Flowchart from Quelhas et al., A thousand words in a scene, IEEE Trans. PAMI, 2007. Probabilistic Latent Semantic Analysis (PLSA) is used to learn aspect models to capture co-occurrences of visterms (visual terms). Bag-of-visterms representation or the aspect parameters are given as input to Support Vector Machines for classification. CS 484, Fall 2012 2012, Selim Aksoy 14

Image classification using bag-of-words CS 484, Fall 2012 2012, Selim Aksoy 15

CS 484, Fall 2012 2012, Selim Aksoy 16

Image classification using bag-of-regions D. Gökalp, S. Aksoy, Scene classification using bag-of-regions representations, IEEE CVPR, Beyond Patches Workshop, 2007. Region segmentation Region clustering region codebook Above-below spatial relationships region pairs Statistical region selection: identify region types that are frequently found in a particular class of scenes but rarely exist in other classes, and consistently occur together in the same class of scenes. Bayesian scene classification using bag of individual regions, bag of region pairs. CS 484, Fall 2012 2012, Selim Aksoy 17

Image classification using bag-of-regions Examples for region clusters. Each row represents a different cluster. CS 484, Fall 2012 2012, Selim Aksoy 18

Image classification using bag-of-regions CS 484, Fall 2012 2012, Selim Aksoy 19

Image classification using bag-of-regions Examples for correctly classified scenes. Examples for wrongly classified scenes. CS 484, Fall 2012 2012, Selim Aksoy 20

Image classification using factor graphs Boutell et al., Scene Parsing Using Region-Based Generative Models, IEEE Trans. Multimedia, 2007. CS 484, Fall 2012 2012, Selim Aksoy 21

ICCV 2005 Beijing, Short Course, Oct 15 Recognizing and Learning Object Categories Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT

Agenda Introduction Bag of words models Part-based models Discriminative methods Segmentation and recognition Conclusions CS 484, Fall 2012 2012, Selim Aksoy 23

perceptible vision material thing CS 484, Fall 2012 2012, Selim Aksoy 24

How many object categories are there? Biederman 1987 CS 484, Fall 2012 2012, Selim Aksoy 25

So what does object recognition involve? CS 484, Fall 2012 2012, Selim Aksoy 26

Verification: is that a bus? CS 484, Fall 2012 2012, Selim Aksoy 27

Detection: are there cars? CS 484, Fall 2012 2012, Selim Aksoy 28

Identification: is that a picture of Mao? CS 484, Fall 2012 2012, Selim Aksoy 29

Object categorization sky building flag banner bus face street lamp bus wall cars CS 484, Fall 2012 2012, Selim Aksoy 30

Scene and context categorization outdoor city traffic CS 484, Fall 2012 2012, Selim Aksoy 31

Challenges 1: view point variation Michelangelo 1475-1564 CS 484, Fall 2012 2012, Selim Aksoy 32

Challenges 2: illumination slide credit: S. Ullman CS 484, Fall 2012 2012, Selim Aksoy 33

Challenges 3: occlusion Magritte, 1957 CS 484, Fall 2012 2012, Selim Aksoy 34

Challenges 4: scale CS 484, Fall 2012 2012, Selim Aksoy 35

Challenges 5: deformation Xu, Beihong 1943 CS 484, Fall 2012 2012, Selim Aksoy 36

Challenges 6: background clutter Klimt, 1913 CS 484, Fall 2012 2012, Selim Aksoy 37

History: single object recognition CS 484, Fall 2012 2012, Selim Aksoy 38

Challenges 7: intra-class variation CS 484, Fall 2012 2012, Selim Aksoy 39

History: early object categorization CS 484, Fall 2012 2012, Selim Aksoy 40

CS 484, Fall 2012 2012, Selim Aksoy 41

ANIMALS PLANTS OBJECTS INANIMATE.. VERTEBRATE NATURAL MAN-MADE MAMMALS BIRDS TAPIR BOAR GROUSE CAMERA CS 484, Fall 2012 2012, Selim Aksoy 42

CS 484, Fall 2012 2012, Selim Aksoy 43

Scenes, Objects, and Parts Scene Objects Parts E. Sudderth, A. Torralba, W. Freeman, A. Willsky. ICCV 2005. Features CS 484, Fall 2012 2012, Selim Aksoy 44

Object categorization: the statistical viewpoint p( zebra image ) vs. p( no zebra image) Bayes rule: p ( zebra image ) p ( image zebra ) p ( zebra ) = p( no zebra image) p( image no zebra) p( no zebra) posterior ratio likelihood ratio prior ratio CS 484, Fall 2012 2012, Selim Aksoy 45

Object categorization: the statistical viewpoint p( zebra image) p( image zebra) p( zebra) = p ( no zebra image ) p ( image no zebra ) p ( no zebra ) posterior ratio likelihood ratio prior ratio Discriminative methods model posterior Generative methods model likelihood and prior CS 484, Fall 2012 2012, Selim Aksoy 46

Discriminative Direct modeling of p( zebra image) p( no zebra image) Decision boundary Zebra Non-zebra CS 484, Fall 2012 2012, Selim Aksoy 47

Generative Model p( image zebra) and p( image no zebra) p ( image zebra ) Low High p( image no zebra) Middle MiddleLow CS 484, Fall 2012 2012, Selim Aksoy 48

Three main issues Representation How to represent an object category Learning How to form the classifier, given training data Recognition How the classifier is to be used on novel data CS 484, Fall 2012 2012, Selim Aksoy 49

Representation Generative / discriminative / hybrid CS 484, Fall 2012 2012, Selim Aksoy 50

Representation Generative / discriminative / hybrid Appearance only or location and appearance CS 484, Fall 2012 2012, Selim Aksoy 51

Representation Generative / discriminative / hybrid Appearance only or location and appearance Invariances View point Illumination Occlusion Scale Deformation Clutter etc. CS 484, Fall 2012 2012, Selim Aksoy 52

Representation Generative / discriminative / hybrid Appearance only or location and appearance invariances Part-based or global w/subwindow CS 484, Fall 2012 2012, Selim Aksoy 53

Representation Generative / discriminative / hybrid Appearance only or location and appearance invariances Parts or global w/subwindow Use set of features or each pixel in image CS 484, Fall 2012 2012, Selim Aksoy 54

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning CS 484, Fall 2012 2012, Selim Aksoy 55

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) Methods of training: generative vs. discriminative class densities 5 4 3 2 1 p(x C 1 ) p(x C 2 ) posterior probabilities 1 0.8 0.6 0.4 0.2 p(c 1 x) p(c 2 x) 0 0 0.2 0.4 0.6 0.8 1 x 0 0 0.2 0.4 0.6 0.8 1 x CS 484, Fall 2012 2012, Selim Aksoy 56

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Contains a motorbike CS 484, Fall 2012 2012, Selim Aksoy 57

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Batch/incremental (on category and image level; user-feedback ) CS 484, Fall 2012 2012, Selim Aksoy 58

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Batch/incremental (on category and image level; user-feedback ) Training images: Issue of overfitting Negative images for discriminative methods CS 484, Fall 2012 2012, Selim Aksoy 59

Learning Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Batch/incremental (on category and image level; user-feedback ) Training images: Priors Issue of overfitting Negative images for discriminative methods CS 484, Fall 2012 2012, Selim Aksoy 60

Recognition Scale / orientation range to search over Speed CS 484, Fall 2012 2012, Selim Aksoy 61

Object recognition using context The relationships between objects and the scene setting can be characterized in five ways: Interposition: objects interrupt their background Support: objects tend to rest on surfaces Probability: objects tend to be found in some contexts but not others Position: given an object is probable in a scene, it is often found in some positions and not others Familiar size: objects have a limited set of size relations with other objects CS 484, Fall 2012 2012, Selim Aksoy 62

Object recognition using context An ideal setting where the objects are perfectly segmented, the regions are classified, and the objects labels are refined with respect to semantic context in the image. (Rabinovich et al. Objects in Context, ICCV, 2007) CS 484, Fall 2012 2012, Selim Aksoy 63

Object recognition using context First column: segmentation results; second column: classification without contextual constraints; third column: classification with co-occurrence contextual constraints implemented using a conditional random field model. (Rabinovich et al. Objects in Context, ICCV, 2007) CS 484, Fall 2012 2012, Selim Aksoy 64

Object recognition using context better Using above below inside around relationships as spatial constraints better better worse First column: input image; second column: classification with co-occurrence contextual constraints; third column: classification with spatial and co-occurrence contextual constraints. (Galleguillos et al. Object Categorization using Co-Occurrence, Location and Appearance, CVPR, 2008) CS 484, Fall 2012 2012, Selim Aksoy 65

Object recognition using context Geometric context: learn labeling of the image into geometric classes such as support/ground (green), vertical planar (green) and sky (blue). The arrows show the direction that the vertical planar surface is facing. (Hoiem et al., Geometric Context from a Single Image, ICCV, 2005) CS 484, Fall 2012 2012, Selim Aksoy 66

Object recognition using context (a) (b) (c) (d) Putting objects in perspective: Car (a) and pedestrian (b) detection by using only local information. Car (c) and pedestrian (d) detection by using context. (Hoiem et al. Putting Objects in Perspective, CVPR, 2006) CS 484, Fall 2012 2012, Selim Aksoy 67

Object recognition using context Contextual recognition by maximizing a scene probability function that incorporates outputs of individual object detectors (confidence values for assigned object labels) and pairwise interactions between objects (likelihood of pairwise spatial relationships). Left column: without using context; right column: with using context. (Firat Kalaycilar, An object recognition framework using contextual interactions among objects, M.S. Thesis, Bilkent University, 2009) CS 484, Fall 2012 2012, Selim Aksoy 68