Learning to Localize Objects with Structured Output Regression

Similar documents
Local features and matching. Image classification & object localization

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

The Artificial Prediction Market

Semantic Image Segmentation and Web-Supervised Visual Learning

Lecture 2: The SVM classifier

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Direct Loss Minimization for Structured Prediction

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

The Visual Internet of Things System Based on Depth Camera

Segmentation as Selective Search for Object Recognition

AIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi

Support Vector Machine (SVM)

Simple and efficient online algorithms for real world applications

Convolutional Feature Maps

Statistical Machine Learning

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Learning from Diversity

Introduction to Learning & Decision Trees

Least-Squares Intersection of Lines

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

MACHINE LEARNING IN HIGH ENERGY PHYSICS

A Support Vector Method for Multivariate Performance Measures

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Semantic Recognition: Object Detection and Scene Segmentation

Learning and transferring mid-level image representions using convolutional neural networks

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

A Simple Introduction to Support Vector Machines

Hyperspectral images retrieval with Support Vector Machines (SVM)

E-commerce Transaction Anomaly Classification

3D Model based Object Class Detection in An Arbitrary View

Support Vector Machines Explained

CSCI567 Machine Learning (Fall 2014)

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Learning from Data: Naive Bayes

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Machine Learning Final Project Spam Filtering

Big Data Analytics CSCI 4030

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Data Mining Practical Machine Learning Tools and Techniques

Learning Similarity Measure for Multi-Modal 3D Image Registration

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

MS1b Statistical Data Mining

Several Views of Support Vector Machines

How to Win at the Track

Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks

Probabilistic Latent Semantic Analysis (plsa)

Novelty Detection in image recognition using IRF Neural Networks properties

Learning Spatial Context: Using Stuff to Find Things

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers

Modelling and Big Data. Leslie Smith ITNPBD4, October Updated 9 October 2015

DUOL: A Double Updating Approach for Online Learning

Supervised Learning (Big Data Analytics)

Linear Classification. Volker Tresp Summer 2015

Azure Machine Learning, SQL Data Mining and R

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Classification using Intersection Kernel Support Vector Machines is Efficient

Distributed Structured Prediction for Big Data

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Multi-Class and Structured Classification

Data Mining - Evaluation of Classifiers

Classifying Manipulation Primitives from Visual Data

Multi-View Object Class Detection with a 3D Geometric Model

CSE 473: Artificial Intelligence Autumn 2010

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Weakly Supervised Object Boundaries Supplementary material

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Basics of Statistical Machine Learning

L25: Ensemble learning

Scalable Object Detection by Filter Compression with Regularized Sparse Coding

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Machine Learning in Spam Filtering

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Big Data - Lecture 1 Optimization reminders

Big Data: Image & Video Analytics

STA 4273H: Statistical Machine Learning

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Semantic parsing with Structured SVM Ensemble Classification Models

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Lecture 3: Linear methods for classification

Chapter 7. Feature Selection. 7.1 Introduction

An Introduction to Machine Learning

Data Mining. Nonlinear Classification

Structured Learning and Prediction in Computer Vision. Contents

Fast R-CNN Object detection with Caffe

Open-Set Face Recognition-based Visitor Interface System

Transcription:

Learning to Localize Objects with Structured Output Regression Matthew B. Blaschko and Christoph H. Lampert Max Planck Institute for Biological Cybernetics Tübingen, Germany October 13th, 2008

Object (Category) Recognition Is there a cow in this picture?

Object (Category) Recognition Is there a cow in this picture?

Object (Category) Recognition Is there a cow in this picture?

Object (Category) Recognition Is there a cow in this picture?

Object (Category) Recognition Is there a cow in this picture?

Object (Category) Recognition Is there a cow in this picture?

Object (Category) Recognition Is there a cow in this picture?

Object (Category) Recognition Is there a cow in this picture? Binary classification problem: classifiers do the job.

Object (Category) Localization Where in the picture is the cow?

Object (Category) Localization Where in the picture is the cow? center point

Object (Category) Localization Where in the picture is the cow? center point segmentation

Object (Category) Localization Where in the picture is the cow? center point segmentation outline

Object (Category) Localization Where in the picture is the cow? center point segmentation outline bounding box Many possible answers, none is binary. How can we build a trainable system to localize objects?

From Classification to Localization: Sliding Window Most common: binary training + sliding window. Train a binary classifier f : { all images } R. training images positive: object class images negative: background regions train a classifier, e.g. support vector machine, Viola-Jones cascade,... decision function f : {images} R f > 0 means image shows the object. f < 0 means image does not show the object. Use f in sliding window procedure: apply f to many subimages subimage with largest score is object location

Problems of Sliding Window Localization Binary training + sliding window is inconsistent: 1) We learn for the wrong task: Training: only sign f matters. Testing: f used as real-valued quality measure. 2) The training samples do not reflect the test situation. Training: samples show either complete object or no object. Testing: many subregions show object parts.

Learning to Localize Objects Ideal setup: Learn a true localization function: g : {all images } {all boxes } g ( ) = that predicts object location from images. Train in a consistent end-to-end way. Training distribution reflects test distribution.

Object Localization as Structured Output Regression Regression task: training pairs (x 1, y 1 ),..., (x n, y n ) X Y x i are images, y i are bounding boxes Learn a mapping g : X Y that generalizes from the given examples: g(x i ) y i, for i = 1,..., n. Prefer smooth mappings to avoid overfitting. Regression is not R R, but input and output are structured spaces: inputs are 2D images outputs are 4-tuples y = [left, top, right, bottom] R 4 that must be predicted jointly.

Alternatives: Predicting with Structured Outputs Independent Training? Learn independent functions g left, g top, g right, g bottom : X R. Unable to integrate constraints and correlations. Nearest Neighbor? Store all example (x i, y i ) as prototypes. For new image x, predict box y i where i = argmin i=1,...n dist(x, x i ). No invariance e.g. against translation. Requires a lot of training data. Probabilistic Modeling? Build a model of p(x, y) or p(y x), e.g. Gaussian Mixture. Difficult to integrate invariances, e.g. against scale changes. Requires a lot of training data to cover 4D output space.

Background: Structured Output Support Vector Machine Structured Output Support Vector Machine: [Tsochantaridis2005] Generalization of SVMs to arbitrary output domains. Goal: prediction function g : X Y Learn a compatibility function F : X Y R and define g(x):= argmax F (x, y) y Y g(x) is learned discriminatively. Non-linearity and domain-knowledge included by kernelization of F. I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large Margin Methods for Structured and Interdependent Output Variables, JMLR, 2005.

Structured Output Support Vector Machine Setup: Define positive definite kernel k : (X Y) (X Y) R. k(.,.) induces a Reproducing Kernel Hilbert Space H and an implicit feature map φ : X Y H. Define loss function : Y Y R. SO-SVM Training: Solve the convex optimization problem 1 n min w,ξ 2 w 2 + C ξ i i=1 subject to margin constraints with loss function : (y i, y) + w, φ(x i, y) w, φ(x i, y i ) ξ i, for all y Y \ {y i } and i =1,..., n.

Structured Output Support Vector Machine Unique solution w H defines the compatibility function F (x, y)= w, φ(x, y) linear in w H, but nonlinear in X and Y. F (x, y) measures how well the output y fits to the input x. analogue in probabilistic model: F (x, y) ˆ= log p(y x). but: F (x, y) max-margin trained, not probabilistic! Best prediction for new x is the most compatible y: g(x) := argmax y Y F (x, y). Evaluating g : X Y is like generalized sliding window: for fixed x, we evaluate a quality function for every box y Y. approximate: sliding window exact: branch-and-bound [Lampert&Blaschko, CVPR2008] other parameterization: min-cut, (loopy) BP,...

SO-SVM Training: Interpretation SO-SVM Training Revisited: Hard-Margin Case Solve the convex optimization problem 1 min w,ξ 2 w 2 subject to margin constraints with loss function 0: w, φ(x i, y i ) w, φ(x i, y) (y i, y) for all y Y \ {y i } and i =1,..., n.

SO-SVM Training: Interpretation SO-SVM Training Revisited: Hard-Margin Case Solve the convex optimization problem 1 min w,ξ 2 w 2 subject to margin constraints with loss function 0: w, φ(x i, y i ) w, φ(x }{{} i, y) }{{} =F (x i,y i ) =F (x i,y) for all y Y \ {y i } and i =1,..., n. Implies: (y i, y) F (x i, y i ) > F (x i, y) for all y Y \ {y i }. Because g(x) := argmax y F (x, y), this means g(x i ) = y i for all training pairs (x i, y i ).

SO-SVM Training: Interpretation SO-SVM Training Revisited: General Case Solve the convex optimization problem 1 n min w,ξ 2 w 2 + C ξ i i=1 subject to margin constraints with loss function 0: w, φ(x i, y i ) w, φ(x }{{} i, y) }{{} =F (x i,y i ) =F (x i,y) for all y Y \ {y i } and i =1,..., n. (y i, y) ξ i Implies F (x i, y i ) F (x i, y) > (y i, y) ξ i for y Y \ {y i }. { small, if y y i (y i, y) = g(x i ) y i for most (x i, y i ) large, if y y i because penalization increases the more g(x i ) differs from y i.

Joint Kernel for (Image,Box)-Pairs SO-SVM for Object Localization: What is a good joint kernel function? k joint ( (x, y), (x, y ) ) for images x, x and boxes y, y. Observation: x y (image restricted to box region) is again an image. Compare two images with boxes by comparing the images inside the box regions: k joint ( (x, y), (x, y ) ) := k image (x y, x y, ) Properties: automatic translation invariance other invariances inherited from image kernel Wide range of choices for image kernel k image : linear, χ 2 -kernel, spatial pyramids, pyramid match kernel,...

Restriction Kernel: Examples k joint (, ) ( = k, ) is large. k joint (, ) ( = k, ) is small. k joint (, ) ( = k, ) could also be large.

Loss Function for Image Boxes What is a good loss function (y, y )? (y i, y) plays the role that the margin has in SVMs. (y i, y) measures how far a prediction y is from a target y i. We use area overlap: (y, y ) := 1 area overlap between y and y = 1 area(y y ) area(y y ) (y, y ) = 0 iff y = y, (y, y ) = 1 iff y and y are disjoint.

Results: PASCAL VOC 2006 dataset natural images (from Microsoft Research Cambridge and Flickr) 5,000 images: 2,500 train/val, 2,500 test humanly labeled 9,500 objects in 10 predefined classes: bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep task: predict locations and confidence scores for each class evaluation: precision recall curves

Results: PASCAL VOC 2006 dataset Experiments on PASCAL VOC 2006 dataset: Most simple setup: SURF local features, 3000-bin bag-of-visual-word histograms, Linear kernel function Predict exactly one object per image For PR-curves, rank images by score of χ 2 -SVM Example detections for VOC 2006 bicycle, bus and cat.

Results: PASCAL VOC 2006 dataset Experiments on PASCAL VOC 2006 dataset: Most simple setup: SURF local features, 3000-bin bag-of-visual-word histograms, Linear kernel function Predict exactly one object per image For PR-curves, rank images by score of χ 2 -SVM Precision recall curves for VOC 2006 bicycle, bus and cat. Structured regression clearly improves detection performance.

Results: PASCAL VOC 2006 dataset We improved best previously published scores in 6 of 10 classes. proposed best VOC2006 post VOC2006 bicycle.472.440 1.498 5 bus.342.169 2.249 6 car.336.444 3.458 5 cat.300.160 2.223 7 cow.275.252 2 dog.150.118 4.148 7 horse.211.140 1 motorbike.397.390 3 person.107.164 3.340 8 sheep.204.251 3 Average Precision (AP) scores on the 10 categories of PASCAL VOC 2006. 1 : I. Laptev, VOC2006 2 : V. Viitaniemi, VOC2006 3 : M. Douze, VOC2006 4 : J. Shotton, VOC2006 5 : Crandall and Huttenlocher, CVPR07 6 : Chum and Zisserman, CVPR07 7 : Lampert et al., CVPR08 8 : Felzenszwalb et al., CVPR08

Results: TU Darmstadt cow dataset Why does it work better? Experiment on TU Darmstadt cow dataset relatively easy, side views of cows in front of different backgrounds 111 training images, 557 test images same setup: bag-of-visual words histograms, linear kernel Learned distribution of local weights: example test image binary training structured training

Results: TU Darmstadt cow dataset Why does it work better? Experiment on TU Darmstadt cow dataset relatively easy, side views of cows in front of different backgrounds 111 training images, 557 test images same setup: bag-of-visual words histograms, linear kernel Learned distribution of local weights: example test image binary training structured training Binary training: weights concentrated on few discriminative regions all boxes containing hot spots gets similarly high scores Structured training: whole inside positive, whole outside negative correct box is enforced to be the best of all possible ones

Results: TU Darmstadt cow dataset Why does it work better? Experiment on TU Darmstadt cow dataset relatively easy, side views of cows in front of different backgrounds 111 training images, 557 test images same setup: bag-of-visual words histograms, linear kernel Learned distribution of local weights: example test image binary training structured training Binary training: weights concentrated on few discriminative regions all boxes containing hot spots gets similarly high scores Structured training: whole inside positive, whole outside negative correct box is enforced to be the best of all possible ones

Results: TU Darmstadt cow dataset Why does it work better? Experiment on TU Darmstadt cow dataset relatively easy, side views of cows in front of different backgrounds 111 training images, 557 test images same setup: bag-of-visual words histograms, linear kernel Learned distribution of local weights: example test image binary training structured training Binary training: weights concentrated on few discriminative regions all boxes containing hot spots gets similarly high scores Structured training: whole inside positive, whole outside negative correct box is enforced to be the best of all possible ones

Summary How to build a trainable system for object localization?

Summary How to build a trainable system for object localization? Conventional approaches use binary classifier function. two-step procedure: binary training + sliding window evaluation inconsistent: problem solved in training is different than at test time.

Summary How to build a trainable system for object localization? Conventional approaches use binary classifier function. two-step procedure: binary training + sliding window evaluation inconsistent: problem solved in training is different than at test time. We formulate localization as regression with structured output. training directly reflects the procedure at test time.

Summary How to build a trainable system for object localization? Conventional approaches use binary classifier function. two-step procedure: binary training + sliding window evaluation inconsistent: problem solved in training is different than at test time. We formulate localization as regression with structured output. training directly reflects the procedure at test time. Proposed system realized as structured support vector machine. discriminative max-margin training, convex optimization problem

Summary How to build a trainable system for object localization? Conventional approaches use binary classifier function. two-step procedure: binary training + sliding window evaluation inconsistent: problem solved in training is different than at test time. We formulate localization as regression with structured output. training directly reflects the procedure at test time. Proposed system realized as structured support vector machine. discriminative max-margin training, convex optimization problem Empirical results: structured training improves over binary training uncertainty of optimal box position is reduced

Summary How to build a trainable system for object localization? Conventional approaches use binary classifier function. two-step procedure: binary training + sliding window evaluation inconsistent: problem solved in training is different than at test time. We formulate localization as regression with structured output. training directly reflects the procedure at test time. Proposed system realized as structured support vector machine. discriminative max-margin training, convex optimization problem Empirical results: structured training improves over binary training uncertainty of optimal box position is reduced Source code: SVMstruct: svmlight.joachims.org/svm_struct.html module for object localization: www.christoph-lampert.org