All You Should Know on Visual Recognition Pipelines: from theory to icub. Sean Ryan Fanello icub Facility Istituto Italiano di Tecnologia

Similar documents
Local features and matching. Image classification & object localization

Image Classification for Dogs and Cats

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Probabilistic Latent Semantic Analysis (plsa)

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Tracking in flussi video 3D. Ing. Samuele Salti

Classifying Manipulation Primitives from Visual Data

3D Model based Object Class Detection in An Arbitrary View

Convolutional Feature Maps

Blocks that Shout: Distinctive Parts for Scene Classification

Object Recognition. Selim Aksoy. Bilkent University

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

The Delicate Art of Flower Classification

Group Sparse Coding. Fernando Pereira Google Mountain View, CA Dennis Strelow Google Mountain View, CA

Semantic Recognition: Object Detection and Scene Segmentation

Driver Cell Phone Usage Detection From HOV/HOT NIR Images

High Level Describable Attributes for Predicting Aesthetics and Interestingness

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Introduction to Computer Graphics

A Learning Based Method for Super-Resolution of Low Resolution Images

MusicGuide: Album Reviews on the Go Serdar Sali

ImageCLEF 2011

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Taking Inverse Graphics Seriously

A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK

The use of computer vision technologies to augment human monitoring of secure computing facilities

CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

Android Ros Application

Colorado School of Mines Computer Vision Professor William Hoff

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

BRIEF: Binary Robust Independent Elementary Features

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor

Randomized Trees for Real-Time Keypoint Recognition

Online Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long Range Traversability

Lecture 2: The SVM classifier

VideoStory A New Mul1media Embedding for Few- Example Recogni1on and Transla1on of Events

Open issues and research trends in Content-based Image Retrieval

Fast Matching of Binary Features

Feature Tracking and Optical Flow

Image Segmentation and Registration

First-Person Activity Recognition: What Are They Doing to Me?

Who are you? Learning person specific classifiers from video

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

View-Invariant Dynamic Texture Recognition using a Bag of Dynamical Systems

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Segmentation & Clustering

Environmental Remote Sensing GEOG 2021

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:

Mean-Shift Tracking with Random Sampling

Steven C.H. Hoi School of Information Systems Singapore Management University

Interactive Segmentation, Tracking, and Kinematic Modeling of Unknown 3D Articulated Objects

MIFT: A Mirror Reflection Invariant Feature Descriptor

Jiří Matas. Hough Transform

GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Machine Learning in Computer Vision A Tutorial. Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Professor, D.Sc. (Tech.) Eugene Kovshov MSTU «STANKIN», Moscow, Russia

Analecta Vol. 8, No. 2 ISSN

G E N E R A L A P P R O A CH: LO O K I N G F O R D O M I N A N T O R I E N T A T I O N I N I M A G E P A T C H E S

Classification using Intersection Kernel Support Vector Machines is Efficient

Digital Image Increase

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Data, Measurements, Features

Optical Flow. Shenlong Wang CSC2541 Course Presentation Feb 2, 2016

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

Object Categorization using Co-Occurrence, Location and Appearance

Online Place Recognition for Mobile Robots

The Visual Internet of Things System Based on Depth Camera

The Scientific Data Mining Process

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Part-Based Recognition

Learning Motion Categories using both Semantic and Structural Information

IMAGE PROCESSING BASED APPROACH TO FOOD BALANCE ANALYSIS FOR PERSONAL FOOD LOGGING

FastKeypointRecognitioninTenLinesofCode

VECTORAL IMAGING THE NEW DIRECTION IN AUTOMATED OPTICAL INSPECTION

Mining Mid-level Features for Image Classification

GURLS: A Least Squares Library for Supervised Learning

ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES

Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration

Chapter 9 Action Recognition in Realistic Sports Videos

EXPLORING IMAGE-BASED CLASSIFICATION TO DETECT VEHICLE MAKE AND MODEL FINAL REPORT

Learning Invariant Visual Shape Representations from Physics

Scalable Object Detection by Filter Compression with Regularized Sparse Coding

THE development of methods for automatic detection

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

Blessing of Dimensionality: High-dimensional Feature and Its Efficient Compression for Face Verification

Transcription:

All You Should Know on Visual Recognition Pipelines: from theory to icub Sean Ryan Fanello icub Facility Istituto Italiano di Tecnologia

Motivations Self-Supervised Strategies Kinematics Motion Human Robot Interaction is a new and natural application for object recognition In robotics settings strong cues are often available, therefore object detectors can be avoided Recognition as tool for complex tasks: grasp, manipulation, affordances, pose S.R. Fanello, C. Ciliberto, L. Natale, G. Metta Weakly Supervised Strategies for Natural Object Recognition in Robotics, ICRA 2013 C. Ciliberto, S.R. Fanello, M. Santoro, L. Natale, G. Metta, L. Rosasco - On the Impact of Learning Hierarchical Representations for Visual Recognition in Robotics, IROS 2013

Introduction Single Instance Recognition Recognition System Pattern Recognition & Machine Learning By C. Bishop Object Categorization Recognition System A book Same Representations Retrieval Recognition System All You Should Know on Visual Recognition Pipelines: from theory to icub - S.R. Fanello

Geometric Information Invariance to Transformation Discriminative Power Real-Time Still an open problem Introduction

Description We want to convert an image into a compact feature vector It must be robust to viewpoint, illumination, occlusion etc. Ideally an Invariant Representation Example: Color Histogram Invariant to: scale, occlusions, lighting, view-point Nice invariance properties, but not informative Trade-off between invariance and information More Data Invariance is not needed

Global Descriptors Color Histogram Swain, Ballard, Color indexing, IJCV 91. GIST of a Scene Oliva, Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, IJCV 01. Douze, Jegou, Sandhawalia, Amsaleg, Schmid, Evaluation of GIST descriptors for web-scale image search, CIVR 09. CENTRIST: Census Transform histogram Wu, Rehg, CENTRIST: a visual descriptor for scene categorization, TPAMI 11. Highly efficient to compute and to match, but they lack of description power. Not suitable for discriminative tasks

Local Descriptors A set of local descriptors is extracted. These features are invariant to geometric transformations. Most widely used: SIFTs Lowe, Distinctive image features from scale-invariant keypoints, IJCV 04. 8 orientations of the gradient in 4x4 spatial grid. 128 dimensions Notice: Local features are invariant when both keypoint detector and descriptor are used. In object categorization detectors do not perform well due to intraclass variability

Visual Recognition Pipelines Local Coding-Pooling Coding Pooling Classifier i-th Object Class e.g. SIFT, HOG, Patch e.g. VQ, SC, LLC, VLAD, SV, FK e.g. SPR e.g. SVM. MRF, BN Current state-the-art recognition pipelines use hierarchical image representations. Main Idea: Starting from low-level descriptors (e.g. SIFT) we compute higher order statistics with a sequence of coding-pooling operators. BOW-like systems have the first layer hand crafted (e.g. SIFT) Pro: Efficient, easy to implement Cons: No invariance Deep Architectures (e.g. HMAX) try to learn also the first layer Pro: Invariant to small transformation, Higher (?) accuracy Cons: TOO slow

Visual Recognition Pipelines Local Coding-Pooling Coding Pooling Classifier i-th Object Class e.g. SIFT, HOG, Patch e.g. VQ, SC, LLC, VLAD, SV, FK e.g. SPR e.g. SVM. MRF, BN Local Descriptors: are extracted from the image Keypoint detectors are not suitable, a dense grid better catches image statistics. These descriptors are not invariant to scale, rotation and translation etc. Main Assumption: Enough Data is available

Visual Recognition Pipelines Local Coding-Pooling Coding Pooling Classifier i-th Object Class e.g. SIFT, HOG, Patch e.g. VQ, SC, LLC, VLAD, SV, FK e.g. SPR e.g. SVM. MRF, BN Dictionary Learning: We want to relate local descriptors to a common basis, called dictionary. Intuitively a dictionary represents all the relevant descriptors and it is able to describe the content of any image. The dictionary is learned using K-Means algorithm with K=dictionary size on a large set of SIFTs (~1M)

Visual Recognition Pipelines Local Coding-Pooling Coding Pooling Classifier i-th Object Class e.g. SIFT, HOG, Patch e.g. VQ, SC, LLC, VLAD, SV, FK e.g. SPR e.g. SVM. MRF, BN Coding Operators The coding stage maps the input features into an overcomplete space. Based on the Reconstruction Error Higher Order Statistics

Coding Operators Vector Quantization (VQ) Sparse Coding (SC) They all minimize the reconstruction error: Locality-constrained Linear Coding (LLC)

Visual Recognition Pipelines Local Coding-Pooling Coding Pooling Classifier i-th Object Class e.g. SIFT, HOG, Patch e.g. VQ, SC, LLC, VLAD, SV, FK e.g. SPR e.g. SVM. MRF, BN Spatial Pyramid Representation partitioned in segments Pooling Operators They aggregates local descriptors into a single one. l=0 l=1 l=2 21 Spatial Bins Example: Max Pooling

Visual Recognition Pipelines Local Coding-Pooling Coding Pooling Classifier i-th Object Class e.g. SIFT, HOG, Patch e.g. VQ, SC, LLC, VLAD, SV, FK e.g. SPR e.g. SVM. RLS, MRF, BN Learning After the pooling stage we get a descriptor size and S the number of cells of the spatial pyramid where K is the dictionary Multi-class classification problem that can be solved with a standard One-vs-All paradigm. Linear Classifiers are more suitable for Real-Time tasks Non-linear feature mapping can be used to approximate non-linear Kernels A. Vedaldi, A. Zisserman Efficient additive kernels via explicit feature maps. CVPR 2010

Recap Local Coding-Pooling Coding Pooling Classifier i-th Object Class e.g. SIFT, HOG, Patch e.g. VQ, SC, LLC, VLAD, SV, FK e.g. SPR e.g. SVM. RLS, MRF, BN Local Descriptors Dense Grid of SIFT descriptors with fixed scale and grid spacing Dictionary Learning A dictionary is learned with K-Means (K=dictionary size) on a large set of SIFTs Coding-Pooling Features are mapped into an overcomplete space and then aggregated together. The image is transformed into a single vector Linear Classification Use your favorite classification method (GURLS!) with One-vs-All paradigm

Generalizing the Pipeline Local Coding-Pooling Coding Pooling Classifier i-th Object Class e.g. SIFT, HOG, Patch e.g. VQ, SC, LLC, VLAD, SV, FK e.g. SPR e.g. SVM. RLS, MRF, BN Action/Gesture Recognition Main Modifications Representation from stereo depth Motion and Appearance Temporal Pooling Video Segmentation

In The Real World... icubworld Single Instance Recognition C. Ciliberto, S.R. Fanello, M. Santoro, L. Natale, G. Metta, L. Rosasco - On the Impact of Learning Hierarchical Representations for Visual Recognition in Robotics, IROS 2013

Self-Supervised Strategies icubworld Data-Set Kinematics Motion icubworld Categorization Data-Set 10 Categories, 40 Objects Object Categorization between Computer Vision & Robotics Main Difficulty: structured clutter, therefore the context cannot be exploited. icubworld available at: http://www.iit.it/it/projects/data-sets.html S.R. Fanello, C. Ciliberto, M. Santoro, L. Natale, G. Metta, L. Rosasco, F. Odone icubworld: Friendly Robots Help Building Good Vision Data-Sets, CVPRW 2013

Build your own app Example of application with visual recognition capabilities Class Classifier (GURLS!) Your Yarp Module SparseCoder

Where? $ICUB_ROOT/contrib/src/sparseCoder Dependencies SIFT GPU, OpenCV 2.X, Yarp SparseCoder Module Docs: http://wiki.icub.org/icub/contrib/dox/html/group icub image representation.html Coding Methods: Best Code Entries, Sparse Coding, Bag of Words Spatial Pyramid Matching: Yes, customizable pyramid levels Dictionary Learning: With K-Means More Functionalities: PCA on SIFTs, Power Normalization, Sparse & Dense SIFTs

That s it! Thank You! Any Question?