Object Detectors Emerge in Deep Scene CNNs

Similar documents
arxiv: v2 [cs.cv] 15 Apr 2015

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Convolutional Feature Maps

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Deformable Part Models with CNN Features

Compacting ConvNets for end to end Learning

CNN Based Object Detection in Large Video Images. WangTao, IQIYI ltd

Semantic Recognition: Object Detection and Scene Segmentation

Bert Huang Department of Computer Science Virginia Tech

Image and Video Understanding

arxiv: v2 [cs.cv] 19 Jun 2015

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks

Image Classification for Dogs and Cats

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Pedestrian Detection with RCNN

Scalable Object Detection by Filter Compression with Regularized Sparse Coding

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:

arxiv: v2 [cs.cv] 19 Apr 2014

Do Convnets Learn Correspondence?

Steven C.H. Hoi School of Information Systems Singapore Management University

Learning and transferring mid-level image representions using convolutional neural networks

Task-driven Progressive Part Localization for Fine-grained Recognition

Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Studying Relationships Between Human Gaze, Description, and Computer Vision

Object Recognition. Selim Aksoy. Bilkent University

Multi-view Face Detection Using Deep Convolutional Neural Networks

arxiv: v1 [cs.cv] 29 Apr 2016

Object Detection in Video using Faster R-CNN

Transform-based Domain Adaptation for Big Data

CAP 6412 Advanced Computer Vision

Simultaneous Deep Transfer Across Domains and Tasks

arxiv: v6 [cs.cv] 10 Apr 2015

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

What makes an image memorable?

Fast R-CNN Object detection with Caffe

Understanding Deep Image Representations by Inverting Them

R-CNN minus R. 1 Introduction. Karel Lenc Department of Engineering Science, University of Oxford, Oxford, UK.

Finding people in repeated shots of the same scene

A Convolutional Neural Network Cascade for Face Detection

SSD: Single Shot MultiBox Detector

Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features

Pedestrian Detection using R-CNN

InstaNet: Object Classification Applied to Instagram Image Streams

Automatic Attribute Discovery and Characterization from Noisy Web Data

A Bayesian Framework for Unsupervised One-Shot Learning of Object Categories

Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance

Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?

Object Detection from Video Tubelets with Convolutional Neural Networks

arxiv: v2 [cs.cv] 27 Sep 2015

High Level Describable Attributes for Predicting Aesthetics and Interestingness

PANDA: Pose Aligned Networks for Deep Attribute Modeling

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

Bringing Semantics Into Focus Using Visual Abstraction

Segmentation as Selective Search for Object Recognition

Semantic Image Segmentation and Web-Supervised Visual Learning

Transfer Learning by Borrowing Examples for Multiclass Object Detection

What Makes an Image Popular?

Marr Revisited: 2D-3D Alignment via Surface Normal Prediction

arxiv:submit/ [cs.cv] 13 Apr 2016

Large Scale Semi-supervised Object Detection using Visual and Semantic Knowledge Transfer

Getting Started with Caffe Julien Demouth, Senior Engineer

Localizing 3D cuboids in single-view images

The Visual Internet of Things System Based on Depth Camera

Unsupervised Discovery of Mid-Level Discriminative Patches

arxiv: v1 [cs.cv] 18 May 2015

Large-scale Knowledge Transfer for Object Localization in ImageNet

The Delicate Art of Flower Classification

Object class recognition using unsupervised scale-invariant learning

SEMANTIC CONTEXT AND DEPTH-AWARE OBJECT PROPOSAL GENERATION

Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling

Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues

A Dynamic Convolutional Layer for Short Range Weather Prediction

arxiv: v2 [cs.cv] 30 Jan 2015

Applications of Deep Learning to the GEOINT mission. June 2015

3D Model based Object Class Detection in An Arbitrary View

Local features and matching. Image classification & object localization

Multi-fold MIL Training for Weakly Supervised Object Localization

A Century of Portraits: A Visual Historical Record of American High School Yearbooks

Two-Stream Convolutional Networks for Action Recognition in Videos

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

Transcription:

Object Detectors Emerge in Deep Scene CNNs Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba Massachusetts Institute of Technology

CNN for Object Recognition Large-scale image classification result on ImageNet CNN methods Figure from Olga Russakovsky ECCV'14 workshop

How Objects are Represented in CNN? Conv1 Conv2 Conv3 Conv4 Pool5 DrawCNN: visualizing the units' connections

How Objects are Represented in CNN? Deconvolution Zeiler, M. et al. Visualizing and Understanding Convolutional Networks,ECCV 2014. Strong activation image Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu-rate object detection and semantic segmentation. CVPR 2014 Back-propagation Simonyan, K. et al. Deep inside convolutional networks: Visualising image classification models and saliency maps. ICLR workshop, 2014

Object Representations in Computer Vision Part-based models are used to represent objects and visual patterns. -Object as a set of parts -Relative locations between parts Figure from Fischler & Elschlager (1973)

Object Representations in Computer Vision Constellation model Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003) Bag-of-word model Lazebnik, Schmid & Ponce(2003), Fei-Fei Perona (2005) Deformable Part model P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan (2010) Class-specific graph model Kumar, Torr and Zisserman (2005), Felzenszwalb & Huttenlocher (2005)

Learning to Recognize Objects brambling terrier Possible internal representations: - Object parts - Textures - Attributes

How Objects are Represented in CNN? CNN uses distributed code to represent objects. Conv1 Conv2 Conv3 Conv4 Pool5 Agrawal, et al. Analyzing the performance of multilayer neural networks for object recognition. ECCV, 2014 Szegedy, et al. Intriguing properties of neural networks.arxiv preprint arxiv:1312.6199, 2013. Zeiler, M. et al. Visualizing and Understanding Convolutional Networks, ECCV 2014.

Scene Recognition Given an image, predict which place we are in. Bedroom Harbor

Learning to Recognize Scenes bedroom mountain Possible internal representations: - Objects (scene parts?) - Scene attributes - Object parts - Textures

CNN for Scene Recognition s ega m i # Places Database: 7 million images from 400 scene categories scene category Places-CNN: AlexNet CNN on 2.5 million images from 205 scene categories. Scene Recognition Demo: 78% top-5 recognition accuracy in the wild http://places.csail.mit.edu Zhou, et al. NIPS, 2014.

ImageNet CNN and Places CNN ImageNet CNN for Object Classification Same architecture: AlexNet Places CNN for Scene Classification Places

Data-Driven Approach to Study CNN Neuroscientists study brain d ImageNet CNN Places CNN 200,000 image stimuli of objects and scene categories (ImageNet TestSet+SUN database)

Estimating the Receptive Fields Estimated receptive fields Actual size of RF is much smaller than the theoretic size pool1 conv3 pool5 Segmentation using the RF of Units More semantically meaningful

Annotating the Semantics of Units Top ranked segmented images are cropped and sent to Amazon Turk for annotation.

Annotating the Semantics of Units Pool5, unit 76; Label: ocean; Type: scene; Precision: 93%

Annotating the Semantics of Units Pool5, unit 13; Label: Lamps; Type: object; Precision: 84%

Annotating the Semantics of Units Pool5, unit 77; Label:legs; Type: object part; Precision: 96%

Annotating the Semantics of Units Pool5, unit 112; Label: pool table; Type: object; Precision: 70%

Annotating the Semantics of Units Pool5, unit 22; Label: dinner table; Type: scene; Precision: 60%

Distribution of Semantic Types at Each Layer Object detectors emerge within CNN trained to classify scenes, without any object supervision!

Histogram of Emerged Objects in Pool5 ImageNet-CNN (59/256)

Histogram of Emerged Objects in Pool5 Places-CNN (151/256)

Evaluation on SUN Database Evaluate the performance of the emerged object detectors

Evaluation on SUN Database Correlation:0.53 Correlation:0.84

Conclusion We show that object detectors emerge inside a CNN trained to classify scenes, without any object supervision. Object detectors for free! Places database, Places CNN, and unit annotations could be downloaded at http://places.csail.mit.edu