Fast R-CNN Object detection with Caffe

Similar documents

Convolutional Feature Maps

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Pedestrian Detection with RCNN

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

Deformable Part Models with CNN Features

Bert Huang Department of Computer Science Virginia Tech

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Compacting ConvNets for end to end Learning

arxiv: v2 [cs.cv] 27 Sep 2015

Image and Video Understanding

Object Detection in Video using Faster R-CNN

Deep Residual Networks

Pedestrian Detection using R-CNN

Latest Advances in Deep Learning. Yao Chou

CAP 6412 Advanced Computer Vision

Steven C.H. Hoi School of Information Systems Singapore Management University

Getting Started with Caffe Julien Demouth, Senior Engineer

Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN

SSD: Single Shot MultiBox Detector

HE Shuncheng March 20, 2016

CS231n Caffe Tutorial

arxiv: v1 [cs.cv] 29 Apr 2016

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks

Semantic Recognition: Object Detection and Scene Segmentation

Applications of Deep Learning to the GEOINT mission. June 2015

Multi-view Face Detection Using Deep Convolutional Neural Networks

Going Deeper with Convolutional Neural Network for Intelligent Transportation

Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features

RECOGNIZING objects and localizing them in images is

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

SIGNAL INTERPRETATION

R-CNN minus R. 1 Introduction. Karel Lenc Department of Engineering Science, University of Oxford, Oxford, UK.

Advanced analytics at your hands

Learning and transferring mid-level image representions using convolutional neural networks

SEMANTIC CONTEXT AND DEPTH-AWARE OBJECT PROPOSAL GENERATION

Hybrid Learning Framework for Large-Scale Web Image Annotation and Localization

Task-driven Progressive Part Localization for Fine-grained Recognition

Image Classification for Dogs and Cats

arxiv: v2 [cs.cv] 9 Mar 2016

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

CNN Based Object Detection in Large Video Images. WangTao, IQIYI ltd

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Supporting Online Material for

arxiv: v6 [cs.cv] 10 Apr 2015

Semantic Image Segmentation and Web-Supervised Visual Learning

arxiv: v1 [cs.cv] 6 Feb 2015

arxiv: v2 [cs.cv] 19 Jun 2015

Canny Edge Detection

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Azure Machine Learning, SQL Data Mining and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Pattern Insight Clone Detection

Model Deployment. Dr. Saed Sayad. University of Toronto

Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation

Novelty Detection in image recognition using IRF Neural Networks properties

Do Convnets Learn Correspondence?

InstaNet: Object Classification Applied to Instagram Image Streams

Scalable Object Detection by Filter Compression with Regularized Sparse Coding

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers

Local features and matching. Image classification & object localization

MACHINE LEARNING IN HIGH ENERGY PHYSICS

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Edge Boxes: Locating Object Proposals from Edges

Conditional Random Fields as Recurrent Neural Networks

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

3D Model based Object Class Detection in An Arbitrary View

The Visual Internet of Things System Based on Depth Camera

Taking Inverse Graphics Seriously

Large Scale Semi-supervised Object Detection using Visual and Semantic Knowledge Transfer

Object Detection from Video Tubelets with Convolutional Neural Networks

A Bayesian Framework for Unsupervised One-Shot Learning of Object Categories

Transform-based Domain Adaptation for Big Data

EdVidParse: Detecting People and Content in Educational Videos

Big Data Text Mining and Visualization. Anton Heijs

Evaluation of Optimizations for Object Tracking Feedback-Based Head-Tracking

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Data Mining mit der JMSL Numerical Library for Java Applications

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Network Morphism. Abstract. 1. Introduction. Tao Wei

Solving Big Data Problems in Computer Vision with MATLAB Loren Shure

Distributed forests for MapReduce-based machine learning

WebSphere Business Modeler

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Implementation of Neural Networks with Theano.

False alarm in outdoor environments

The Delicate Art of Flower Classification

Simultaneous Deep Transfer Across Domains and Tasks

Transcription:

Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts

Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets Highlight some more sophisticated uses of Caffe Python layers Multi-task training with multiple losses Batch sizes that change dynamically during Net::Forward() Pointers to open source code so you can explore, try, and understand!

Image classification (mostly what you ve seen) K classes Task: Assign the correct class label to the whole image Digit classification (MNIST) Object recognition (Caltech-101, ImageNet, etc.)

Classification vs. Detection Dog Bridge Bridge Dog Dog Easyish, these days Still quite a lot harder

Problem formulation The Visual World K object classes {airplane, bird, motorbike, person, sofa, bg} Person 0.7 Input YODA: Yet another Object Detection Algorithm Motorbike 0.9 Desired output *Actual results may vary

mean Average Precision (map) PASCAL VOC object detection After 70% Before the successful application of ConvNets 60% 50% < 2 years 1.8x map 40% 30% ~5 years 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 year Precision: higher is better

Fast R-CNN (Region-based Convolutional Networks) Object box proposals (N) A Fast R-CNN network e.g., selective A fast search object detector implemented (VGG_CNN_M_1024) with Caffe - Caffe fork on GitHub that adds two new layers (ROIPoolingLayer and SmoothL1LossLayer) - Python (using pycaffe) / more advanced Caffe usage Two outputs: - A type of Region-based Convolutional Network (R-CNN) Let s see how it works! 1. NK regressed object boxes 2. P cls = k box = n, image) for each NK boxes Image

Quick background Region-based Convolution Networks (R-CNNs) Input image Extract region proposals (~2k / image) e.g., selective search [van de Sande, Uijlings et al.] Compute CNN features on regions Classify and refine regions [Girshick et al. CVPR 14]

Fast R-CNN (test-time detection) Object box proposals (N) e.g., selective search A Fast R-CNN network (VGG_CNN_M_1024) Two output types: 1. NK regressed object boxes Net::Forward() takes 60 to 330ms 2. P cls = k box = n, image) for each NK boxes Given an image and object proposals, detection happens with a single call to the Net::Forward() Image

Fast R-CNN (test-time detection) Object box proposals (N) e.g., selective search A Fast R-CNN network (VGG_CNN_M_1024) Two output types: 1. NK regressed object boxes Minimal post-processing: - Non-maximum suppression (NMS) 2. P cls = k box = n, image) for each NK boxes Image Object proposals comes from: - Selective Search (2s / image) [van de Sande/Uijlings et al.] - EdgeBoxes (0.2s / image) [Zitnick & Dollar] - MCG (30s / image) [Arbelaez et al.] - Etc.

Zooming into the net image comes in here, blob size = S x 3 x H x W (e.g., S = 1 or 5, H = 600, W = 1000) (a bunch of conv layers and whatnot) 2000 image regions come in here, blob size = 2000 x 5 2000 x (4 * 21) Pool5 blob size = 2000 x 512 x 6 x 6 2000 x 21 conv5 feature map blob size = S x 512 x H/16 x W/16

Zooming into the net image comes in here, blob size = S x 3 x H x W (e.g., S = 1 or 5, H = 600, W = 1000) RoI Pooling Layer: - adaptive max pooling layer - dynamically expands batch from S to R (e.g., 2000) 2000 image regions come in here, blob size = 2000 x 5 2000 x (4 * 21) Pool5 blob size = 2000 x 512 x 6 x 6 2000 x 21 conv5 feature map blob size = S x 512 x H/16 x W/16

Another view of the same thing Deep ConvNet RoI projection conv5 feature map RoI pooling layer Outputs: softmax FCs FC RoI feature vector bbox regressor FC For each RoI These (top and bottom images) are the same

RoI Pooling Layer Special case of SPPnet s SPP layer [He et al. ECCV 14] Two inputs ( bottoms ) Conv feature map: S x 512 x H x W Regions of Interest: R x 5 5 comes from [r, x1, y1, x2, y2], where r in [0, R 1] specifies an image batch index

The train-time net Single fine-tuning operation all in Caffe Even more boxes and arrows Let s look at them

The train-time net (inputs) Class labels: 128 x 21 B full images: B x 3 x H x W (e.g., B = 2, H = 600, W = 1000) Bounding-box regression targets: 128 x 84 Zoomed area Bounding-box regression loss weights: 128 x 84 RoIs: 128 x 5 (75% background)

The train-time net (exotic data layers) Custom Python data layer Samples 2 images From each sampled image, takes 64 RoIs Input batch is initially 2 elements Gets expanded by the RoI Pooling Layer to 128 elements Outputs 5 tops data [images] rois [regions of interest] labels [class labels for the rois] bbox_targets [box regression targets] bbox_loss_weights [ details ]

The train-time net (multi-task losses) Classification loss (Cross-entropy) + Bounding-box regression loss ( Smooth L1 ) Zoomed area

Code is on GitHub (MIT License, Runs on Linux)

Caffe fork A brief tour of some of the code Python modules Train, test

Caffe fork A brief tour of some of the code (Caffe bits) Python modules Train, test

Region of Interest (RoI) Pooling Layer Expands a small batch into a big batch

Smooth L1 Loss Layer Robust to outliers Optimizer friendly Per-dimension loss weights Smooth L1 loss L1 L1 L2

Caffe fork A brief tour of some of the code (Python bits) Python modules Train, test

Python data layer for Fast R-CNN Reshapes blobs on-the-fly

Python training code Custom solver loop with custom snapshot method

Caffe fork A brief tour of some of the code (CLI tools) Python modules Train, test

Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research The detection network also proposes objects Marginal cost of proposals: 10ms VGG16 runtime ~200ms including all steps Higher map, faster Open-source Caffe code coming later this summer Region Proposal Network shares conv layers with Fast R-CNN object detection network