Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection



Similar documents
Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Convolutional Feature Maps

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Pedestrian Detection with RCNN

Semantic Recognition: Object Detection and Scene Segmentation

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:

Fast R-CNN Object detection with Caffe

Deformable Part Models with CNN Features

Pedestrian Detection using R-CNN

Bert Huang Department of Computer Science Virginia Tech

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Latest Advances in Deep Learning. Yao Chou

CAP 6412 Advanced Computer Vision

The Visual Internet of Things System Based on Depth Camera

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN

Scalable Object Detection by Filter Compression with Regularized Sparse Coding

LIBSVX and Video Segmentation Evaluation

Object Detection in Video using Faster R-CNN

arxiv: v2 [cs.cv] 27 Sep 2015

Image and Video Understanding

Multi-view Face Detection Using Deep Convolutional Neural Networks

arxiv: v1 [cs.cv] 29 Apr 2016

Task-driven Progressive Part Localization for Fine-grained Recognition

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Edge Boxes: Locating Object Proposals from Edges

Image Classification for Dogs and Cats

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

R-CNN minus R. 1 Introduction. Karel Lenc Department of Engineering Science, University of Oxford, Oxford, UK.

InstaNet: Object Classification Applied to Instagram Image Streams

Deep Residual Networks

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

How To Generate Object Proposals On A Computer With A Large Image Of A Large Picture

Getting Started with Caffe Julien Demouth, Senior Engineer

SSD: Single Shot MultiBox Detector

CNN Based Object Detection in Large Video Images. WangTao, IQIYI ltd

Segmentation as Selective Search for Object Recognition

Local features and matching. Image classification & object localization

Steven C.H. Hoi School of Information Systems Singapore Management University

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Do Convnets Learn Correspondence?

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Compacting ConvNets for end to end Learning

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT

Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features

Master s Program in Information Systems

Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling

Hybrid Learning Framework for Large-Scale Web Image Annotation and Localization

Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance

Object Detection from Video Tubelets with Convolutional Neural Networks

Environmental Remote Sensing GEOG 2021

Learning and transferring mid-level image representions using convolutional neural networks

A Learning Based Method for Super-Resolution of Low Resolution Images

Conditional Random Fields as Recurrent Neural Networks

arxiv: v6 [cs.cv] 10 Apr 2015

Segmentation & Clustering

Big Data Text Mining and Visualization. Anton Heijs

The Scientific Data Mining Process

Applications of Deep Learning to the GEOINT mission. June 2015

Data Mining Practical Machine Learning Tools and Techniques

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Fast Matching of Binary Features

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Supporting Online Material for

SIGNAL INTERPRETATION

Automatic parameter regulation for a tracking system with an auto-critical function

Knowledge Discovery from patents using KMX Text Analytics

Big Data: Image & Video Analytics

Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Neovision2 Performance Evaluation Protocol

arxiv: v2 [cs.cv] 9 Mar 2016

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Transform-based Domain Adaptation for Big Data

Taking Inverse Graphics Seriously

Determining optimal window size for texture feature extraction methods

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Large Scale Semi-supervised Object Detection using Visual and Semantic Knowledge Transfer

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

The Delicate Art of Flower Classification

Novelty Detection in image recognition using IRF Neural Networks properties

Inner Classification of Clusters for Online News

Simple and efficient online algorithms for real world applications

Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?

3D Model based Object Class Detection in An Arbitrary View

Color Segmentation Based Depth Image Filtering

arxiv: v1 [cs.cv] 18 May 2015

Convolutional Networks for Stock Trading

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Learning to Process Natural Language in Big Data Environment

Digital image processing

Transcription:

CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 3 Object detection Region based CNN (RCNN) Input image Extract region proposal Compute CNN features Any proposal method Any architecture (e.g., selective search, edgebox) Classification Softmax, SVM Independent evaluation of each proposal Bounding box regression improves detection accuracy. Mean average precision (map): 53.7% with bounding box regression in VOC 200 test set [Girshick4] R. Girshick, J. Donahue, S. Guadarrama, T. Darrell, J. Malik: Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 204 4 Motivation Selective Search Sliding window approach is not feasible for object detection with convolutional neural networks. We need a more faster method to identify object candidates. Finding object proposals Greedy hierarchical superpixel segmentation Diversification of superpixel construction and merge Using a variety of color spaces Using different similarity measures Varying staring regions [Uijlings3] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders: Selective Search for Object Recognition. IJCV 203

Bounding Box Regression Detection Results Learning a transformation of bounding box VOC 200 test set Region proposal:,,, Ground truth:,,, Transformation:,,, exp Feature analysis on VOC 2007 test set exp 5 argmin CNN pool5 feature 6 Fast RCNN Faster RCNN Fast RCNN + RPN Proposal computation into network Marginal cost of proposals: 0ms 7 Fast version of RCNN 9x faster in training and 23x faster in testing than RCNN A single feature computation and ROI pooling using object proposals Bounding box regression into network Single stage training using multi task loss [Girshick5] R. Girshick: Fast R CNN, ICCV 205 [Ren5] S. Ren, K. He, R. Girshick, J. Sun: Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks. NIPS 205 8

Object Detection Performance Faster RCNN with ResNet RCNN family achieves the state of the art performance in object detection! Pascal VOC 2007 Object Detection map (%) 9 0 Faster RCNN with ResNet Visual Tracking with Convolutional Neural Networks 2

Main Idea Training shared features and domain specific classifiers jointly. Domain Domain specific classifiers Visual Tracking MDNet (Multi Domain Network) Multi domain learning Separating shared and domain specific layers Shared feature representation Domain 2 Domain 3 Domain 4 3 Transfer to a new domain Multi Domain Learning [Nam5] Hyeonseob Nam, Bohyung Han: Learning Multi Domain Convolutional Neural Networks for Visual Tracking, CVPR 206 4 The Winner of Visual Object Tracking Challenge 205 Online Tracking using MDNet Features Iteration #nk+ #nk+2 Transfer shared features New Sequence 5 6

Online Tracking using MDNet Features Online Tracking: Overview : positive score Transfer shared features Frame 2 argmax x New Sequence Draw target candidates Find the optimal state Collect training samples Update the CNN if needed Fine Tuning Repeat for the next frame 7 8 Long Term Update Performed at regular intervals Using long term training samples For Robustness Online Network Update Long-term update Short Term Update Performed at abrupt appearance changes ( 0.5 Using short term training samples For Adaptiveness Provide a hard minibatch in each training iteration. Pool of Negative Samples Randomly draw samples Hard Negative Mining Select samples with highest scores A MINIBATCH Training CNN 9 0.82 0.9 0.86 0.93 0.94 0.85 0.73 0.78 0.66 0.38 0.53 0.47 0.62 0.83 0.88 Frame # Short-term update 20 Pool of Positive Samples Randomly draw samples

Hard Negative Mining Bounding Box Regression Positive sample Negative sample Improve the localization quality. DPM [Felzenszwalb et al. PAMI 0], R CNN [Girshick et al. CVPR 4] Frame Frame Ground-Truth st minibatch 5 th minibatch 30 th minibatch Positive samples Train a bounding box regression model. Tracking result Adjust the tracking result by bounding box regression. Training iteration 2 22 Results on OTB00 [Wu5] Results on VOT205 Protocol MDNet is trained with 58 sequences from {VOT 3, 4, 5} excluding {OTB00}. Distance precision and overlap success rate by One Pass Evaluation (OPE) 23 [Wu5] Y. Wu, J. Lim, M. H. Yang: Object Tracking Benchmark. TPAMI 205 24 Ground truth Our 5 repetitions

Semantic Segmentation Segmenting images based on its semantic notion Semantic Segmentation by Fully Convolutional Network 25 26 Semantic Segmentation using CNN Image classification Fully Convolutional Network (FCN) Interpreting fully connected layers as convolution layers Each fully connected layer is identical to a convolution layer with a large spatial filter that covers entire input field. Query image Semantic segmentation Given an input image, obtain pixel wise segmentation mask using a deep Convolutional Neural Network (CNN) fc7 fc6 pool5 7 7 52 fc7 fc6 fc7 fc6 6 6 6 6 pool5 7 7 52 pool5 22 22 52 Fully connected layers Convolution layers For the larger Input field Query image 27 28

FCN for Semantic Segmentation Network architecture [Long5] End to end CNN architecture for semantic segmentation Interpret fully connected layers to convolutional layers 500x500x3 Bilinear interpolation filter Deconvolution Filter Same filter for every class No filter learning! How does this deconvolution work? Deconvolution layer is fixed. Fining tuning convolutional layers of the network with segmentation ground truth. 6x6x2 seg Deconvolution Fixed Pretrained on ImageNet Fine tuned for segmentation 64x64 bilinear interpolation [Long5] J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Network for Semantic Segmentation. CVPR 205 29 30 Skip Architecture Ensemble of three different scales Combining complementary features More semantic Limitations of FCN based Semantic Segmentation Coarse output score map A single bilinear filter should handle the variations in all kinds of object classes. Difficult to capture detailed structure of objects in image Fixed size receptive field Unable to handle multiple scales Difficult to delineate too small or large objects compared to the size of rec eptive field Noisy predictions due to skip architecture Trade off between details and noises Minor quantitative performance improvement 3 More detailed 32

Results and Limitations Results and Limitations Input image GT FCN 32s FCN 6s FCN 8s Input image GT FCN 32s FCN 6s FCN 8s 33 34 35