CNN Based Object Detection in Large Video Images. WangTao, wtao@qiyi.com IQIYI ltd. 2016.4



Similar documents
Bert Huang Department of Computer Science Virginia Tech

Pedestrian Detection with RCNN

Compacting ConvNets for end to end Learning

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

arxiv: v2 [cs.cv] 19 Jun 2015

Convolutional Feature Maps

Steven C.H. Hoi School of Information Systems Singapore Management University

Image and Video Understanding

Pedestrian Detection using R-CNN

Image Classification for Dogs and Cats

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

arxiv: v1 [cs.cv] 29 Apr 2016

Semantic Recognition: Object Detection and Scene Segmentation

CAP 6412 Advanced Computer Vision

Deformable Part Models with CNN Features

Learning to Process Natural Language in Big Data Environment

Fast Matching of Binary Features

arxiv: v2 [cs.cv] 15 Apr 2015

Object Detection in Video using Faster R-CNN

InstaNet: Object Classification Applied to Instagram Image Streams

Fast R-CNN Object detection with Caffe

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:

Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling

Applications of Deep Learning to the GEOINT mission. June 2015

Scalable Object Detection by Filter Compression with Regularized Sparse Coding

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks

SSD: Single Shot MultiBox Detector

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Introduction to Machine Learning CMU-10701

Object Recognition. Selim Aksoy. Bilkent University

Network Morphism. Abstract. 1. Introduction. Tao Wei

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

SIGNAL INTERPRETATION

Deep Learning For Text Processing

Transform-based Domain Adaptation for Big Data

Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers

Multi-view Face Detection Using Deep Convolutional Neural Networks

Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?

GPU-Based Deep Learning Inference:

Neural Network based Vehicle Classification for Intelligent Traffic Control

Real-Time Grasp Detection Using Convolutional Neural Networks

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Task-driven Progressive Part Localization for Fine-grained Recognition

The Delicate Art of Flower Classification

Learning and transferring mid-level image representions using convolutional neural networks

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Data Mining and Predictive Analytics - Assignment 1 Image Popularity Prediction on Social Networks

Simultaneous Deep Transfer Across Domains and Tasks

A Convolutional Neural Network Cascade for Face Detection

Image Search by MapReduce

arxiv: v2 [cs.cv] 19 Apr 2014

3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels

Semantic Image Segmentation and Web-Supervised Visual Learning

Do Convnets Learn Correspondence?

Getting Started with Caffe Julien Demouth, Senior Engineer

Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance

Object Detection from Video Tubelets with Convolutional Neural Networks

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

Latest Advances in Deep Learning. Yao Chou

Character Image Patterns as Big Data

An automatic system for sports analytics in multi-camera tennis videos

Advanced analytics at your hands

arxiv: v1 [cs.cv] 6 Feb 2015

Novelty Detection in image recognition using IRF Neural Networks properties

Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu

An Introduction to Deep Learning

Marr Revisited: 2D-3D Alignment via Surface Normal Prediction

Search Result Optimization using Annotators

Deep learning applications and challenges in big data analytics

Deep Residual Networks

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Azure Machine Learning, SQL Data Mining and R

arxiv: v6 [cs.cv] 10 Apr 2015

The Applications of Deep Learning on Traffic Identification

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation

Denoising Convolutional Autoencoders for Noisy Speech Recognition

arxiv: v1 [cs.cv] 18 May 2015

Distributed forests for MapReduce-based machine learning

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Speed Performance Improvement of Vehicle Blob Tracking System

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION

EdVidParse: Detecting People and Content in Educational Videos

HE Shuncheng March 20, 2016

Search and Information Retrieval

A Dynamic Convolutional Layer for Short Range Weather Prediction

Obtaining Value from Big Data

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

The Visual Internet of Things System Based on Depth Camera

CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Transcription:

CNN Based Object Detection in Large Video Images WangTao, wtao@qiyi.com IQIYI ltd. 2016.4

Outline Introduction Background Challenge Our approach System framework Object detection Scene recognition Body segmentation Same style matching Experiments Conclusion

Video out applications Background Image retrieval Video advertising

Challenge Real video data vs. image dataset - Clutter background - Multiple objects - Small objects - Variant pose/position - Partial occlusion

Our task Problems: Content based object retrieval in large video images High accuracy for same style matching High speed in large video database Solution: Accurate object detection + scene classification Discriminated DNN features and PCA/LDA transformation Speed up by parallel indexing and hierarchical filtering

System framework Video key frame Scene Classification Object detection Body segmentation CNN feature Indexing Database indexing Scene Classification Query image Faster-RCNN rect Body segmentation CNN feature Match query Distance sort Result

Object detection (I) Object detection by faster-rcnn Faster-RCNN, Region proposals + object scores, [Ren, Shaoqing, et al. NIPS2015] Trained on MS coco db (300k images) + video images (10k images) More pervasive and general for images with multi-objects

Multi-class object detection including Clothes(skirt,jacket,trousers) Bags(handbag, backpack, draw-bar box ) Electronics (mobile, laptop,tv,keyboard,mouse, microwave oven, oven, refrigerator ) Glasses, necklace, hat Shoes

Object detection (II) Object detection by CNN regression Input an image, output the coordinates of the object rectangle [Erhan, Dumitru, et al. CVPR2014] Efficient for images with single object, not recognized by faster-rcnn

Body Segmentation Constraint by human body parts CNN based body segmentation [Jonathan Long,CVPR2015] Bounding box, body mask, body parsing original image segmentation image

Scene classification CNN based Scene classification [Bolei Zhou, NIPS2014] Video Key frame Is Scene? yes/no CNN absed Scene classification Multi-frame fusion tags Scene classification Preciosn:65.8% Recall:74% Non scene images Scene images of kitchen, office, living room, and bedroom Threshold@0.7 Preciosn:83.8% Recall:56.7%

Scene classes 0 kitchen 1 dining 2 bakery 3 ice_cream_parlor 4 bathroom 5 washing_room 6 bedroom 7 living_room 8 office 9 children_room 10 nursery 11 toyshop 12 shoe_shop 13 jewelry_shop 14 outdoor_ice_world 15 indoor_ice_skating_rink 16 baseball 17 football 18 basketball_court 19 swimming_pool 20 track 21 bowling_alley 22 billiards 23 tennis 24 volleyball 25 gymnasium 26 pleasure_ground 27 hospital_room 28 dentists 29 drugstore 30 music_studio 31 music_store 32 sandbeach 33 hairsalon 34 bar 35 pagoda 36 bamboo_forest 37 mountain 38 coast 39 creek 40 waterfall 41 grass 42 other

Same style matching SIFT feature matching Normalization of SIFT Dimension : 128dim x 400pts MAP 22% CNN feature of imagenet 1k classifier Model :VGG19 Layers : fc7 Dimension : 4096 600 MAP 28% CNN feature of Same style classifier Model :VGG19 Layers : fc7 Dimension : 4096 600 MAP 34%

Multi-feature fusion Same class matching classifier on imagenet 21k classes of 15M images Same style matching classifier trained on 1239 queries of 1M images CNN Models Feature dim MAP Inception_bn1k 1024 24% Inception_21k 1024 34% Vgg19_caffe 4096 34% Inception_21k + vgg19_caffe 5120 43% Speed Nvidia K40 GPU, 10x faster than CPU i7 Faster RCNN speed: 200ms/frame, image size 1920x1080 Vgg19 feature speed: 60ms/frame, image size 256x256

Experiments MAP precision on 3M testing images, trained on1m images Vgg 19model Full image Object rectangle PCA+LDA Inception-21k MAP 27.8% 34.2% 37.3% 43.1% 46.1% Speed up Parallel flann tree indexing Hierarchical filtering by object classes, 10x faster speed Query speed: 1s /image on 5000 teleplays with 2M images

Query system GUI

Query examples on image dataset

Query examples on video dataset

Conclusion Bounding box is important to recognize object Fusion Same style matching with same class matching features to get higher accuracy PCA and LDA further improve accuracy and speed GPU is faster for CNN feature extraction Speed up query by parallel indexing and hierarchical filtering

References Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in Neural Information Processing Systems. 2015. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. Arandjelović, Relja, and Andrew Zisserman. "Three things everyone should know to improve object retrieval." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012. Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolution Networks for Semantic Segmentation. CVPR 2015 arxiv:1411.4038. Conditional Random Fields as Recurrent Neural Networks. S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, P. Torr ICCV 2015. Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition, Clinical Orthopaedics and Related Research, 2015 Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba and Aude Oliva, Learning Deep Features for Scene Recognition using Places Database, NIPS, 2014 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva and Antonio Torralba, Object detectors emerge in deep scene cnns, ICLR, 2015 Ruobing Wu, Baoyuan Wang, Wenping Wang and Yizhou Yu, Harvesting discriminative meta objects with deep CNN features for Scene Classification, ICCV, 2015 Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna,Rethinking the Inception Architecture for Computer Vision, arxiv:1512.00567,2015