Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Size: px
Start display at page:

Download "Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016"

Transcription

1 Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

2 Previously, end-to-end.. Dog Slide credit: Jose M 2

3 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose M 3

4 Previously, end-to-end.. Dog Learned Representation Part I: End-to-end learning (E2E) 4

5 Previously, end-to-end.. Learned Representation Task A (eg. image classification) Part I: End-to-end learning (E2E) 5

6 Previously,finetuning.. Part I: End-to-end learning (E2E) Learned Representation Domain A Transfer Part I: End-to-end learning (E2E) Part I: End-to-end learning (E2E) Fine-tuned Learned Representation Domain B Part I : End-to-End Fine-Tuning (FT) 6 slide credit: X. Giro

7 Previously,finetuning.. Fine-tuning a pre-trained network Slide credit: Victor Campos, Layer-wise CNN surgery for Visual Sentiment Prediction (ETSETB 2015) 7

8 Previously,finetuning.. Fine-tuning a pre-trained network Fine-tuning: High learning rate in new layer, and low learning rate in all other layers. Slide credit: Victor Campos, Layer-wise CNN surgery for Visual Sentiment Prediction (ETSETB 2015) 8

9 Previously, off-the-shelf features.. Learned Representation Task A (eg. image classification) Part I: End-to-end learning (E2E) Part II: Off-the-shelf features Task B (eg. image retrieval) 9 slide credit: X. Giro

10 Previously, off-the-shelf features.. Image classification: image as an input, label as output Orange 1 1 df d d d x y F spatial coded image representations (like spatial pyramids) orderless image representation (like BOW)

11 Two deep lectures in M5 Deep ConvNets for Recognition at... Global Scale (today s lecture) Local Scale (next lecture)

12 Image Classification Image classification: image as an input, label as output Orange How to process non-squared images? resize zero padding largest centred square

13 Local object recognition object localization (single object) object detection semantic segmentation

14 Classification+LOCALIZATION slide credit: Li, Karpathy, Johnson

15 Localization as regression slide credit: Li, Karpathy, Johnson

16 Localization as regression slide credit: Li, Karpathy, Johnson

17 Localization as regression classification head slide credit: Li, Karpathy, Johnson regression head

18 Localization as regression classification head slide credit: Li, Karpathy, Johnson regression head

19 Localization as regression slide credit: Li, Karpathy, Johnson

20 Localization as regression Problem: multiple classes Classification head: C- class scores slide credit: Li, Karpathy, Johnson regression head: Cx4 - numbers

21 Localization as regression slide credit: Li, Karpathy, Johnson

22 Localization as regression (example) Example of localization of cloths. Regression is done in two steps: first the person bounding box and then the cloth bounding boxes (master project 2015) Esteve Cervantes: Evaluating deep features for Fashion Recognition

23 Local object recognition object localization (single object) object detection any ideas? semantic segmentation

24 Sliding window classification + regression 227 classification + regression 0.83 Compute a new regressed bounding box and classification score for all sliding window positions.

25 Sliding window Repeat for different scales and combine all results (e.g. with non maxima suppression)

26 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x car/not car conv 1 fc1 fc2 What are the spatial coordinates of conv1? 10 Part of the convolutional features are the same and do not need recomputation! 12x17

27 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car 10 How many 10x10 windows are there in this 12x17 image? 12x17

28 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car x5 8 The convolutions can be computed in a single pass. 12x conv 1

29 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car x6x5 1x1x10 5x5 8 12x conv 1 fc2

30 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car x5 8 12x conv 1 (5x5x3) 8 fc2=conv2 (6x6x5)

31 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car x1x2 5x5 8 12x conv 1 (5x5x3) 8 fc2=conv2 (6x6x5) fc3

32 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network 10 conv1 filter (5x5) x car/not car conv 1 fc1 fc2 We have the 8x3=24 classification scores sharing computation of the convolutional feaures x5 8 12x conv 1 5 fillters of (5x5x3) 8 fc2=conv2 10 filters of (6x6x5) 8 fc3=conv3 2 filters of (1x1x10)

33 Sliding window (efficient computation) Networks can be written as fully convolutional networks to speed up computation at testing time. Example of bear and fish detection on multiple scales. Semanet et al, Integrated Recognition, Localization and Detection using Convolutional Networks ICLR 2014

34 object proposals object proposal methods compute boxes which potentially contain an object. Features for each box are extracted and a classifier is applied. typically thousands of boxes (but much less than sliding window) Many different approaches: selective search, edge boxes, GOP, etc. selective search K. Van de Sande et al. Segmentation as selective search for object recognition. ICCV 2011.

35 object proposals (RCNN) bounding box regression car: yes person : no 1. compute object proposals (~2k) 2. warp dilated bounding box 3. compute CNN features 4. classify regions Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.

36 object proposals (RCNN) Alex Net Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.

37 object proposals (RCNN) remove last layer and finetune for 20 PASCAL classes Alex Net Use fc d vector as the description of the bounding box. Train a SVM on this representation for classification Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.

38 object proposals (RCNN) slide credit: Girshick

39 object proposals (RCNN)

40 object proposals (RCNN) slide credit: Li, Karpathy, Johnson

41 object proposals (RCNN) drawbacks: not end-to-end warping of boxes lots of double computation (overlap of bounding boxes) improved bounding box car: yes person : no 1. compute object proposals (~2k) 2. warp dilated bounding box 3. compute CNN features 4. classify regions Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.

42 object proposals (Fast R-CNN)

43 shared computation (conv1-conv5) object proposals (Fast R-CNN) conv 5 compute ones the convolutional features per image. He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." PAMI 2015

44 shared computation object proposals (Fast R-CNN) conv 5 compute ones the convolutional features extract features from conv5 for all bb s This was first proposed by: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." PAMI 2015

45 shared computation object proposals (Fast R-CNN) for all bounding boxes: Region of Interest pooling (ROI pooling) pool the features in a spatial grid.

46 shared computation object proposals (Fast R-CNN) classification: log loss ROI pooling: FCs regression: smooth L1 loss pool the features in a spatial grid end-to-end training

47 object proposals (Fast R-CNN) multi-task improves also classification performance. end-to-end improves results Fast R-CNN R-CNN Train time speedup 8.8x - Test time/image 0.32s 47s Test speedup 146x - map 66.9% 66.0% Test time does not include object proposal computation (which is now the bottleneck)

48 shared computation object proposals (Faster R-CNN) FCs Region Proposal Network (RPN) ROI pooling: conv5 compute the object proposals directly in the network.

49 object proposals (Faster R-CNN) Slide a window over the feature map. Add a network which classifies and regresses the bounding boxes. The classification score provides the confidence of the presence of object. slide credit: Kaming He

50 object proposals (Faster R-CNN) Slide a window over the feature map. Add a network which classifies and regresses the bounding boxes. The classification score provides the confidence of the presence of object. Use N anchors for proposals of varying aspect ratios. slide credit: Kaming He

51 object proposals (Faster R-CNN) Model Time Edge boxes + R-CNN 0.25 sec *ConvTime *FcTime Edge boxes + fast R-CNN 0.25 sec + 1*ConvTime *FcTime faster R-CNN 1*ConvTime *FcTime Computation for 1000 boxes. slide credit: Kaming He

52 object proposals (Faster R-CNN) slide credit: Li, Karpathy, johnson

53 object proposals (Faster R-CNN) slide credit: Li, Karpathy, johnson

54 object localization Winner ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with residual networks and Faster RCNN challenge

55 object localization Winner ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with residual networks and Faster RCNN 2015 challenge

56 summary object detection object localization: when there is one or a known number of objects/classes you can do object localization by adding a regression head to your network. Sliding window + CNN can be computed efficiently by writing the network as a fully convolutional network. Object proposal methods are straightforwardly combined with CNNs, but for fast/good results consider: adding a regression head to improve bounding box estimation. share computation of the convolutional features (SPP) end-to-end training of network (fast RCNN) include Region Proposal Network for fast object proposals within the network (faster RCNN). slide credit: Li, Karpathy, johnson

57 Local object recognition object localization (single object) object detection semantic segmentation

58 semantic segmentation semantic segmentation: assign a class to all pixels instance segmentation : assign pixels to a particular instance of a class (chair1, etc..)

59 semantic segmentation ConvNet predict center pixel Write network as fully convolutional network and apply to image Because of the convolutions the resolution is smaller and upsampling is required

60 semantic segmentation pixelwise loss Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015

61 input semantic segmentation Convolution (3x3) padding [ ] stride [1 1] Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015

62 input semantic segmentation Convolution (3x3) padding [ ] stride [1 1]

63 input input semantic segmentation Convolution (3x3) padding [ ] stride [1 1] Convolution (3x3) padding [ ] stride [2 2]

64 input input semantic segmentation Convolution (3x3) padding [ ] stride [1 1] Convolution (3x3) padding [ ] stride [2 2]

65 input semantic segmentation deconvolution (3x3) padding [ ] stride [2 2]

66 input semantic segmentation deconvolution (3x3) padding [ ] stride [2 2] deconvolutions are also called fractionally strided convolutions, convolution transpose.

67 semantic segmentation Noh et al. ICCV 2015

68 semantic segmentation Noh et al. ICCV 2015

69 semantic segmentation combine where (local, shallow) with what (global, deep) Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015

70 semantic segmentation skip layers interp + sum interp + sum dense output Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015

71 semantic segmentation input image stride 32 stride 16 stride 8 ground truth no skips 1 skip 2 skips Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015

72 semantic segmentation Eigen, Fergus, Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture, ICCV 2015

73 semantic segmentation Surface normals results Eigen, Fergus, Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture, ICCV 2015

74 instance segmentation Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.

75 instance segmentation Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.

76 instance segmentation Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.

77 instance segmentation results ground-truth Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.

78 Generative Adversarial Networks noise Fractionally strided convolutions (deconvolutions) can be used to generate images. Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.

79 Generative Adversarial Networks Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z. generated horses G(z) real horses x I can train a discriminative network D which is trained to distinguish real horse images x from generated horse images G(z) D max log D x log 1 D D G z

80 Generative Adversarial Networks Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z. generated horses G(z) real horses x I can then optimize my generative network to fool the discriminative network. D min G maxlog D x log 1 D D G z

81 Generative Adversarial Networks Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z. generated horses G(z) real horses x D You can re-optimize the Discriminate network D, etc... min G maxlog D x log 1 D D G z

82 Generative Adversarial Networks Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z. generated horses G(z) real horses x You can re-optimize the Discriminate network D, etc...until D gives in... D min G maxlog D x log 1 D D G z Goodman et al. Generative Adversarial Nets NIPS 2014

83 Generative Adversarial Networks Examples of generated bedrooms. Unsupervised Representation Radford et al. Learning with Deep Convolutional Generative Adversarial Nteworks ICLR 2016

84 Generative Adversarial Networks Interpolation between points in z. Unsupervised Representation Radford et al. Learning with Deep Convolutional Generative Adversarial Nteworks ICLR 2016

85 summary semantic segmentation Fully convolutional networks can be applied for efficient classification of all pixels. To get high quality segmentations deep features of multiple scales need to be combined (e.g. with skip layers). upsampling can be done by de-convolution and de-pooling operations. Instance segmentation can be performed by combining object detection and semantic segmentation pipelines. slide credit: Li, Karpathy, johnson

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection

More information

Convolutional Feature Maps

Convolutional Feature Maps Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview

More information

Lecture 6: Classification & Localization. boris. [email protected]

Lecture 6: Classification & Localization. boris. ginzburg@intel.com Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014

More information

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy

More information

Fast R-CNN Object detection with Caffe

Fast R-CNN Object detection with Caffe Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets

More information

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing

More information

Compacting ConvNets for end to end Learning

Compacting ConvNets for end to end Learning Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,

More information

Deformable Part Models with CNN Features

Deformable Part Models with CNN Features Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we

More information

Pedestrian Detection with RCNN

Pedestrian Detection with RCNN Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional

More information

Image and Video Understanding

Image and Video Understanding Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material

More information

Semantic Recognition: Object Detection and Scene Segmentation

Semantic Recognition: Object Detection and Scene Segmentation Semantic Recognition: Object Detection and Scene Segmentation Xuming He [email protected] Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Jan 26, 2016 Today Administrivia A bigger picture and some common questions Object detection proposals, by Samer

More information

Bert Huang Department of Computer Science Virginia Tech

Bert Huang Department of Computer Science Virginia Tech This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students

More information

Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual

More information

Pedestrian Detection using R-CNN

Pedestrian Detection using R-CNN Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection

More information

Image Classification for Dogs and Cats

Image Classification for Dogs and Cats Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science [email protected] Abstract

More information

SIGNAL INTERPRETATION

SIGNAL INTERPRETATION SIGNAL INTERPRETATION Lecture 6: ConvNets February 11, 2016 Heikki Huttunen [email protected] Department of Signal Processing Tampere University of Technology CONVNETS Continued from previous slideset

More information

arxiv:1604.08893v1 [cs.cv] 29 Apr 2016

arxiv:1604.08893v1 [cs.cv] 29 Apr 2016 Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain {amaia.salvador,xavier.giro}@upc.edu Shin ichi

More information

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan 1 MulticoreWare Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan Focused on Heterogeneous Computing Multiple verticals spawned from core competency Machine Learning

More information

Applications of Deep Learning to the GEOINT mission. June 2015

Applications of Deep Learning to the GEOINT mission. June 2015 Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics

More information

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning : Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May

More information

Object Detection in Video using Faster R-CNN

Object Detection in Video using Faster R-CNN Object Detection in Video using Faster R-CNN Prajit Ramachandran University of Illinois at Urbana-Champaign [email protected] Abstract Convolutional neural networks (CNN) currently dominate the computer

More information

Learning and transferring mid-level image representions using convolutional neural networks

Learning and transferring mid-level image representions using convolutional neural networks Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there

More information

Local features and matching. Image classification & object localization

Local features and matching. Image classification & object localization Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

More information

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 1 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun arxiv:1506.01497v3 [cs.cv] 6 Jan 2016 Abstract State-of-the-art object

More information

Latest Advances in Deep Learning. Yao Chou

Latest Advances in Deep Learning. Yao Chou Latest Advances in Deep Learning Yao Chou Outline Introduction Images Classification Object Detection R-CNN Traditional Feature Descriptor Selective Search Implementation Latest Application Deep Learning

More information

arxiv:1409.1556v6 [cs.cv] 10 Apr 2015

arxiv:1409.1556v6 [cs.cv] 10 Apr 2015 VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman + Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk

More information

Deep Residual Networks

Deep Residual Networks Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July 2016. Formerly affiliated with Microsoft Research Asia 7x7 conv,

More information

Scalable Object Detection by Filter Compression with Regularized Sparse Coding

Scalable Object Detection by Filter Compression with Regularized Sparse Coding Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical

More information

Object Recognition. Selim Aksoy. Bilkent University [email protected]

Object Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] Image classification Image (scene) classification is a fundamental

More information

Semantic Image Segmentation and Web-Supervised Visual Learning

Semantic Image Segmentation and Web-Supervised Visual Learning Semantic Image Segmentation and Web-Supervised Visual Learning Florian Schroff Andrew Zisserman University of Oxford, UK Antonio Criminisi Microsoft Research Ltd, Cambridge, UK Outline Part I: Semantic

More information

R-CNN minus R. 1 Introduction. Karel Lenc http://www.robots.ox.ac.uk/~karel. Department of Engineering Science, University of Oxford, Oxford, UK.

R-CNN minus R. 1 Introduction. Karel Lenc http://www.robots.ox.ac.uk/~karel. Department of Engineering Science, University of Oxford, Oxford, UK. LENC, VEDALDI: R-CNN MINUS R 1 R-CNN minus R Karel Lenc http://www.robots.ox.ac.uk/~karel Andrea Vedaldi http://www.robots.ox.ac.uk/~vedaldi Department of Engineering Science, University of Oxford, Oxford,

More information

arxiv:1504.08083v2 [cs.cv] 27 Sep 2015

arxiv:1504.08083v2 [cs.cv] 27 Sep 2015 Fast R-CNN Ross Girshick Microsoft Research [email protected] arxiv:1504.08083v2 [cs.cv] 27 Sep 2015 Abstract This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object

More information

Multi-view Face Detection Using Deep Convolutional Neural Networks

Multi-view Face Detection Using Deep Convolutional Neural Networks Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo [email protected] Mohammad Saberian Yahoo [email protected] Li-Jia Li Yahoo [email protected]

More information

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance

More information

Getting Started with Caffe Julien Demouth, Senior Engineer

Getting Started with Caffe Julien Demouth, Senior Engineer Getting Started with Caffe Julien Demouth, Senior Engineer What is Caffe? Open Source Framework for Deep Learning http://github.com/bvlc/caffe Developed by the Berkeley Vision and Learning Center (BVLC)

More information

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/ Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey

More information

Do Convnets Learn Correspondence?

Do Convnets Learn Correspondence? Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained

More information

Learning to Process Natural Language in Big Data Environment

Learning to Process Natural Language in Big Data Environment CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview

More information

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform

More information

CNN Based Object Detection in Large Video Images. WangTao, [email protected] IQIYI ltd. 2016.4

CNN Based Object Detection in Large Video Images. WangTao, wtao@qiyi.com IQIYI ltd. 2016.4 CNN Based Object Detection in Large Video Images WangTao, [email protected] IQIYI ltd. 2016.4 Outline Introduction Background Challenge Our approach System framework Object detection Scene recognition Body

More information

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

High Quality Image Magnification using Cross-Scale Self-Similarity

High Quality Image Magnification using Cross-Scale Self-Similarity High Quality Image Magnification using Cross-Scale Self-Similarity André Gooßen 1, Arne Ehlers 1, Thomas Pralow 2, Rolf-Rainer Grigat 1 1 Vision Systems, Hamburg University of Technology, D-21079 Hamburg

More information

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

More information

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree. Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table

More information

Big Data: Image & Video Analytics

Big Data: Image & Video Analytics Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)

More information

Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN

Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN Xiu Li 1, 2, Min Shang 1, 2, Hongwei Qin 1, 2, Liansheng Chen 1, 2 1. Department of Automation, Tsinghua University, Beijing

More information

T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN

T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN Goal is to process 360 degree images and detect two object categories 1. Pedestrians,

More information

The Visual Internet of Things System Based on Depth Camera

The Visual Internet of Things System Based on Depth Camera The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed

More information

arxiv:1312.6034v2 [cs.cv] 19 Apr 2014

arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,

More information

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Denoising Convolutional Autoencoders for Noisy Speech Recognition Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University [email protected] Victor Zhong Stanford University [email protected] Abstract We propose the use of

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information

More information

Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers

Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers Fan Yang 1,2, Wongun Choi 2, and Yuanqing Lin 2 1 Department of Computer Science,

More information

SSD: Single Shot MultiBox Detector

SSD: Single Shot MultiBox Detector SSD: Single Shot MultiBox Detector Wei Liu 1, Dragomir Anguelov 2, Dumitru Erhan 3, Christian Szegedy 3, Scott Reed 4, Cheng-Yang Fu 1, Alexander C. Berg 1 1 UNC Chapel Hill 2 Zoox Inc. 3 Google Inc. 4

More information

Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation

Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation ACCEPTED BY IEEE TIP 1 Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation Yu Zhang, Xiu-Shen Wei, Jianxin Wu, Member, IEEE, Jianfei Cai, Senior Member, IEEE, Jiangbo Lu,

More information

Object Detection from Video Tubelets with Convolutional Neural Networks

Object Detection from Video Tubelets with Convolutional Neural Networks Object Detection from Video Tubelets with Convolutional Neural Networks Kai Kang Wanli Ouyang Hongsheng Li Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong {kkang,wlouyang,hsli,xgwang}@ee.cuhk.edu.hk

More information

Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition

Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition 1. Image Pre-Processing - Pixel Brightness Transformation - Geometric Transformation - Image Denoising 1 1. Image Pre-Processing

More information

Two-Stream Convolutional Networks for Action Recognition in Videos

Two-Stream Convolutional Networks for Action Recognition in Videos Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Visual Geometry Group, University of Oxford {karen,az}@robots.ox.ac.uk Abstract We investigate architectures

More information

Image Super-Resolution Using Deep Convolutional Networks

Image Super-Resolution Using Deep Convolutional Networks 1 Image Super-Resolution Using Deep Convolutional Networks Chao Dong, Chen Change Loy, Member, IEEE, Kaiming He, Member, IEEE, and Xiaoou Tang, Fellow, IEEE arxiv:1501.00092v3 [cs.cv] 31 Jul 2015 Abstract

More information

EdVidParse: Detecting People and Content in Educational Videos

EdVidParse: Detecting People and Content in Educational Videos EdVidParse: Detecting People and Content in Educational Videos by Michele Pratusevich S.B., Massachusetts Institute of Technology (2013) Submitted to the Department of Electrical Engineering and Computer

More information

Going Deeper with Convolutional Neural Network for Intelligent Transportation

Going Deeper with Convolutional Neural Network for Intelligent Transportation Going Deeper with Convolutional Neural Network for Intelligent Transportation by Tairui Chen A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE In partial fulfillment of the requirements

More information

Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling

Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling Ming Liang Xiaolin Hu Bo Zhang Tsinghua National Laboratory for Information Science and Technology (TNList) Department

More information

Part-Based Recognition

Part-Based Recognition Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple

More information

3D Model based Object Class Detection in An Arbitrary View

3D Model based Object Class Detection in An Arbitrary View 3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/

More information

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Philip Lenz 1 Andreas Geiger 2 Christoph Stiller 1 Raquel Urtasun 3 1 KARLSRUHE INSTITUTE OF TECHNOLOGY 2 MAX-PLANCK-INSTITUTE IS 3

More information

arxiv:1502.01852v1 [cs.cv] 6 Feb 2015

arxiv:1502.01852v1 [cs.cv] 6 Feb 2015 Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun arxiv:1502.01852v1 [cs.cv] 6 Feb 2015 Abstract Rectified activation

More information

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks 1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of

More information

Task-driven Progressive Part Localization for Fine-grained Recognition

Task-driven Progressive Part Localization for Fine-grained Recognition Task-driven Progressive Part Localization for Fine-grained Recognition Chen Huang Zhihai He [email protected] University of Missouri [email protected] Abstract In this paper we propose a task-driven

More information

Determining optimal window size for texture feature extraction methods

Determining optimal window size for texture feature extraction methods IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec

More information

Digital image processing

Digital image processing 746A27 Remote Sensing and GIS Lecture 4 Digital image processing Chandan Roy Guest Lecturer Department of Computer and Information Science Linköping University Digital Image Processing Most of the common

More information

The Relationship between Artificial Intelligence and Finance

The Relationship between Artificial Intelligence and Finance Material 1 The Relationship between Artificial Intelligence and Finance University of Tokyo, Yutaka Matsuo Provisional Translation by the Secretariat Please refer to the original material in Japanese 1

More information

arxiv:1511.02300v2 [cs.cv] 9 Mar 2016

arxiv:1511.02300v2 [cs.cv] 9 Mar 2016 Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images Shuran Song Jianxiong Xiao Princeton University http://dss.cs.princeton.edu arxiv:1511.02300v2 [cs.cv] 9 Mar 2016 Abstract We focus on

More information

Computational Foundations of Cognitive Science

Computational Foundations of Cognitive Science Computational Foundations of Cognitive Science Lecture 15: Convolutions and Kernels Frank Keller School of Informatics University of Edinburgh [email protected] February 23, 2010 Frank Keller Computational

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References

More information

Limitations of Human Vision. What is computer vision? What is computer vision (cont d)?

Limitations of Human Vision. What is computer vision? What is computer vision (cont d)? What is computer vision? Limitations of Human Vision Slide 1 Computer vision (image understanding) is a discipline that studies how to reconstruct, interpret and understand a 3D scene from its 2D images

More information

Edge Boxes: Locating Object Proposals from Edges

Edge Boxes: Locating Object Proposals from Edges Edge Boxes: Locating Object Proposals from Edges C. Lawrence Zitnick and Piotr Dollár Microsoft Research Abstract. The use of object proposals is an effective recent approach for increasing the computational

More information

arxiv:1506.03365v2 [cs.cv] 19 Jun 2015

arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop Fisher Yu Yinda Zhang Shuran Song Ari Seff Jianxiong Xiao arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 Princeton

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected]

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected] Department of Applied Mathematics Ecole Centrale Paris Galen

More information

Transform-based Domain Adaptation for Big Data

Transform-based Domain Adaptation for Big Data Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from

More information

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.

More information

InstaNet: Object Classification Applied to Instagram Image Streams

InstaNet: Object Classification Applied to Instagram Image Streams InstaNet: Object Classification Applied to Instagram Image Streams Clifford Huang Stanford University [email protected] Mikhail Sushkov Stanford University [email protected] Abstract The growing

More information

Water Flow in. Alex Vlachos, Valve July 28, 2010

Water Flow in. Alex Vlachos, Valve July 28, 2010 Water Flow in Alex Vlachos, Valve July 28, 2010 Outline Goals & Technical Constraints How Artists Create Flow Maps Flowing Normal Maps in Left 4 Dead 2 Flowing Color Maps in Portal 2 Left 4 Dead 2 Goals

More information

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS Alexander Velizhev 1 (presenter) Roman Shapovalov 2 Konrad Schindler 3 1 Hexagon Technology Center, Heerbrugg, Switzerland 2 Graphics & Media

More information

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

More information

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006 Object Recognition Large Image Databases and Small Codes for Object Recognition Pixels Description of scene contents Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

More information

Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics

Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics Dimitri Van De Ville Ecole Polytechnique Fédérale de Lausanne Biomedical Imaging Group [email protected]

More information

HE Shuncheng [email protected]. March 20, 2016

HE Shuncheng hsc12@outlook.com. March 20, 2016 Department of Automation Association of Science and Technology of Automation March 20, 2016 Contents Binary Figure 1: a cat? Figure 2: a dog? Binary : Given input data x (e.g. a picture), the output of

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

RECOGNIZING objects and localizing them in images is

RECOGNIZING objects and localizing them in images is 1 Region-based Convolutional Networks for Accurate Object Detection and Segmentation Ross Girshick, Jeff Donahue, Student Member, IEEE, Trevor Darrell, Member, IEEE, and Jitendra Malik, Fellow, IEEE Abstract

More information

Topological Data Analysis Applications to Computer Vision

Topological Data Analysis Applications to Computer Vision Topological Data Analysis Applications to Computer Vision Vitaliy Kurlin, http://kurlin.org Microsoft Research Cambridge and Durham University, UK Topological Data Analysis quantifies topological structures

More information