Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
|
|
|
- Tamsyn Farmer
- 9 years ago
- Views:
Transcription
1 Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
2 Previously, end-to-end.. Dog Slide credit: Jose M 2
3 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose M 3
4 Previously, end-to-end.. Dog Learned Representation Part I: End-to-end learning (E2E) 4
5 Previously, end-to-end.. Learned Representation Task A (eg. image classification) Part I: End-to-end learning (E2E) 5
6 Previously,finetuning.. Part I: End-to-end learning (E2E) Learned Representation Domain A Transfer Part I: End-to-end learning (E2E) Part I: End-to-end learning (E2E) Fine-tuned Learned Representation Domain B Part I : End-to-End Fine-Tuning (FT) 6 slide credit: X. Giro
7 Previously,finetuning.. Fine-tuning a pre-trained network Slide credit: Victor Campos, Layer-wise CNN surgery for Visual Sentiment Prediction (ETSETB 2015) 7
8 Previously,finetuning.. Fine-tuning a pre-trained network Fine-tuning: High learning rate in new layer, and low learning rate in all other layers. Slide credit: Victor Campos, Layer-wise CNN surgery for Visual Sentiment Prediction (ETSETB 2015) 8
9 Previously, off-the-shelf features.. Learned Representation Task A (eg. image classification) Part I: End-to-end learning (E2E) Part II: Off-the-shelf features Task B (eg. image retrieval) 9 slide credit: X. Giro
10 Previously, off-the-shelf features.. Image classification: image as an input, label as output Orange 1 1 df d d d x y F spatial coded image representations (like spatial pyramids) orderless image representation (like BOW)
11 Two deep lectures in M5 Deep ConvNets for Recognition at... Global Scale (today s lecture) Local Scale (next lecture)
12 Image Classification Image classification: image as an input, label as output Orange How to process non-squared images? resize zero padding largest centred square
13 Local object recognition object localization (single object) object detection semantic segmentation
14 Classification+LOCALIZATION slide credit: Li, Karpathy, Johnson
15 Localization as regression slide credit: Li, Karpathy, Johnson
16 Localization as regression slide credit: Li, Karpathy, Johnson
17 Localization as regression classification head slide credit: Li, Karpathy, Johnson regression head
18 Localization as regression classification head slide credit: Li, Karpathy, Johnson regression head
19 Localization as regression slide credit: Li, Karpathy, Johnson
20 Localization as regression Problem: multiple classes Classification head: C- class scores slide credit: Li, Karpathy, Johnson regression head: Cx4 - numbers
21 Localization as regression slide credit: Li, Karpathy, Johnson
22 Localization as regression (example) Example of localization of cloths. Regression is done in two steps: first the person bounding box and then the cloth bounding boxes (master project 2015) Esteve Cervantes: Evaluating deep features for Fashion Recognition
23 Local object recognition object localization (single object) object detection any ideas? semantic segmentation
24 Sliding window classification + regression 227 classification + regression 0.83 Compute a new regressed bounding box and classification score for all sliding window positions.
25 Sliding window Repeat for different scales and combine all results (e.g. with non maxima suppression)
26 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x car/not car conv 1 fc1 fc2 What are the spatial coordinates of conv1? 10 Part of the convolutional features are the same and do not need recomputation! 12x17
27 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car 10 How many 10x10 windows are there in this 12x17 image? 12x17
28 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car x5 8 The convolutions can be computed in a single pass. 12x conv 1
29 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car x6x5 1x1x10 5x5 8 12x conv 1 fc2
30 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car x5 8 12x conv 1 (5x5x3) 8 fc2=conv2 (6x6x5)
31 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network conv1 filter (5x5) x conv 1 fc1 fc2 car/not car x1x2 5x5 8 12x conv 1 (5x5x3) 8 fc2=conv2 (6x6x5) fc3
32 10 Sliding window (efficient computation) Let us for simplicity consider a simple three layer network 10 conv1 filter (5x5) x car/not car conv 1 fc1 fc2 We have the 8x3=24 classification scores sharing computation of the convolutional feaures x5 8 12x conv 1 5 fillters of (5x5x3) 8 fc2=conv2 10 filters of (6x6x5) 8 fc3=conv3 2 filters of (1x1x10)
33 Sliding window (efficient computation) Networks can be written as fully convolutional networks to speed up computation at testing time. Example of bear and fish detection on multiple scales. Semanet et al, Integrated Recognition, Localization and Detection using Convolutional Networks ICLR 2014
34 object proposals object proposal methods compute boxes which potentially contain an object. Features for each box are extracted and a classifier is applied. typically thousands of boxes (but much less than sliding window) Many different approaches: selective search, edge boxes, GOP, etc. selective search K. Van de Sande et al. Segmentation as selective search for object recognition. ICCV 2011.
35 object proposals (RCNN) bounding box regression car: yes person : no 1. compute object proposals (~2k) 2. warp dilated bounding box 3. compute CNN features 4. classify regions Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.
36 object proposals (RCNN) Alex Net Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.
37 object proposals (RCNN) remove last layer and finetune for 20 PASCAL classes Alex Net Use fc d vector as the description of the bounding box. Train a SVM on this representation for classification Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.
38 object proposals (RCNN) slide credit: Girshick
39 object proposals (RCNN)
40 object proposals (RCNN) slide credit: Li, Karpathy, Johnson
41 object proposals (RCNN) drawbacks: not end-to-end warping of boxes lots of double computation (overlap of bounding boxes) improved bounding box car: yes person : no 1. compute object proposals (~2k) 2. warp dilated bounding box 3. compute CNN features 4. classify regions Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.
42 object proposals (Fast R-CNN)
43 shared computation (conv1-conv5) object proposals (Fast R-CNN) conv 5 compute ones the convolutional features per image. He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." PAMI 2015
44 shared computation object proposals (Fast R-CNN) conv 5 compute ones the convolutional features extract features from conv5 for all bb s This was first proposed by: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." PAMI 2015
45 shared computation object proposals (Fast R-CNN) for all bounding boxes: Region of Interest pooling (ROI pooling) pool the features in a spatial grid.
46 shared computation object proposals (Fast R-CNN) classification: log loss ROI pooling: FCs regression: smooth L1 loss pool the features in a spatial grid end-to-end training
47 object proposals (Fast R-CNN) multi-task improves also classification performance. end-to-end improves results Fast R-CNN R-CNN Train time speedup 8.8x - Test time/image 0.32s 47s Test speedup 146x - map 66.9% 66.0% Test time does not include object proposal computation (which is now the bottleneck)
48 shared computation object proposals (Faster R-CNN) FCs Region Proposal Network (RPN) ROI pooling: conv5 compute the object proposals directly in the network.
49 object proposals (Faster R-CNN) Slide a window over the feature map. Add a network which classifies and regresses the bounding boxes. The classification score provides the confidence of the presence of object. slide credit: Kaming He
50 object proposals (Faster R-CNN) Slide a window over the feature map. Add a network which classifies and regresses the bounding boxes. The classification score provides the confidence of the presence of object. Use N anchors for proposals of varying aspect ratios. slide credit: Kaming He
51 object proposals (Faster R-CNN) Model Time Edge boxes + R-CNN 0.25 sec *ConvTime *FcTime Edge boxes + fast R-CNN 0.25 sec + 1*ConvTime *FcTime faster R-CNN 1*ConvTime *FcTime Computation for 1000 boxes. slide credit: Kaming He
52 object proposals (Faster R-CNN) slide credit: Li, Karpathy, johnson
53 object proposals (Faster R-CNN) slide credit: Li, Karpathy, johnson
54 object localization Winner ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with residual networks and Faster RCNN challenge
55 object localization Winner ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with residual networks and Faster RCNN 2015 challenge
56 summary object detection object localization: when there is one or a known number of objects/classes you can do object localization by adding a regression head to your network. Sliding window + CNN can be computed efficiently by writing the network as a fully convolutional network. Object proposal methods are straightforwardly combined with CNNs, but for fast/good results consider: adding a regression head to improve bounding box estimation. share computation of the convolutional features (SPP) end-to-end training of network (fast RCNN) include Region Proposal Network for fast object proposals within the network (faster RCNN). slide credit: Li, Karpathy, johnson
57 Local object recognition object localization (single object) object detection semantic segmentation
58 semantic segmentation semantic segmentation: assign a class to all pixels instance segmentation : assign pixels to a particular instance of a class (chair1, etc..)
59 semantic segmentation ConvNet predict center pixel Write network as fully convolutional network and apply to image Because of the convolutions the resolution is smaller and upsampling is required
60 semantic segmentation pixelwise loss Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015
61 input semantic segmentation Convolution (3x3) padding [ ] stride [1 1] Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015
62 input semantic segmentation Convolution (3x3) padding [ ] stride [1 1]
63 input input semantic segmentation Convolution (3x3) padding [ ] stride [1 1] Convolution (3x3) padding [ ] stride [2 2]
64 input input semantic segmentation Convolution (3x3) padding [ ] stride [1 1] Convolution (3x3) padding [ ] stride [2 2]
65 input semantic segmentation deconvolution (3x3) padding [ ] stride [2 2]
66 input semantic segmentation deconvolution (3x3) padding [ ] stride [2 2] deconvolutions are also called fractionally strided convolutions, convolution transpose.
67 semantic segmentation Noh et al. ICCV 2015
68 semantic segmentation Noh et al. ICCV 2015
69 semantic segmentation combine where (local, shallow) with what (global, deep) Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015
70 semantic segmentation skip layers interp + sum interp + sum dense output Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015
71 semantic segmentation input image stride 32 stride 16 stride 8 ground truth no skips 1 skip 2 skips Long et al., Fully Convolutional Networks for Semantic Segmentation, ICCV 2015
72 semantic segmentation Eigen, Fergus, Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture, ICCV 2015
73 semantic segmentation Surface normals results Eigen, Fergus, Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture, ICCV 2015
74 instance segmentation Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.
75 instance segmentation Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.
76 instance segmentation Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.
77 instance segmentation results ground-truth Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.
78 Generative Adversarial Networks noise Fractionally strided convolutions (deconvolutions) can be used to generate images. Dai et al. Instance aware Semantic Segmentation via Multi-task Network Cascades, arxiv 2015.
79 Generative Adversarial Networks Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z. generated horses G(z) real horses x I can train a discriminative network D which is trained to distinguish real horse images x from generated horse images G(z) D max log D x log 1 D D G z
80 Generative Adversarial Networks Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z. generated horses G(z) real horses x I can then optimize my generative network to fool the discriminative network. D min G maxlog D x log 1 D D G z
81 Generative Adversarial Networks Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z. generated horses G(z) real horses x D You can re-optimize the Discriminate network D, etc... min G maxlog D x log 1 D D G z
82 Generative Adversarial Networks Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z. generated horses G(z) real horses x You can re-optimize the Discriminate network D, etc...until D gives in... D min G maxlog D x log 1 D D G z Goodman et al. Generative Adversarial Nets NIPS 2014
83 Generative Adversarial Networks Examples of generated bedrooms. Unsupervised Representation Radford et al. Learning with Deep Convolutional Generative Adversarial Nteworks ICLR 2016
84 Generative Adversarial Networks Interpolation between points in z. Unsupervised Representation Radford et al. Learning with Deep Convolutional Generative Adversarial Nteworks ICLR 2016
85 summary semantic segmentation Fully convolutional networks can be applied for efficient classification of all pixels. To get high quality segmentations deep features of multiple scales need to be combined (e.g. with skip layers). upsampling can be done by de-convolution and de-pooling operations. Instance segmentation can be performed by combining object detection and semantic segmentation pipelines. slide credit: Li, Karpathy, johnson
Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection
Convolutional Feature Maps
Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.
Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy
Fast R-CNN Object detection with Caffe
Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets
CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015
CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing
Compacting ConvNets for end to end Learning
Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
Deformable Part Models with CNN Features
Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
Image and Video Understanding
Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
Semantic Recognition: Object Detection and Scene Segmentation
Semantic Recognition: Object Detection and Scene Segmentation Xuming He [email protected] Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei
CAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Jan 26, 2016 Today Administrivia A bigger picture and some common questions Object detection proposals, by Samer
Bert Huang Department of Computer Science Virginia Tech
This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
Pedestrian Detection using R-CNN
Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection
Image Classification for Dogs and Cats
Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science [email protected] Abstract
SIGNAL INTERPRETATION
SIGNAL INTERPRETATION Lecture 6: ConvNets February 11, 2016 Heikki Huttunen [email protected] Department of Signal Processing Tampere University of Technology CONVNETS Continued from previous slideset
arxiv:1604.08893v1 [cs.cv] 29 Apr 2016
Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain {amaia.salvador,xavier.giro}@upc.edu Shin ichi
MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan
1 MulticoreWare Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan Focused on Heterogeneous Computing Multiple verticals spawned from core competency Machine Learning
Applications of Deep Learning to the GEOINT mission. June 2015
Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics
Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning
: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May
Object Detection in Video using Faster R-CNN
Object Detection in Video using Faster R-CNN Prajit Ramachandran University of Illinois at Urbana-Champaign [email protected] Abstract Convolutional neural networks (CNN) currently dominate the computer
Learning and transferring mid-level image representions using convolutional neural networks
Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
1 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun arxiv:1506.01497v3 [cs.cv] 6 Jan 2016 Abstract State-of-the-art object
Latest Advances in Deep Learning. Yao Chou
Latest Advances in Deep Learning Yao Chou Outline Introduction Images Classification Object Detection R-CNN Traditional Feature Descriptor Selective Search Implementation Latest Application Deep Learning
arxiv:1409.1556v6 [cs.cv] 10 Apr 2015
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman + Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk
Deep Residual Networks
Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July 2016. Formerly affiliated with Microsoft Research Asia 7x7 conv,
Scalable Object Detection by Filter Compression with Regularized Sparse Coding
Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical
Object Recognition. Selim Aksoy. Bilkent University [email protected]
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] Image classification Image (scene) classification is a fundamental
Semantic Image Segmentation and Web-Supervised Visual Learning
Semantic Image Segmentation and Web-Supervised Visual Learning Florian Schroff Andrew Zisserman University of Oxford, UK Antonio Criminisi Microsoft Research Ltd, Cambridge, UK Outline Part I: Semantic
R-CNN minus R. 1 Introduction. Karel Lenc http://www.robots.ox.ac.uk/~karel. Department of Engineering Science, University of Oxford, Oxford, UK.
LENC, VEDALDI: R-CNN MINUS R 1 R-CNN minus R Karel Lenc http://www.robots.ox.ac.uk/~karel Andrea Vedaldi http://www.robots.ox.ac.uk/~vedaldi Department of Engineering Science, University of Oxford, Oxford,
arxiv:1504.08083v2 [cs.cv] 27 Sep 2015
Fast R-CNN Ross Girshick Microsoft Research [email protected] arxiv:1504.08083v2 [cs.cv] 27 Sep 2015 Abstract This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object
Multi-view Face Detection Using Deep Convolutional Neural Networks
Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo [email protected] Mohammad Saberian Yahoo [email protected] Li-Jia Li Yahoo [email protected]
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance
Getting Started with Caffe Julien Demouth, Senior Engineer
Getting Started with Caffe Julien Demouth, Senior Engineer What is Caffe? Open Source Framework for Deep Learning http://github.com/bvlc/caffe Developed by the Berkeley Vision and Learning Center (BVLC)
Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/
Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
Do Convnets Learn Correspondence?
Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained
Learning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
CNN Based Object Detection in Large Video Images. WangTao, [email protected] IQIYI ltd. 2016.4
CNN Based Object Detection in Large Video Images WangTao, [email protected] IQIYI ltd. 2016.4 Outline Introduction Background Challenge Our approach System framework Object detection Scene recognition Body
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
High Quality Image Magnification using Cross-Scale Self-Similarity
High Quality Image Magnification using Cross-Scale Self-Similarity André Gooßen 1, Arne Ehlers 1, Thomas Pralow 2, Rolf-Rainer Grigat 1 1 Vision Systems, Hamburg University of Technology, D-21079 Hamburg
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
Big Data: Image & Video Analytics
Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN Xiu Li 1, 2, Min Shang 1, 2, Hongwei Qin 1, 2, Liansheng Chen 1, 2 1. Department of Automation, Tsinghua University, Beijing
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN Goal is to process 360 degree images and detect two object categories 1. Pedestrians,
The Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
arxiv:1312.6034v2 [cs.cv] 19 Apr 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,
Denoising Convolutional Autoencoders for Noisy Speech Recognition
Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University [email protected] Victor Zhong Stanford University [email protected] Abstract We propose the use of
Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers Fan Yang 1,2, Wongun Choi 2, and Yuanqing Lin 2 1 Department of Computer Science,
SSD: Single Shot MultiBox Detector
SSD: Single Shot MultiBox Detector Wei Liu 1, Dragomir Anguelov 2, Dumitru Erhan 3, Christian Szegedy 3, Scott Reed 4, Cheng-Yang Fu 1, Alexander C. Berg 1 1 UNC Chapel Hill 2 Zoox Inc. 3 Google Inc. 4
Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation
ACCEPTED BY IEEE TIP 1 Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation Yu Zhang, Xiu-Shen Wei, Jianxin Wu, Member, IEEE, Jianfei Cai, Senior Member, IEEE, Jiangbo Lu,
Object Detection from Video Tubelets with Convolutional Neural Networks
Object Detection from Video Tubelets with Convolutional Neural Networks Kai Kang Wanli Ouyang Hongsheng Li Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong {kkang,wlouyang,hsli,xgwang}@ee.cuhk.edu.hk
Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition
Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition 1. Image Pre-Processing - Pixel Brightness Transformation - Geometric Transformation - Image Denoising 1 1. Image Pre-Processing
Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Visual Geometry Group, University of Oxford {karen,az}@robots.ox.ac.uk Abstract We investigate architectures
Image Super-Resolution Using Deep Convolutional Networks
1 Image Super-Resolution Using Deep Convolutional Networks Chao Dong, Chen Change Loy, Member, IEEE, Kaiming He, Member, IEEE, and Xiaoou Tang, Fellow, IEEE arxiv:1501.00092v3 [cs.cv] 31 Jul 2015 Abstract
EdVidParse: Detecting People and Content in Educational Videos
EdVidParse: Detecting People and Content in Educational Videos by Michele Pratusevich S.B., Massachusetts Institute of Technology (2013) Submitted to the Department of Electrical Engineering and Computer
Going Deeper with Convolutional Neural Network for Intelligent Transportation
Going Deeper with Convolutional Neural Network for Intelligent Transportation by Tairui Chen A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE In partial fulfillment of the requirements
Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling
Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling Ming Liang Xiaolin Hu Bo Zhang Tsinghua National Laboratory for Information Science and Technology (TNList) Department
Part-Based Recognition
Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Philip Lenz 1 Andreas Geiger 2 Christoph Stiller 1 Raquel Urtasun 3 1 KARLSRUHE INSTITUTE OF TECHNOLOGY 2 MAX-PLANCK-INSTITUTE IS 3
arxiv:1502.01852v1 [cs.cv] 6 Feb 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun arxiv:1502.01852v1 [cs.cv] 6 Feb 2015 Abstract Rectified activation
Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks
1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of
Task-driven Progressive Part Localization for Fine-grained Recognition
Task-driven Progressive Part Localization for Fine-grained Recognition Chen Huang Zhihai He [email protected] University of Missouri [email protected] Abstract In this paper we propose a task-driven
Determining optimal window size for texture feature extraction methods
IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec
Digital image processing
746A27 Remote Sensing and GIS Lecture 4 Digital image processing Chandan Roy Guest Lecturer Department of Computer and Information Science Linköping University Digital Image Processing Most of the common
The Relationship between Artificial Intelligence and Finance
Material 1 The Relationship between Artificial Intelligence and Finance University of Tokyo, Yutaka Matsuo Provisional Translation by the Secretariat Please refer to the original material in Japanese 1
arxiv:1511.02300v2 [cs.cv] 9 Mar 2016
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images Shuran Song Jianxiong Xiao Princeton University http://dss.cs.princeton.edu arxiv:1511.02300v2 [cs.cv] 9 Mar 2016 Abstract We focus on
Computational Foundations of Cognitive Science
Computational Foundations of Cognitive Science Lecture 15: Convolutions and Kernels Frank Keller School of Informatics University of Edinburgh [email protected] February 23, 2010 Frank Keller Computational
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Limitations of Human Vision. What is computer vision? What is computer vision (cont d)?
What is computer vision? Limitations of Human Vision Slide 1 Computer vision (image understanding) is a discipline that studies how to reconstruct, interpret and understand a 3D scene from its 2D images
Edge Boxes: Locating Object Proposals from Edges
Edge Boxes: Locating Object Proposals from Edges C. Lawrence Zitnick and Piotr Dollár Microsoft Research Abstract. The use of object proposals is an effective recent approach for increasing the computational
arxiv:1506.03365v2 [cs.cv] 19 Jun 2015
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop Fisher Yu Yinda Zhang Shuran Song Ari Seff Jianxiong Xiao arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 Princeton
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected]
Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected] Department of Applied Mathematics Ecole Centrale Paris Galen
Transform-based Domain Adaptation for Big Data
Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from
Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence
Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.
InstaNet: Object Classification Applied to Instagram Image Streams
InstaNet: Object Classification Applied to Instagram Image Streams Clifford Huang Stanford University [email protected] Mikhail Sushkov Stanford University [email protected] Abstract The growing
Water Flow in. Alex Vlachos, Valve July 28, 2010
Water Flow in Alex Vlachos, Valve July 28, 2010 Outline Goals & Technical Constraints How Artists Create Flow Maps Flowing Normal Maps in Left 4 Dead 2 Flowing Color Maps in Portal 2 Left 4 Dead 2 Goals
IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS
IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS Alexander Velizhev 1 (presenter) Roman Shapovalov 2 Konrad Schindler 3 1 Hexagon Technology Center, Heerbrugg, Switzerland 2 Graphics & Media
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006
Object Recognition Large Image Databases and Small Codes for Object Recognition Pixels Description of scene contents Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)
Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics
Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics Dimitri Van De Ville Ecole Polytechnique Fédérale de Lausanne Biomedical Imaging Group [email protected]
HE Shuncheng [email protected]. March 20, 2016
Department of Automation Association of Science and Technology of Automation March 20, 2016 Contents Binary Figure 1: a cat? Figure 2: a dog? Binary : Given input data x (e.g. a picture), the output of
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
RECOGNIZING objects and localizing them in images is
1 Region-based Convolutional Networks for Accurate Object Detection and Segmentation Ross Girshick, Jeff Donahue, Student Member, IEEE, Trevor Darrell, Member, IEEE, and Jitendra Malik, Fellow, IEEE Abstract
Topological Data Analysis Applications to Computer Vision
Topological Data Analysis Applications to Computer Vision Vitaliy Kurlin, http://kurlin.org Microsoft Research Cambridge and Durham University, UK Topological Data Analysis quantifies topological structures
