Bored by Classification ConvNets? End-to-end Learning of other Computer Vision Tasks

Similar documents
Optical Flow. Shenlong Wang CSC2541 Course Presentation Feb 2, 2016

arxiv: v1 [cs.cv] 18 May 2015

Pedestrian Detection with RCNN

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Image and Video Understanding

Convolutional Feature Maps

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Compacting ConvNets for end to end Learning

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Two-Stream Convolutional Networks for Action Recognition in Videos

Image Classification for Dogs and Cats

Bert Huang Department of Computer Science Virginia Tech

arxiv: v2 [cs.cv] 19 Apr 2014

Deformable Part Models with CNN Features

Steven C.H. Hoi School of Information Systems Singapore Management University

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

3D Model based Object Class Detection in An Arbitrary View

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

Semantic Recognition: Object Detection and Scene Segmentation

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Color Segmentation Based Depth Image Filtering

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. M.Sc. in Advanced Computer Science. Friday 18 th January 2008.

LIBSVX and Video Segmentation Evaluation

High Quality Image Magnification using Cross-Scale Self-Similarity

First-Person Activity Recognition: What Are They Doing to Me?

Marr Revisited: 2D-3D Alignment via Surface Normal Prediction

Taking Inverse Graphics Seriously

Classifying Manipulation Primitives from Visual Data

arxiv: v2 [cs.cv] 19 Jun 2015

Learning and transferring mid-level image representions using convolutional neural networks

Tracking performance evaluation on PETS 2015 Challenge datasets

Scalable Object Detection by Filter Compression with Regularized Sparse Coding

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Feature Tracking and Optical Flow

CNN Based Object Detection in Large Video Images. WangTao, IQIYI ltd

Pallas Ludens. We inject human intelligence precisely where automation fails. Jonas Andrulis, Daniel Kondermann

Deep Convolutional Inverse Graphics Network

Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?

Contextual Marketing with CoreMedia LiveContext

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Character Animation from 2D Pictures and 3D Motion Data ALEXANDER HORNUNG, ELLEN DEKKERS, and LEIF KOBBELT RWTH-Aachen University

SIGNAL INTERPRETATION

Simple and efficient online algorithms for real world applications

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Do Convnets Learn Correspondence?

ENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES

What Makes a Great Picture?

Getting Started with Caffe Julien Demouth, Senior Engineer

Efficient Storage, Compression and Transmission

Advanced X-ray Tomography:

High Performance GPU-based Preprocessing for Time-of-Flight Imaging in Medical Applications

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Discovering objects and their location in images

arxiv: v1 [cs.cv] 29 Apr 2016

The Visual Internet of Things System Based on Depth Camera

How To Use A Near Neighbor To A Detector

CAP 6412 Advanced Computer Vision

Task-driven Progressive Part Localization for Fine-grained Recognition

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks

Colour Image Segmentation Technique for Screen Printing

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

How does the Kinect work? John MacCormick

Digital Image Increase

VideoStory A New Mul1media Embedding for Few- Example Recogni1on and Transla1on of Events

Immersive Medien und 3D-Video

A Prototype For Eye-Gaze Corrected

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Simultaneous Deep Transfer Across Domains and Tasks

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

Human behavior analysis from videos using optical flow

Applications of Deep Learning to the GEOINT mission. June 2015

Understanding Deep Image Representations by Inverting Them

Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN

Dense Point Trajectories by GPU-accelerated Large Displacement Optical Flow

Introduction to Robotics Analysis, Systems, Applications

Service-Oriented Visualization of Virtual 3D City Models

University of Arkansas Libraries ArcGIS Desktop Tutorial. Section 2: Manipulating Display Parameters in ArcMap. Symbolizing Features and Rasters:

Pedestrian Detection using R-CNN

A Dynamic Convolutional Layer for Short Range Weather Prediction

4D Cardiac Reconstruction Using High Resolution CT Images

Big Data in the Mathematical Sciences

Denoising Convolutional Autoencoders for Noisy Speech Recognition

High Level Describable Attributes for Predicting Aesthetics and Interestingness

Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance

Analytics for Performance Optimization of BPMN2.0 Business Processes

Fast R-CNN Object detection with Caffe

CS635 Spring Department of Computer Science Purdue University

What more can we do with videos?

A Short Introduction to Computer Graphics

Data Mining and Predictive Analytics - Assignment 1 Image Popularity Prediction on Social Networks

Support Vector Machines for Dynamic Biometric Handwriting Classification

Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling

R-CNN minus R. 1 Introduction. Karel Lenc Department of Engineering Science, University of Oxford, Oxford, UK.

A Bayesian Framework for Unsupervised One-Shot Learning of Object Categories

Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models

Group Sparse Coding. Fernando Pereira Google Mountain View, CA Dennis Strelow Google Mountain View, CA

Transcription:

Bored by Classification ConvNets? End-to-end Learning of other Computer Vision Tasks Thomas Brox University of Freiburg Germany Research funded by ERC Starting Grant VideoLearn, the German Research Foundation, and the Deutsche Telekom Stiftung Thomas Brox

Outline Generative networks U-Net: Multi-instance segmentation FlowNet: Estimating optical flow Thomas Brox 2

Typical ConvNet architecture cat Classification network Thomas Brox 3

Typical ConvNet architecture cat cat Classification network Thomas Brox 4

Up-convolutional network Alexey Dosovitskiy CVPR 2015 New: Expanding network architecture Small gray office chair, side view cat Image generation Related work: Eigen et al. NIPS 2014: Network for depth map prediction Long et al. CVPR 2015: Network for semantic segmentation Thomas Brox 5

Generating chair images with a network Dosovitskiy et al., CVPR 2015 Thomas Brox 6

Training set 3D chair dataset Aubry et al. CVPR 2014 Rendering 809 chair styles From 62 viewpoints Source: https://github.com/dimatura/seeing3d Some of the rendered chairs Thomas Brox 7

Generating images of unseen views Training set split into two subsets: Source set: 62 viewpoints available (90% of all chair models) Target set: fewer viewpoints available (10% of all models) Thomas Brox 8

Generating images of unseen views 8 azimuths available 4 azimuths available 2 azimuths available 1 azimuth available Thomas Brox 9

Comparison to baselines Alexey Dosovitskiy CVPR 2015 Thomas Brox 10

Interpolation of chair styles Alexey Dosovitskiy CVPR 2015 Thomas Brox 11

Correspondences between chair instances Alexey Dosovitskiy CVPR 2015 Thomas Brox 12

Correspondences between chair instances Generate intermediate images with the network Track points with optical flow (LDOF) along the sequence Deformable Spatial Pyramid Matching (Kim et al. 2013) all easy difficult 5.2 3.3 6.3 SIFT Flow (Liu et al. 2008) 4.0 2.8 4.8 Ours 3.9 3.9 3.9 Human performance 1.1 1.1 1.1 Thomas Brox 13

Preview: Inverting ConvNets with ConvNets Up-convolutional network Image features e.g. from AlexNet Learn to re-generate the input image from its feature representation Related work: Mahendran & Vedaldi CVPR 2015 Zeiler & Fergus ECCV 2014 Alexey Dosovitskiy arxiv 2015 Thomas Brox 14

Reconstruction results Up-Conv. Mahendran & Vedaldi Autoencoder More reconstructions with up-convolutional network: Thomas Brox 15

Color and position are preserved in high layers input All FC8 Top 5 FC8 All but Top 5 FC8 Color experiment Position experiment Thomas Brox 16

Outline A generative network U-Net: Multi-instance segmentation FlowNet: Estimating optical flow Thomas Brox 17

U-Net: Image segmentation with a ConvNet Olaf Ronneberger MICCAI 2015 Philipp Fischer Similar to Fully Convolutional Network [Long et al., CVPR 2015] Original inspiration: Depth map prediction [Eigen et al., NIPS 2014] Thomas Brox 18

Binary segmentation Electron Microscopy ISBI 2012 Challenge Rank 1 Light microscopy cell tracking ISBI 2015 Challenge Rank 1 Thomas Brox 19

Multi-class semantic segmentation X-ray dental segmentation, ISBI 2015 Challenge, Rank 1 Intersection over union: 77.5% Second best: 46% Thomas Brox 20

Multi-instance segmentation Light microscopy, DIC-HeLa cell tracking ISBI 2015 Challenge: Rank 1 Thomas Brox 21

Outline A generative network U-Net: Multi-instance segmentation FlowNet: Estimating optical flow Thomas Brox 22

FlowNet: Estimating optical flow with a ConvNet Refinement: expanding architecture Thomas Brox 23

Helping the network with a correlation layer Joint work with the group of Daniel Cremers Philipp Fischer Alexey Dosovitskiy Eddy Ilg Philip Häusser Caner Hazirbas Vladimir Golkov Thomas Brox 24

Enough data to train such a network? Getting ground truth optical flow for realistic videos is hard Existing datasets are small: Frames with ground truth Middlebury 8 KITTI 194 Sintel 1041 Needed >10000 Thomas Brox 25

Realism is overrated: the flying chair dataset Rendered image Optical flow Thomas Brox 26

It works! Input images Ground truth FlowNetSimple FlowNetCorr Although the network has only seen flying chairs for training, it predicts good optical flow on Sintel Thomas Brox 27

Results on various datasets Middlebury KITTI Sintel Clean Sintel Final Flying Chairs EpicFlow 0.39 3.8 4.1 6.3 2.9 DeepFlow 0.42 5.8 5.4 7.2 3.5 LDOF 0.56 12.4 7.6 9.1 3.5 FlowNetS - - 7.4 8.4 2.7 FlowNetS+v - - 6.5 7.7 2.9 FlowNetS+ft - 9.1 7.0 7.8 3.0 FlowNetS+ft+v 0.47 7.6 6.2 7.2 3.0 FlowNetC - - 7.3 8.8 2.2 FlowNetC+v - - 6.3 8.0 2.6 FlowNetC+ft - - 6.9 8.5 2.3 FlowNetC+ft+v 0.5-6.1 7.9 2.7 Networks can compete with state-of-the-art conventional optical flow estimation methods Thomas Brox 28

Can handle large displacements Input images Ground truth FlowNetSimple FlowNetCorr DeepFlow (Weinzaepfel et al. ICCV 2013) EpicFlow (Revaud et al. CVPR 2015) Thomas Brox 29

Sometimes wrong direction Input images Ground truth FlowNetSimple FlowNetCorr DeepFlow (Weinzaepfel et al. ICCV 2013) EpicFlow (Revaud et al. CVPR 2015) Thomas Brox 30

Often captures fine details Input images Ground truth FlowNetSimple FlowNetCorr DeepFlow (Weinzaepfel et al. ICCV 2013) EpicFlow (Revaud et al. CVPR 2015) Thomas Brox 31

Results on Flying chairs test set Input images Ground truth EpicFlow (Revaud et al. CVPR 2015) FlowNetCorr Thomas Brox 32

Runs with 10fps on the GPU Thomas Brox 33

Summary A generative network U-Net: Multi-instance segmentation FlowNet: Estimating optical flow Thomas Brox 34

Tip of the day Thomas Brox 35