Using geometry and related things

Similar documents
Object Recognition. Selim Aksoy. Bilkent University

Semantic Recognition: Object Detection and Scene Segmentation

Localizing 3D cuboids in single-view images

Semantic Visual Understanding of Indoor Environments: from Structures to Opportunities for Action

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

3D Model based Object Class Detection in An Arbitrary View

The goal is multiply object tracking by detection with application on pedestrians.

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Colorado School of Mines Computer Vision Professor William Hoff

Mean-Shift Tracking with Random Sampling

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Segmentation of building models from dense 3D point-clouds

Pedestrian Detection with RCNN

How To Model The Labeling Problem In A Conditional Random Field (Crf) Model

A Learning Based Method for Super-Resolution of Low Resolution Images

Finding people in repeated shots of the same scene

Geometric Context from a Single Image

Learning 3-D Scene Structure from a Single Still Image

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Limitations of Human Vision. What is computer vision? What is computer vision (cont d)?

Object Categorization using Co-Occurrence, Location and Appearance

Tracking in flussi video 3D. Ing. Samuele Salti

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

CS 534: Computer Vision 3D Model-based recognition

High Level Describable Attributes for Predicting Aesthetics and Interestingness

Segmentation & Clustering

Layout Aware Visual Tracking and Mapping

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Taking Inverse Graphics Seriously

Fast Matching of Binary Features

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

How To Use A Near Neighbor To A Detector

Decomposing a Scene into Geometric and Semantically Consistent Regions

Edge tracking for motion segmentation and depth ordering

Automatic Reconstruction of Parametric Building Models from Indoor Point Clouds. CAD/Graphics 2015

Feature Tracking and Optical Flow

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Part-Based Recognition

What more can we do with videos?

Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation

Density-aware person detection and tracking in crowds

The Visual Internet of Things System Based on Depth Camera

Observing Human Behavior in Image Sequences: the Video Hermeneutics Challenge

Speed Performance Improvement of Vehicle Blob Tracking System

Edge Boxes: Locating Object Proposals from Edges

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Real-Time Tracking of Pedestrians and Vehicles

Tracking performance evaluation on PETS 2015 Challenge datasets

Improving Spatial Support for Objects via Multiple Segmentations

Quality Assessment for Crowdsourced Object Annotations

A feature-based tracking algorithm for vehicles in intersections

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network

A Mobile Vision System for Robust Multi-Person Tracking

Character Image Patterns as Big Data

Surviving Deformable Superquadric Models From Range Data

Urban Vehicle Tracking using a Combined 3D Model Detector and Classifier

Convolutional Feature Maps

Deformable Part Models with CNN Features

Local features and matching. Image classification & object localization

Learning Motion Categories using both Semantic and Structural Information

Fast Semantic Segmentation of 3D Point Clouds using a Dense CRF with Learned Parameters

Learning Spatial Context: Using Stuff to Find Things

Classifying Manipulation Primitives from Visual Data

Intelligent Flexible Automation

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH

Course: Model, Learning, and Inference: Lecture 5

Spatio-Temporally Coherent 3D Animation Reconstruction from Multi-view RGB-D Images using Landmark Sampling

Big Data: Image & Video Analytics

Detection and Recognition of Mixed Traffic for Driver Assistance System

Lighting Estimation in Indoor Environments from Low-Quality Images

Probabilistic Latent Semantic Analysis (plsa)

Transform-based Domain Adaptation for Big Data

Breaking the chain: liberation from the temporal Markov assumption for tracking human poses

Digital Image Increase

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Extracting Business. Value From CAD. Model Data. Transformation. Sreeram Bhaskara The Boeing Company. Sridhar Natarajan Tata Consultancy Services Ltd.

Segmentation as Selective Search for Object Recognition

Doctor of Philosophy in Computer Science

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Automatic Single-Image 3d Reconstructions of Indoor Manhattan World Scenes

Perception-based Design for Tele-presence

Object Class Recognition by Unsupervised Scale-Invariant Learning

MULTI-LEVEL SEMANTIC LABELING OF SKY/CLOUD IMAGES

Multi-View Object Class Detection with a 3D Geometric Model

Live Feature Clustering in Video Using Appearance and 3D Geometry

How To Learn From Perspective

Multiview Social Behavior Analysis in Work Environments

Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001


Discovering objects and their location in images

The KITTI-ROAD Evaluation Benchmark. for Road Detection Algorithms

Object class recognition using unsupervised scale-invariant learning

Automatic parameter regulation for a tracking system with an auto-critical function

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

LABELING IN VIDEO DATABASES. School of Electrical and Computer Engineering Electrical Engineering Building. in the video database.

Online Learned Discriminative Part-Based Appearance Models for Multi-Human Tracking

False alarm in outdoor environments

Transcription:

Using geometry and related things

Region labels + Boundaries and objects Stronger geometric constraints from domain knowledge Reasoning on aspects and poses 3D point clouds Qualitative More quantitative more precise Explicit

[D. Hoiem, A. A. Efros, and M. Hebert. Recovering surface layout from an image. IJCV, 75(1):151 172, 2007] What level of representation? How qualitative? What type of training information is available? Assumptions (camera geometry, etc )? Learning from image features to depth + MRF: A. Saxena, et al.. 3-D depth reconstruction from a single still image. IJCV, 76, 2007. Make3D: Learning 3D Scene Structure from a Single Still Image: A. Saxena, et al. TPAMI, 2010. Stage classes: Nedovic, V., Smeulders, A., Redert, A., Geusebroek, J.: Stages as models of scene geometry. In: PAMI (2010)

Next. Can coarse surface labels be used for improving object recognition and scene analysis performance through better geometric reasoning?

Object Detection Surface Estimates Viewpoint Prior Local Car Detector From geometry to objects and back? Distributions versus decisions? Local Ped Detector D. Hoiem, A. Efros, M. Hebert. Putting objects in perspective. IJCV 2009 S.Y. Bao, M. Sun, S.Savarese. Toward Coherent Object Detection And Scene Layout Understanding. CVPR 2010. B. Leibe, N. Cornelis, K. Cornelis, and L. Van Gool. Dynamic 3D Scene Analysis from a Moving Vehicle. CVPR07

Is a more precise representation possible? Boundaries, interposition, relative depth ordering. Still low-level: Can we combine reasoning about semantic labels Region labels + More geom. relations and semantic labels Stronger geometric constraints from domain knowledge Reasoning on aspects and poses 3D point clouds Qualitative More quantitative more precise Explicit

Iterative refinement Iter 1 Iter 2 Toward Final true integration of geometric D. Hoiem, A. A. Efros, and M. Hebert. Closing the loop on scene interpretation. In CVPR, 2008 semantic cues: How to beat the intractable Global nature optimization of the problem? What features/cues can be used? F(D I,L) = p 1(d p ) + pqr 2(d p,d q,d r ) + p 3(d p,d g ) F( I,L,S) = p 1( i ) + i 2( i ) + ij 3( i, j) i T x = 1 j T x = 1 B. Liu, S. Gould, D. Koller. Single image depth estimation from predicted semantic labels. CVPR 2010 S. Gould, R. Fulton, D. Koller. Decomposing a scene into geometric and semantically consistent regions. ICCV 2009

Still mostly bottom-up classification approach No use of domain constraints or constraints governing the physical world Region labels + Boundaries and objects Stronger geometric constraints from domain knowledge Reasoning on aspects and poses 3D point clouds Qualitative More quantitative more precise Explicit

Lines Faces Score How to generate and search through hypotheses (in a tractable manner)? How to evaluate score? How to avoid early decisions? Classifiers f(x,y,w) = w T (x,y) D. Lee, T. Kanade, M. Hebert. Geometric Reasoning for Single Image Structure Recovery. CVPR09. V. Hedau, D. Hoiem, D.Forsyth, Recovering the Spatial Layout of Cluttered Rooms, IEEE International Conference on Computer Vision (ICCV), 2009. How to represent constraints in a more general way? H. Wang, S. Gould, D. Koller. Discriminative learning with latent variables for cluttered indoor scene understanding. ECCV 2010. Learned weight vector Feature vector measuring agreement between lines, faces and labels

Region labels + Boundaries and objects Stronger geometric constraints from domain knowledge + more constraints 3D point clouds Qualitative More quantitative more precise Explicit

Finite volume Spatial exclusion Containment Stability Contact Proximity.

Search through hypothesis space Input image Line segments and Vanishing points Room hypotheses Reject invalid configurations Geometric context Orientation map Object hypotheses Compatibility of image data with geometric configuration Penalty term for incompatible configurations f x, y w T x, y w T y Features from image (surface labels, vanishing points, etc.) Hypothesis: Scene layout+ object hypothesis D. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces. Advances in Neural Information Processing Systems (NIPS), Vol. 24, 2010.

Surface Layout Density Map Bag of Segments Frontal Front-Right Front-Left Left-Right Left-Occluded Right-Occluded Porous Solid Catalogue

above sky above above above Medium Medium Medium Medium Infront High Infront Original Image Pointsupported Pointsupported supported High Ground 3D Parse Graph supported A. Gupta, A. Efros, and M. Hebert. Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics. ECCV 2010. http://www.cs.cmu.edu/~abhinavg/blocksworld

Direct search through hypothesis space Sampling Sample room boundary L. Del Pero, J. Guan, E. Brau, J. Schlecht, K. Barnard. Sampling Bedrooms. CVPR 2011. Diffusion moves: Sample camera Change r = (x,y,z,w,h,l, ) Change c = f Sample object parameters Sample over a block edge only Change o = (x,y,z,w,h,l) Jump moves:

Direct search through hypothesis space Sampling (Constrained) object detection P(O 1,..,O N,L,H I) = P(H)P(L H,I) i P(O i L,H,I) V. Hedau, D. Hoiem, D. Forsyth, Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry, European Conference on Computer Vision (ECCV), 2010.

Direct search through hypothesis space Sampling Object detection Grammars Truth Max Window Wall Balcony Door Roof = L. Simon, O. Teboul, P. Koutsourakis and N. Paragios. Random Exploration of the Procedural Space for Single-View 3D Modeling of Buildings. International Journal of Computer Vision (IJCV), 2010. O. Teboul, I. Kokkinos, P. Koutsourakis, L. Simon and N. Paragios. Shape Grammar Parsing via Reinforcement Learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2011.

Direct search through hypothesis space Sampling Object detection Grammars What constraints? How to combine low level classifiers with higher-level reasoning? What are the right tools to search through and score hypotheses? How to represent partial interpretations without early commitment?

Region labels Qualitative + Boundaries and objects Stronger geometric constraints from domain knowledge More quantitative more precise + sparse/partial 3D data + more constraints 3D + point other clouds constraints + (large) prior data Explicit G. Tsai et al. Realtime Indoor Scene Understanding using Bayesian Filtering with Motion Cues. ICCV 2011.

What level of representation? Do we need explicit parse of the input or how far can we go with associations? Hierarchical, region labels, associations,... How to incorporate knowledge/context external to the input image? Task, geometry, contextual info (scene type, location), text,... Should we use global models vs. sequences of simpler models? Is the problem too hard as posed, i.e., intractable? What are the right definitions of actions, activities, behaviors? How to combine temporal (actions) and spatial (scenes, objects) information effectively? What should be evaluated and what stage? bounding boxes, pixelwise labels, 3D models, actions/behaviors, predictions?