Semantic Recognition: Object Detection and Scene Segmentation
|
|
|
- April Barnett
- 9 years ago
- Views:
Transcription
1 Semantic Recognition: Object Detection and Scene Segmentation Xuming He Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei Li, R. Fergus, A. Torralba, K. Grauman.
2 Semantic scene understanding Semantic recognition tasks: Object detection Semantic segmentation
3 Semantic scene understanding Semantic recognition tasks: Object detection Semantic segmentation Activity recognition Scene layout estimation Scene categorization Geo-localization
4 Outline Object (category) detection Basic framework of object detection Case study: Viola-Jones, DPM and R-CNN Discussions Semantic scene segmentation Pixel labeling and CRF Design of CRF models Discussions
5 Kristen Grauman Object detection: why it is so hard? Illumination Object pose Clutter Occlusions Intra-class appearance Viewpoint
6 Kristen Grauman Object detection: why it is so hard? Realistic scenes are crowded, cluttered, have overlapping objects.
7 What works reliably today (semi-) Rigid objects Eg., face, car, pedestrian, license plate/ traffic sign
8 Kristen Grauman Generic object detection: basic framework Build/train object model (training stage) Choose an object representation Learn or fit parameters of object model Generate candidates in new image Score the candidates (inference/prediction stage)
9 Case study I: Viola-Jones face detector Overview: A seminal approach to real-time object detection (Viola and Jones, IJCV 2004) Object / feature representation: Global template with Haar wavelet features Object model and scoring: Classifying object candidates into face/non-face Use boosted combination of discriminative features as final classifier
10 Kristen Grauman Viola-Jones detector: Haar wavelets Rectangular filters Feature output is difference between adjacent regions Efficiently computable with integral image: any sum can be computed in constant time. Value at (x,y) is sum of pixels above and to the left of (x,y) Integral image
11 Viola-Jones detector: features Which subset of these features should we use to determine if a window has a face? Considering all possible filter parameters: position, scale, and type: 180,000+ possible features associated with each 24 x 24 window Use AdaBoost both to select the informative features and to build the classifier Kristen Grauman
12 Viola-Jones detector: Boosting Defines a classifier using an additive model: Strong classifier Features vector Weight Weak classifier Training: incrementally selecting weaker classifiers During each step, we select a weak learner that does well on examples that were hard for the previous weak learners Hardness is captured by weights attached to training examples
13 Kristen Grauman Viola-Jones detector: AdaBoost Want to select the single rectangle feature and threshold that best separates positive (faces) and negative (nonfaces) training examples, in terms of weighted error. Resulting weak classifier: Outputs of a possible rectangle feature on faces and non-faces. For next round, reweight the examples according to errors, choose another filter/threshold combo.
14 Viola-Jones detector: Learned model First two features selected
15 Kristen Grauman Viola-Jones detector: Candidate generation Sliding window at multiple scales face/non-face Classifier
16 Kristen Grauman Cascading classifiers for detection Form a cascade with low false negative rates early on Apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative
17 Detecting profile faces? Can we use the same detector?
18 Viola-Jones detector: Strengths Sliding window detection and global appearance descriptors: Simple detection protocol to implement Good feature choices critical Past successes for certain classes Implementation: OpenCV Kristen Grauman
19 Viola-Jones detector: Limitations Non-rigid, deformable objects not captured well with representations assuming a fixed 2d structure; or must assume fixed viewpoint Objects with less-regular textures not captured well with holistic appearance-based descriptions Kristen Grauman
20 Case Study II: Deformable Part-based Models Overview: Felzenszwalb et al. PAMI 10, and winner of the PASCAL detection challenge (2008,2009) Part-based representation: Global (root) template + deformable parts Trained from global bounding-boxes only
21 DPMs: Part-based representation Objects are decomposed into parts and spatial relations among parts Fischler and Elschlager 73 22
22 DPMs: Object representation Based on HOG features (1 root + 6 parts) Full model is a mixture of deformable part-based models
23 DPMs: Object model Object candidates obtained in a multi-scale fashion
24 DPMs: Object model Score of candidates
25 DPMs: Learning object models Training data consists of images with labeled bounding boxes Need to learn the model structure, filters and deformation costs: Latent SVM
26 DPMs: Candidate generation and scoring Detection: Defined by a high-scoring root location Relies on an overall score based on the placement of the parts Efficient dynamic programming/bp
27 Deformable Part-based Models: Results Car detections
28 Deformable Part-based Models: Results Person detections
29 Other part-based representations Tree model Articulated objects 30
30 Part-based models: Pose estimation Pose estimation in video (Ramanan et al, 2007) Running on street Dancing
31 DPMs: Strengths Part-based representation : Flexible object model with deformation/pose Discriminatively learned with bounding box annotation Past successes in PASCAL detection challenges. Implementation: Kristen Grauman
32 DPMs: Limitations Manually designed feature (HOG) Pre-defined object-part structure Trainable classifier is often generic (e.g. SVM) Where next? Better classifiers? Or keep building more features? Object candidates Hand-designed feature extraction Trainable classifier Object Class Kristen Grauman
33 Case Study III: Regions with CNN features Overview: Girshick et al. CVPR 14, and significant improvement on the PASCAL VOC Learned object representation based on Convolutional Neural Network. Candidate generation by region proposal (objectness)
34 RCNN: Object representation Learn a feature hierarchy all the way from pixels to classifier Each layer extracts features from the output of previous layer Train all layers jointly Object Candidates Layer 1 Layer 2 Layer 3 Simple Classifier
35 Layer 1: Top-9 Patches Patches from validation images that give maximal activation of a given feature map
36 Layer 2: Top-9 Patches
37 Layer 3: Top-9 Patches
38 RCNN: Candidate generation Class-generic Object Detection, or Objectness (eg. Alexe, Deselaers, and Ferrari, 2010) Saliency Edge Map Segments RCNN uses Selective Search method (van de Sande, Uijlings, Gevers, Smeulders, 2011)
39 RCNN: Detection pipeline and results Strength: Significant improvement on public benchmarks (map = 53.7% on PASCAL VOC 2010 vs ~35% with DPM.) Implementation
40 Outline Object (category) detection Basic framework of object detection Case study: Viola-Jones, DPM and R-CNN Discussions Semantic scene segmentation Pixel labeling and CRF Design of CRF models Discussions
41 Semantic scene understanding Semantic recognition tasks: Object detection Semantic segmentation
42 Pixel labeling problem Problem formulation Assign predefined labels to image elements Multiple label spaces Typical settings Semantic object class Segmentation + recognition Object instance labeling Geometric class labeling etc. (Gould and He, CACM 2014)
43 Pixel labeling problem Surface layout (Hoiem, Efros & Hebert ICCV05; Gupta et al, ECCV 2010) Sky Non-Planar Porous Vertical Non-Planar Solid Support Planar (Left/Center/Right) Geometry + semantic segmentation (Gould et al, ICCV09) 45
44 A local solution Multiclass segmentation Image inputs A pixel-wise classifier energy function
45 Challenges in local approaches Local cues can be ambiguous for scene analysis Objects/regions are correlated in a scene P.5 P.5 Sky Water Sky Water P.5 P.5 Sky Water Sky Water (He et al, CVPR 2004)
46 Adding contextual information Incorporating spatial context Labels are generally spatially smooth Image inputs Local image cues Contextual information
47 An example: A simple smooth model Same labeling for neighboring pixels unless an intensity gradient exists Unary only + Pairwise (Shotton et al, ECCV2006)
48 Conditional Random Field framework Input Output CRF model Unary potential Pairwise potential Higher-order potential Examples: surface normal, object class, depth, etc.
49 Conditional Random Field framework Energy minimization perspective Label prediction: MAP estimation Global optimization of combinatorial problems Design choices in scene modeling Feature representation Modeling context Integrating top-down information
50 Image features and unary potentials Manually designed features Stuff class: local features Thing class: + shape cues Global image features Deep network features (Long, Shelhamer and Darrel, Arxiv 2014) (Farabet, et al, PAMI2013)
51 Image features and unary potentials
52 Pixel vs superpixel Pixel representation Redundancy Leading to complex models Super-pixel representation Over-segmentation Reduced model size Larger support for feature extraction Fast and regular-shaped e.g. SLIC (Achanta, et al, PAMI2012) implemented in VLFeat. Irregular graph and Inaccurate object boundaries Better to combine both representations.
53 Modeling context Local context Bottom-up grouping Superpixels to labels Regional context Pairwise interaction between neighboring regions Spatial relations between regions (Galleguillos et al., ICCV07, CVPR08)
54 Modeling longer-range context Fully-connected CRFs (Krahenbuhl and Koltun, NIPS2012) Higher-order models (Kohli et al., CVPR08; Park & Gould, ECCV12)
55 Integrating object-specific cues Previous potentials: smoothing Object shape mask as a top-down cue Integrating scene classification, object detection, etc (Yao, et al. CVPR 2012)
56 Integrating object-specific cues Object shape mask as a top-down cue Integrating object detection with semantic video labeling (Liu, et al. WACV & CVPR 2015) NICTA Copyright 2012 From imagination to impact
57 Datasets and software Datasets Stanford Background Dataset ; Microsoft Research Cambridge Dataset Pascal VOC; Labelme Dataset; MSCOCO Dataset Software packages Darwin software framework ALE (Automatic Labeling Environment)
58 Summary Pixel labeling and CRF framework Design choices in semantic segmentation Image feature representation Modeling context (short-range vs. long-range) Integrating top-down information at object and scene level Ongoing research directions Deep network features and CRF framework Nonparametric label transfer Multiple modality in scene labeling Gould and He, Scene Understanding by Labeling Pixels. Communications of the ACM, 2014 NICTA Copyright 2012 From imagination to impact 60
Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Deformable Part Models with CNN Features
Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we
Object Recognition. Selim Aksoy. Bilkent University [email protected]
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] Image classification Image (scene) classification is a fundamental
Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose
The Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
Convolutional Feature Maps
Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview
Scalable Object Detection by Filter Compression with Regularized Sparse Coding
Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015
CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing
Semantic Image Segmentation and Web-Supervised Visual Learning
Semantic Image Segmentation and Web-Supervised Visual Learning Florian Schroff Andrew Zisserman University of Oxford, UK Antonio Criminisi Microsoft Research Ltd, Cambridge, UK Outline Part I: Semantic
Robust Real-Time Face Detection
Robust Real-Time Face Detection International Journal of Computer Vision 57(2), 137 154, 2004 Paul Viola, Michael Jones 授 課 教 授 : 林 信 志 博 士 報 告 者 : 林 宸 宇 報 告 日 期 :96.12.18 Outline Introduction The Boost
Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance
2012 IEEE International Conference on Multimedia and Expo Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance Rogerio Feris, Sharath Pankanti IBM T. J. Watson Research Center
Edge Boxes: Locating Object Proposals from Edges
Edge Boxes: Locating Object Proposals from Edges C. Lawrence Zitnick and Piotr Dollár Microsoft Research Abstract. The use of object proposals is an effective recent approach for increasing the computational
Finding people in repeated shots of the same scene
Finding people in repeated shots of the same scene Josef Sivic 1 C. Lawrence Zitnick Richard Szeliski 1 University of Oxford Microsoft Research Abstract The goal of this work is to find all occurrences
Multi-view Face Detection Using Deep Convolutional Neural Networks
Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo [email protected] Mohammad Saberian Yahoo [email protected] Li-Jia Li Yahoo [email protected]
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
Pedestrian Detection using R-CNN
Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection
Localizing 3D cuboids in single-view images
Localizing 3D cuboids in single-view images Jianxiong Xiao Bryan C. Russell Antonio Torralba Massachusetts Institute of Technology University of Washington Abstract In this paper we seek to detect rectangular
Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning
: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May
How To Generate Object Proposals On A Computer With A Large Image Of A Large Picture
Geodesic Object Proposals Philipp Krähenbühl 1 and Vladlen Koltun 2 1 Stanford University 2 Adobe Research Abstract. We present an approach for identifying a set of candidate objects in a given image.
How To Model The Labeling Problem In A Conditional Random Field (Crf) Model
A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes Christian Wojek and Bernt Schiele {wojek, schiele}@cs.tu-darmstadt.de Computer Science Department TU Darmstadt Abstract.
Bert Huang Department of Computer Science Virginia Tech
This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students
Latest Advances in Deep Learning. Yao Chou
Latest Advances in Deep Learning Yao Chou Outline Introduction Images Classification Object Detection R-CNN Traditional Feature Descriptor Selective Search Implementation Latest Application Deep Learning
How To Use A Near Neighbor To A Detector
Ensemble of -SVMs for Object Detection and Beyond Tomasz Malisiewicz Carnegie Mellon University Abhinav Gupta Carnegie Mellon University Alexei A. Efros Carnegie Mellon University Abstract This paper proposes
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
High Level Describable Attributes for Predicting Aesthetics and Interestingness
High Level Describable Attributes for Predicting Aesthetics and Interestingness Sagnik Dhar Vicente Ordonez Tamara L Berg Stony Brook University Stony Brook, NY 11794, USA [email protected] Abstract
What, Where & How Many? Combining Object Detectors and CRFs
What, Where & How Many? Combining Object Detectors and CRFs L ubor Ladický, Paul Sturgess, Karteek Alahari, Chris Russell, and Philip H.S. Torr Oxford Brookes University http://cms.brookes.ac.uk/research/visiongroup
MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected]
Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected] Department of Applied Mathematics Ecole Centrale Paris Galen
Fast Semantic Segmentation of 3D Point Clouds using a Dense CRF with Learned Parameters
Fast Semantic Segmentation of 3D Point Clouds using a Dense CRF with Learned Parameters Daniel Wolf, Johann Prankl and Markus Vincze Abstract In this paper, we present an efficient semantic segmentation
Segmentation as Selective Search for Object Recognition
Segmentation as Selective Search for Object Recognition Koen E. A. van de Sande Jasper R. R. Uijlings Theo Gevers Arnold W. M. Smeulders University of Amsterdam University of Trento Amsterdam, The Netherlands
Object Categorization using Co-Occurrence, Location and Appearance
Object Categorization using Co-Occurrence, Location and Appearance Carolina Galleguillos Andrew Rabinovich Serge Belongie Department of Computer Science and Engineering University of California, San Diego
Segmentation & Clustering
EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,
Decomposing a Scene into Geometric and Semantically Consistent Regions
Decomposing a Scene into Geometric and Semantically Consistent Regions Stephen Gould Dept. of Electrical Engineering Stanford University [email protected] Richard Fulton Dept. of Computer Science Stanford
Fast R-CNN Object detection with Caffe
Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets
Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006
Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,
Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation
Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation Ricardo Guerrero-Gómez-Olmedo, Roberto López-Sastre, Saturnino Maldonado-Bascón, and Antonio Fernández-Caballero 2 GRAM, Department of
CAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Jan 26, 2016 Today Administrivia A bigger picture and some common questions Object detection proposals, by Samer
Learning Spatial Context: Using Stuff to Find Things
Learning Spatial Context: Using Stuff to Find Things Geremy Heitz Daphne Koller Department of Computer Science, Stanford University {gaheitz,koller}@cs.stanford.edu Abstract. The sliding window approach
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Philip Lenz 1 Andreas Geiger 2 Christoph Stiller 1 Raquel Urtasun 3 1 KARLSRUHE INSTITUTE OF TECHNOLOGY 2 MAX-PLANCK-INSTITUTE IS 3
SLIC Superpixels. Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk
SLIC Superpixels Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk School of Computer and Communication Sciences (IC) École Polytechnique Fédrale de Lausanne
Improving Spatial Support for Objects via Multiple Segmentations
Improving Spatial Support for Objects via Multiple Segmentations Tomasz Malisiewicz and Alexei A. Efros Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 Abstract Sliding window scanning
Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
Unsupervised Discovery of Mid-Level Discriminative Patches
Unsupervised Discovery of Mid-Level Discriminative Patches Saurabh Singh, Abhinav Gupta, and Alexei A. Efros Carnegie Mellon University, Pittsburgh, PA 15213, USA http://graphics.cs.cmu.edu/projects/discriminativepatches/
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.
Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy
Image Classification for Dogs and Cats
Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science [email protected] Abstract
Big Data: Image & Video Analytics
Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)
Informed Haar-like Features Improve Pedestrian Detection
Informed Haar-like Features Improve Pedestrian Detection Shanshan Zhang, Christian Bauckhage, Armin B. Cremers University of Bonn, Germany Fraunhofer IAIS, Germany Bonn-Aachen International Center for
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Taking Inverse Graphics Seriously
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best
Randomized Trees for Real-Time Keypoint Recognition
Randomized Trees for Real-Time Keypoint Recognition Vincent Lepetit Pascal Lagger Pascal Fua Computer Vision Laboratory École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland Email:
Pictorial Structures Revisited: People Detection and Articulated Pose Estimation
Pictorial Structures Revisited: People Detection and Articulated Pose Estimation Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele Department of Computer Science, TU Darmstadt Abstract Non-rigid object
Part-Based Recognition
Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple
SEMANTIC CONTEXT AND DEPTH-AWARE OBJECT PROPOSAL GENERATION
SEMANTIC TEXT AND DEPTH-AWARE OBJECT PROPOSAL GENERATION Haoyang Zhang,, Xuming He,, Fatih Porikli,, Laurent Kneip NICTA, Canberra; Australian National University, Canberra ABSTRACT This paper presents
Interactive Offline Tracking for Color Objects
Interactive Offline Tracking for Color Objects Yichen Wei Jian Sun Xiaoou Tang Heung-Yeung Shum Microsoft Research Asia, Beijing, China {yichenw,jiansun,xitang,hshum}@microsoft.com Abstract In this paper,
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
Human Pose Estimation from RGB Input Using Synthetic Training Data
Human Pose Estimation from RGB Input Using Synthetic Training Data Oscar Danielsson and Omid Aghazadeh School of Computer Science and Communication KTH, Stockholm, Sweden {osda02, omida}@kth.se arxiv:1405.1213v2
Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006
Object Recognition Large Image Databases and Small Codes for Object Recognition Pixels Description of scene contents Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)
Mean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
Geometric Context from a Single Image
Geometric Context from a Single Image Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University {dhoiem,efros,hebert}@cs.cmu.edu Abstract Many computer vision algorithms limit their performance
R-CNN minus R. 1 Introduction. Karel Lenc http://www.robots.ox.ac.uk/~karel. Department of Engineering Science, University of Oxford, Oxford, UK.
LENC, VEDALDI: R-CNN MINUS R 1 R-CNN minus R Karel Lenc http://www.robots.ox.ac.uk/~karel Andrea Vedaldi http://www.robots.ox.ac.uk/~vedaldi Department of Engineering Science, University of Oxford, Oxford,
Learning and transferring mid-level image representions using convolutional neural networks
Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there
Bringing Semantics Into Focus Using Visual Abstraction
Bringing Semantics Into Focus Using Visual Abstraction C. Lawrence Zitnick Microsoft Research, Redmond [email protected] Devi Parikh Virginia Tech [email protected] Abstract Relating visual information
Bottom-up Segmentation for Top-down Detection
Bottom-up Segmentation for Top-down Detection Sanja Fidler Roozbeh Mottaghi 2 Alan Yuille 2 Raquel Urtasun TTI Chicago, 2 UCLA {fidler, rurtasun}@ttic.edu, {roozbehm@cs, yuille@stat}.ucla.edu Abstract
CS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN Goal is to process 360 degree images and detect two object categories 1. Pedestrians,
Machine Learning in Computer Vision A Tutorial. Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN
Machine Learning in Computer Vision A Tutorial Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN Outline Introduction Supervised Learning Unsupervised Learning Semi-Supervised
Multi-View Object Class Detection with a 3D Geometric Model
Multi-View Object Class Detection with a 3D Geometric Model Joerg Liebelt IW-SI, EADS Innovation Works D-81663 Munich, Germany [email protected] Cordelia Schmid LEAR, INRIA Grenoble F-38330 Montbonnot,
Introduction. Selim Aksoy. Bilkent University [email protected]
Introduction Selim Aksoy Department of Computer Engineering Bilkent University [email protected] What is computer vision? What does it mean, to see? The plain man's answer (and Aristotle's, too)
Image and Video Understanding
Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
Task-driven Progressive Part Localization for Fine-grained Recognition
Task-driven Progressive Part Localization for Fine-grained Recognition Chen Huang Zhihai He [email protected] University of Missouri [email protected] Abstract In this paper we propose a task-driven
Color Segmentation Based Depth Image Filtering
Color Segmentation Based Depth Image Filtering Michael Schmeing and Xiaoyi Jiang Department of Computer Science, University of Münster Einsteinstraße 62, 48149 Münster, Germany, {m.schmeing xjiang}@uni-muenster.de
LabelMe: Online Image Annotation and Applications
INVITED PAPER LabelMe: Online Image Annotation and Applications By developing a publicly available tool that allows users to use the Internet to quickly and easily annotate images, the authors were able
Jiří Matas. Hough Transform
Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian
A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow
, pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices
A Convolutional Neural Network Cascade for Face Detection
A Neural Network Cascade for Face Detection Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Gang Hua Stevens Institute of Technology Hoboken, NJ 07030 {hli18, ghua}@stevens.edu Adobe Research San
Multi-fold MIL Training for Weakly Supervised Object Localization
Multi-fold MIL Training for Weakly Supervised Object Localization Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid To cite this version: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid. Multi-fold
Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011
Behavior Analysis in Crowded Environments XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011 Behavior Analysis in Sparse Scenes Zelnik-Manor & Irani CVPR
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS Norbert Buch 1, Mark Cracknell 2, James Orwell 1 and Sergio A. Velastin 1 1. Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE,
Do Convnets Learn Correspondence?
Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained
LIBSVX and Video Segmentation Evaluation
CVPR 14 Tutorial! 1! LIBSVX and Video Segmentation Evaluation Chenliang Xu and Jason J. Corso!! Computer Science and Engineering! SUNY at Buffalo!! Electrical Engineering and Computer Science! University
Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models
Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models Mathieu Aubry 1, Daniel Maturana 2 Alexei A. Efros 3, Bryan C. Russell 4 Josef Sivic 1, 1 INRIA 2 Carnegie Mellon
Classroom Monitoring System by Wired Webcams and Attendance Management System
Classroom Monitoring System by Wired Webcams and Attendance Management System Sneha Suhas More, Amani Jamiyan Madki, Priya Ranjit Bade, Upasna Suresh Ahuja, Suhas M. Patil Student, Dept. of Computer, KJCOEMR,
Automatic Maritime Surveillance with Visual Target Detection
Automatic Maritime Surveillance with Visual Target Detection Domenico Bloisi, PhD [email protected] Maritime Scenario Maritime environment represents a challenging scenario for automatic video surveillance
The use of computer vision technologies to augment human monitoring of secure computing facilities
The use of computer vision technologies to augment human monitoring of secure computing facilities Marius Potgieter School of Information and Communication Technology Nelson Mandela Metropolitan University
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Object class recognition using unsupervised scale-invariant learning
Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of Technology Goal Recognition of object categories
Who are you? Learning person specific classifiers from video
Who are you? Learning person specific classifiers from video Josef Sivic, Mark Everingham 2 and Andrew Zisserman 3 INRIA, WILLOW Project, Laboratoire d Informatique de l Ecole Normale Superieure, Paris,
Feature Tracking and Optical Flow
02/09/12 Feature Tracking and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Many slides adapted from Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve
Density-aware person detection and tracking in crowds
Density-aware person detection and tracking in crowds Mikel Rodriguez 1,4 Ivan Laptev 2,4 Josef Sivic 2,4 Jean-Yves Audibert 3,4 1 École Normale Supérieure 2 INRIA 3 Imagine, LIGM, Universite Paris-Est
