# Decomposing a Scene into Geometric and Semantically Consistent Regions

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 variable R p {1,...,K}. Let the set of pixels in region r be denoted by P r = {p : R p = r}. The size of the region (number of pixels) is then N r = P r = p 1{R p = r}. Each pixel has a local appearance feature vector α p R n (described in Section 3.1 below). Associated with each region are: a semantic class label S r, currently grass, mountain, water, sky, road, tree, building and foreground; a geometry G r, currently horizontal, vertical, and sky; and an appearance A r that summarizes the appearance of the region as a whole. The final component in our model is the horizon. We assume that the image was taken by a camera with horizontal axis parallel to the ground. We therefore model the location of the horizon as the row in the image corresponding to the horizon v hz {1,...,height(I)}. Given an image I and model parameters θ, our unified energy function scores the entire description of the scene: the pixel-to-region associations R; the region semantic class labels S, geometries G, and appearances A; and the location of the horizon v hz : E(R,S,G,A,v hz,k I,θ) = + θ horizon ψ horizon (v hz ) (1) + θ region r ψregion r (S r,g r,v hz ;A r, P r ) (2) + θ pair rs ψpair rs (S r,g r,s s,g s ;A r, P r,a s, P s ) (3) + θ boundary pq ψboundary pq (R p,r q ;α p,α q ). (4) We now describe each of the components of our model Characterizing Individual Region Appearance For each pixel p in the image, we construct a local appearance descriptor vector α p comprised of raw image features and discriminitively learned boosted features. Our raw image features, which are computed in a small neighborhood of the pixel, are identical to the 17-dimensional color and texture features described in [17]. We augment these raw features with more processed summaries that represent the match between the pixel s local neighborhood and each of the region labels. In particular, for each (individual) semantic and geometric label we learn a one-vs-all boosted classifier to predict the label given the raw image features in a small neighborhood around the pixel. 1 We then append the score (log-odds ratio) from each boosted classifier to our pixel appearance feature vector α p. In our experiments, we set the region appearance A r to be the maximum-likelihood Gaussian parameters over the appearance of pixels within the region: A r = ( ) µ A r,σ A r where µ A r R n and Σ A r R n n are the mean and covariance matrix for the appearance vectors α p of pixels in 1 In our experiments we append to the pixel s 17 features, the average and variance for each feature over a 5 5-pixel window in 9 grid locations around the pixel and the image row to give a total of 324 features. We use the GentleBoost algorithm with 2-split decision stumps and train for 500 rounds. Our results appeared robust to the choice of parameters. the r-th region. These summary statistics give us a more robust estimator of the appearance of the region than would be obtained by considering only small neighborhoods of the individual pixels Individual Region Potentials To define the potentials that help infer the label of individual regions, we extract features φ r (A r, P r ) R n describing the region appearance and basic shape. Our appearance features include the mean and covariance µ A r,σ A r, the log-determinant of Σ A r, and the average contrast at the region boundary and region interior. In addition to relating to semantic class grass is green the appearance features provide a measure for the quality of a region well-formed regions will tend to have strong boundary contrast and (depending on the class) little variation of interior appearance. We also want to capture more global characteristics of our larger regions. For example, we would like to capture the fact that buildings tend to be vertical with many straight lines, trees tend to be green and textured, and grass tends to be green and horizontal. Thus, we incorporate shape features that include normalized region area, perimeter, first and second x- and y-moments, and residual to a robust line fit along the top and bottom boundary of the region. The latter features capture the fact that buildings tend to have straight boundaries while trees tend to be rough. We also include the horizon variable in the regionspecific potential, allowing us to include features that measure the ratio of pixels in the region above and below the horizon. These features give us a sense of the scale of the object and its global position in the scene. For example, buildings are tall and tend to have more mass above the horizon than below it; foreground objects are often close and will have most of their mass below the horizon. Conversely, these potentials also allow us to capture the strong positional correlation between the horizon and semantic classes such as sky or ground, allowing us to use the same potential to place the horizon within the image. To put all of these features together, we learn a multiclass logistic classifier for S r G r with a quadratic kernel over φ r (see Section 4.2). The score for any assignment to the region variables is then: ψr region (S r,g r,v hz ;A r, P r ) = N r log σ ( S r G r φ r (A r, P r ),v hz), where σ( ) is the multi-class logistic function with learned parameters. We scale the potential by the region size N r so that our score gives more weight to larger regions and is independent of the number of regions in the image Inter-Region Potentials Our model contains two types of inter-region potentials. The first of these is ψpq boundary (R p,r q ;α p,α q ), which is the standard contrast-dependent pairwise boundary potential [17]. For two adjacent pixels p and q, we de-

5 We then update the horizon v hz using Iterated Conditional Modes (ICM). 3 The new configuration is evaluated relative to our global energy function, and kept if it provides an improvement. The algorithm iterates until convergence. In our experiments (Section 5) inference took between 30 seconds and 10 minutes to converge depending on image complexity (i.e., number of segments in Ω) Learning Algorithm We train our model using a labeled dataset where each image is segmented into regions that are semantically and geometrically coherent. Thus, our ground truth specifies both the region association for each pixel and the labels for each region. We learn each term ψ horizon, ψ region and ψ pair in our energy function separately, using our labeled training data. We then cross-validate the weights between the terms using a subset of our training data. Since only the relative weighting between terms matter, we fixed θ region to one. For the horizon singleton term, we learn a Gaussian over the location of the horizon in training images and set ψ horizon (v hz ) to be the log-likelihood of v hz given this model. We normalize ψ horizon (v hz ) by the image height to make this model resolution invariant. Our learned Gaussian has a mean of approximately 0.5 and standard deviation of 0.15 (varying slightly across experiment folds). This suggests that the horizon in our dataset is quite well spread around the center of the image. The within-region term, ψ region, and the between-region term, ψ pair, are learned using multi-class logistic regression. However, the training of the within-region term involves an important subtlety. One of the main roles of this term is to help recognize when a given collection of pixels is actually a coherent region one corresponding to a single semantic class and a single geometry. Although all of the regions in our training set are coherent, many of the moves proposed during the course of inference are not. For example, our algorithm may propose a move that merges together pixels containing (horizontal) grass and pixels containing (vertical) trees. We want to train our classifier to recognize invalid moves and penalize them. To penalize such moves, we train our multi-class logistic regression classifier with an additional invalid label. This label cannot be assigned to a candidate region during inference, and so if the proposed region r appears incoherent, the invalid label will get high probability, reducing the probability for all (valid) labels in S r G r. This induces a high energy for the new proposed assignment, making it likely to be rejected. To train a discriminative classifier that distinguishes be- 3 We experimented with including v hz in the belief propagation inference but found that it changed very little from one iteration to the next and was therefore more efficient to infer conditionally (using ICM) once the other variables were assigned. tween coherent and incoherent regions, we need to provide it with negative (incoherent) training instances. Here, we cannot simply collect arbitrary subsets of adjacent pixels that do not correspond to coherent regions: Most arbitrary subsets of pixels are easily seen to be incoherent, so that a discriminative model trained with such subsets as negative examples is unlikely to learn a meaningful decision boundary. Therefore, we use a novel closed-loop learning procedure, where the algorithm trains on its own mistakes. We begin by training our classifier where the negative examples are defined by merging pairs of adjacent ground truth regions (which are not consistent with each other). We then perform inference (on our training set) using this model. During each proposal move we evaluate the outcome of inference with the ground truth annotations. We append to our training set moves that were incorrectly accepted or rejected, or moves that were accepted (resulted in lower energy) but produced an incorrect labeling of the region variables. In this way, we can target the training of our decision boundary on the more troublesome examples. 5. Experimental Results We conduct experiments on a set of 715 images of urban and rural scenes assembled from a collection of public image datasets: LabelMe [15], MSRC [2], PASCAL [4], and Geometric Context (GC) [10]. Our selection criteria were for the images to have approximately pixels, contain at least one foreground object and have the horizon positioned within the image (it need not be visible). We perform 5-fold cross-validation with the dataset randomly split into 572 training images and 143 test images for each fold. The quality of our annotations (obtained from Amazon Mechanical Turk) is extremely good and in many cases superior to those provided by the original datasets. Images and labels are available for download from the first author s website. Baselines. To validate our method and provide strong baselines for comparison, we performed experiments on independent multi-class image segmentation and geometry prediction using standard pixelwise CRF models. Here the pixel class S p (or surface geometry G p ) is predicted separately for each pixel p I given the pixel s appearance α p (see Section 3.1). A contrast-dependent pairwise smoothness term is added to encourage adjacent pixels to take the same value. The models have the form E(S I) = p ψ p (S p ;α p ) + θ pq ψ pq (S p,s q ;α p,α q ) and similarly for E(G I). In this model, each pixel can be thought of as belonging to its own region. The parameters are learned as described above with ψ p a multi-class logistic over boosted appearance features and ψ pq the boundary penalty. The baseline results are shown in Table 1.

6 Figure 2. Examples of typical scene decompositions produced by our method. Show for each image are regions (top right), semantic class overlay (bottom left), and surface geometry with horizon (bottom right). Best viewed in color. Region-based Approach. Multi-class image segmentation and surface orientation results from our region-based approach are shown below the baseline results in Table 1. Our improvement of 2.1% over baseline for multi-class segmentation and 1.9% for surface orientation is significant. In particular, we observed an improvement in each of our five folds. Our horizon prediction was within an average of 6.9% (relative to image height) of the true horizon. In order to evaluate the quality of our decomposition, we computed the overlap score between our boundary predictions and our hand annotated boundaries. To make this metric robust we first dilate both the predicted and ground truth boundaries by five pixels. We then compute the overlap score by dividing the total number of overlapping pixels by half the total number of (dilated) boundary pixels (ground truth and predicted). A score of one indicates perfect overlap. We averaged across the five folds indicating that on average we get about half of the semantic boundaries correct. For comparison, using the baseline class predictions gives a boundary overlap score of The boundary score result reflects our algorithm s tendency to break regions into multiple segments. For example, it tends to leave windows separated from buildings and people s torsos separated from their legs (as can be seen in Figure 2). This is not surprising given the strong appearance difference between these different parts. We hope to extend our model in the future with object specific appearance and shape models so that we can avoid these types of errors. Figures 3 and 4 show some good and bad examples, respectively. Notice the high quality of the class and geometry predictions particularly at the boundary between classes and how our algorithm deals well with both near and far objects. There are still many mistakes that we would like to CLASS GEOMETRY Pixel CRF (baseline) 74.3 (0.80) 89.1 (0.73) Region-based energy 76.4 (1.22) 91.0 (0.56) Table 1. Multi-class image segmentation and surface orientation (geometry) accuracy. Standard deviation shown in parentheses. MSRC GC TextonBoost [17] 72.2 Hoiem et al. [10]: Yang et al. [24] 75.1 pixel model 82.1 Gould et al. [5] 76.5 full model 88.1 Pixel CRF 75.3 Pixel CRF 86.5 Region-based 76.4 Region-based 86.9 Table 2. Comparison with state-of-the-art MSRC and GC results against our restricted model. Table shows mean pixel accuracy. address in future work. For example, our algorithm is often confused by strong shadows and reflections in water as can be seen in some of the examples in Figure 4. We hope that with stronger geometric reasoning we can avoid this problem. Also, without knowledge of foreground subclasses, our algorithm sometimes merges a person with a building or confuses boat masts with buildings. Comparison with Other Methods. We also compared our method with state-of-the-art techniques on the 21-class MSRC [2] and 3-class GC [10] datasets. To make our results directly comparable with published works, we removed components from our model not available in the ground-truth labels for the respective datasets. That is, for MSRC we only use semantic class labels and for GC we only use (main) geometry labels. Neither model used horizon information. Despite these restrictions, our regionbased energy approach is still competitive with state-of-theart. Results are shown in Table Application to 3D Reconstruction The output of our model can be used to generate novel 3D views of the scene. Our approach is very simple and obtains its power from our region-based decomposition rather than sophisticated features tuned for the task. Nevertheless, the results from our approach are surprisingly good compared to the state-of-the-art (see Figure 5 for some examples). Since our model does not provide true depth estimates our goal here is to produce planar geometric reconstructions of each region with accurate relative distances rather than absolute distance. Given an estimate of the distance between any two points in the scene, our 3D reconstruction can then be scaled to the appropriate size. Our rules for reconstruction are simple. Briefly, we assume an ideal camera model with horizontal (x) axis parallel to the ground. We fix the camera origin at 1.8m above the ground (i.e., y = 1.8). We then estimate the yz-rotation of the camera from the location of the horizon (assumed to be at depth ) as θ = tan 1 ( 1 f (vhz v 0 )) where v 0 is half

7 Figure 3. Representative results when our model does well. Each cell shows original image (left), overlay of semantic class label (center), and surface geometry (right) for each image. Predicted location of horizon is shown on the geometry image. Best viewed in color. Figure 4. Examples of where our algorithm makes mistakes, such as mislabeling of road as water (top left), or confusing boat masts as buildings (bottom right). We also have difficulty with shadows and reflections. Best viewed in color. the image height and f is the focal length of the camera. Now the 3D location of every pixel p = (u,v) lies along the ray r p = R(θ) 1 K 1 [u v 1] T, where R(θ) is the rotation matrix and K is the camera model [6]. It remains to scale this ray appropriately. We process each region in the image depending on its semantic class. For ground plane regions (road, grass and water) we scale the ray to make the height zero. We model each vertical region (tree, building, mountain and foreground) as a planar surface whose orientation and depth with respect to the camera are estimated by fitting a robust line over the pixels along its boundary with the ground plane. This produced good results despite the fact that not all of these pixels are actually adjacent to the ground in 3D (such as the belly of the cow in Figure 5). When a region does not touch the ground (that is, it is occluded by another object), we estimate its orientation using pixels on its bottom-most boundary. We then place the region half way between the depth of the occluding object and maximum possible depth (either the horizon or the point at which the ground plane would become visible beyond the occluding object). The 3D location of each pixel p in the region is determined by the intersection of this plane and the ray r p. Finally, sky regions are placed behind the last vertical region Discussion and Future Work In this work, we addressed the problem of decomposing a scene into geometrically and semantically coherent regions. Our method combines reasoning over both pixels and regions through a unified energy function. We proposed 4 Technically, sky regions should be placed at the horizon, but since the horizon has infinite depth, we choose to render sky regions closer, so as to make them visible in our viewer.

### Using geometry and related things

Using geometry and related things Region labels + Boundaries and objects Stronger geometric constraints from domain knowledge Reasoning on aspects and poses 3D point clouds Qualitative More quantitative

### 3D Model based Object Class Detection in An Arbitrary View

3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/

### Semantic Recognition: Object Detection and Scene Segmentation

Semantic Recognition: Object Detection and Scene Segmentation Xuming He xuming.he@nicta.com.au Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei

### Segmentation & Clustering

EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,

### Local features and matching. Image classification & object localization

Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

### Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### Object Categorization using Co-Occurrence, Location and Appearance

Object Categorization using Co-Occurrence, Location and Appearance Carolina Galleguillos Andrew Rabinovich Serge Belongie Department of Computer Science and Engineering University of California, San Diego

### What, Where & How Many? Combining Object Detectors and CRFs

What, Where & How Many? Combining Object Detectors and CRFs L ubor Ladický, Paul Sturgess, Karteek Alahari, Chris Russell, and Philip H.S. Torr Oxford Brookes University http://cms.brookes.ac.uk/research/visiongroup

### Learning Spatial Context: Using Stuff to Find Things

Learning Spatial Context: Using Stuff to Find Things Geremy Heitz Daphne Koller Department of Computer Science, Stanford University {gaheitz,koller}@cs.stanford.edu Abstract. The sliding window approach

### The calibration problem was discussed in details during lecture 3.

1 2 The calibration problem was discussed in details during lecture 3. 3 Once the camera is calibrated (intrinsics are known) and the transformation from the world reference system to the camera reference

### Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo

### Mean-Shift Tracking with Random Sampling

1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of

### Object Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr

Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image classification Image (scene) classification is a fundamental

### A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes

A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes Christian Wojek and Bernt Schiele {wojek, schiele}@cs.tu-darmstadt.de Computer Science Department TU Darmstadt Abstract.

### Improving Spatial Support for Objects via Multiple Segmentations

Improving Spatial Support for Objects via Multiple Segmentations Tomasz Malisiewicz and Alexei A. Efros Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 Abstract Sliding window scanning

### A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

### Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

### Learning 3-D Scene Structure from a Single Still Image

Learning 3-D Scene Structure from a Single Still Image Ashutosh Saxena, Min Sun and Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305 {asaxena,aliensun,ang}@cs.stanford.edu

### Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 3 Object detection

### Edge Boxes: Locating Object Proposals from Edges

Edge Boxes: Locating Object Proposals from Edges C. Lawrence Zitnick and Piotr Dollár Microsoft Research Abstract. The use of object proposals is an effective recent approach for increasing the computational

### Segmentation. Lecture 12. Many slides from: S. Lazebnik, K. Grauman and P. Kumar

Segmentation Lecture 12 Many slides from: S. Lazebnik, K. Grauman and P. Kumar Image Segmentation Image segmentation The goals of segmentation Group together similar-looking pixels for efficiency of further

### Classification using intersection kernel SVMs is efficient

Classification using intersection kernel SVMs is efficient Jitendra Malik UC Berkeley Joint work with Subhransu Maji and Alex Berg Fast intersection kernel SVMs and other generalizations of linear SVMs

### Finding people in repeated shots of the same scene

Finding people in repeated shots of the same scene Josef Sivic 1 C. Lawrence Zitnick Richard Szeliski 1 University of Oxford Microsoft Research Abstract The goal of this work is to find all occurrences

### Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

### Speed Performance Improvement of Vehicle Blob Tracking System

Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu, nevatia@usc.edu Abstract. A speed

### Geometric Context from a Single Image

Geometric Context from a Single Image Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University {dhoiem,efros,hebert}@cs.cmu.edu Abstract Many computer vision algorithms limit their performance

### Robert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006

More on Mean-shift R.Collins, CSE, PSU Spring 2006 Recall: Kernel Density Estimation Given a set of data samples x i ; i=1...n Convolve with a kernel function H to generate a smooth function f(x) Equivalent

### Semantic Image Segmentation and Web-Supervised Visual Learning

Semantic Image Segmentation and Web-Supervised Visual Learning Florian Schroff Andrew Zisserman University of Oxford, UK Antonio Criminisi Microsoft Research Ltd, Cambridge, UK Outline Part I: Semantic

### Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

### Efficient visual search of local features. Cordelia Schmid

Efficient visual search of local features Cordelia Schmid Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million images

### Polygonal Approximation of Closed Curves across Multiple Views

Polygonal Approximation of Closed Curves across Multiple Views M. Pawan Kumar Saurabh Goyal C. V. Jawahar P. J. Narayanan Centre for Visual Information Technology International Institute of Information

### Pedestrian Detection with RCNN

Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University mcc17@stanford.edu Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional

### Snapshot of the online application for image annotation.

1 LabelMe: online image annotation and applications Antonio Torralba, Bryan C. Russell, and Jenny Yuen Abstract Central to the development of computer vision systems is the collection and use of annotated

### VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS Norbert Buch 1, Mark Cracknell 2, James Orwell 1 and Sergio A. Velastin 1 1. Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE,

### Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

### Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Philip Lenz 1 Andreas Geiger 2 Christoph Stiller 1 Raquel Urtasun 3 1 KARLSRUHE INSTITUTE OF TECHNOLOGY 2 MAX-PLANCK-INSTITUTE IS 3

### Face Recognition using SIFT Features

Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

### Classification of Fine Art Oil Paintings by Semantic Category

Classification of Fine Art Oil Paintings by Semantic Category William Kromydas kromydas@stanford.edu Abstract In this paper we explore supervised learning techniques that are able to classify fine-art

### Fast Semantic Segmentation of 3D Point Clouds using a Dense CRF with Learned Parameters

Fast Semantic Segmentation of 3D Point Clouds using a Dense CRF with Learned Parameters Daniel Wolf, Johann Prankl and Markus Vincze Abstract In this paper, we present an efficient semantic segmentation

### C4 Computer Vision. 4 Lectures Michaelmas Term Tutorial Sheet Prof A. Zisserman. fundamental matrix, recovering ego-motion, applications.

C4 Computer Vision 4 Lectures Michaelmas Term 2004 1 Tutorial Sheet Prof A. Zisserman Overview Lecture 1: Stereo Reconstruction I: epipolar geometry, fundamental matrix. Lecture 2: Stereo Reconstruction

### LIBSVX and Video Segmentation Evaluation

CVPR 14 Tutorial! 1! LIBSVX and Video Segmentation Evaluation Chenliang Xu and Jason J. Corso!! Computer Science and Engineering! SUNY at Buffalo!! Electrical Engineering and Computer Science! University

### MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Department of Applied Mathematics Ecole Centrale Paris Galen

### A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA N. Zarrinpanjeh a, F. Dadrassjavan b, H. Fattahi c * a Islamic Azad University of Qazvin - nzarrin@qiau.ac.ir

### LabelMe: Online Image Annotation and Applications

INVITED PAPER LabelMe: Online Image Annotation and Applications By developing a publicly available tool that allows users to use the Internet to quickly and easily annotate images, the authors were able

### Template-based Eye and Mouth Detection for 3D Video Conferencing

Template-based Eye and Mouth Detection for 3D Video Conferencing Jürgen Rurainsky and Peter Eisert Fraunhofer Institute for Telecommunications - Heinrich-Hertz-Institute, Image Processing Department, Einsteinufer

### A New Robust Algorithm for Video Text Extraction

A New Robust Algorithm for Video Text Extraction Pattern Recognition, vol. 36, no. 6, June 2003 Edward K. Wong and Minya Chen School of Electrical Engineering and Computer Science Kyungpook National Univ.

### Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC

Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain

### How does the Kinect work? John MacCormick

How does the Kinect work? John MacCormick Xbox demo Laptop demo The Kinect uses structured light and machine learning Inferring body position is a two-stage process: first compute a depth map (using structured

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Computer Vision: Filtering

Computer Vision: Filtering Raquel Urtasun TTI Chicago Jan 10, 2013 Raquel Urtasun (TTI-C) Computer Vision Jan 10, 2013 1 / 82 Today s lecture... Image formation Image Filtering Raquel Urtasun (TTI-C) Computer

### Fast Matching of Binary Features

Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

### Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

### Find Me: An Android App for Measurement and Localization

Find Me: An Android App for Measurement and Localization Grace Tsai and Sairam Sundaresan Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor Email: {gstsai,sairams}@umich.edu

### Delving into the whorl of flower segmentation

Delving into the whorl of flower segmentation Maria-Elena Nilsback and Andrew Zisserman Visual Geometry Group Deparment of Engineering Sciene, University of Oxford Abstract We describe an algorithm for

### Data, Measurements, Features

Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

### The Delicate Art of Flower Classification

The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC pvicol@sfu.ca Note: The following is my contribution to a group project for a graduate machine learning

### Determining optimal window size for texture feature extraction methods

IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec

### Computer Vision - part II

Computer Vision - part II Review of main parts of Section B of the course School of Computer Science & Statistics Trinity College Dublin Dublin 2 Ireland www.scss.tcd.ie Lecture Name Course Name 1 1 2

### Learning to Segment. Eran Borenstein and Shimon Ullman

Learning to Segment Eran Borenstein and Shimon Ullman Faculty of Mathematics and Computer Science Weizmann Institute of Science Rehovot, Israel 76100 {eran.borenstein,shimon.ullman}@weizmann.ac.il Abstract.

Image Segmentation Preview Segmentation subdivides an image to regions or objects Two basic properties of intensity values Discontinuity Edge detection Similarity Thresholding Region growing/splitting/merging

### Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

### Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,

### Improving Web-based Image Search via Content Based Clustering

Improving Web-based Image Search via Content Based Clustering Nadav Ben-Haim, Boris Babenko and Serge Belongie Department of Computer Science and Engineering University of California, San Diego {nbenhaim,bbabenko,sjb}@ucsd.edu

### Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform

### Character Image Patterns as Big Data

22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

### ColorCrack: Identifying Cracks in Glass

ColorCrack: Identifying Cracks in Glass James Max Kanter Massachusetts Institute of Technology 77 Massachusetts Ave Cambridge, MA 02139 kanter@mit.edu Figure 1: ColorCrack automatically identifies cracks

### CS229 Project Final Report. Sign Language Gesture Recognition with Unsupervised Feature Learning

CS229 Project Final Report Sign Language Gesture Recognition with Unsupervised Feature Learning Justin K. Chen, Debabrata Sengupta, Rukmani Ravi Sundaram 1. Introduction The problem we are investigating

### Automatic Labeling of Lane Markings for Autonomous Vehicles

Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 jkiske@stanford.edu 1. Introduction As autonomous vehicles become more popular,

### Edge tracking for motion segmentation and depth ordering

Edge tracking for motion segmentation and depth ordering P. Smith, T. Drummond and R. Cipolla Department of Engineering University of Cambridge Cambridge CB2 1PZ,UK {pas1001 twd20 cipolla}@eng.cam.ac.uk

### Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

Object Recognition Large Image Databases and Small Codes for Object Recognition Pixels Description of scene contents Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

### Interactive Offline Tracking for Color Objects

Interactive Offline Tracking for Color Objects Yichen Wei Jian Sun Xiaoou Tang Heung-Yeung Shum Microsoft Research Asia, Beijing, China {yichenw,jiansun,xitang,hshum}@microsoft.com Abstract In this paper,

### Introduction to Segmentation

Lecture 2: Introduction to Segmentation Jonathan Krause 1 Goal Goal: Identify groups of pixels that go together image credit: Steve Seitz, Kristen Grauman 2 Types of Segmentation Semantic Segmentation:

### Face Recognition in Low-resolution Images by Using Local Zernike Moments

Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie

### SLIC Superpixels. Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk

SLIC Superpixels Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk School of Computer and Communication Sciences (IC) École Polytechnique Fédrale de Lausanne

### Stereo Vision (Correspondences)

Stereo Vision (Correspondences) EECS 598-08 Fall 2014! Foundations of Computer Vision!! Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! Readings: FP 7; SZ 11; TV 7! Date: 10/27/14!!

### Neovision2 Performance Evaluation Protocol

Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram rajmadhan@mail.usf.edu Dmitry Goldgof, Ph.D. goldgof@cse.usf.edu Rangachar Kasturi, Ph.D.

### Object Separation in X-ray Image Sets. Geremy Heitz, Gal Chechik Qylur Security Systems CVPR 2010 June 16, 2010

Object Separation in X-ray Image Sets SATISϕ Geremy Heitz Gal Chechik Qylur Security Systems CVPR 2010 June 16 2010 Motivation Security checkpoints based on x-ray screening are increasingly important Our

### 10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

### An Iterative Image Registration Technique with an Application to Stereo Vision

An Iterative Image Registration Technique with an Application to Stereo Vision Bruce D. Lucas Takeo Kanade Computer Science Department Carnegie-Mellon University Pittsburgh, Pennsylvania 15213 Abstract

### Accurate 3D Ground Plane Estimation from a Single Image

Accurate 3D Ground Plane Estimation from a Single Image Anoop Cherian Vassilios Morellas Nikolaos Papanikolopoulos cherian@cs.umn.edu morellas@cs.umn.edu npapas@cs.umn.edu Department of Computer Science

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

### Build Panoramas on Android Phones

Build Panoramas on Android Phones Tao Chu, Bowen Meng, Zixuan Wang Stanford University, Stanford CA Abstract The purpose of this work is to implement panorama stitching from a sequence of photos taken

### CVChess: Computer Vision Chess Analytics

CVChess: Computer Vision Chess Analytics Jay Hack and Prithvi Ramakrishnan Abstract We present a computer vision application and a set of associated algorithms capable of recording chess game moves fully

### Vision based Vehicle Tracking using a high angle camera

Vision based Vehicle Tracking using a high angle camera Raúl Ignacio Ramos García Dule Shu gramos@clemson.edu dshu@clemson.edu Abstract A vehicle tracking and grouping algorithm is presented in this work

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### Multi-Scale Improves Boundary Detection in Natural Images

Multi-Scale Improves Boundary Detection in Natural Images Xiaofeng Ren Toyota Technological Institute at Chicago 1427 E. 60 Street, Chicago, IL 60637, USA xren@tti-c.org Abstract. In this work we empirically

### Perception-based Design for Tele-presence

Perception-based Design for Tele-presence Santanu Chaudhury 1, Shantanu Ghosh 1,2, Amrita Basu 3, Brejesh Lall 1, Sumantra Dutta Roy 1, Lopamudra Choudhury 3, Prashanth R 1, Ashish Singh 1, and Amit Maniyar

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### The Visual Internet of Things System Based on Depth Camera

The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed

### Quality Assessment for Crowdsourced Object Annotations

S. VITTAYAKORN, J. HAYS: CROWDSOURCED OBJECT ANNOTATIONS 1 Quality Assessment for Crowdsourced Object Annotations Sirion Vittayakorn svittayakorn@cs.brown.edu James Hays hays@cs.brown.edu Computer Science

### Wii Remote Calibration Using the Sensor Bar

Wii Remote Calibration Using the Sensor Bar Alparslan Yildiz Abdullah Akay Yusuf Sinan Akgul GIT Vision Lab - http://vision.gyte.edu.tr Gebze Institute of Technology Kocaeli, Turkey {yildiz, akay, akgul}@bilmuh.gyte.edu.tr

### Texture. Chapter 7. 7.1 Introduction

Chapter 7 Texture 7.1 Introduction Texture plays an important role in many machine vision tasks such as surface inspection, scene classification, and surface orientation and shape determination. For example,

### Model-based Chart Image Recognition

Model-based Chart Image Recognition Weihua Huang, Chew Lim Tan and Wee Kheng Leow SOC, National University of Singapore, 3 Science Drive 2, Singapore 117543 E-mail: {huangwh,tancl, leowwk@comp.nus.edu.sg}

### On Multifont Character Classification in Telugu

On Multifont Character Classification in Telugu Venkat Rasagna, K. J. Jinesh, and C. V. Jawahar International Institute of Information Technology, Hyderabad 500032, INDIA. Abstract. A major requirement

### INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

### Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?