Online Place Recognition for Mobile Robots

Size: px
Start display at page:

Download "Online Place Recognition for Mobile Robots"

Transcription

1 Autonomous Systems Lab Prof. Roland Siegwart Master-Thesis Online Place Recognition for Mobile Robots Spring Term 2011 Supervised by: Maye Jérôme Kästner Ralf Author: Baselgia Ciril

2

3 Contents Abstract Acknowledgment iii v 1 Introduction Outline Related Work 5 3 Modeling Images SIFT Descriptor Bag of Visual Words CENTRIST Descriptor Spatial Weighting Color Descriptor RGB Color Hue Color Transformed Color Image Descriptor Concatenation Modeling Places The Unigram Model Mixture of Unigrams Probabilistic Latent Semantic Indexing Latent Dirichlet Allocation Dirichlet Compound Multinomial Online Place Recognition using Bayesian Change-Point Detection Model Based Change-Point Detection Transition Probability Data Likelihood Particle Filter Change-Point Based Place Labelling Experiments Experimental Setup VPC Database Cosy Localization Database ETH Data Set Change-Point Detection and Place Labeling Supervised Place Recognition Supervised Place Categorization i

4 7 Discussion and Further Work 45 A VPC: Change Point Detection and Labeling 47 B COLD: Change Point Detection and Labeling 50 C Ground Truth COLD 53 D Ground Truth ETH Data Set 55 Bibliography 59 ii

5 Abstract Although visual place recognition and categorization is one of the most fundamental and natural task for humans, it generally remains an unsolved problem in robotics. In this work, we aim to tackle this problem by applying an unsupervised and online approach to place recognition and categorization on image streams obtained from a monocular video camera. Our approach begins with the assignment of the incoming image streams into coherent segments that correspond to distinctive places. This is accomplished by using a model-based Bayesian change-point detection framework. Change-point detection flags out abrupt changes to the generative parameters of a statistical model which is computed online in a recursive and unsupervised manner. After segmenting the image streams, we assign them to the relevant distinctive places by using an online unsupervised framework whereby new places are detected by means of hypothesis testing of Bayes factors. To model the places in our system, we use a Dirichlet Compound Multinomial (DCM) model which is known for modeling word burstiness, the concept that a word is more likely to occur again once it has already appeared. It has been demonstrated that this phenomenon is also likely to occur for an image stream corresponding to a single place. In order to reduce the complexity and the running time of the maximum likelihood update for the hyperparameters of the DCM model, we develop a new update scheme that is multiple times faster than the originally used gradient descendent optimization. To asses the accuracy of our system, we present multiple experiments performed on two existing image data sets and a third data set which we recorded at the Autonomous Systems Lab of the Swiss Federal Institute of Technology. All experiments include a comparison of the concatenation of different image features such as SIFT, CENTRIST and color histograms. iii

6 iv

7 Acknowledgment I would like to thank everyone who encouraged, supported and motivated me throughout my studies, especially during the master thesis. In particular, I am grateful to Jérôme Maye and Ralf Kästner, my supervisors, for their excellent support during this project, as well as for their fruitful feedbacks and discussions. Prof. Roland Y. Siegwart for giving me the opportunity to carry out this project at the Autonomous Systems Lab Yuanshan Lee for proofreading this thesis. My family, who supported me wholeheartedly my whole studies. My Friends, for making my life at ETH so enjoyable and colorful. v

8 vi

9 Chapter 1 Introduction Where am I? and Have I been here before? are two very different questions although both relate to being at a certain place. The answer to the former question is often an exact place, such as Zurich Paradeplatz, or a category of places such as kitchen. For the latter, instead of one answer, there are usually two possible answers: Yes or No. Although seemingly easy questions for us as humans, these are difficult questions for robots. In many ways, this is still an unsolved problem when dealing with robot localization. Nevertheless, it is sometimes essential for a robot or an intelligent agent to recognize or categorize places in a manner similar to how humans do it. This would, for instance, facilitate robot-human interaction or allow the system to overcome the kidnapped-robot problem. Robot localization is also used in many other applications such as Simultaneous Localization And Mapping (SLAM) algorithms, where a good localization enables a proper loop closure. Whereas it is highly important in a SLAM approach that the localization gives the exact position and orientation of the robot, this is of less significance in a framework for topological mapping. A topological map is a graph-based structure of the environment. It consists of nodes and edges where nodes indicate landmarks or significant places, while edges denote their connectivity. This is in contrast to a metric map which shows space to scale (Fig. 1.1) [37]. In a topological framework, the robot only has to decide if the current measurement comes from an already seen place, and if yes, from where. It is not important to know its exact position within a metric space. Performing place recognition or place labeling can be divided into Figure 1.1: (a) Topological Map, (b) Metric Map. The doted red line is the robot path taken while gathering the measurements [37]. 1

10 Chapter 1. Introduction 2 top-down and bottom-up approaches. In top-down approaches, one concludes from the overall room appearance the kind of place label which can be expected, whereas in a bottom-up approach, one concludes from the objects found in the measurement the kind of place category where the measurement is taken from. The solutions to the two questions from the beginning of this chapter can be classified into place categorization and place recognition. Place categorization, also known as scene recognition, usually refers to the task of recognizing the semantic label of a scene when asked for a category of places. As already mentioned, semantic labels could have a very wide range, from corridor right up to coffee-bar in the first floor of ETH Zurich s main building. Recognizing the scene category comes within the task of understanding the scene. The label of a scene indicates strongly the types of objects which can be found there or the types of tasks a person in such a specific place could be doing. For instance, it is much more likely to find a person brewing coffee with a coffee machine in a kitchen than finding a person sleeping on the floor. There are tons of other arguments why semantic labeling can be useful, a collection of them can be found in [44]. Most existing place categorization algorithms, which assume a finite set of place categories, require a lot of learning. The labels are commonly learned offline in a supervised manner with a corresponding set of training data. The training data contains manually labeled measurements, which the system uses to learn how measurements are grouped. During runtime, a classifier separates and categorizes input measurements into their corresponding labels using the previously learned groups. While such supervised systems relying on classifiers have the advantage of simplicity, they have a lot more drawbacks [36],[13]: 1. The classifier needs a huge amount of labeled training data due to the large variation in the measurements (e.g. offices are vastly different in many aspects from one another), which implies long hours of manual work to annotate images by hand, which is tedious and expensive. 2. Expert defined labels are somewhat arbitrary and therefore possibly suboptimal [13]. 3. For the classifier to learn the different labels in the best possible manner, it is essential that each training data set contains the main characteristics of the underlying scene. When testing the system on new data, the data must also have the same main characteristics. This means that a human has to supervise the process by recording the data, making the use of continuous measurements almost impossible. 4. The training data assumes a fixed number of different labels and a system will classify new measurements according to previously learned labels, which makes the recognition of possibly new categories impossible. 5. The system classifies each measurement individually and does not make decisions based on recently seen measurements. One very important issue when using supervised place categorization is that it needs to be applicable across a wide range of spatial environments. Otherwise accurate semantic labeling will not be possible for places which the robot has not visited before [44]. Recognizing a place is the ability to consistently labeling a place as the same when a particular place is being revisited [36]. No semantic labeling of the place is

11 3 performed with this approach. This implies that the whole scene understanding part can be omitted. Furthermore, there are two main tasks when talking about place recognition. These are global localization, where the robots exact pose1 is determined, and topological place recognition, where just a rough location (e.g. corridor) is determined [45]. In the following sections, we refer the term place recognition as topological place recognition. While a topological mapping does not have to coincide with the human understanding of rooms or places [10], we will use the term place to indicate a room as humans define it. As in the place categorization task, most existing place recognition approaches require training. This is done by taking some measurements from a specific environment with manually labeled places and testing the system on previously unseen measurements from the same environment. Supervised place recognition is an easier task than place categorization because the learning and testing measurements look quite similar as they come from the same places. The drawbacks of the existing place recognition approaches are similar to those of place categorization. One also needs labeled training data although it does not require as much training as place categorization approaches. The system will have the same limitation that it simply classifies the current measurement according to the learned labels, making the creation of new labels impossible. No matter what kind of measurement is taken as the system s input, a good place recognition algorithm must be robust to dynamic changes in the environment. This could, e.g., range from overall low dynamic changes, such as lighting and/or viewpoint changes (Fig. 1.2) when images are used, to very dynamic changes when a person walks by. (a) (b) Figure 1.2: Two different views of the same scene. In this thesis, we devise an unsupervised place recognition approach. In contrast to most existing approaches, the input of our system consists of image streams or videos instead of stand-alone images. Thus we are able to intrinsically capture time information in our algorithm. When using image streams, one has to overcome the problem that not all images will capture the characteristics of a room or scene. In fact, it is often the case that just a single wall or some other close-up views are present such that it is impossible even for a human being to characterize the current room. We extract and combine several descriptors out of the images to form a single distinctive image histogram, in the hope that at least one descriptor will overcome the shortcomings described. Our method is based on change-point detection that detects abrupt variations in the generative parametrization of a statistical model [30]. The change-points will indicate place changes as we assume that each place has its own parametrization and that the parameters within a place do not change drastically. Thus, when a change-point is observed, the robot exits the place and enters another place. We use a Bayesian algorithm to infer the change-points by 1 pose = position + orientation

12 Chapter 1. Introduction 4 computing the probability a change-point occurs for every new input image. The exact algorithm would keep track of all possibilities that a change-point could occur and make no irrevocable decision. As the computational cost of the exact algorithm would increase linearly with every time step, we use a Rao-Blackwellized particle filter to keep the costs almost constant. While change-points deliver boundaries to the places, the place label is assigned based on the probability that the measurement comes from an already seen place. Thus, the place label assignment depends on the distribution of all change-points and the past model assignments. The algorithm then calculates a probability distribution across all place labels seen so far and uses a Bayes-factor to test if the robot is currently in a previously unseen place. It is thus possible for the algorithm to start from scratch and to systematically learn previously unseen places online by assigning them their corresponding parametrization. 1.1 Outline The remainder of the thesis is structured as follows. In Chapter 2, we summarize the previous works related to the topic of this thesis. Chapter 3 describes our method of image representation and we will discuss several descriptors with their advantages and disadvantages. In Chapter 4, we give a theoretical overview of 5 document modeling techniques which will be used later as place models in this thesis. Based on the evaluation of these techniques and theory, we will decide for one model to be used in our system. Chapter 5 describes the place recognition algorithm which is based on visual change-point detection. In Chapter 6 experimental results are provided where we compare our system to the PLISS [36] and the VPC [44] system. Finally, we will conclude in Chapter 7 by summarizing the results and providing insights for future work.

13 Chapter 2 Related Work In robotics, many approaches exist to perform visual place recognition. Although place classification methods based on laser and sonar range scans are also used for visual place recognition [18], [26] they are not within the scope of this thesis. In this thesis, we focus on image feature-based methods instead. Typically, visual place recognition methods involve using measures of distinctiveness of image features to determine the location. Examples of such measures include comparing color histograms [41], matching SIFT features [20] obtained from different images [47], retrieving images by means of key-point matching [11], and using classifiers such as SVM s based on manually labeled data [32]. In a recent work by Pronobis et al. [33], sensory data is merged and the system s output for place recognition is obtained from individually trained SVM classifiers. However, as with other existing classifier-based approaches, the method by Pronobis et al. has the disadvantage that it cannot be generalized to learn from previously unseen places. Methods for visual place recognition/classification, can be divided into two main types - those that use global features [41], [39], [13], [29], and those that use local features [35] or model distinctive parts of the image [34]. These can be further divided into methods that use omnidirectional cameras [41], [27], [24], [3] and those that use perspective cameras [19], [34]. In this thesis, we adopt the latter approach of using perspective cameras and agree with the context-based vision system from Torralba et al. [39] to use global image texture features (Gist), which are related to functional constraints. According to Torralba et al., this method is more robust than using local features which could occur randomly and would therefore be highly variable. Their method is based on a hidden Markov model (HMM) which recursively computes the place label based on past measurements. This method, however, has the disadvantage that prior training is required to learn the transition function for the HMM and the observation likelihood for each place in order to obtain the probability of the current measurement coming from a specific place. Ullah et al. [40] combine Harris-Laplace detectors [14] with the SIFT descriptors for place recognition as they provide an excellent trade-off between descriptive power (due to the SIFT descriptor) and generalizability because they capture significant fragments which are very likely to appear again in different settings. In the work by Wu et al. [44], the CENTRIST [45] image descriptor is used for their system input whereby place recognition and classification is done based on Bayesian filtering. These approaches, which are all classifier-based methods, have the implicit disadvantage that they have to pre-learn labels and are not able to learn new place categories during runtime. More recently, a new approach called PLISS (Place Labeling through Image Sequence Segmentation), which is based on online change-point detection, was intro- 5

14 Chapter 2. Related Work 6 Figure 2.1: Maximum-likelihood place labeling using PLISS. Thumbnails of the images are shown on top, followed by ground truth, maximum-likelihood place labeling output and change-point detection of the algorithm [36] duced by Ranganathan [36]. Compared to other approaches, PLISS is an algorithm which is able to learn new place labels in an online manner. This is particularly interesting and useful for mobile robots as they are constantly confronted with new places when exploring the environment. Change-point detection in PLISS is done using the approach proposed by Adams and McKay [30], whereby a particle filter approach, by Fearnhead et al. [12], is used to control computing costs. Place labeling is conditioned on the change-point detection (see Fig. 2.1). In addition, the algorithm performs at each time step a hypothesis testing which uses a likelihood ratio to determine the place model to which the current measurement belongs to. If all hypotheses are rejected, the algorithm will then introduce a new place label. To model places, a Dirichlet Compound Multinomial (DCM) framework, which is supposed to model word burstiness [22], is used. PLISS uses a maximum-likelihood parameter update for the DCM model each time a new measurement is available, and hence is able to learn online a statistical model parameter for a specific place by using all measurements which belong to this place. As inputs, PLISS uses images that are modeled using the bag-of-words approach where a word is assigned to each SIFT descriptor, whereas SIFT features are calculated on a dense grid over the image. These group of words is then further processed with the spatial pyramid algorithm by Lazebnick et al. [19] to introduce some spatial information. Spatial pyramids are obtained by dividing the image into a grid as shown in Fig Finally, each cell is represented as a histogram of words which is then weighted and concatenated to form an image descriptor. There are two main drawbacks of the DCM approach. Firstly, because no analytical maximum-likelihood update in closed form exists for the DCM parameter, iterative algorithms have to be used which can be quite time consuming especially when a lot of data has to be considered. Secondly, the framework will encounter storage problems with time as the algorithm needs to store every measurement from each place visited in order to compute maximum-likelihood parameters. In this thesis, we adopt mainly the PLISS approach, and we will show that better results can be achieved when a combination of different global and local descriptors are used. According to [19], some confusion occurs for the classification of indoor images (such as kitchen, bedroom, living room) when using the spatial

15 7 Figure 2.2: Spatial Pyramid histogram according to Lazebnick et al. [19]. pyramid approach. We believe that this confusion results from the fact that indoor scenes are not as spacious as outdoor scenes. This leads to large image variations even when the camera is only slightly moved. This narrow spatiality results in drastic pyramid-cell-content changes in a short time, the words which are close to a cell-border especially can easily move to a different cell which could lead to a very different overall image descriptor when the single histogram are concatenated. Thus, we found that in image sequences where no emphasis of image content is made (in contrary to [19] where only scene-characteristic images were used), the spatial pyramid approach is not suitable for place recognition. Due to these drawbacks, we will introduce other methods in this thesis to capture some of the image s spatial information. To model places, we also use the DCM model, but instead of using a maximum-likelihood update for its parameters, we will demonstrate a Bayesian approach that is many times faster and hence able to provide real-time responses. As we do not use maximum-likelihood update, we can perform our hypothesis testing using a Bayes-factor [17] instead of using likelihood ratio hypothesis testing. Finally, we will provide a thorough evaluation of our algorithm, testing it with different databases with clearly defined parameters.

16 Chapter 2. Related Work 8

17 Chapter 3 Modeling Images In this chapter, we describe three different image descriptors. The first one is the well-known SIFT descriptor, which can be represented by a histogram when using the bag-of-words approach. As most people are familiar with SIFT, we will only provide a brief introduction to SIFT and focus on describing the bag-of-words method instead. The second section will describe the recently introduced CEN- TRIST descriptor and finally, we discuss three different color descriptors. 3.1 SIFT Descriptor SIFT was introduced by David G. Lowe in his well-known paper Object Recognition from Local Scale-Invariant Features [20]. Since its introduction, many researchers worldwide used the SIFT descriptor for a wide range of tasks. The main reason for its success is due to its invariance to image translation, scaling and rotation. SIFT is robust to local variations arising from nearby clutter resulting in a very distinctive local descriptor. Furthermore, it is partially invariant to illumination changes as well as affine or projective transformation [20]. SIFT s scale invariance results from a staged filtering approach where so called SIFT key-points or interest points are found. Key-points are extrema of a difference-of-gaussian function sampled at different scale-space coordinates. To each key-point an orientation is assigned by computing a gradient orientation histogram in the key-point s neighborhood (see bar in Fig. 3.1). Projective, affine and rotational invariance is then achieved because all properties of a key-point are measured relative to the keypoint orientation. Once the orientation is set for a key-point, the SIFT descriptor is computed as a set of orientation histograms on a 4 4 pixel neighborhood which is orientated relative to the key-point orientation. Each histogram contains 8 bins and each descriptor contains an array of 4 histograms around the key-point. Hence, the SIFT feature vector has = 128 elements. Finally, the vector is normalized to enhance illumination changes. 3.2 Bag of Visual Words The bag of words model originated from document modeling. It is a simplified assumption that a document is represented as an unordered accumulation of words. The same method is also applicable to images by stating that an image is a document and its content is built out of visual words. A visual word could be anything. In the simplest case, it is the intensity value of a pixel. Hence, the representation of an image by a bag of visual words, where a word is associated with the pixel s intensity value, is simply a histogram with at most 256 bins (the range for intensity values 9

18 Chapter 3. Modeling Images 10 Figure 3.1: SIFT key-points detection. The size of the ellipse implies at which scale the key-point is found and the bar shows the key-point s orientation. is [0 255]). By representing intensity values with words, they now have their own intrinsic so-called dictionary. A dictionary is a collection of words which describe the image. In the case of intensity values, a word is a single intensity value and hence the dictionary, which is the bag, has a size of 256 words. Contrary to intensity values, the SIFT descriptors do not have a given dictionary beforehand. Instead, a dictionary has to be learnt in advance. To learn a dictionary of SIFT-words, SIFT descriptors have to be extracted out of a set of training images. With N the number of SIFT descriptors extracted from all training images, we get N feature points within a 128-dimensional feature space. Learning is then accomplished by using the well-known k-means clustering algorithm [21]. When k-means is applied to the feature space, it provides W cluster centres within the feature space. Thus, this procedure generates a dictionary of size W where each word is associated with one of the W cluster center in the feature space. Note that the dictionary size W can be set to any number required. A typical dictionary size, however, ranges between 200 to 400 [19]. With the dictionary of SIFT words, an image can therefore be represented by a bag-of-words whereby a word is assigned to every descriptor. Word assignment is accomplished by the Nearest-Neighbor classification, i.e. the SIFT descriptor computed is assigned to the word in feature space which is closest to it. To be more concrete, let k denote the closest cluster center for a given SIFT feature vector in the feature space. The word w k is then represented by a vector containing only zeros except at the kth position where a 1 is set, i.e., w k = (0,..., 1,..., 0). Furthermore, let x W be the bag-of-words where W denotes the size of the dictionary, w i denotes the ith word and x i denotes the number of times an individual word i (i.e., w i ) is observed in an image. Summing up all resulting word-vectors w 1:N will result in a one-dimensional histogram called the bag-of-words which represents the word frequency of an input image and is given by x W = [x 1, x 2, x 3,..., x W ]. In Fig. 3.2 the process is represented graphically. SIFT descriptors only contain local information around the scale they were computed. In contrast, histograms only contain global information. Bringing both together results in a more sophisticated image representation.

19 CENTRIST Descriptor Figure 3.2: Graphical representation of the Bag-of-Words algorithm. The algorithm starts with the extraction of interest points (key-points). To each interest point a descriptor is calculated which is then quantized in a histogram using the precomputed dictionary (see text for details). 3.3 CENTRIST Descriptor Recently Wu et al. [45] introduced a new global descriptor named CENTRIST (CENsus TRansform histogram) which they used for place categorization. CEN- TRIST is based on Census Transform (CT) [46], which compares the intensity value of a pixel with its eight neighboring pixels [45]. If the center pixel exceeds (or is equal to) the intensity value of the neighboring pixel then a bit 1 is set at the corresponding location, otherwise a bit 0 is set. The bit stream resulting from the eight comparisons for each individual pixel is then converted into a base-10 number (Eq. 3.1). Hence, each center pixel is census transformed into a value in the range [0 255]. Although it is possible to arrange the individual bits arbitrarily, we followed [45] and order the bits from top left to bottom right through out this thesis. Once all CT values are calculated, one can easily transform them into a histogram with 256 bins which results in the so called CENTRIST descriptor ( ) 2 CT = 214 (3.1) As with other non-parametric local transforms for intensity values, CT is robust to illumination changes, gamma variation etc. [45]. In addition, it can also retain the underlying global image structure after an image undergoes a census transformation. This is shown in Fig. 3.3 where each pixel s intensity value is replaced by its CT value. Additionally, the census transformation highlights the discontinuities of an image, which is a very useful property as the discontinuities are the most distinctive features in an image. In general, CT represents the underlying image geometry as it captures structural properties by modeling distribution of local structures [45]. Many properties can be inferred from the census transform. One of them is that neighboring census transformed values are highly correlated because one neighboring pixel is involved in the census transform of the other pixel and vice versa. Hence, bit five of the pixel at (x, y) is strictly complement to bit four of the pixel at (x + 1, y) (Fig. 3.4). Extending this constraint to the whole image, this implies that the number of 1s at bit 5 must be at least equal to the number of 0s at bit 4. Furthermore, there are eight other constraints belonging to a single pixel arising from its eight neighbors (strictly speaking more constraints can be found in [45]). As a result of these constraints, the feature vector, although being

20 Chapter 3. Modeling Images 12 (a) (b) Figure 3.3: Example of (a) original and (b) census transformed image. 256-dimensional, is located in a much smaller subspace of the feature space giving rise to PCA for dimension reduction. In fact, Wu et al. [45] found that 15, 23, 32, 232, (excluding 0 and 255) are the most frequent CT values. These values correspond to local shapes with horizontal or close to diagonal edge structure. It is counter-intuitive that vertical structures are not amongst these values. Wu et al. [45] state that vertical edges are possibly inclined in pictures arising from the perspective nature of cameras. Figure 3.4: Example to demonstrate the correlation of two neighboring pixels when the census transform is applied. Due to the many constraints that can be derived from a census transformed image, the single bins of the CENTRIST are not independent. A CENTRIST descriptor therefore implicitly encodes some of the underlying spatial image structure (note that this is not the same as shown in Fig. 3.3 because we do not look at local image structure, instead we investigate the global structure). The encoding of spatiality is best demonstrated in an image reconstruction experiment. The initial image is shuffled by repeatedly exchanging two randomly chosen pixels. The following reconstruction is done with the constraint that the initial and final image must have the same CENTRIST description. As shown in Fig. 3.5, the probability that the resulting reconstruction shares a similar structure as the input image is very high [45]. We have to note that the images used in this example are black and white and contain just a small number of pixels. Hence, a CENTRIST alone is insufficient for the reconstruction of larger gray-scale images. Nevertheless, this example shows that the CENTRIST captures at least some small image structure. CENTRIST descriptors are well suited for use in computer vision because census transform values are very efficient to compute. In practice, this is done with a sliding window of size 3 3. As comparing different pixels only involves integer calculations, it is possible to achieve an image frame rate of up to 50 frames per second. Furthermore, the implementation is very easy and there are hardly any parameter that require tuning (so far, tuning is only required if PCA is used). As mentioned, the CENTRIST descriptor is invariant to illumination changes. It is

21 Spatial Weighting (a) (b) (c) (d) (e) (f) Figure 3.5: Image reconstruction from CENTRIST. The left image is always the initial image, the middle image is the shuffled image after repeatedly exchanging two randomly chosen pixels and the right image is the reconstructed one. The first two images are completely reconstructed, the second two images are partially not the same and in the last two images the reconstruction fails completely [45]. also invariant to translations and robust against scale changes. However, it is very sensitive to rotations. Although, a CENTRIST descriptor implicitly encodes some spatial image structure as discussed above, it is worth thinking about techniques to improve spatial information. 3.4 Spatial Weighting Wu et. al [45] proposed one kind of spatial pyramid [19] to represent spatial information. As mentioned, we believe that such a method is not suitable for image sequences because key-points close to cell boundaries are very likely to change their cell and this will result in a completely different image description. Furthermore, such a spatial pyramid enlarges the dimension of a descriptor by multiple times. This is due to the concatenating of different histograms arising from different scales and position. To be more precise, let W denote the size of the dictionary (i.e. the histogram size when no spatial pyramid is used) and let L denote the pyramid levels. The resulting descriptor has dimension W L l=0 4l = 1 3 W (4L+1 1) [19]. Thus, choosing W = 256 and L = 2 results in a 5376-dimensional descriptor. Nevertheless, it is obvious that spatial information is useful in both classifying and recognizing rooms. But on the other hand, a huge dimensional descriptor can be problematic. Therefore, we propose a new method that keeps spatial information intrinsically in the descriptor without the unnecessary dimension expansion. In many images, the most characteristic part is mainly located in the middle of an image. When dealing with image streams, e.g., when a robot moves around and records images, it is likely that it records an image where all the characteristics lie completely in the centre, and in the next frame the main characteristics are shifted to the side but at the same height. With this reasoning we propose a weighting scheme, where an image is divided into horizontal strips instead of dividing the images into a pyramid. As shown in Fig. 3.6, we divide the image horizontally into three approximately equally sized patches resulting in a 3 1 representation. To avoid artefacts arising from the non-overlapping regions, we introduce two new patches (dashed line) which result in a total of 5 blocks. We then extract from each block a CENTRIST descriptor (note that the same scheme can be applied to other descriptors). These are weighted in such a way, that the descriptor from the inner most block is assigned with the highest weight and the weights decrease when we move outwards. For blocks which are equally far away from the horizontal image center line, they are assigned twice the same weight. Note that the assignment of weights can differ in other applications. Finally, the weighted histograms are summed up which results in a descriptor whose dimension is not expanded but retains some rough spatial information.

22 Chapter 3. Modeling Images 14 Figure 3.6: Image splitted in 5 patches, where the histogram of each patch is individually weighted with a, b or c, respectively. 3.5 Color Descriptor 1986 Biederman wrote [4]: Surface characteristics such as color and texture will typically have only secondary roles in primal access...we may know that a chair has a particular color and texture simultaneously with its volumetric description, but it is only the volumetric description that provides efficient access to the representation of CHAIR. Biederman implicitly meant that geometrical cues are most reliable for identifying objects [38]. This might be one of the reasons why color descriptors are not very often used in the computer vision community. As our focus in this thesis does not lie in object categorization but rather visual place recognition, we found color to be very useful when dealing with room recognition and especially when dealing with changes in the overall room representation (see Chap. 5 for the change-point algorithm). For instance, the color of a bathroom can look quite different from most other rooms in a house. Furthermore, global color descriptors have some very valuable properties such as invariance to rotation, translation and scale and color is widely independent of the view and its resolution. On the other hand color can be very sensitive to illumination and other light changes. Therefore, we provide in the following subsections a discussion about three different kinds of color-based histograms with their invariances. At the end of the section, we summarize all invariants of the color descriptors discussed in a table RGB Color RGB color is what most people understand when they talk about colors. Indeed, it is widely used, for instance, in the television market where each color is a mixture of three or four base colors. Every color in a RGB image is a mixture of red (R), green (G) and blue (B). In computer vision, a RGB image is represented as a three-dimensional matrix where each of the three channels represents one color and each matrix entry denotes the intensity of the corresponding color. Thus, each channel of the RGB matrix represents one dimension in the three-dimensional color space. A color histogram is obtained by discretizing the individual dimensions of the color space and counting the number of times each color intensity occurs in the image array [38]. This results in a three-dimensional histogram where each bin is a representation of a color in the discretized color space. A bin in the threedimensional space can be understood as a sphere whose center is located at the

23 Color Descriptor discretized color position. The radius of the sphere is proportional to the bin counts of the corresponding color. See Fig. 3.7 for an illustration of the three dimensional histogram. (a) (b) Figure 3.7: An image of a Baboon with its corresponding color histogram [16]. The RGB color model is very intuitive to handle, but it has absolutely no invariance to illumination changes such as intensity change, intensity shift, etc. (Tab. 3.1) Hue Color Besides the previously discussed RGB color model, there exists another model named HSV color. In contrary to the cubic RGB model, the HSV is a cylindrical model that models the hue (H), saturation (S) and value (V) of a color (Fig. 3.8). Because the hue becomes unstable near the gray axis [42], Weijer et al [43] applied an error propagation analysis to the hue transform and found that the certainty of the hue is inversely proportional to the saturation. Therefore, the hue histogram becomes more robust when each hue value is weighted with its corresponding saturation value [42]. Hue color histograms are invariant to light changes and shifts, but they are not invariant to light color changes (Tab. 3.1) Transformed Color As already mentioned, the RGB color histogram is not invariant to any light changes. Yet, with proper normalization of the single RGB channels, invariance against scale and shift with respect to light intensity and color changes can be achieved [42]. The color channels are normalized as follows: R t G t B t = R µ R σ R G µ G σ G B µ B σ B (3.2) with µ X the mean and σ X the standard deviation of the color distribution in channel X [42]. Thus, we have normalized the distribution in each channel and obtain a new color model with µ = 0 and σ = 1.

24 Chapter 3. Modeling Images 16 Figure 3.8: The cylindrical HSV color model [2]. Table 3.1: Invariances of color descriptors. Light Light Light intensity Light color Light color intensity intensity change change change change shift and shift and shift RGB Hist Hue Hist Tr. Col Image Descriptor Concatenation So far, we have briefly provided the theory of three very different image descriptors where all of them have their advantages and disadvantages. In the hope to eliminate most of the disadvantages and to retain the advantages, we use a combination of all three descriptors. We combine the descriptors by concatenating the individual histograms, where either the ordinary RGB, the Hue, Transformed Color histogram or none of them is used. This will result in four slightly different image descriptors. While the SIFT, CENTRIST and the Hue color descriptors are represented as one-dimensional histograms, the RGB and Transformed Color descriptors have a three-dimensional representation. In order to be able to concatenate the threedimensional histograms, we must reduce their dimension to one. The reduction is simply done by projecting the bins down to their individual dimensions. This is achieved for instance for the red-axis by summing first along the green and afterwards along the blue axis, which then results in a one dimensional histogram of bin counts. By applying the same procedure to the two other axes, we get three different histograms which we connect in series to form a single color histogram which is then combined with the SIFT and CENTRIST histograms.

25 Chapter 4 Modeling Places As mentioned in Sec. 3.2, we represent images with the widely used bag-of-words approach. It seems therefore natural to also use other models which originate from document modeling. A lot of generative approaches for document modeling exist, where some of them use latent variables while others do not. In this chapter a few models are discussed with their mathematical background and we will explain the rational for the model used in this thesis. Before going into details, it is important to clearly define the notations that we will use in the following sections as we will be using the language of text collections throughout this thesis. Terms such as words, documents and corpus will be used often. The definitions are as follows: A word is the basic unit of data, i.e., a single measurement. It is defined as one of the cluster centres calculated with k-means. See Sec. 3.2 for more detail. In computer vision, a word is associated with one descriptor. A topic reflects the latent structure of a document. A document is a sequence of N words represented as a vector w = (w 1, w 2,..., w N ) whit w n the nth word in the sequence [8]. This vector w can be binned into a histogram x with W bins, where W denotes the size of the vocabulary (see Sec. 3.2). The counts for a particular word is denoted with x w. A document can be associated with a single image. A corpus is a collection of D documents, D = (w 1, w 2,..., w D ). In the language of computer vision where we perform place recognition/categorization, this can be associated with a single place. 4.1 The Unigram Model The easiest statistical model for document modeling is the unigram. With this model, the words of each document are generated by independent samples from a multinomial distribution [8] N p(w) = p(w n ), (4.1) n=1 where p(w n ) denotes the emission probability for the nth word w n. The multinomial distribution specifies the probability that a given vector x = (x 1, x 2,..., x W ) of 17

26 Chapter 4. Modeling Places 18 word counts is observed [22], where x wn denotes the number of times the nth word is the outcome, i.e., x w = δ (w i w n ). (4.2) i If we parametrize the multinomial with θ and denote with θ w the probability that a specific word w n is emitted subject to the constraints W w=1 θ w = 1 and θ w 0. Then the probability of a document having word counts x w is given by ( ) N W p(x θ) = x 1 x 2... x W w=1 θ xw w = N! x 1!x 2!... x W! W w=1 θ xw w, (4.3) where N = W w=1 x w [5],[22]. Using the multinomial, implies that the probability for the emission of a particular word depends just on itself and is not influenced by other facts. In Fig. 4.1 the graphical representation of the unigram is shown. w d,n θ N D Figure 4.1: Graphical representation of the Unigram model. Blue denotes observed variables, red denotes multinomial parameters. The multinomial distribution depends on the document length N and therefore is different for each different N [22]. Nevertheless, this is not a problem because we only want to learn the model parameters θ. Since the maximum likelihood parameter estimate ˆθ depends only on the fraction of times a particular word appears in the entire corpus, the length of a document has no influence (Eq. 4.4). To compute the maximum likelihood estimate ˆθ, consider a training data set D with D independent observation x 1, x 2,..., x D, i.e., a corpus with D documents. Then the maximum likelihood solution for the parameter is given by D d=1 ˆθ w = x d,w W D w=1 d=1 x. (4.4) d,w As mentioned at the beginning of this section, the unigram model samples each document from the same multinomial parameter θ. This implies that a single word has exactly the same emitting probability across all documents within a corpus. In the case of visual places, this implies that each image recorded from a particular place, should capture approximately the same word frequency. It is intuitive that this model is not a good approximation for real places. Therefore, other techniques have to be developed. 4.2 Mixture of Unigrams To overcome the drawback on the unigram model that each document within a corpus contains the same word parametrization, Nigam et al [28] introduced the mixture of unigrams by augmenting the unigram model with a random topic variable z. Using this modified model, a document is generated by first choosing a topic

27 Probabilistic Latent Semantic Indexing variable z out of T topics. Words are then drawn independently from a multinomial conditioned on that topic. The probability of a document is: p(w) = z N p(z) p(w n z). (4.5) n=1 Parametrizing the mixing weights with τ and the word distribution with θ τ, the likelihood of a document under the mixture of unigrams model becomes [7] p(w τ, Θ) = T N p(z j τ ) p(w n z j, θ zt ). (4.6) j=1 Rewriting the equation and using word-counts x w instead of the document vector w we get T W p(x τ, Θ) = p(z j τ ) p(x w z j, θ zj ), (4.7) j=1 where τ is a multinomial hyperparameter with dimension T and Θ is a T W matrix where its jth row θ zj determines the probability of the nth word in the vocabulary given by the jth topic. These parameters are learned from a corpus beforehand. n=1 w=1 τ z d w d,n θ zj N d D T Figure 4.2: Graphical representation of the Mixture of Unigrams. Blue denotes observed variables, red denotes multinomial parameters, white denotes hidden variables. The T multinomial distribution over topics represents an underlying semantic structure in the corpus [7]. Although a corpus contains documents which are generated out of different topics, each individual document is a manifestation of only one topic. By using a similar terminology like bag-of-words, the mixture of unigrams model is best named as bag-of-topics-of-bag-of-words, i.e. we have a bag-of-topics, where each topic implies a distribution over words (Fig. 4.5(a)). In the language of computer vision, the term topic is something rather vague. However, one could interpret a topic as all words arising from e.g., a sofa or a fridge. This would imply, that an image should contain just sofas when it is conditioned on the sofa topic. It is obvious that this will hardly be the case when using real data. Hence, a model containing several topics per measurement would be a better approximation for real-life measurements. 4.3 Probabilistic Latent Semantic Indexing Hoffmann et al. [15] augmented the mixture of unigrams with a new variable γ resulting in the probabilistic latent semantic indexing (plsi) model. The plsi

28 Chapter 4. Modeling Places 20 model posits that a word w n and a document label γ are conditionally independent given an unobserved topic z [8]: p(γ, w n ) = p(γ) z p(w n z)p(z γ). (4.8) The plsi model relaxes the simplifying assumption that a document is generated by just one topic. Because the multinomial p(z γ) serves as a mixing component for a particular document d, it allows a document to contain more than one topic (Fig. 4.5(a)). However, it is important to note that γ is a dummy variable with values as many as the number of training documents [8]. Thus, the model only learns the topic mixture p(z γ) based on the document it is trained with and cannot be generalized to previously unseen documents. Furthermore, the plsi model is likely to overfit the data because its parameters grow linearly with the size of the corpus. For more information on that issue see [8]. d d z d,n w d,n θ zj N d D T Figure 4.3: Graphical representation of the plsi model. Blue denotes observed variables, red denotes multinomial parameters, white denotes hidden variables. 4.4 Latent Dirichlet Allocation To overcome the overfitting and low generalizability of the plsi model, Blei et. al [8] introduced a new model called latent dirichlet allocation (LDA). LDA treats the topic mixture weights p(z τ ) as a T -parameter hidden random variable instead of a large set of stand alone parameters explicitly linked to the training set, i.e., a topic z is a probability distribution over a vocabulary of topics (see Fig. 4.4). The general idea behind LDA is that documents are generated by a random mixture over latent topics where each topic is characterized by a distribution over words [8]. LDA assumes the following generative process for each document d in a corpus D [8]: 1. Choose the number of words N Poisson(ξ) 2. Choose the topic mixing parameter τ Dir(α) 3. For each of the N words w n : (a) Choose a topic z j Multinomial(τ ) (b) Choose a word w n from p(w n θ zj ), a multinomial probability conditioned on the topic z j. Thus, given the parameters α and θ, this leads to the joint distribution p(τ, z, w α, θ) = p(θ α) N p(z n τ )p(w n θ zn ), (4.9) n=1

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo

More information

The Delicate Art of Flower Classification

The Delicate Art of Flower Classification The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC pvicol@sfu.ca Note: The following is my contribution to a group project for a graduate machine learning

More information

Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES. Eva Hörster

Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES. Eva Hörster Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES Eva Hörster Department of Computer Science University of Augsburg Adviser: Readers: Prof. Dr. Rainer Lienhart Prof. Dr. Rainer Lienhart

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

CS229 Project Final Report. Sign Language Gesture Recognition with Unsupervised Feature Learning

CS229 Project Final Report. Sign Language Gesture Recognition with Unsupervised Feature Learning CS229 Project Final Report Sign Language Gesture Recognition with Unsupervised Feature Learning Justin K. Chen, Debabrata Sengupta, Rukmani Ravi Sundaram 1. Introduction The problem we are investigating

More information

Modified Sift Algorithm for Appearance Based Recognition of American Sign Language

Modified Sift Algorithm for Appearance Based Recognition of American Sign Language Modified Sift Algorithm for Appearance Based Recognition of American Sign Language Jaspreet Kaur,Navjot Kaur Electronics and Communication Engineering Department I.E.T. Bhaddal, Ropar, Punjab,India. Abstract:

More information

Face detection is a process of localizing and extracting the face region from the

Face detection is a process of localizing and extracting the face region from the Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.

More information

Object Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr

Object Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image classification Image (scene) classification is a fundamental

More information

Image Segmentation and Registration

Image Segmentation and Registration Image Segmentation and Registration Dr. Christine Tanner (tanner@vision.ee.ethz.ch) Computer Vision Laboratory, ETH Zürich Dr. Verena Kaynig, Machine Learning Laboratory, ETH Zürich Outline Segmentation

More information

3D Model based Object Class Detection in An Arbitrary View

3D Model based Object Class Detection in An Arbitrary View 3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

Calibrating a Camera and Rebuilding a Scene by Detecting a Fixed Size Common Object in an Image

Calibrating a Camera and Rebuilding a Scene by Detecting a Fixed Size Common Object in an Image Calibrating a Camera and Rebuilding a Scene by Detecting a Fixed Size Common Object in an Image Levi Franklin Section 1: Introduction One of the difficulties of trying to determine information about a

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow , pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

The calibration problem was discussed in details during lecture 3.

The calibration problem was discussed in details during lecture 3. 1 2 The calibration problem was discussed in details during lecture 3. 3 Once the camera is calibrated (intrinsics are known) and the transformation from the world reference system to the camera reference

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

The KTH-INDECS Database

The KTH-INDECS Database The KTH-INDECS Database Andrzej Pronobis, Barbara Caputo Computational Vision and Active Perception Laboratory (CVAP) Department of Numerical Analysis and Computer Science (NADA) KTH, SE-1 44 Stockholm,

More information

Colour Image Segmentation Technique for Screen Printing

Colour Image Segmentation Technique for Screen Printing 60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Local features and matching. Image classification & object localization

Local features and matching. Image classification & object localization Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

More information

Computer Vision - part II

Computer Vision - part II Computer Vision - part II Review of main parts of Section B of the course School of Computer Science & Statistics Trinity College Dublin Dublin 2 Ireland www.scss.tcd.ie Lecture Name Course Name 1 1 2

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Discovering objects and their location in images

Discovering objects and their location in images Discovering objects and their location in images Josef Sivic Bryan C. Russell Alexei A. Efros Andrew Zisserman William T. Freeman Dept. of Engineering Science CS and AI Laboratory School of Computer Science

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform

More information

Classifying Manipulation Primitives from Visual Data

Classifying Manipulation Primitives from Visual Data Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if

More information

Text mining and topic models

Text mining and topic models Text mining and topic models Charles Elan elan@cs.ucsd.edu February 12, 2014 Text mining means the application of learning algorithms to documents consisting of words and sentences. Text mining tass include

More information

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson c 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or

More information

A Short Introduction to Computer Graphics

A Short Introduction to Computer Graphics A Short Introduction to Computer Graphics Frédo Durand MIT Laboratory for Computer Science 1 Introduction Chapter I: Basics Although computer graphics is a vast field that encompasses almost any graphical

More information

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials 3. Interpolation Closing the Gaps of Discretization... Beyond Polynomials Closing the Gaps of Discretization... Beyond Polynomials, December 19, 2012 1 3.3. Polynomial Splines Idea of Polynomial Splines

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

Face Recognition using SIFT Features

Face Recognition using SIFT Features Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

More information

An Automatic and Accurate Segmentation for High Resolution Satellite Image S.Saumya 1, D.V.Jiji Thanka Ligoshia 2

An Automatic and Accurate Segmentation for High Resolution Satellite Image S.Saumya 1, D.V.Jiji Thanka Ligoshia 2 An Automatic and Accurate Segmentation for High Resolution Satellite Image S.Saumya 1, D.V.Jiji Thanka Ligoshia 2 Assistant Professor, Dept of ECE, Bethlahem Institute of Engineering, Karungal, Tamilnadu,

More information

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target

More information

Interactive Math Glossary Terms and Definitions

Interactive Math Glossary Terms and Definitions Terms and Definitions Absolute Value the magnitude of a number, or the distance from 0 on a real number line Additive Property of Area the process of finding an the area of a shape by totaling the areas

More information

The Image Deblurring Problem

The Image Deblurring Problem page 1 Chapter 1 The Image Deblurring Problem You cannot depend on your eyes when your imagination is out of focus. Mark Twain When we use a camera, we want the recorded image to be a faithful representation

More information

Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering

Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering By, Swati Bhonsle Alissa Klinzmann Mentors Fred Park Department of Mathematics Ernie Esser Department of

More information

1. Bag of visual words model: recognizing object categories

1. Bag of visual words model: recognizing object categories 1. Bag of visual words model: recognizing object categories 1 1 1 Problem: Image Classification Given: positive training images containing an object class, and negative training images that don t Classify:

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

CS231M Project Report - Automated Real-Time Face Tracking and Blending

CS231M Project Report - Automated Real-Time Face Tracking and Blending CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android

More information

CS 585 Computer Vision Final Report Puzzle Solving Mobile App

CS 585 Computer Vision Final Report Puzzle Solving Mobile App CS 585 Computer Vision Final Report Puzzle Solving Mobile App Developed by Timothy Chong and Patrick W. Crawford December 9, 2014 Introduction and Motivation This project s aim is to create a mobile application

More information

Mean-Shift Tracking with Random Sampling

Mean-Shift Tracking with Random Sampling 1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of

More information

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information

Automatic Labeling of Lane Markings for Autonomous Vehicles

Automatic Labeling of Lane Markings for Autonomous Vehicles Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 jkiske@stanford.edu 1. Introduction As autonomous vehicles become more popular,

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Robert Collins CSE598C, PSU. Introduction to Mean-Shift Tracking

Robert Collins CSE598C, PSU. Introduction to Mean-Shift Tracking Introduction to Mean-Shift Tracking Appearance-Based Tracking current frame + previous location likelihood over object location appearance model (e.g. image template, or Mode-Seeking (e.g. mean-shift;

More information

Topic models for Sentiment analysis: A Literature Survey

Topic models for Sentiment analysis: A Literature Survey Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.

More information

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Face Recognition in Low-resolution Images by Using Local Zernike Moments Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Evaluation of local spatio-temporal features for action recognition

Evaluation of local spatio-temporal features for action recognition Evaluation of local spatio-temporal features for action recognition Heng WANG 1,3, Muhammad Muneeb ULLAH 2, Alexander KLÄSER 1, Ivan LAPTEV 2, Cordelia SCHMID 1 1 LEAR, INRIA, LJK Grenoble, France 2 VISTA,

More information

S = {1, 2,..., n}. P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = . P (n, 1) P (n, 2)... P (n, n)

S = {1, 2,..., n}. P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = . P (n, 1) P (n, 2)... P (n, n) last revised: 26 January 2009 1 Markov Chains A Markov chain process is a simple type of stochastic process with many social science applications. We ll start with an abstract description before moving

More information

EE 368 Project: Face Detection in Color Images

EE 368 Project: Face Detection in Color Images EE 368 Project: Face Detection in Color Images Wenmiao Lu and Shaohua Sun Department of Electrical Engineering Stanford University May 26, 2003 Abstract We present in this report an approach to automatic

More information

Digital Image Processing. Prof. P. K. Biswas. Department of Electronics & Electrical Communication Engineering

Digital Image Processing. Prof. P. K. Biswas. Department of Electronics & Electrical Communication Engineering Digital Image Processing Prof. P. K. Biswas Department of Electronics & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture - 28 Colour Image Processing - III Hello,

More information

C4 Computer Vision. 4 Lectures Michaelmas Term Tutorial Sheet Prof A. Zisserman. fundamental matrix, recovering ego-motion, applications.

C4 Computer Vision. 4 Lectures Michaelmas Term Tutorial Sheet Prof A. Zisserman. fundamental matrix, recovering ego-motion, applications. C4 Computer Vision 4 Lectures Michaelmas Term 2004 1 Tutorial Sheet Prof A. Zisserman Overview Lecture 1: Stereo Reconstruction I: epipolar geometry, fundamental matrix. Lecture 2: Stereo Reconstruction

More information

Classification of Fine Art Oil Paintings by Semantic Category

Classification of Fine Art Oil Paintings by Semantic Category Classification of Fine Art Oil Paintings by Semantic Category William Kromydas kromydas@stanford.edu Abstract In this paper we explore supervised learning techniques that are able to classify fine-art

More information

Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras

Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture No. # 31 Recursive Sets, Recursively Innumerable Sets, Encoding

More information

Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

More information

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS Alexander Velizhev 1 (presenter) Roman Shapovalov 2 Konrad Schindler 3 1 Hexagon Technology Center, Heerbrugg, Switzerland 2 Graphics & Media

More information

Part-Based Recognition

Part-Based Recognition Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

Face Model Fitting on Low Resolution Images

Face Model Fitting on Low Resolution Images Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode Value

Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode Value IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode

More information

Eyeglass Localization for Low Resolution Images

Eyeglass Localization for Low Resolution Images Eyeglass Localization for Low Resolution Images Earl Arvin Calapatia 1 1 De La Salle University 1 earl_calapatia@dlsu.ph Abstract: Facial data is a necessity in facial image processing technologies. In

More information

Jiří Matas. Hough Transform

Jiří Matas. Hough Transform Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian

More information

Determining optimal window size for texture feature extraction methods

Determining optimal window size for texture feature extraction methods IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

Scalar Visualization

Scalar Visualization Scalar Visualization 4-1 Motivation Visualizing scalar data is frequently encountered in science, engineering, and medicine, but also in daily life. Recalling from earlier, scalar datasets, or scalar fields,

More information

Motivation. Lecture 31: Object Recognition: SIFT Keys. Simple Example. Simple Example. Simple Example

Motivation. Lecture 31: Object Recognition: SIFT Keys. Simple Example. Simple Example. Simple Example Lecture 31: Object Recognition: SIFT Keys Motivation Want to recognize a known objects from unknown viewpoints. find them in an image database of models Local Feature based Approaches Represent appearance

More information

THE development of methods for automatic detection

THE development of methods for automatic detection Learning to Detect Objects in Images via a Sparse, Part-Based Representation Shivani Agarwal, Aatif Awan and Dan Roth, Member, IEEE Computer Society 1 Abstract We study the problem of detecting objects

More information

Learning 3D Object Recognition Models from 2D Images

Learning 3D Object Recognition Models from 2D Images From: AAAI Technical Report FS-93-04. Compilation copyright 1993, AAAI (www.aaai.org). All rights reserved. Learning 3D Object Recognition Models from 2D Images Arthur R. Pope David G. Lowe Department

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

CVChess: Computer Vision Chess Analytics

CVChess: Computer Vision Chess Analytics CVChess: Computer Vision Chess Analytics Jay Hack and Prithvi Ramakrishnan Abstract We present a computer vision application and a set of associated algorithms capable of recording chess game moves fully

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Android Ros Application

Android Ros Application Android Ros Application Advanced Practical course : Sensor-enabled Intelligent Environments 2011/2012 Presentation by: Rim Zahir Supervisor: Dejan Pangercic SIFT Matching Objects Android Camera Topic :

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Multiple Object Tracking Using SIFT Features and Location Matching

Multiple Object Tracking Using SIFT Features and Location Matching Multiple Object Tracking Using SIFT Features and Location Matching Seok-Wun Ha 1, Yong-Ho Moon 2 1,2 Dept. of Informatics, Engineering Research Institute, Gyeongsang National University, 900 Gazwa-Dong,

More information

CHAPTER 3 Numbers and Numeral Systems

CHAPTER 3 Numbers and Numeral Systems CHAPTER 3 Numbers and Numeral Systems Numbers play an important role in almost all areas of mathematics, not least in calculus. Virtually all calculus books contain a thorough description of the natural,

More information

Efficient visual search of local features. Cordelia Schmid

Efficient visual search of local features. Cordelia Schmid Efficient visual search of local features Cordelia Schmid Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million images

More information

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

More information

Robot Perception Continued

Robot Perception Continued Robot Perception Continued 1 Visual Perception Visual Odometry Reconstruction Recognition CS 685 11 Range Sensing strategies Active range sensors Ultrasound Laser range sensor Slides adopted from Siegwart

More information

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections Maximilian Hung, Bohyun B. Kim, Xiling Zhang August 17, 2013 Abstract While current systems already provide

More information

A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof

A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology, Austria {mauthner,bischof}@icg.tu-graz.ac.at

More information

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms H. Kandil and A. Atwan Information Technology Department, Faculty of Computer and Information Sciences, Mansoura University,El-Gomhoria

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information