Online Place Recognition for Mobile Robots


 Letitia Casey
 1 years ago
 Views:
Transcription
1 Autonomous Systems Lab Prof. Roland Siegwart MasterThesis Online Place Recognition for Mobile Robots Spring Term 2011 Supervised by: Maye Jérôme Kästner Ralf Author: Baselgia Ciril
2
3 Contents Abstract Acknowledgment iii v 1 Introduction Outline Related Work 5 3 Modeling Images SIFT Descriptor Bag of Visual Words CENTRIST Descriptor Spatial Weighting Color Descriptor RGB Color Hue Color Transformed Color Image Descriptor Concatenation Modeling Places The Unigram Model Mixture of Unigrams Probabilistic Latent Semantic Indexing Latent Dirichlet Allocation Dirichlet Compound Multinomial Online Place Recognition using Bayesian ChangePoint Detection Model Based ChangePoint Detection Transition Probability Data Likelihood Particle Filter ChangePoint Based Place Labelling Experiments Experimental Setup VPC Database Cosy Localization Database ETH Data Set ChangePoint Detection and Place Labeling Supervised Place Recognition Supervised Place Categorization i
4 7 Discussion and Further Work 45 A VPC: Change Point Detection and Labeling 47 B COLD: Change Point Detection and Labeling 50 C Ground Truth COLD 53 D Ground Truth ETH Data Set 55 Bibliography 59 ii
5 Abstract Although visual place recognition and categorization is one of the most fundamental and natural task for humans, it generally remains an unsolved problem in robotics. In this work, we aim to tackle this problem by applying an unsupervised and online approach to place recognition and categorization on image streams obtained from a monocular video camera. Our approach begins with the assignment of the incoming image streams into coherent segments that correspond to distinctive places. This is accomplished by using a modelbased Bayesian changepoint detection framework. Changepoint detection flags out abrupt changes to the generative parameters of a statistical model which is computed online in a recursive and unsupervised manner. After segmenting the image streams, we assign them to the relevant distinctive places by using an online unsupervised framework whereby new places are detected by means of hypothesis testing of Bayes factors. To model the places in our system, we use a Dirichlet Compound Multinomial (DCM) model which is known for modeling word burstiness, the concept that a word is more likely to occur again once it has already appeared. It has been demonstrated that this phenomenon is also likely to occur for an image stream corresponding to a single place. In order to reduce the complexity and the running time of the maximum likelihood update for the hyperparameters of the DCM model, we develop a new update scheme that is multiple times faster than the originally used gradient descendent optimization. To asses the accuracy of our system, we present multiple experiments performed on two existing image data sets and a third data set which we recorded at the Autonomous Systems Lab of the Swiss Federal Institute of Technology. All experiments include a comparison of the concatenation of different image features such as SIFT, CENTRIST and color histograms. iii
6 iv
7 Acknowledgment I would like to thank everyone who encouraged, supported and motivated me throughout my studies, especially during the master thesis. In particular, I am grateful to Jérôme Maye and Ralf Kästner, my supervisors, for their excellent support during this project, as well as for their fruitful feedbacks and discussions. Prof. Roland Y. Siegwart for giving me the opportunity to carry out this project at the Autonomous Systems Lab Yuanshan Lee for proofreading this thesis. My family, who supported me wholeheartedly my whole studies. My Friends, for making my life at ETH so enjoyable and colorful. v
8 vi
9 Chapter 1 Introduction Where am I? and Have I been here before? are two very different questions although both relate to being at a certain place. The answer to the former question is often an exact place, such as Zurich Paradeplatz, or a category of places such as kitchen. For the latter, instead of one answer, there are usually two possible answers: Yes or No. Although seemingly easy questions for us as humans, these are difficult questions for robots. In many ways, this is still an unsolved problem when dealing with robot localization. Nevertheless, it is sometimes essential for a robot or an intelligent agent to recognize or categorize places in a manner similar to how humans do it. This would, for instance, facilitate robothuman interaction or allow the system to overcome the kidnappedrobot problem. Robot localization is also used in many other applications such as Simultaneous Localization And Mapping (SLAM) algorithms, where a good localization enables a proper loop closure. Whereas it is highly important in a SLAM approach that the localization gives the exact position and orientation of the robot, this is of less significance in a framework for topological mapping. A topological map is a graphbased structure of the environment. It consists of nodes and edges where nodes indicate landmarks or significant places, while edges denote their connectivity. This is in contrast to a metric map which shows space to scale (Fig. 1.1) [37]. In a topological framework, the robot only has to decide if the current measurement comes from an already seen place, and if yes, from where. It is not important to know its exact position within a metric space. Performing place recognition or place labeling can be divided into Figure 1.1: (a) Topological Map, (b) Metric Map. The doted red line is the robot path taken while gathering the measurements [37]. 1
10 Chapter 1. Introduction 2 topdown and bottomup approaches. In topdown approaches, one concludes from the overall room appearance the kind of place label which can be expected, whereas in a bottomup approach, one concludes from the objects found in the measurement the kind of place category where the measurement is taken from. The solutions to the two questions from the beginning of this chapter can be classified into place categorization and place recognition. Place categorization, also known as scene recognition, usually refers to the task of recognizing the semantic label of a scene when asked for a category of places. As already mentioned, semantic labels could have a very wide range, from corridor right up to coffeebar in the first floor of ETH Zurich s main building. Recognizing the scene category comes within the task of understanding the scene. The label of a scene indicates strongly the types of objects which can be found there or the types of tasks a person in such a specific place could be doing. For instance, it is much more likely to find a person brewing coffee with a coffee machine in a kitchen than finding a person sleeping on the floor. There are tons of other arguments why semantic labeling can be useful, a collection of them can be found in [44]. Most existing place categorization algorithms, which assume a finite set of place categories, require a lot of learning. The labels are commonly learned offline in a supervised manner with a corresponding set of training data. The training data contains manually labeled measurements, which the system uses to learn how measurements are grouped. During runtime, a classifier separates and categorizes input measurements into their corresponding labels using the previously learned groups. While such supervised systems relying on classifiers have the advantage of simplicity, they have a lot more drawbacks [36],[13]: 1. The classifier needs a huge amount of labeled training data due to the large variation in the measurements (e.g. offices are vastly different in many aspects from one another), which implies long hours of manual work to annotate images by hand, which is tedious and expensive. 2. Expert defined labels are somewhat arbitrary and therefore possibly suboptimal [13]. 3. For the classifier to learn the different labels in the best possible manner, it is essential that each training data set contains the main characteristics of the underlying scene. When testing the system on new data, the data must also have the same main characteristics. This means that a human has to supervise the process by recording the data, making the use of continuous measurements almost impossible. 4. The training data assumes a fixed number of different labels and a system will classify new measurements according to previously learned labels, which makes the recognition of possibly new categories impossible. 5. The system classifies each measurement individually and does not make decisions based on recently seen measurements. One very important issue when using supervised place categorization is that it needs to be applicable across a wide range of spatial environments. Otherwise accurate semantic labeling will not be possible for places which the robot has not visited before [44]. Recognizing a place is the ability to consistently labeling a place as the same when a particular place is being revisited [36]. No semantic labeling of the place is
11 3 performed with this approach. This implies that the whole scene understanding part can be omitted. Furthermore, there are two main tasks when talking about place recognition. These are global localization, where the robots exact pose1 is determined, and topological place recognition, where just a rough location (e.g. corridor) is determined [45]. In the following sections, we refer the term place recognition as topological place recognition. While a topological mapping does not have to coincide with the human understanding of rooms or places [10], we will use the term place to indicate a room as humans define it. As in the place categorization task, most existing place recognition approaches require training. This is done by taking some measurements from a specific environment with manually labeled places and testing the system on previously unseen measurements from the same environment. Supervised place recognition is an easier task than place categorization because the learning and testing measurements look quite similar as they come from the same places. The drawbacks of the existing place recognition approaches are similar to those of place categorization. One also needs labeled training data although it does not require as much training as place categorization approaches. The system will have the same limitation that it simply classifies the current measurement according to the learned labels, making the creation of new labels impossible. No matter what kind of measurement is taken as the system s input, a good place recognition algorithm must be robust to dynamic changes in the environment. This could, e.g., range from overall low dynamic changes, such as lighting and/or viewpoint changes (Fig. 1.2) when images are used, to very dynamic changes when a person walks by. (a) (b) Figure 1.2: Two different views of the same scene. In this thesis, we devise an unsupervised place recognition approach. In contrast to most existing approaches, the input of our system consists of image streams or videos instead of standalone images. Thus we are able to intrinsically capture time information in our algorithm. When using image streams, one has to overcome the problem that not all images will capture the characteristics of a room or scene. In fact, it is often the case that just a single wall or some other closeup views are present such that it is impossible even for a human being to characterize the current room. We extract and combine several descriptors out of the images to form a single distinctive image histogram, in the hope that at least one descriptor will overcome the shortcomings described. Our method is based on changepoint detection that detects abrupt variations in the generative parametrization of a statistical model [30]. The changepoints will indicate place changes as we assume that each place has its own parametrization and that the parameters within a place do not change drastically. Thus, when a changepoint is observed, the robot exits the place and enters another place. We use a Bayesian algorithm to infer the changepoints by 1 pose = position + orientation
12 Chapter 1. Introduction 4 computing the probability a changepoint occurs for every new input image. The exact algorithm would keep track of all possibilities that a changepoint could occur and make no irrevocable decision. As the computational cost of the exact algorithm would increase linearly with every time step, we use a RaoBlackwellized particle filter to keep the costs almost constant. While changepoints deliver boundaries to the places, the place label is assigned based on the probability that the measurement comes from an already seen place. Thus, the place label assignment depends on the distribution of all changepoints and the past model assignments. The algorithm then calculates a probability distribution across all place labels seen so far and uses a Bayesfactor to test if the robot is currently in a previously unseen place. It is thus possible for the algorithm to start from scratch and to systematically learn previously unseen places online by assigning them their corresponding parametrization. 1.1 Outline The remainder of the thesis is structured as follows. In Chapter 2, we summarize the previous works related to the topic of this thesis. Chapter 3 describes our method of image representation and we will discuss several descriptors with their advantages and disadvantages. In Chapter 4, we give a theoretical overview of 5 document modeling techniques which will be used later as place models in this thesis. Based on the evaluation of these techniques and theory, we will decide for one model to be used in our system. Chapter 5 describes the place recognition algorithm which is based on visual changepoint detection. In Chapter 6 experimental results are provided where we compare our system to the PLISS [36] and the VPC [44] system. Finally, we will conclude in Chapter 7 by summarizing the results and providing insights for future work.
13 Chapter 2 Related Work In robotics, many approaches exist to perform visual place recognition. Although place classification methods based on laser and sonar range scans are also used for visual place recognition [18], [26] they are not within the scope of this thesis. In this thesis, we focus on image featurebased methods instead. Typically, visual place recognition methods involve using measures of distinctiveness of image features to determine the location. Examples of such measures include comparing color histograms [41], matching SIFT features [20] obtained from different images [47], retrieving images by means of keypoint matching [11], and using classifiers such as SVM s based on manually labeled data [32]. In a recent work by Pronobis et al. [33], sensory data is merged and the system s output for place recognition is obtained from individually trained SVM classifiers. However, as with other existing classifierbased approaches, the method by Pronobis et al. has the disadvantage that it cannot be generalized to learn from previously unseen places. Methods for visual place recognition/classification, can be divided into two main types  those that use global features [41], [39], [13], [29], and those that use local features [35] or model distinctive parts of the image [34]. These can be further divided into methods that use omnidirectional cameras [41], [27], [24], [3] and those that use perspective cameras [19], [34]. In this thesis, we adopt the latter approach of using perspective cameras and agree with the contextbased vision system from Torralba et al. [39] to use global image texture features (Gist), which are related to functional constraints. According to Torralba et al., this method is more robust than using local features which could occur randomly and would therefore be highly variable. Their method is based on a hidden Markov model (HMM) which recursively computes the place label based on past measurements. This method, however, has the disadvantage that prior training is required to learn the transition function for the HMM and the observation likelihood for each place in order to obtain the probability of the current measurement coming from a specific place. Ullah et al. [40] combine HarrisLaplace detectors [14] with the SIFT descriptors for place recognition as they provide an excellent tradeoff between descriptive power (due to the SIFT descriptor) and generalizability because they capture significant fragments which are very likely to appear again in different settings. In the work by Wu et al. [44], the CENTRIST [45] image descriptor is used for their system input whereby place recognition and classification is done based on Bayesian filtering. These approaches, which are all classifierbased methods, have the implicit disadvantage that they have to prelearn labels and are not able to learn new place categories during runtime. More recently, a new approach called PLISS (Place Labeling through Image Sequence Segmentation), which is based on online changepoint detection, was intro 5
14 Chapter 2. Related Work 6 Figure 2.1: Maximumlikelihood place labeling using PLISS. Thumbnails of the images are shown on top, followed by ground truth, maximumlikelihood place labeling output and changepoint detection of the algorithm [36] duced by Ranganathan [36]. Compared to other approaches, PLISS is an algorithm which is able to learn new place labels in an online manner. This is particularly interesting and useful for mobile robots as they are constantly confronted with new places when exploring the environment. Changepoint detection in PLISS is done using the approach proposed by Adams and McKay [30], whereby a particle filter approach, by Fearnhead et al. [12], is used to control computing costs. Place labeling is conditioned on the changepoint detection (see Fig. 2.1). In addition, the algorithm performs at each time step a hypothesis testing which uses a likelihood ratio to determine the place model to which the current measurement belongs to. If all hypotheses are rejected, the algorithm will then introduce a new place label. To model places, a Dirichlet Compound Multinomial (DCM) framework, which is supposed to model word burstiness [22], is used. PLISS uses a maximumlikelihood parameter update for the DCM model each time a new measurement is available, and hence is able to learn online a statistical model parameter for a specific place by using all measurements which belong to this place. As inputs, PLISS uses images that are modeled using the bagofwords approach where a word is assigned to each SIFT descriptor, whereas SIFT features are calculated on a dense grid over the image. These group of words is then further processed with the spatial pyramid algorithm by Lazebnick et al. [19] to introduce some spatial information. Spatial pyramids are obtained by dividing the image into a grid as shown in Fig Finally, each cell is represented as a histogram of words which is then weighted and concatenated to form an image descriptor. There are two main drawbacks of the DCM approach. Firstly, because no analytical maximumlikelihood update in closed form exists for the DCM parameter, iterative algorithms have to be used which can be quite time consuming especially when a lot of data has to be considered. Secondly, the framework will encounter storage problems with time as the algorithm needs to store every measurement from each place visited in order to compute maximumlikelihood parameters. In this thesis, we adopt mainly the PLISS approach, and we will show that better results can be achieved when a combination of different global and local descriptors are used. According to [19], some confusion occurs for the classification of indoor images (such as kitchen, bedroom, living room) when using the spatial
15 7 Figure 2.2: Spatial Pyramid histogram according to Lazebnick et al. [19]. pyramid approach. We believe that this confusion results from the fact that indoor scenes are not as spacious as outdoor scenes. This leads to large image variations even when the camera is only slightly moved. This narrow spatiality results in drastic pyramidcellcontent changes in a short time, the words which are close to a cellborder especially can easily move to a different cell which could lead to a very different overall image descriptor when the single histogram are concatenated. Thus, we found that in image sequences where no emphasis of image content is made (in contrary to [19] where only scenecharacteristic images were used), the spatial pyramid approach is not suitable for place recognition. Due to these drawbacks, we will introduce other methods in this thesis to capture some of the image s spatial information. To model places, we also use the DCM model, but instead of using a maximumlikelihood update for its parameters, we will demonstrate a Bayesian approach that is many times faster and hence able to provide realtime responses. As we do not use maximumlikelihood update, we can perform our hypothesis testing using a Bayesfactor [17] instead of using likelihood ratio hypothesis testing. Finally, we will provide a thorough evaluation of our algorithm, testing it with different databases with clearly defined parameters.
16 Chapter 2. Related Work 8
17 Chapter 3 Modeling Images In this chapter, we describe three different image descriptors. The first one is the wellknown SIFT descriptor, which can be represented by a histogram when using the bagofwords approach. As most people are familiar with SIFT, we will only provide a brief introduction to SIFT and focus on describing the bagofwords method instead. The second section will describe the recently introduced CEN TRIST descriptor and finally, we discuss three different color descriptors. 3.1 SIFT Descriptor SIFT was introduced by David G. Lowe in his wellknown paper Object Recognition from Local ScaleInvariant Features [20]. Since its introduction, many researchers worldwide used the SIFT descriptor for a wide range of tasks. The main reason for its success is due to its invariance to image translation, scaling and rotation. SIFT is robust to local variations arising from nearby clutter resulting in a very distinctive local descriptor. Furthermore, it is partially invariant to illumination changes as well as affine or projective transformation [20]. SIFT s scale invariance results from a staged filtering approach where so called SIFT keypoints or interest points are found. Keypoints are extrema of a differenceofgaussian function sampled at different scalespace coordinates. To each keypoint an orientation is assigned by computing a gradient orientation histogram in the keypoint s neighborhood (see bar in Fig. 3.1). Projective, affine and rotational invariance is then achieved because all properties of a keypoint are measured relative to the keypoint orientation. Once the orientation is set for a keypoint, the SIFT descriptor is computed as a set of orientation histograms on a 4 4 pixel neighborhood which is orientated relative to the keypoint orientation. Each histogram contains 8 bins and each descriptor contains an array of 4 histograms around the keypoint. Hence, the SIFT feature vector has = 128 elements. Finally, the vector is normalized to enhance illumination changes. 3.2 Bag of Visual Words The bag of words model originated from document modeling. It is a simplified assumption that a document is represented as an unordered accumulation of words. The same method is also applicable to images by stating that an image is a document and its content is built out of visual words. A visual word could be anything. In the simplest case, it is the intensity value of a pixel. Hence, the representation of an image by a bag of visual words, where a word is associated with the pixel s intensity value, is simply a histogram with at most 256 bins (the range for intensity values 9
18 Chapter 3. Modeling Images 10 Figure 3.1: SIFT keypoints detection. The size of the ellipse implies at which scale the keypoint is found and the bar shows the keypoint s orientation. is [0 255]). By representing intensity values with words, they now have their own intrinsic socalled dictionary. A dictionary is a collection of words which describe the image. In the case of intensity values, a word is a single intensity value and hence the dictionary, which is the bag, has a size of 256 words. Contrary to intensity values, the SIFT descriptors do not have a given dictionary beforehand. Instead, a dictionary has to be learnt in advance. To learn a dictionary of SIFTwords, SIFT descriptors have to be extracted out of a set of training images. With N the number of SIFT descriptors extracted from all training images, we get N feature points within a 128dimensional feature space. Learning is then accomplished by using the wellknown kmeans clustering algorithm [21]. When kmeans is applied to the feature space, it provides W cluster centres within the feature space. Thus, this procedure generates a dictionary of size W where each word is associated with one of the W cluster center in the feature space. Note that the dictionary size W can be set to any number required. A typical dictionary size, however, ranges between 200 to 400 [19]. With the dictionary of SIFT words, an image can therefore be represented by a bagofwords whereby a word is assigned to every descriptor. Word assignment is accomplished by the NearestNeighbor classification, i.e. the SIFT descriptor computed is assigned to the word in feature space which is closest to it. To be more concrete, let k denote the closest cluster center for a given SIFT feature vector in the feature space. The word w k is then represented by a vector containing only zeros except at the kth position where a 1 is set, i.e., w k = (0,..., 1,..., 0). Furthermore, let x W be the bagofwords where W denotes the size of the dictionary, w i denotes the ith word and x i denotes the number of times an individual word i (i.e., w i ) is observed in an image. Summing up all resulting wordvectors w 1:N will result in a onedimensional histogram called the bagofwords which represents the word frequency of an input image and is given by x W = [x 1, x 2, x 3,..., x W ]. In Fig. 3.2 the process is represented graphically. SIFT descriptors only contain local information around the scale they were computed. In contrast, histograms only contain global information. Bringing both together results in a more sophisticated image representation.
19 CENTRIST Descriptor Figure 3.2: Graphical representation of the BagofWords algorithm. The algorithm starts with the extraction of interest points (keypoints). To each interest point a descriptor is calculated which is then quantized in a histogram using the precomputed dictionary (see text for details). 3.3 CENTRIST Descriptor Recently Wu et al. [45] introduced a new global descriptor named CENTRIST (CENsus TRansform histogram) which they used for place categorization. CEN TRIST is based on Census Transform (CT) [46], which compares the intensity value of a pixel with its eight neighboring pixels [45]. If the center pixel exceeds (or is equal to) the intensity value of the neighboring pixel then a bit 1 is set at the corresponding location, otherwise a bit 0 is set. The bit stream resulting from the eight comparisons for each individual pixel is then converted into a base10 number (Eq. 3.1). Hence, each center pixel is census transformed into a value in the range [0 255]. Although it is possible to arrange the individual bits arbitrarily, we followed [45] and order the bits from top left to bottom right through out this thesis. Once all CT values are calculated, one can easily transform them into a histogram with 256 bins which results in the so called CENTRIST descriptor ( ) 2 CT = 214 (3.1) As with other nonparametric local transforms for intensity values, CT is robust to illumination changes, gamma variation etc. [45]. In addition, it can also retain the underlying global image structure after an image undergoes a census transformation. This is shown in Fig. 3.3 where each pixel s intensity value is replaced by its CT value. Additionally, the census transformation highlights the discontinuities of an image, which is a very useful property as the discontinuities are the most distinctive features in an image. In general, CT represents the underlying image geometry as it captures structural properties by modeling distribution of local structures [45]. Many properties can be inferred from the census transform. One of them is that neighboring census transformed values are highly correlated because one neighboring pixel is involved in the census transform of the other pixel and vice versa. Hence, bit five of the pixel at (x, y) is strictly complement to bit four of the pixel at (x + 1, y) (Fig. 3.4). Extending this constraint to the whole image, this implies that the number of 1s at bit 5 must be at least equal to the number of 0s at bit 4. Furthermore, there are eight other constraints belonging to a single pixel arising from its eight neighbors (strictly speaking more constraints can be found in [45]). As a result of these constraints, the feature vector, although being
20 Chapter 3. Modeling Images 12 (a) (b) Figure 3.3: Example of (a) original and (b) census transformed image. 256dimensional, is located in a much smaller subspace of the feature space giving rise to PCA for dimension reduction. In fact, Wu et al. [45] found that 15, 23, 32, 232, (excluding 0 and 255) are the most frequent CT values. These values correspond to local shapes with horizontal or close to diagonal edge structure. It is counterintuitive that vertical structures are not amongst these values. Wu et al. [45] state that vertical edges are possibly inclined in pictures arising from the perspective nature of cameras. Figure 3.4: Example to demonstrate the correlation of two neighboring pixels when the census transform is applied. Due to the many constraints that can be derived from a census transformed image, the single bins of the CENTRIST are not independent. A CENTRIST descriptor therefore implicitly encodes some of the underlying spatial image structure (note that this is not the same as shown in Fig. 3.3 because we do not look at local image structure, instead we investigate the global structure). The encoding of spatiality is best demonstrated in an image reconstruction experiment. The initial image is shuffled by repeatedly exchanging two randomly chosen pixels. The following reconstruction is done with the constraint that the initial and final image must have the same CENTRIST description. As shown in Fig. 3.5, the probability that the resulting reconstruction shares a similar structure as the input image is very high [45]. We have to note that the images used in this example are black and white and contain just a small number of pixels. Hence, a CENTRIST alone is insufficient for the reconstruction of larger grayscale images. Nevertheless, this example shows that the CENTRIST captures at least some small image structure. CENTRIST descriptors are well suited for use in computer vision because census transform values are very efficient to compute. In practice, this is done with a sliding window of size 3 3. As comparing different pixels only involves integer calculations, it is possible to achieve an image frame rate of up to 50 frames per second. Furthermore, the implementation is very easy and there are hardly any parameter that require tuning (so far, tuning is only required if PCA is used). As mentioned, the CENTRIST descriptor is invariant to illumination changes. It is
21 Spatial Weighting (a) (b) (c) (d) (e) (f) Figure 3.5: Image reconstruction from CENTRIST. The left image is always the initial image, the middle image is the shuffled image after repeatedly exchanging two randomly chosen pixels and the right image is the reconstructed one. The first two images are completely reconstructed, the second two images are partially not the same and in the last two images the reconstruction fails completely [45]. also invariant to translations and robust against scale changes. However, it is very sensitive to rotations. Although, a CENTRIST descriptor implicitly encodes some spatial image structure as discussed above, it is worth thinking about techniques to improve spatial information. 3.4 Spatial Weighting Wu et. al [45] proposed one kind of spatial pyramid [19] to represent spatial information. As mentioned, we believe that such a method is not suitable for image sequences because keypoints close to cell boundaries are very likely to change their cell and this will result in a completely different image description. Furthermore, such a spatial pyramid enlarges the dimension of a descriptor by multiple times. This is due to the concatenating of different histograms arising from different scales and position. To be more precise, let W denote the size of the dictionary (i.e. the histogram size when no spatial pyramid is used) and let L denote the pyramid levels. The resulting descriptor has dimension W L l=0 4l = 1 3 W (4L+1 1) [19]. Thus, choosing W = 256 and L = 2 results in a 5376dimensional descriptor. Nevertheless, it is obvious that spatial information is useful in both classifying and recognizing rooms. But on the other hand, a huge dimensional descriptor can be problematic. Therefore, we propose a new method that keeps spatial information intrinsically in the descriptor without the unnecessary dimension expansion. In many images, the most characteristic part is mainly located in the middle of an image. When dealing with image streams, e.g., when a robot moves around and records images, it is likely that it records an image where all the characteristics lie completely in the centre, and in the next frame the main characteristics are shifted to the side but at the same height. With this reasoning we propose a weighting scheme, where an image is divided into horizontal strips instead of dividing the images into a pyramid. As shown in Fig. 3.6, we divide the image horizontally into three approximately equally sized patches resulting in a 3 1 representation. To avoid artefacts arising from the nonoverlapping regions, we introduce two new patches (dashed line) which result in a total of 5 blocks. We then extract from each block a CENTRIST descriptor (note that the same scheme can be applied to other descriptors). These are weighted in such a way, that the descriptor from the inner most block is assigned with the highest weight and the weights decrease when we move outwards. For blocks which are equally far away from the horizontal image center line, they are assigned twice the same weight. Note that the assignment of weights can differ in other applications. Finally, the weighted histograms are summed up which results in a descriptor whose dimension is not expanded but retains some rough spatial information.
22 Chapter 3. Modeling Images 14 Figure 3.6: Image splitted in 5 patches, where the histogram of each patch is individually weighted with a, b or c, respectively. 3.5 Color Descriptor 1986 Biederman wrote [4]: Surface characteristics such as color and texture will typically have only secondary roles in primal access...we may know that a chair has a particular color and texture simultaneously with its volumetric description, but it is only the volumetric description that provides efficient access to the representation of CHAIR. Biederman implicitly meant that geometrical cues are most reliable for identifying objects [38]. This might be one of the reasons why color descriptors are not very often used in the computer vision community. As our focus in this thesis does not lie in object categorization but rather visual place recognition, we found color to be very useful when dealing with room recognition and especially when dealing with changes in the overall room representation (see Chap. 5 for the changepoint algorithm). For instance, the color of a bathroom can look quite different from most other rooms in a house. Furthermore, global color descriptors have some very valuable properties such as invariance to rotation, translation and scale and color is widely independent of the view and its resolution. On the other hand color can be very sensitive to illumination and other light changes. Therefore, we provide in the following subsections a discussion about three different kinds of colorbased histograms with their invariances. At the end of the section, we summarize all invariants of the color descriptors discussed in a table RGB Color RGB color is what most people understand when they talk about colors. Indeed, it is widely used, for instance, in the television market where each color is a mixture of three or four base colors. Every color in a RGB image is a mixture of red (R), green (G) and blue (B). In computer vision, a RGB image is represented as a threedimensional matrix where each of the three channels represents one color and each matrix entry denotes the intensity of the corresponding color. Thus, each channel of the RGB matrix represents one dimension in the threedimensional color space. A color histogram is obtained by discretizing the individual dimensions of the color space and counting the number of times each color intensity occurs in the image array [38]. This results in a threedimensional histogram where each bin is a representation of a color in the discretized color space. A bin in the threedimensional space can be understood as a sphere whose center is located at the
23 Color Descriptor discretized color position. The radius of the sphere is proportional to the bin counts of the corresponding color. See Fig. 3.7 for an illustration of the three dimensional histogram. (a) (b) Figure 3.7: An image of a Baboon with its corresponding color histogram [16]. The RGB color model is very intuitive to handle, but it has absolutely no invariance to illumination changes such as intensity change, intensity shift, etc. (Tab. 3.1) Hue Color Besides the previously discussed RGB color model, there exists another model named HSV color. In contrary to the cubic RGB model, the HSV is a cylindrical model that models the hue (H), saturation (S) and value (V) of a color (Fig. 3.8). Because the hue becomes unstable near the gray axis [42], Weijer et al [43] applied an error propagation analysis to the hue transform and found that the certainty of the hue is inversely proportional to the saturation. Therefore, the hue histogram becomes more robust when each hue value is weighted with its corresponding saturation value [42]. Hue color histograms are invariant to light changes and shifts, but they are not invariant to light color changes (Tab. 3.1) Transformed Color As already mentioned, the RGB color histogram is not invariant to any light changes. Yet, with proper normalization of the single RGB channels, invariance against scale and shift with respect to light intensity and color changes can be achieved [42]. The color channels are normalized as follows: R t G t B t = R µ R σ R G µ G σ G B µ B σ B (3.2) with µ X the mean and σ X the standard deviation of the color distribution in channel X [42]. Thus, we have normalized the distribution in each channel and obtain a new color model with µ = 0 and σ = 1.
24 Chapter 3. Modeling Images 16 Figure 3.8: The cylindrical HSV color model [2]. Table 3.1: Invariances of color descriptors. Light Light Light intensity Light color Light color intensity intensity change change change change shift and shift and shift RGB Hist Hue Hist Tr. Col Image Descriptor Concatenation So far, we have briefly provided the theory of three very different image descriptors where all of them have their advantages and disadvantages. In the hope to eliminate most of the disadvantages and to retain the advantages, we use a combination of all three descriptors. We combine the descriptors by concatenating the individual histograms, where either the ordinary RGB, the Hue, Transformed Color histogram or none of them is used. This will result in four slightly different image descriptors. While the SIFT, CENTRIST and the Hue color descriptors are represented as onedimensional histograms, the RGB and Transformed Color descriptors have a threedimensional representation. In order to be able to concatenate the threedimensional histograms, we must reduce their dimension to one. The reduction is simply done by projecting the bins down to their individual dimensions. This is achieved for instance for the redaxis by summing first along the green and afterwards along the blue axis, which then results in a one dimensional histogram of bin counts. By applying the same procedure to the two other axes, we get three different histograms which we connect in series to form a single color histogram which is then combined with the SIFT and CENTRIST histograms.
25 Chapter 4 Modeling Places As mentioned in Sec. 3.2, we represent images with the widely used bagofwords approach. It seems therefore natural to also use other models which originate from document modeling. A lot of generative approaches for document modeling exist, where some of them use latent variables while others do not. In this chapter a few models are discussed with their mathematical background and we will explain the rational for the model used in this thesis. Before going into details, it is important to clearly define the notations that we will use in the following sections as we will be using the language of text collections throughout this thesis. Terms such as words, documents and corpus will be used often. The definitions are as follows: A word is the basic unit of data, i.e., a single measurement. It is defined as one of the cluster centres calculated with kmeans. See Sec. 3.2 for more detail. In computer vision, a word is associated with one descriptor. A topic reflects the latent structure of a document. A document is a sequence of N words represented as a vector w = (w 1, w 2,..., w N ) whit w n the nth word in the sequence [8]. This vector w can be binned into a histogram x with W bins, where W denotes the size of the vocabulary (see Sec. 3.2). The counts for a particular word is denoted with x w. A document can be associated with a single image. A corpus is a collection of D documents, D = (w 1, w 2,..., w D ). In the language of computer vision where we perform place recognition/categorization, this can be associated with a single place. 4.1 The Unigram Model The easiest statistical model for document modeling is the unigram. With this model, the words of each document are generated by independent samples from a multinomial distribution [8] N p(w) = p(w n ), (4.1) n=1 where p(w n ) denotes the emission probability for the nth word w n. The multinomial distribution specifies the probability that a given vector x = (x 1, x 2,..., x W ) of 17
26 Chapter 4. Modeling Places 18 word counts is observed [22], where x wn denotes the number of times the nth word is the outcome, i.e., x w = δ (w i w n ). (4.2) i If we parametrize the multinomial with θ and denote with θ w the probability that a specific word w n is emitted subject to the constraints W w=1 θ w = 1 and θ w 0. Then the probability of a document having word counts x w is given by ( ) N W p(x θ) = x 1 x 2... x W w=1 θ xw w = N! x 1!x 2!... x W! W w=1 θ xw w, (4.3) where N = W w=1 x w [5],[22]. Using the multinomial, implies that the probability for the emission of a particular word depends just on itself and is not influenced by other facts. In Fig. 4.1 the graphical representation of the unigram is shown. w d,n θ N D Figure 4.1: Graphical representation of the Unigram model. Blue denotes observed variables, red denotes multinomial parameters. The multinomial distribution depends on the document length N and therefore is different for each different N [22]. Nevertheless, this is not a problem because we only want to learn the model parameters θ. Since the maximum likelihood parameter estimate ˆθ depends only on the fraction of times a particular word appears in the entire corpus, the length of a document has no influence (Eq. 4.4). To compute the maximum likelihood estimate ˆθ, consider a training data set D with D independent observation x 1, x 2,..., x D, i.e., a corpus with D documents. Then the maximum likelihood solution for the parameter is given by D d=1 ˆθ w = x d,w W D w=1 d=1 x. (4.4) d,w As mentioned at the beginning of this section, the unigram model samples each document from the same multinomial parameter θ. This implies that a single word has exactly the same emitting probability across all documents within a corpus. In the case of visual places, this implies that each image recorded from a particular place, should capture approximately the same word frequency. It is intuitive that this model is not a good approximation for real places. Therefore, other techniques have to be developed. 4.2 Mixture of Unigrams To overcome the drawback on the unigram model that each document within a corpus contains the same word parametrization, Nigam et al [28] introduced the mixture of unigrams by augmenting the unigram model with a random topic variable z. Using this modified model, a document is generated by first choosing a topic
27 Probabilistic Latent Semantic Indexing variable z out of T topics. Words are then drawn independently from a multinomial conditioned on that topic. The probability of a document is: p(w) = z N p(z) p(w n z). (4.5) n=1 Parametrizing the mixing weights with τ and the word distribution with θ τ, the likelihood of a document under the mixture of unigrams model becomes [7] p(w τ, Θ) = T N p(z j τ ) p(w n z j, θ zt ). (4.6) j=1 Rewriting the equation and using wordcounts x w instead of the document vector w we get T W p(x τ, Θ) = p(z j τ ) p(x w z j, θ zj ), (4.7) j=1 where τ is a multinomial hyperparameter with dimension T and Θ is a T W matrix where its jth row θ zj determines the probability of the nth word in the vocabulary given by the jth topic. These parameters are learned from a corpus beforehand. n=1 w=1 τ z d w d,n θ zj N d D T Figure 4.2: Graphical representation of the Mixture of Unigrams. Blue denotes observed variables, red denotes multinomial parameters, white denotes hidden variables. The T multinomial distribution over topics represents an underlying semantic structure in the corpus [7]. Although a corpus contains documents which are generated out of different topics, each individual document is a manifestation of only one topic. By using a similar terminology like bagofwords, the mixture of unigrams model is best named as bagoftopicsofbagofwords, i.e. we have a bagoftopics, where each topic implies a distribution over words (Fig. 4.5(a)). In the language of computer vision, the term topic is something rather vague. However, one could interpret a topic as all words arising from e.g., a sofa or a fridge. This would imply, that an image should contain just sofas when it is conditioned on the sofa topic. It is obvious that this will hardly be the case when using real data. Hence, a model containing several topics per measurement would be a better approximation for reallife measurements. 4.3 Probabilistic Latent Semantic Indexing Hoffmann et al. [15] augmented the mixture of unigrams with a new variable γ resulting in the probabilistic latent semantic indexing (plsi) model. The plsi
28 Chapter 4. Modeling Places 20 model posits that a word w n and a document label γ are conditionally independent given an unobserved topic z [8]: p(γ, w n ) = p(γ) z p(w n z)p(z γ). (4.8) The plsi model relaxes the simplifying assumption that a document is generated by just one topic. Because the multinomial p(z γ) serves as a mixing component for a particular document d, it allows a document to contain more than one topic (Fig. 4.5(a)). However, it is important to note that γ is a dummy variable with values as many as the number of training documents [8]. Thus, the model only learns the topic mixture p(z γ) based on the document it is trained with and cannot be generalized to previously unseen documents. Furthermore, the plsi model is likely to overfit the data because its parameters grow linearly with the size of the corpus. For more information on that issue see [8]. d d z d,n w d,n θ zj N d D T Figure 4.3: Graphical representation of the plsi model. Blue denotes observed variables, red denotes multinomial parameters, white denotes hidden variables. 4.4 Latent Dirichlet Allocation To overcome the overfitting and low generalizability of the plsi model, Blei et. al [8] introduced a new model called latent dirichlet allocation (LDA). LDA treats the topic mixture weights p(z τ ) as a T parameter hidden random variable instead of a large set of stand alone parameters explicitly linked to the training set, i.e., a topic z is a probability distribution over a vocabulary of topics (see Fig. 4.4). The general idea behind LDA is that documents are generated by a random mixture over latent topics where each topic is characterized by a distribution over words [8]. LDA assumes the following generative process for each document d in a corpus D [8]: 1. Choose the number of words N Poisson(ξ) 2. Choose the topic mixing parameter τ Dir(α) 3. For each of the N words w n : (a) Choose a topic z j Multinomial(τ ) (b) Choose a word w n from p(w n θ zj ), a multinomial probability conditioned on the topic z j. Thus, given the parameters α and θ, this leads to the joint distribution p(τ, z, w α, θ) = p(θ α) N p(z n τ )p(w n θ zn ), (4.9) n=1
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uniaugsburg.de www.multimediacomputing.{de,org} References
More informationRecognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bagofwords Spatial pyramids Neural Networks Object
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationAutomatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo
More informationThe Delicate Art of Flower Classification
The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC pvicol@sfu.ca Note: The following is my contribution to a group project for a graduate machine learning
More informationDissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGESCALE DATABASES. Eva Hörster
Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGESCALE DATABASES Eva Hörster Department of Computer Science University of Augsburg Adviser: Readers: Prof. Dr. Rainer Lienhart Prof. Dr. Rainer Lienhart
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationCS229 Project Final Report. Sign Language Gesture Recognition with Unsupervised Feature Learning
CS229 Project Final Report Sign Language Gesture Recognition with Unsupervised Feature Learning Justin K. Chen, Debabrata Sengupta, Rukmani Ravi Sundaram 1. Introduction The problem we are investigating
More informationModified Sift Algorithm for Appearance Based Recognition of American Sign Language
Modified Sift Algorithm for Appearance Based Recognition of American Sign Language Jaspreet Kaur,Navjot Kaur Electronics and Communication Engineering Department I.E.T. Bhaddal, Ropar, Punjab,India. Abstract:
More informationFace detection is a process of localizing and extracting the face region from the
Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.
More informationObject Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image classification Image (scene) classification is a fundamental
More informationImage Segmentation and Registration
Image Segmentation and Registration Dr. Christine Tanner (tanner@vision.ee.ethz.ch) Computer Vision Laboratory, ETH Zürich Dr. Verena Kaynig, Machine Learning Laboratory, ETH Zürich Outline Segmentation
More information3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
More informationAssessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
More informationCalibrating a Camera and Rebuilding a Scene by Detecting a Fixed Size Common Object in an Image
Calibrating a Camera and Rebuilding a Scene by Detecting a Fixed Size Common Object in an Image Levi Franklin Section 1: Introduction One of the difficulties of trying to determine information about a
More informationCurrent Standard: Mathematical Concepts and Applications Shape, Space, and Measurement Primary
Shape, Space, and Measurement Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two and threedimensional shapes by demonstrating an understanding of:
More informationA Study on SURF Algorithm and RealTime Tracking Objects Using Optical Flow
, pp.233237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and RealTime Tracking Objects Using Optical Flow Giwoo Kim 1, HyeYoun Lim 1 and DaeSeong Kang 1, 1 Department of electronices
More informationA Learning Based Method for SuperResolution of Low Resolution Images
A Learning Based Method for SuperResolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationThe calibration problem was discussed in details during lecture 3.
1 2 The calibration problem was discussed in details during lecture 3. 3 Once the camera is calibrated (intrinsics are known) and the transformation from the world reference system to the camera reference
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More informationBayes and Naïve Bayes. cs534machine Learning
Bayes and aïve Bayes cs534machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationThe KTHINDECS Database
The KTHINDECS Database Andrzej Pronobis, Barbara Caputo Computational Vision and Active Perception Laboratory (CVAP) Department of Numerical Analysis and Computer Science (NADA) KTH, SE1 44 Stockholm,
More informationColour Image Segmentation Technique for Screen Printing
60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screenprinting is an industry with a large number of applications ranging from printing mobile phone
More informationComparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Nonlinear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Nonlinear
More informationLocal features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
More informationComputer Vision  part II
Computer Vision  part II Review of main parts of Section B of the course School of Computer Science & Statistics Trinity College Dublin Dublin 2 Ireland www.scss.tcd.ie Lecture Name Course Name 1 1 2
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationDiscovering objects and their location in images
Discovering objects and their location in images Josef Sivic Bryan C. Russell Alexei A. Efros Andrew Zisserman William T. Freeman Dept. of Engineering Science CS and AI Laboratory School of Computer Science
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationRecognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
More informationClassifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan HadfieldMenell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
More informationText mining and topic models
Text mining and topic models Charles Elan elan@cs.ucsd.edu February 12, 2014 Text mining means the application of learning algorithms to documents consisting of words and sentences. Text mining tass include
More informationA Genetic AlgorithmEvolved 3D Point Cloud Descriptor
A Genetic AlgorithmEvolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT  Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200001 Covilhã, Portugal
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationDYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson
c 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or
More informationA Short Introduction to Computer Graphics
A Short Introduction to Computer Graphics Frédo Durand MIT Laboratory for Computer Science 1 Introduction Chapter I: Basics Although computer graphics is a vast field that encompasses almost any graphical
More information3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials
3. Interpolation Closing the Gaps of Discretization... Beyond Polynomials Closing the Gaps of Discretization... Beyond Polynomials, December 19, 2012 1 3.3. Polynomial Splines Idea of Polynomial Splines
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationFace Recognition using SIFT Features
Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.
More informationAn Automatic and Accurate Segmentation for High Resolution Satellite Image S.Saumya 1, D.V.Jiji Thanka Ligoshia 2
An Automatic and Accurate Segmentation for High Resolution Satellite Image S.Saumya 1, D.V.Jiji Thanka Ligoshia 2 Assistant Professor, Dept of ECE, Bethlahem Institute of Engineering, Karungal, Tamilnadu,
More informationIn mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.
MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target
More informationInteractive Math Glossary Terms and Definitions
Terms and Definitions Absolute Value the magnitude of a number, or the distance from 0 on a real number line Additive Property of Area the process of finding an the area of a shape by totaling the areas
More informationThe Image Deblurring Problem
page 1 Chapter 1 The Image Deblurring Problem You cannot depend on your eyes when your imagination is out of focus. Mark Twain When we use a camera, we want the recorded image to be a faithful representation
More informationCentroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering
Centroid Distance Function and the Fourier Descriptor with Applications to Cancer Cell Clustering By, Swati Bhonsle Alissa Klinzmann Mentors Fred Park Department of Mathematics Ernie Esser Department of
More information1. Bag of visual words model: recognizing object categories
1. Bag of visual words model: recognizing object categories 1 1 1 Problem: Image Classification Given: positive training images containing an object class, and negative training images that don t Classify:
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors ChiaHui Chang and ZhiKai Ding Department of Computer Science and Information Engineering, National Central University, ChungLi,
More informationCS231M Project Report  Automated RealTime Face Tracking and Blending
CS231M Project Report  Automated RealTime Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
More informationCS 585 Computer Vision Final Report Puzzle Solving Mobile App
CS 585 Computer Vision Final Report Puzzle Solving Mobile App Developed by Timothy Chong and Patrick W. Crawford December 9, 2014 Introduction and Motivation This project s aim is to create a mobile application
More informationMeanShift Tracking with Random Sampling
1 MeanShift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
More informationFast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
More informationAutomatic Labeling of Lane Markings for Autonomous Vehicles
Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 jkiske@stanford.edu 1. Introduction As autonomous vehicles become more popular,
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationRobert Collins CSE598C, PSU. Introduction to MeanShift Tracking
Introduction to MeanShift Tracking AppearanceBased Tracking current frame + previous location likelihood over object location appearance model (e.g. image template, or ModeSeeking (e.g. meanshift;
More informationTopic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
More informationFace Recognition in Lowresolution Images by Using Local Zernike Moments
Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August1415, 014 Paper No. 15 Face Recognition in Lowresolution Images by Using Local Zernie
More informationData Mining Yelp Data  Predicting rating stars from review text
Data Mining Yelp Data  Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationEvaluation of local spatiotemporal features for action recognition
Evaluation of local spatiotemporal features for action recognition Heng WANG 1,3, Muhammad Muneeb ULLAH 2, Alexander KLÄSER 1, Ivan LAPTEV 2, Cordelia SCHMID 1 1 LEAR, INRIA, LJK Grenoble, France 2 VISTA,
More informationS = {1, 2,..., n}. P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = . P (n, 1) P (n, 2)... P (n, n)
last revised: 26 January 2009 1 Markov Chains A Markov chain process is a simple type of stochastic process with many social science applications. We ll start with an abstract description before moving
More informationEE 368 Project: Face Detection in Color Images
EE 368 Project: Face Detection in Color Images Wenmiao Lu and Shaohua Sun Department of Electrical Engineering Stanford University May 26, 2003 Abstract We present in this report an approach to automatic
More informationDigital Image Processing. Prof. P. K. Biswas. Department of Electronics & Electrical Communication Engineering
Digital Image Processing Prof. P. K. Biswas Department of Electronics & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture  28 Colour Image Processing  III Hello,
More informationC4 Computer Vision. 4 Lectures Michaelmas Term Tutorial Sheet Prof A. Zisserman. fundamental matrix, recovering egomotion, applications.
C4 Computer Vision 4 Lectures Michaelmas Term 2004 1 Tutorial Sheet Prof A. Zisserman Overview Lecture 1: Stereo Reconstruction I: epipolar geometry, fundamental matrix. Lecture 2: Stereo Reconstruction
More informationClassification of Fine Art Oil Paintings by Semantic Category
Classification of Fine Art Oil Paintings by Semantic Category William Kromydas kromydas@stanford.edu Abstract In this paper we explore supervised learning techniques that are able to classify fineart
More informationTheory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras
Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture No. # 31 Recursive Sets, Recursively Innumerable Sets, Encoding
More informationMarkov chains and Markov Random Fields (MRFs)
Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume
More informationIMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS
IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS Alexander Velizhev 1 (presenter) Roman Shapovalov 2 Konrad Schindler 3 1 Hexagon Technology Center, Heerbrugg, Switzerland 2 Graphics & Media
More informationPartBased Recognition
PartBased Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, PartBased Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple
More informationSection 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
More informationFace Model Fitting on Low Resolution Images
Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationImage Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode Value
IJSTE  International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode
More informationEyeglass Localization for Low Resolution Images
Eyeglass Localization for Low Resolution Images Earl Arvin Calapatia 1 1 De La Salle University 1 earl_calapatia@dlsu.ph Abstract: Facial data is a necessity in facial image processing technologies. In
More informationJiří Matas. Hough Transform
Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian
More informationDetermining optimal window size for texture feature extraction methods
IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237242, ISBN: 8480213515. Determining optimal window size for texture feature extraction methods Domènec
More informationSupervised and unsupervised learning  1
Chapter 3 Supervised and unsupervised learning  1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationScalar Visualization
Scalar Visualization 41 Motivation Visualizing scalar data is frequently encountered in science, engineering, and medicine, but also in daily life. Recalling from earlier, scalar datasets, or scalar fields,
More informationMotivation. Lecture 31: Object Recognition: SIFT Keys. Simple Example. Simple Example. Simple Example
Lecture 31: Object Recognition: SIFT Keys Motivation Want to recognize a known objects from unknown viewpoints. find them in an image database of models Local Feature based Approaches Represent appearance
More informationTHE development of methods for automatic detection
Learning to Detect Objects in Images via a Sparse, PartBased Representation Shivani Agarwal, Aatif Awan and Dan Roth, Member, IEEE Computer Society 1 Abstract We study the problem of detecting objects
More informationLearning 3D Object Recognition Models from 2D Images
From: AAAI Technical Report FS9304. Compilation copyright 1993, AAAI (www.aaai.org). All rights reserved. Learning 3D Object Recognition Models from 2D Images Arthur R. Pope David G. Lowe Department
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationCVChess: Computer Vision Chess Analytics
CVChess: Computer Vision Chess Analytics Jay Hack and Prithvi Ramakrishnan Abstract We present a computer vision application and a set of associated algorithms capable of recording chess game moves fully
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationINTRODUCTION TO NEURAL NETWORKS
INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbookchapterslides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationAndroid Ros Application
Android Ros Application Advanced Practical course : Sensorenabled Intelligent Environments 2011/2012 Presentation by: Rim Zahir Supervisor: Dejan Pangercic SIFT Matching Objects Android Camera Topic :
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationMultiple Object Tracking Using SIFT Features and Location Matching
Multiple Object Tracking Using SIFT Features and Location Matching SeokWun Ha 1, YongHo Moon 2 1,2 Dept. of Informatics, Engineering Research Institute, Gyeongsang National University, 900 GazwaDong,
More informationCHAPTER 3 Numbers and Numeral Systems
CHAPTER 3 Numbers and Numeral Systems Numbers play an important role in almost all areas of mathematics, not least in calculus. Virtually all calculus books contain a thorough description of the natural,
More informationEfficient visual search of local features. Cordelia Schmid
Efficient visual search of local features Cordelia Schmid Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million images
More informationData Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a
More informationRobot Perception Continued
Robot Perception Continued 1 Visual Perception Visual Odometry Reconstruction Recognition CS 685 11 Range Sensing strategies Active range sensors Ultrasound Laser range sensor Slides adopted from Siegwart
More informationBlind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections
Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections Maximilian Hung, Bohyun B. Kim, Xiling Zhang August 17, 2013 Abstract While current systems already provide
More informationA Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof
A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology, Austria {mauthner,bischof}@icg.tugraz.ac.at
More informationA Comparative Study between SIFT Particle and SURFParticle Video Tracking Algorithms
A Comparative Study between SIFT Particle and SURFParticle Video Tracking Algorithms H. Kandil and A. Atwan Information Technology Department, Faculty of Computer and Information Sciences, Mansoura University,ElGomhoria
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More information