GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas

Size: px
Start display at page:

Download "GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas"

Transcription

1 GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas Wei Guan Computer Graphics and Immersive Technologies Computer Science, USC Suya You Computer Graphics and Immersive Technologies Computer Science, USC Ulrich Neumann Computer Graphics and Immersive Technologies Computer Science, USC ABSTRACT We present a recognition-based user tracking and augmented reality system that works in extreme large scale areas. The system will provide a user who captures an image of a building facade with precise location of the building and augmented information about the building. While GPS cannot provide information about camera poses, it is needed to aid reducing the searching ranges in image database. A patch-retrieval method is used for efficient computations and real-time camera pose recovery. With the patch matching as the prior information, the whole image matching can be done through propagations in an efficient way so that a more stable camera pose can be generated. Augmented information such as building names and locations are then delivered to the user. The proposed system mainly contains two parts, offline database building and online user tracking. The database is composed of images for different locations of interests. The locations are clustered into groups according to their UTM coordinates. An overlapped clustering method is used to cluster these locations in order to restrict the retrieval range and avoid ping pong effects. For each cluster, a vocabulary tree is built for searching the most similar view. On the tracking part, the rough location of the user is obtained from the GPS and the exact location and camera pose are calculated by querying patches of the captured image. The patch property makes the tracking robust to occlusions and dynamics in the scenes. Moreover, due to the overlapped clusters, the system simulates the "soft handoff" feature and avoid frequent swaps in memory resource. Experiments show that the proposed tracking and augmented reality system is efficient and robust in many cases. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval Image Retrieval, Performance Analysis; I.4.9 [Image Processing and Computer Vision]: Applications Scene Recognition, User Tracking General Terms Performance Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MMSys 11, February 23 25, 2011, San Jose, California, USA. Copyright 2011 ACM /11/02...$ Figure 1: The proposed tracking and augmented reality system. (a) The captured image from the user. (b) The nearest cluster is selected (yellow dots). The current location is shown with blue dot and the estimated location is shown with red dot. (c) The image with augmented information. A 2D annotating label and 3D annotating sign is embedded in the image. Keywords Patch Approach, Real-time Recognition, Augmented Reality 1. INTRODUCTION The Global Positioning System (GPS) is a space-based satellite system that can provide location information anywhere where there is an unobstructed line of sight to more than three satellites. It precisely times the signals sent by GPS satellites and then determines the distances to these satellites. These distances and the locations of satellites are used to calculate the position of GPS receivers. While GPS system is widely used to aid navigation, however, many applications demand more accurate locations and more detailed sensor information for better environment-interaction capabilities.

2 Vision sensors provide a tremendous amount of information about the user s environments. They are considered one of the most powerful sources of information among all the sensors. Not only can they be used to provide more accurate location information, they can also provide users with more context-based information such as appearance information about the buildings. However, due to the wealth of information provided by vision sensors, the processing usually takes much longer time than other types of sensors like ultrasound sensors and inertial sensors. Besides, a picture itself provides no information about locations. The location can be estimated by matching images in the database. Therefore, image querying is an essential process, which will consume extra computational resource. Determining locations from vision-based sensors is a critical problem in the vision and robotics community. When a user obtains information from its optical sensors, the visual information is summarized and compared with the existing landmarks. Point-based features such as SIFT [11] or SURF [3] are usually used in the matching process due to their robustness. However, because of the complexity for such feature generations and matchings, the image retrieval and matching processes are not fast enough for some applications such as real time tracking and augmented reality. We propose a novel system that can track the user with augmented information in large scale areas. The system can work in real time through speeding up the retrieval and matching processes. With the GPS information, the system first selects the nearest cluster and load the corresponding database. Then for the image captured by the user, the system picks the most promising part of the image and use it to query the best matching patch in the existing database. The query results are used to refine the user s location. The calculated camera pose is still not stable or accurate enough for some applications like augmented reality since the querying features are located in a small area on the image. So in the next step, an algorithm is designed to propagate matchings to the whole image. The searching range for feature matchings is largely limited. Therefore, the speed for matchings will be significantly increased and the calculated camera pose will be more accurate. Another advantage of our proposed framework is its ability to handle occlusions and dynamics. It is common that the newly captured images are different from existing images in the database due to moving passengers and objects. The proposed algorithm will pick some patches that are from the non-occluded parts and match them in the database. In most cases, the proposed framework is robust to large occlusions and dynamics. The remainder of this paper presents the proposed system in more details. After discussing some related work in Section 2, we provide an overview of our proposed system in Section 3. The process of building the database is presented in Section 4. Following that, we talk about the recognition-based user tracking in Section 5. In Section 6, we propose a propagation method for camera pose refinement. We show experimental results in Section 7 and conclude the paper in Section RELATED WORK Over the past years, many vision-based localization systems have been proposed. Depending on the features they use to describe the images and the method they exploit to do the matchings, these systems can be divided into two categories. In the first category [7, 8, 5, 4], simple features like lines and colors are used, but sophistic learning techniques are usually required to locate the users. Horswill [7] extracts features around the environment like walls, doors or openings and identify the robot position according to these features. The algorithm is efficient due to its specialization to its task and environment. In [8], the vertical lines are extracted from images and combined with distance information which is obtained from ultrasound sensors. A Bayesian network is used for the combination and estimation of the robot location. Dodds and Hager [5] use a color interest operator consisting of a weighted combination of heuristic scores to identify landmarks. The operator can select regions that are robust representations for scenes recognition. A Bayesian filtering method was proposed in [4]. It uses samplingbased representation method and localizes the robot by using scalar brightness measurement. In the second category, more sophisticated features are used [15, 16, 19, 9]. In [15], Se et al. propose a vision-based simultaneous localization and mapping (SLAM) system by tracking the SIFT features. SIFT features [11] are robust in scale, orientation and viewpoint variations so they are good natural visual landmarks for tracking over long periods of time from different views. Tamimi et al. [16] propose an approach that reduces the number of features generated by SIFT, and with the help of a particle filter, the robot location can still be estimated accurately. In [19], Wolf et al. use local scale-invariant features and combine with Monte-Carlo localization to estimate robot positions. The system is robust against occlusions and dynamics such as people walking by. In [9], scaleinvariant features are also used, and they are combined with a proposed probabilistic environment model in order to locate the robot. To match images in large-scale database, image retrieval techniques are usually exploited [19, 10, 17, 13]. Wolf et al. [19] make use of image retrieval technique together with sample-based Monte Carlo localization to extract the possible viewpoints for the current image. Krose and Bunschoten [10] describe a vision based localization method that uses principal component analysis on images captured at different locations. In [17], Ulrich and Nourbakhsh propose an efficient approach that uses color histogram matchings for localization. The color images can be classified in real-time based on nearest-neighbor learning and voting scheme. In [13], Nister and Stewenius proposes a recognition scheme that scales to large number of objects. The scheme builds upon indexing descriptors based on SIFT features and efficiently integrates indexing and hierarchical quantization with a vocabulary tree. Some systems are not specially designed for localization purpose but try to recognize objects with less computational cost. Wagner et al. [18] presents a modified SIFT that is created from a fixed patch size of 15x15 pixels and form a descriptor with only 36 dimensions. The modified feature is used for efficient nature tracking with interactive speed on current-generation phones. Henze et al. [6] combines the simplified SIFT with a scalable vocabulary tree to achieve interactive object recognition on mobile phones. The simplified features consume less computational cost which is necessary for mobile applications. Azad et al. [2] present a combination of the Harris corner detector and the SIFT descriptor, which computes features with a high repeatability and good matching properties. By replacing the SIFT keypoint detector with extended Harris corner detector, the algorithm can generate features in real time. The techniques described above match images captured by vision sensors with existing landmarks in the database. The goal of this paper is to propose a novel system that can track users in large scale areas and provide with augmented information. We describe how the landmarks are built in the database and how the image can be queried efficiently. In practical experiments we demonstrate that our system is able to locate the user and recover the camera pose at the real time speed and is robust to large occlusions and dynamics.

3 3. OVERVIEW OF THE SYSTEM The system consists of two main parts, database building and user tracking. As shown in Fig. 2, each part contains several steps. In database building process, first we select locations of interests in the large-scale area and take images from different viewpoints for each location. Then we use the bundler algorithm [1] to build a 3D point cloud per location. The point cloud is manually registered with the world coordinates, i.e. assigning the UTM coordinates, adjusting the scale and orientation. We use overlapped clustering method to cluster these locations so that each cluster contains a few nearby locations. The images are partitioned into smaller patches and the patches in each cluster are put into a vocabulary tree for retrieval in the tracking process. For the online tracking part, the user s rough location is obtained through GPS device so that the closest cluster is determined. Due to the inaccuracy of GPS devices, the locations that are nearly close to two or more cluster centers will be assigned to these clusters at the same time. An overlapped clustering method that is based on k-means is used. When a user captures a new image, the image is partitioned into patches and the most distinctive patches are used for retrieval through the corresponding vocabulary tree. The more accurate location information and the camera pose can be recovered from 2D to 3D matchings. We will discuss each step in more details in the following sections. 4. DATABASE BUILDING We use similar method in [1] to build the environment. Instead of obtaining images from the web and cover the whole area, we take images ourselves and only cover the locations of interests such as facades of buildings. At each location, the images are taken from different views at different distances. SURF features [3] are used due to its fast speed and robustness. Moreover, the UTM coordinates and building-related information for each location are also recorded. Such information can be obtained from GPS device or manually input. In our system, 120 locations are recorded. For each location, we take 10 images from different viewing angles and distances. Therefore, there are totally 1,200 images in our database. 4.1 Overlapped Clustering with K-Means To reduce the searching range, we cluster the locations of interests into groups. The user is associated with the cluster whose center is nearest to his or her rough location. Such location is represented as the UTM coordinates returned from GPS device, so the retrieval process can be done within the cluster. To build overlapped K clusters, we first use traditional k-means clustering method to get K non-overlapped clusters. For those locations that are at the boundaries, we assign them to all those clusters that are nearly close as the closest one. The steps are shown as follows, 1. Suppose there are N different locations, and the UTM coordinates (easting and northing) for location i are L i = (X i,y i ), i = 1..N. Let the maximum error caused by a GPS device be E and the longest distance from which a user is allowed to take pictures for a location be D. Let H be the threshold for handoff (we will discuss in later section). We want to cluster the locations into K groups. 2. (Traditional K-means) Randomly select K locations. We consider these K locations as the centers of K clusters. 3. For each location, assign it to the cluster whose center is nearest to it. 4. For each cluster, recalculate the cluster center. Repeat step 3 and step 4 until it converges. Then goto step (Overlapped assignment) For each location i, let d i be the distance to the nearest cluster center. Let centers for the K clusters be C j = (X j,y j ), j = 1..K. We assign the location i to cluster j if and only if L i C j 2 d i + max(e, D, H). The step 2 to step 4 are traditional k-means clustering method. We add one more step for overlapped assignment to handle three problems. Firstly, the instability and inaccuracy of GPS. Secondly, the position of the user is usually different from the location of interest. Thirdly, the user may go back and forth within certain areas, which can cause unnecessary swaps of databases in the memory. In our system, we have N=120 locations. We want to cluster them into K=5 groups. The clustering results are shown in Fig. 3. Figure 3: The overlapped clustering with N=120 and K=5. Only half of the locations are displayed in the figure. 4.2 Image Partitioning When we build the database, instead of bagging SURF features based on images, we make smaller bags that are based on patches. For a 640 by 480 image, we partition it into 8 by 8 grids. As shown in Fig. 4, there are 4 different patches according to their size. For patches of the same size, the neighboring two patches have an overlap of half size of the patch. For example, in Fig. 4, the two patches of size 1 1 grid have an overlap of half grid size. For an 8 by 8 partitioning, there are totally = 284 patches. Therefore, there are totally = 340,800 patches so on average 340,800/5 = 68,160 patches for each cluster. Moreover, we will remove those patches that contain too few features since they are not good representations for the locations. After removal, there are around 30,000 to 40,000 patches per cluster. As we can see, many of the patches have duplicate SURF features. In another word, each SURF feature is contained in many different patches. Therefore, when we calculate the visual words of these features, we need to make sure each feature is only calculated once to avoid overheads caused by partitioning. It is also important to note that we should calculate the SURF features first and then partition them into patches. Otherwise, the features near the patch boundary will not be correctly described.

4 Figure 2: Overview of the system. Figure 4: An image is partitioned into 8 by 8 small grids. Four different sizes are used for patches, 1 1 grid, 2 2 grids, 4 4 grids and 8 8 grids (the whole image). SURF features. In our implementation, the vocabulary tree has 6 levels and each level has 10 branches. So there are 1 million leaf nodes or classes. A patch usually contains hundreds of features, and each feature generates a visual word by going through the vocabulary tree, so a patch can be represented by the bag of visual words. It can be further described by the frequency or distribution of visual words, and such distribution can be represented by a vector. The length of the vector is the same as the number of leaf nodes. To compare two patches, we only need to compare the two distribution vectors. Though the vector is very high-dimensional, there are only a few non-zero elements, so the comparison can be done in little time. When a new patch is queried, a score is calculated for each comparison with every patch containing the same nodes along the path. To calculate the distribution vector for a patch, the most time consuming part is the quantization process. Instead of going through the vocabulary tree patch by patch, we first do the quantization for all the features in the image to avoid duplicate quantization. The distribution vectors can be calculated hierarchically in a simple way as follows. As shown in Fig. 5-(a), let the number of features for these patches be N 1 to N 6. The distribution vectors for patch 7 and patch 8 can be simply calculated as, 4.3 Scalable Vocabulary Tree with Patches Every patch is represented by a bag of SURF features. For a large number of patches, to compare the 64-dimension-vector SURF features for every two images is extremely expensive. Vocabulary tree is usually used to quantize SURF into more compact features. A vocabulary tree is a hierarchical-structured tree that can efficiently integrate quantization and classification. The classification results can be further used as indexing based on well-designed scoring scheme. The quantization is built by hierarchical k-means clustering. The tree can be trained unsupervisely with a large set of v 7 = 4 k=1 v kn k 4 k=1 N k, v 8 = 6 k=3 v kn k 6 k=3 N. (1) k For each image, though we have 284 distribution vectors instead of one vector, the total amount of time used for calculating the vectors is negligible compared to the quantization process. For a image on a 4GHz CPU, the total amount of time used for the quantization process is about 70-80ms, and the time to calculate all the vectors is about 2-4ms. Therefore, the overheads for image partitioning are no more than 3 percent.

5 5.2 Patch Distinctiveness Similar to the partitioning in database building process, an image is partitioned into 8 by 8 smaller patches. However, in the querying process, we do not overlap patches so only 64 patches are used. For each image, the top 5 patches are selected according to the number of features, as shown in Fig. 7. The quantization process for these patches is about 10-20ms, which is 5 to 8 times faster than using the whole image. The patch with the most number of features is considered the one with the wealthiest visual information. However, these features may not be the best for image retrieval because some features may not be as distinct as others to identify the patch. By learning the database images, we can learn about the distinctiveness for the features. Figure 5: (a) The hierarchical way to calculate the distribution vector. (b) A 3-level vocabulary tree with 5 features (a, b, c, d, e). The weights for level 1 to 3 are 1 8, 1 4 and 1 2. The number in the parenthesis are number of features that going through for each node. 5. RECOGNITION-BASED TRACKING For the online stage, we first associate the user with the nearest cluster. The most distinctive patches in the new image will be matched with images in the corresponding database. The overlapped clusters are used to achieve "soft handoff" feature. The use of patches will greatly increase the speed for retrieval thus reduce the localization time. With careful selection of the patch used for retrieval, the performance is still competitive with using the whole image. In the cases that large occlusions exist, the patch-based method can have even better performance. 5.1 Cluster Association and Soft Handoff With the user s UTM location provided by GPS, the cluster whose center is nearest to the user can be found. The corresponding database will then be loaded to the memory. However, when a user is at some locations that are nearly close to two cluster centers and frequently crosses the boundary, there will be unnecessary swaps of memories. Therefore, we set a handoff threshold H to avoid such overheads. Figure 6: The soft handoff process. H is the threshold. As shown in Fig. 6, C 1 and C 2 are two clusters. The user is initially at point a, which is closer to cluster C 1. When the user crosses the boundary to point b which is closer to cluster C 2, he or she is still associated with cluster C 1. The association will not change if the user goes back to point a. However, if the user goes further with extra distance H to point c, the association will be changed to cluster C 2. When he or she goes back to b or a, the user is still associated to cluster C 2. In our system, since the GPS has an error E and the user may take pictures with distance D from the location of interest, we will use max(e, D, H) instead of H. Figure 7: The top 5 patches are selected according to feature numbers. The selected patches are shown in red rectangles. One way to measure feature distinctness is to calculate the feature frequencies. For a specific feature, the more patches that contain it, the less distinctive the feature is. Reversely, if the feature is contained only in one patch, such feature is considered very distinctive. For any quantized features (visual words), we count the number of patches that contain it. Only the smallest non-overlapping patches in the database are counted. The process is shown with a simpler vocabulary tree in Fig. 5- (b). Suppose there are 5 patches and each patch contains only one feature. The 5 features are a,b,c,d,e, whose paths are shown in red. As we can see, a and b are quantized to the same value so there are totally 4 different features, and their frequencies are, f a = f b = 2, f c = f d = f e = 1. The distinctiveness weight is as follows, { 1fi i=a,b,c,d,e. w i = (2) 0 other features. With such weights, features a and b are less distinctive compared to features c, d and e. Features c and d are similar so they should be less distinctive than feature e. However, the above assignment can not distinguish this. We improve this by assigning weights to nodes along a feature path at different levels. For a n-level vocabulary tree, the weights are 2 1 1,,..., 1 n 2 n 1 2 for nodes at level 1 to level n. For any path in a the tree, let the number of patches that go through a node in level i be N i. The frequency and weight of the path p are calculated as, { n N 1fp i f p = i=1 2 n+1 i, w for f p 0. p = (3) 0 for f p = 0.

6 With the new weight assignment, frequencies for a to e are calculated as, f a = f b = 2, f c = f d = 1.5, f e = In this way, e is considered more distinctive than c and d. With the top 5 patches selected, we will measure the distinctiveness for each patch by summing the weights of all the features in the patch. Then the 5 patches are ranked according to these distinctiveness. The ordered patches are used for querying in the database. 5.3 Querying with Patches The more distinctive the patch is, the higher probability that the correct patch will be retrieved in the database. Therefore, we start patch retrieval in the order of distinctiveness. In the retrieval process, a similarity score is assigned to each patch in the database that contains same features with the querying patch. For any two patches, the similarity score can be calculated by multiplying the two distribution vectors of the patches. To consider the distinctiveness for different features, we multiply with the distinctiveness weights in calculating the scores. Let v 1 and v 2 be the two vectors with length n, and v 1 = (e 11 e 12...e 1n ) and v 2 =(e 21 e 22...e 2n ). The score S 12 is calculated as, n S 12 = e 1p e 2p w p, (4) p=1 where w p is the distinctiveness weight for path p. For each of the selected 5 querying patches, the top 3 patches in the database with highest scores are returned. As shown in table 1, the top 1 query results are not always reliable. However, the probability is more than 90% that the correct patch is in the top 3 returned patches by querying the best selected patch. The accuracy rate of retrieval can be further increased by querying more patches. From the table, we see that using 3 out of the 5 patches will increase the accuracy to more than 95%, but using all the 5 patches cannot increase further much. Therefore, we will use 3 querying patches which will totally return 9 patches. Top 1 Top 2 Top 3 Rank % 85.3% 90.7% Rank % 89.3% 93.2% Rank % 91.2% 96.3% Rank % 92.5% 96.5% Rank % 93.1% 96.7% Table 1: The accuracy rate for patch retrieval. The value at ith column, jth row means the probability that the correct patch can be returned by querying j patches with each returning top i results. To find the correct patch, RANSAC is used to guarantee that the matchings are correct. In our implementation, we set the threshold value for inliers to be 20. On average 4 patch matchings are needed, which cost about 10-15ms. 5.4 More Accurate Locations When we build the database with method in [1], for each location of interests, a 3D feature point cloud is generated from images at different positions and views. Each 3D feature point corresponds to some 2D features in the database patches. When we match features from the querying patch with the features in the retrieved patch, we also find feature matchings with the 3D points. Therefore, the camera pose can be calculated from the 2D to 3D matchings. Let camera position for the returned patch be L w =(X w,y w,z w) in world coordinates. Let the camera pose of returned patch in the camera coordinates be P c =[R c T c], and the pose of current camera be P c =[R c T c ]. The user s location in the world coordinates is then calculated as, L w =(X w,y w,z w )=L w+(t c T c), (5) where Z w is set to 0 if the height information is unknown. 6. CAMERA POSE REFINEMENT The location information is obtained through patch querying and camera pose calculation. However, the calculated pose is not accurate enough for some other applications. For example, many augmented reality applications usually needs high accuracy in order to correctly place the virtual objects into the real scenes. The pose calculated with two patches is not accurate because all the features are located in a small area. With matching propagations, we can obtain a much refined pose. 6.1 Matching Propagations We can calculate the homography from the matched features in the two patches. This is reasonable because most building facades are more or less planar-like. With the estimated homography transform H 1, we can estimate the locations of the matches for other feature points that lie outside the patch. The procedure is shown in Fig. 8. The red rectangles represent the two matching patches. H 1 is calculated from the matched features, and x 2 is one of feature points that are outside the patch. The location of corresponding point for x 2 can be estimated by H 1 x 2, which is represented by x 2 in the figure. Therefore, we can find the real match point x 2 by searching the neighboring area. Figure 8: The propagation process. Two propagations are shown in the figure. H 1 is calculated from the matched points in the red area (x 1 and x 1 are one example matching pair). The corresponding point for x 2 can be estimated by x 2 = H 1 x 2. The real corresponding point x 2 can be found nearby x 2. Then H 2 can be calculated from all matched points within the blue rectangle. Similarly, H 3 can be calculated. Not all the matches of feature points lie at nearby area. For example, after the first propagation, x 3, the real match point of x 3 is probably far from the estimated match point x 3 which is calculated from H 1. This is due to the fact that the further the feature points are from the matched pairs, the larger errors are produced by such homographic transformation. Therefore, one-time propagation is not enough. For an image that is partitioned into a 8 by 8 patches, we use three propagations. The propagation steps are one patch, two patches and four patches, which are shown in Fig. 9. With the proposed propagations, the searching area for feature matchings can be largely restricted. Another advantage of such matching scheme is that RANSAC process can be simplified. This is because the number of outliers is largely reduced by restriction of searching areas. We randomly select two inliers p 1 and p 2 in the patch, and two matched feature points p 3 and p 4 in the propagation

7 Figure 9: The 3 propagations in the image matching process. Different colors are used for the correspondence lines in different propagations. The propagation steps are 1 patch, 2 patches and 4 patches for the 1st, 2nd and 3rd propagation. area. The four matching pairs are used to calculate the homography, and it is used to determine the outliers. If there are too many outliers, it is very possible that the selected points p 3 or p 4 is an outlier. In this case, we randomly choose another two points and calculate the pose again. If the number of outliers is small, we remove these outliers and calculate the pose using all the matched points. The matching time with 3 propagation steps is about 15-25ms, while it usually take ms without the proposed patch-approach and such propagations. 6.2 Pose Refinement with Propagations When the camera pose is estimated with features in the patch area, the calculated pose is not accurate. The accuracy can be improved through matching propagations. When the matchings are propagated to the whole image, the calculated pose is accurate enough for most applications. As described in previous section, the propagation method is 5 to 6 times faster than the traditional way of feature matchings. Figure 11: Reprojection errors of the calculated pose after each propagation for Fig. 9. Only the inlier features are displayed. The features are displayed according to order of propagations. ample, the errors in the third propagation area are much larger than the errors in the second propagation area, and those errors in the patch area are close to 0. Figure 11 shows similar results but only reprojection errors of the inliers are displayed. 7. EXPERIMENTAL RESULTS 7.1 System Speedup With the overlapped clustering, we can reduce the database size so that the retrieval performance will not be hurt due to its large scale. The retrieval time is reduced by querying a patch instead of querying the whole image. The matching speed is increased through first matching with a smaller patch and then propagating to the whole image. Therefore, the whole process is speeded up. Table 2 shows the comparisons of computing time between the framework with patch-approach and the one without patch-approach. without patch (ms) with patch (ms) improved by Retrieval times Matching times Total (1) % Total (2) % Total (3) % Table 2: The comparisons between the framework with patchapproach and the one without patch-approach. SURF descriptor (10ms) is employed but different detectors are used. Nearest neighbor method is used for the matching process. (1) SURF detector (300ms) (2) extended Harris corner detector (25ms) (3) FAST corner detector (<5ms). Figure 10: Reprojection errors of the calculated pose after each propagation for Fig. 9. All the matched features are displayed. The features are displayed according to the order of propagations. Those with very large errors are outliers. As shown in Fig. 10, the reprojection errors of all the matching points are shown. We can see that the errors after the third propagation are the smallest. Without any propagations, the reprojection errors are large for the features that are far from the patch. For ex- From the experiments, we conclude that if SURF detector is used, the performance can be increased by up to 50%. If faster detectors such as extended Harris corner detector [12] or FAST corner detector [14] are used, the performance will be increased significantly which can be up to 2 to 5 times. This is because patch retrieval and matching process become the most time consuming parts in the whole system.

8 7.2 Localization and Augmented Reality We conduct many experiments for our localization system in the outdoor. Some of the results are shown In Fig. 7.2 and Fig As we can see, the system is demonstrated to be robust for many different cases such as image captured with rotations, at different distances and with large occlusions. The user locations are displayed by red dots in the map. The building names and locations are also displayed with a 2D label and a 3D sign in the captured images. 8. CONCLUSION We propose an efficient system for user tracking and augmented reality in large scale area. The system is composed of two parts, offline database building process and online user tracking process. The database is built from images captured at many locations with different viewing directions and distances. The locations are clustered with overlaps for limiting the database size. The images are partitioned into patches of different sizes. Each patch is represented by the distribution of visual words, which are generated from the quantization of SURF descriptors. A 3D points cloud is also formed for each location, which is used for pose calculation. At the online stage, the new image is also partitioned into smaller patches. The nearest cluster is selected and the most distinctive patch is used to retrieve the similar patches in the corresponding database for that cluster. The camera pose can be calculated from the 2D to 3D matchings. Since the features are located in a small area, the calculated pose is not accurate enough for augmented reality applications. Therefore, the pose is further refined through matching propagations in an efficient way. Experiments show that the proposed system works well in large scale areas. Furthermore, the system is demonstrated to be robust to large occlusions and dynamics in the images. Through matching propagations, the pose can be generated with expected accuracies. 9. REFERENCES [1] S. Agarwal, N. Snavely, I. Simon, S. Seitz, and R. Szeliski. Building rome in a day. In Proceedings of the International Conference on Computer Vision (ICCV), [2] P. Azad, T. Asfour, and R. Dillmann. Combining harris interest points and the sift descriptor for fast scale-invariant object recognition. In EEE/RSJ International Conference on Intelligent Robots and Systems (IROS), [3] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3): , [4] F. Dellaert, W. Burgard, D. Fox, and S. Thrun. Using the condensation algorithm for robust, vision-based mobile robot localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [5] Z. Dodds and G. D. Hager. A color interest operator for landmark-based navigation. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages , [6] N. Henze, T. Schinke, and S. Boll. What is that? object recognition from natural features on a mobile phone. In Proceedings of the Workshop on Mobile Interaction with the Real World, [7] I. Horswill. Polly: A visisn-based artificial agent. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages , [8] D. Kortenkamp and T. Weymouth. Topological mapping for mobile robots using a combination of sonar and vision sensing. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages , [9] J. Kosecka, F. Li, and X. Yang. Global localization and relative positioning based on scale-invariant keypoints. Robotics and Autonomous Systems, 52(1), [10] B. Krose and R. Bunschoten. Probabilistic localization by appearance models and active vision. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), [11] D. Lowe. Object recognition from local scaleinvariant features. In Proceedings of the Seventh International Conference on Computer Vision (ICCV), [12] K. Mikolajcyk and C. Schmid. An affine invariant interest point detector. In Proceedings of the International Conference on Computer Vision (ICCV), [13] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [14] E. Rosten and T. Drummond. Machine learning for high-speed corner detection. In Proceedings of the European Conference on Computer Vision (ECCV), pages , [15] S. Se, D. Lowe, and J. Little. Vision-based mobile robot localization and mapping using scale-invariant features. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages , [16] H. Tamimi, H. Andreasson, A. Treptow, T. Duckett, and A. Zell. Localization of mobile robots with omnidirectional vision using particle filter and iterative sift. In Proceedings of the European Conference on Mobile Robots (ECMR), [17] I. Ulrich and I. Nourbakhsh. Appearance-based place recognition for topological localization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), [18] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and D. Schmalstieg. Pose tracking from natural features on mobile phones. In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), [19] J. Wolf, W. Burgard, and H. Burkhardt. Robust vision-based localization for mobile robots using an image retrieval system based on invariant features. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2002.

9 (a) (b) (c) (d) Figure 12: The localization system and augmented reality. (a) the captured image is rotated. (b) the image is captured at a further distance. (a) and (b) are within the same cluster area. (c) and (d) are two examples captured in other cluster areas.

10 (a) (b) Figure 13: Occlusions handling of the localization system. Due to its patch property, the proposed system is effective in handling occlusions.

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service Feng Tang, Daniel R. Tretter, Qian Lin HP Laboratories HPL-2012-131R1 Keyword(s): image recognition; cloud service;

More information

Build Panoramas on Android Phones

Build Panoramas on Android Phones Build Panoramas on Android Phones Tao Chu, Bowen Meng, Zixuan Wang Stanford University, Stanford CA Abstract The purpose of this work is to implement panorama stitching from a sequence of photos taken

More information

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow , pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices

More information

3D Model based Object Class Detection in An Arbitrary View

3D Model based Object Class Detection in An Arbitrary View 3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

Local features and matching. Image classification & object localization

Local features and matching. Image classification & object localization Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

More information

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G. Lowe Computer Science Department, University of British Columbia, Vancouver, B.C., Canada mariusm@cs.ubc.ca,

More information

The use of computer vision technologies to augment human monitoring of secure computing facilities

The use of computer vision technologies to augment human monitoring of secure computing facilities The use of computer vision technologies to augment human monitoring of secure computing facilities Marius Potgieter School of Information and Communication Technology Nelson Mandela Metropolitan University

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor

A Genetic Algorithm-Evolved 3D Point Cloud Descriptor A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal

More information

Segmentation of building models from dense 3D point-clouds

Segmentation of building models from dense 3D point-clouds Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute

More information

CS231M Project Report - Automated Real-Time Face Tracking and Blending

CS231M Project Report - Automated Real-Time Face Tracking and Blending CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android

More information

Mean-Shift Tracking with Random Sampling

Mean-Shift Tracking with Random Sampling 1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of

More information

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree. Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table

More information

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH VRVis Research Center for Virtual Reality and Visualization, Virtual Habitat, Inffeldgasse

More information

Flow Separation for Fast and Robust Stereo Odometry

Flow Separation for Fast and Robust Stereo Odometry Flow Separation for Fast and Robust Stereo Odometry Michael Kaess, Kai Ni and Frank Dellaert Abstract Separating sparse flow provides fast and robust stereo visual odometry that deals with nearly degenerate

More information

Situated Visualization with Augmented Reality. Augmented Reality

Situated Visualization with Augmented Reality. Augmented Reality , Austria 1 Augmented Reality Overlay computer graphics on real world Example application areas Tourist navigation Underground infrastructure Maintenance Games Simplify Qualcomm Vuforia [Wagner ISMAR 2008]

More information

CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER

CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER TECHNICAL SCIENCES Abbrev.: Techn. Sc., No 15(2), Y 2012 CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER Piotr Artiemjew, Przemysław Górecki, Krzysztof Sopyła

More information

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform

More information

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service Ahmad Pahlavan Tafti 1, Hamid Hassannia 2, and Zeyun Yu 1 1 Department of Computer Science, University of Wisconsin -Milwaukee,

More information

BRIEF: Binary Robust Independent Elementary Features

BRIEF: Binary Robust Independent Elementary Features BRIEF: Binary Robust Independent Elementary Features Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua CVLab, EPFL, Lausanne, Switzerland e-mail: firstname.lastname@epfl.ch Abstract.

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Randomized Trees for Real-Time Keypoint Recognition

Randomized Trees for Real-Time Keypoint Recognition Randomized Trees for Real-Time Keypoint Recognition Vincent Lepetit Pascal Lagger Pascal Fua Computer Vision Laboratory École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland Email:

More information

Augmented Reality Tic-Tac-Toe

Augmented Reality Tic-Tac-Toe Augmented Reality Tic-Tac-Toe Joe Maguire, David Saltzman Department of Electrical Engineering jmaguire@stanford.edu, dsaltz@stanford.edu Abstract: This project implements an augmented reality version

More information

Object Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr

Object Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image classification Image (scene) classification is a fundamental

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition IWNEST PUBLISHER Journal of Industrial Engineering Research (ISSN: 2077-4559) Journal home page: http://www.iwnest.com/aace/ Adaptive sequence of Key Pose Detection for Human Action Recognition 1 T. Sindhu

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Face Recognition in Low-resolution Images by Using Local Zernike Moments Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie

More information

Online Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long Range Traversability

Online Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long Range Traversability Online Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long Range Traversability Raia Hadsell1, Pierre Sermanet1,2, Jan Ben2, Ayse Naz Erkan1, Jefferson Han1, Beat Flepp2, Urs Muller2,

More information

Tracking of Small Unmanned Aerial Vehicles

Tracking of Small Unmanned Aerial Vehicles Tracking of Small Unmanned Aerial Vehicles Steven Krukowski Adrien Perkins Aeronautics and Astronautics Stanford University Stanford, CA 94305 Email: spk170@stanford.edu Aeronautics and Astronautics Stanford

More information

FastKeypointRecognitioninTenLinesofCode

FastKeypointRecognitioninTenLinesofCode FastKeypointRecognitioninTenLinesofCode Mustafa Özuysal Pascal Fua Vincent Lepetit Computer Vision Laboratory École Polytechnique Fédérale de Lausanne(EPFL) 115 Lausanne, Switzerland Email: {Mustafa.Oezuysal,

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

Automatic Labeling of Lane Markings for Autonomous Vehicles

Automatic Labeling of Lane Markings for Autonomous Vehicles Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 jkiske@stanford.edu 1. Introduction As autonomous vehicles become more popular,

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

SWIGS: A Swift Guided Sampling Method

SWIGS: A Swift Guided Sampling Method IGS: A Swift Guided Sampling Method Victor Fragoso Matthew Turk University of California, Santa Barbara 1 Introduction This supplement provides more details about the homography experiments in Section.

More information

Removing Moving Objects from Point Cloud Scenes

Removing Moving Objects from Point Cloud Scenes 1 Removing Moving Objects from Point Cloud Scenes Krystof Litomisky klitomis@cs.ucr.edu Abstract. Three-dimensional simultaneous localization and mapping is a topic of significant interest in the research

More information

Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application

Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application Marta Marrón Romera, Miguel Angel Sotelo Vázquez, and Juan Carlos García García Electronics Department, University

More information

CODING MODE DECISION ALGORITHM FOR BINARY DESCRIPTOR CODING

CODING MODE DECISION ALGORITHM FOR BINARY DESCRIPTOR CODING CODING MODE DECISION ALGORITHM FOR BINARY DESCRIPTOR CODING Pedro Monteiro and João Ascenso Instituto Superior Técnico - Instituto de Telecomunicações ABSTRACT In visual sensor networks, local feature

More information

MusicGuide: Album Reviews on the Go Serdar Sali

MusicGuide: Album Reviews on the Go Serdar Sali MusicGuide: Album Reviews on the Go Serdar Sali Abstract The cameras on mobile phones have untapped potential as input devices. In this paper, we present MusicGuide, an application that can be used to

More information

ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES

ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES B. Sirmacek, R. Lindenbergh Delft University of Technology, Department of Geoscience and Remote Sensing, Stevinweg

More information

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms H. Kandil and A. Atwan Information Technology Department, Faculty of Computer and Information Sciences, Mansoura University,El-Gomhoria

More information

Make and Model Recognition of Cars

Make and Model Recognition of Cars Make and Model Recognition of Cars Sparta Cheung Department of Electrical and Computer Engineering University of California, San Diego 9500 Gilman Dr., La Jolla, CA 92093 sparta@ucsd.edu Alice Chu Department

More information

The Visual Internet of Things System Based on Depth Camera

The Visual Internet of Things System Based on Depth Camera The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

On Fleet Size Optimization for Multi-Robot Frontier-Based Exploration

On Fleet Size Optimization for Multi-Robot Frontier-Based Exploration On Fleet Size Optimization for Multi-Robot Frontier-Based Exploration N. Bouraqadi L. Fabresse A. Doniec http://car.mines-douai.fr Université de Lille Nord de France, Ecole des Mines de Douai Abstract

More information

CS 534: Computer Vision 3D Model-based recognition

CS 534: Computer Vision 3D Model-based recognition CS 534: Computer Vision 3D Model-based recognition Ahmed Elgammal Dept of Computer Science CS 534 3D Model-based Vision - 1 High Level Vision Object Recognition: What it means? Two main recognition tasks:!

More information

STORE VIEW: Pervasive RFID & Indoor Navigation based Retail Inventory Management

STORE VIEW: Pervasive RFID & Indoor Navigation based Retail Inventory Management STORE VIEW: Pervasive RFID & Indoor Navigation based Retail Inventory Management Anna Carreras Tànger, 122-140. anna.carrerasc@upf.edu Marc Morenza-Cinos Barcelona, SPAIN marc.morenza@upf.edu Rafael Pous

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

Jiří Matas. Hough Transform

Jiří Matas. Hough Transform Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian

More information

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA N. Zarrinpanjeh a, F. Dadrassjavan b, H. Fattahi c * a Islamic Azad University of Qazvin - nzarrin@qiau.ac.ir

More information

Classifying Manipulation Primitives from Visual Data

Classifying Manipulation Primitives from Visual Data Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if

More information

Character Image Patterns as Big Data

Character Image Patterns as Big Data 22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

More information

Feature Tracking and Optical Flow

Feature Tracking and Optical Flow 02/09/12 Feature Tracking and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Many slides adapted from Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve

More information

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011 Behavior Analysis in Crowded Environments XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011 Behavior Analysis in Sparse Scenes Zelnik-Manor & Irani CVPR

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

Keyframe-Based Real-Time Camera Tracking

Keyframe-Based Real-Time Camera Tracking Keyframe-Based Real-Time Camera Tracking Zilong Dong 1 Guofeng Zhang 1 Jiaya Jia 2 Hujun Bao 1 1 State Key Lab of CAD&CG, Zhejiang University 2 The Chinese University of Hong Kong {zldong, zhangguofeng,

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Incremental PCA: An Alternative Approach for Novelty Detection

Incremental PCA: An Alternative Approach for Novelty Detection Incremental PCA: An Alternative Approach for Detection Hugo Vieira Neto and Ulrich Nehmzow Department of Computer Science University of Essex Wivenhoe Park Colchester CO4 3SQ {hvieir, udfn}@essex.ac.uk

More information

Group Sparse Coding. Fernando Pereira Google Mountain View, CA pereira@google.com. Dennis Strelow Google Mountain View, CA strelow@google.

Group Sparse Coding. Fernando Pereira Google Mountain View, CA pereira@google.com. Dennis Strelow Google Mountain View, CA strelow@google. Group Sparse Coding Samy Bengio Google Mountain View, CA bengio@google.com Fernando Pereira Google Mountain View, CA pereira@google.com Yoram Singer Google Mountain View, CA singer@google.com Dennis Strelow

More information

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006 Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,

More information

Interactive person re-identification in TV series

Interactive person re-identification in TV series Interactive person re-identification in TV series Mika Fischer Hazım Kemal Ekenel Rainer Stiefelhagen CV:HCI lab, Karlsruhe Institute of Technology Adenauerring 2, 76131 Karlsruhe, Germany E-mail: {mika.fischer,ekenel,rainer.stiefelhagen}@kit.edu

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

How To Write A Fast Binary Descriptor Based On Brief

How To Write A Fast Binary Descriptor Based On Brief ORB: an efficient alternative to SIFT or SURF Ethan Rublee Vincent Rabaud Kurt Konolige Gary Bradski Willow Garage, Menlo Park, California {erublee}{vrabaud}{konolige}{bradski}@willowgarage.com Abstract

More information

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING = + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:

More information

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION Saurabh Asija 1, Rakesh Singh 2 1 Research Scholar (Computer Engineering Department), Punjabi University, Patiala. 2 Asst.

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM Amol Ambardekar, Mircea Nicolescu, and George Bebis Department of Computer Science and Engineering University

More information

Topographic Change Detection Using CloudCompare Version 1.0

Topographic Change Detection Using CloudCompare Version 1.0 Topographic Change Detection Using CloudCompare Version 1.0 Emily Kleber, Arizona State University Edwin Nissen, Colorado School of Mines J Ramón Arrowsmith, Arizona State University Introduction CloudCompare

More information

Distributed Kd-Trees for Retrieval from Very Large Image Collections

Distributed Kd-Trees for Retrieval from Very Large Image Collections ALY et al.: DISTRIBUTED KD-TREES FOR RETRIEVAL FROM LARGE IMAGE COLLECTIONS1 Distributed Kd-Trees for Retrieval from Very Large Image Collections Mohamed Aly 1 malaa@vision.caltech.edu Mario Munich 2 mario@evolution.com

More information

An Algorithm for Automatic Base Station Placement in Cellular Network Deployment

An Algorithm for Automatic Base Station Placement in Cellular Network Deployment An Algorithm for Automatic Base Station Placement in Cellular Network Deployment István Törős and Péter Fazekas High Speed Networks Laboratory Dept. of Telecommunications, Budapest University of Technology

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Speed Performance Improvement of Vehicle Blob Tracking System

Speed Performance Improvement of Vehicle Blob Tracking System Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu, nevatia@usc.edu Abstract. A speed

More information

Open-Set Face Recognition-based Visitor Interface System

Open-Set Face Recognition-based Visitor Interface System Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe

More information

Segmentation & Clustering

Segmentation & Clustering EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,

More information

Intelligent Flexible Automation

Intelligent Flexible Automation Intelligent Flexible Automation David Peters Chief Executive Officer Universal Robotics February 20-22, 2013 Orlando World Marriott Center Orlando, Florida USA Trends in AI and Computing Power Convergence

More information

Image Retrieval for Image-Based Localization Revisited

Image Retrieval for Image-Based Localization Revisited SATTLER et al.: IMAGE RETRIEVAL FOR IMAGE-BASED LOCALIZATION REVISITED 1 Image Retrieval for Image-Based Localization Revisited Torsten Sattler 1 tsattler@cs.rwth-aachen.de Tobias Weyand 2 weyand@vision.rwth-aachen.de

More information

Neural Network based Vehicle Classification for Intelligent Traffic Control

Neural Network based Vehicle Classification for Intelligent Traffic Control Neural Network based Vehicle Classification for Intelligent Traffic Control Saeid Fazli 1, Shahram Mohammadi 2, Morteza Rahmani 3 1,2,3 Electrical Engineering Department, Zanjan University, Zanjan, IRAN

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Dynamic composition of tracking primitives for interactive vision-guided navigation

Dynamic composition of tracking primitives for interactive vision-guided navigation Dynamic composition of tracking primitives for interactive vision-guided navigation Darius Burschka and Gregory Hager Johns Hopkins University, Baltimore, USA ABSTRACT We present a system architecture

More information

Tracking and Recognition in Sports Videos

Tracking and Recognition in Sports Videos Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation Vincent Lepetit Julien Pilet Pascal Fua Computer Vision Laboratory Swiss Federal Institute of Technology (EPFL) 1015

More information

Interactive Segmentation, Tracking, and Kinematic Modeling of Unknown 3D Articulated Objects

Interactive Segmentation, Tracking, and Kinematic Modeling of Unknown 3D Articulated Objects Interactive Segmentation, Tracking, and Kinematic Modeling of Unknown 3D Articulated Objects Dov Katz, Moslem Kazemi, J. Andrew Bagnell and Anthony Stentz 1 Abstract We present an interactive perceptual

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

Online Learning of Patch Perspective Rectification for Efficient Object Detection

Online Learning of Patch Perspective Rectification for Efficient Object Detection Online Learning of Patch Perspective Rectification for Efficient Object Detection Stefan Hinterstoisser 1, Selim Benhimane 1, Nassir Navab 1, Pascal Fua 2, Vincent Lepetit 2 1 Department of Computer Science,

More information

Static Environment Recognition Using Omni-camera from a Moving Vehicle

Static Environment Recognition Using Omni-camera from a Moving Vehicle Static Environment Recognition Using Omni-camera from a Moving Vehicle Teruko Yata, Chuck Thorpe Frank Dellaert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 USA College of Computing

More information

Real-Time Camera Tracking Using a Particle Filter

Real-Time Camera Tracking Using a Particle Filter Real-Time Camera Tracking Using a Particle Filter Mark Pupilli and Andrew Calway Department of Computer Science University of Bristol, UK {pupilli,andrew}@cs.bris.ac.uk Abstract We describe a particle

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo

More information

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

More information

Building an Advanced Invariant Real-Time Human Tracking System

Building an Advanced Invariant Real-Time Human Tracking System UDC 004.41 Building an Advanced Invariant Real-Time Human Tracking System Fayez Idris 1, Mazen Abu_Zaher 2, Rashad J. Rasras 3, and Ibrahiem M. M. El Emary 4 1 School of Informatics and Computing, German-Jordanian

More information

COLOR-BASED PRINTED CIRCUIT BOARD SOLDER SEGMENTATION

COLOR-BASED PRINTED CIRCUIT BOARD SOLDER SEGMENTATION COLOR-BASED PRINTED CIRCUIT BOARD SOLDER SEGMENTATION Tz-Sheng Peng ( 彭 志 昇 ), Chiou-Shann Fuh ( 傅 楸 善 ) Dept. of Computer Science and Information Engineering, National Taiwan University E-mail: r96922118@csie.ntu.edu.tw

More information

Binary Image Scanning Algorithm for Cane Segmentation

Binary Image Scanning Algorithm for Cane Segmentation Binary Image Scanning Algorithm for Cane Segmentation Ricardo D. C. Marin Department of Computer Science University Of Canterbury Canterbury, Christchurch ricardo.castanedamarin@pg.canterbury.ac.nz Tom

More information

Wide Area Localization on Mobile Phones

Wide Area Localization on Mobile Phones Wide Area Localization on Mobile Phones Clemens Arth, Daniel Wagner, Manfred Klopschitz, Arnold Irschara, Dieter Schmalstieg Graz University of Technology, Austria ABSTRACT We present a fast and memory

More information