GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas
|
|
- Chrystal King
- 8 years ago
- Views:
Transcription
1 GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas Wei Guan Computer Graphics and Immersive Technologies Computer Science, USC Suya You Computer Graphics and Immersive Technologies Computer Science, USC Ulrich Neumann Computer Graphics and Immersive Technologies Computer Science, USC ABSTRACT We present a recognition-based user tracking and augmented reality system that works in extreme large scale areas. The system will provide a user who captures an image of a building facade with precise location of the building and augmented information about the building. While GPS cannot provide information about camera poses, it is needed to aid reducing the searching ranges in image database. A patch-retrieval method is used for efficient computations and real-time camera pose recovery. With the patch matching as the prior information, the whole image matching can be done through propagations in an efficient way so that a more stable camera pose can be generated. Augmented information such as building names and locations are then delivered to the user. The proposed system mainly contains two parts, offline database building and online user tracking. The database is composed of images for different locations of interests. The locations are clustered into groups according to their UTM coordinates. An overlapped clustering method is used to cluster these locations in order to restrict the retrieval range and avoid ping pong effects. For each cluster, a vocabulary tree is built for searching the most similar view. On the tracking part, the rough location of the user is obtained from the GPS and the exact location and camera pose are calculated by querying patches of the captured image. The patch property makes the tracking robust to occlusions and dynamics in the scenes. Moreover, due to the overlapped clusters, the system simulates the "soft handoff" feature and avoid frequent swaps in memory resource. Experiments show that the proposed tracking and augmented reality system is efficient and robust in many cases. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval Image Retrieval, Performance Analysis; I.4.9 [Image Processing and Computer Vision]: Applications Scene Recognition, User Tracking General Terms Performance Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MMSys 11, February 23 25, 2011, San Jose, California, USA. Copyright 2011 ACM /11/02...$ Figure 1: The proposed tracking and augmented reality system. (a) The captured image from the user. (b) The nearest cluster is selected (yellow dots). The current location is shown with blue dot and the estimated location is shown with red dot. (c) The image with augmented information. A 2D annotating label and 3D annotating sign is embedded in the image. Keywords Patch Approach, Real-time Recognition, Augmented Reality 1. INTRODUCTION The Global Positioning System (GPS) is a space-based satellite system that can provide location information anywhere where there is an unobstructed line of sight to more than three satellites. It precisely times the signals sent by GPS satellites and then determines the distances to these satellites. These distances and the locations of satellites are used to calculate the position of GPS receivers. While GPS system is widely used to aid navigation, however, many applications demand more accurate locations and more detailed sensor information for better environment-interaction capabilities.
2 Vision sensors provide a tremendous amount of information about the user s environments. They are considered one of the most powerful sources of information among all the sensors. Not only can they be used to provide more accurate location information, they can also provide users with more context-based information such as appearance information about the buildings. However, due to the wealth of information provided by vision sensors, the processing usually takes much longer time than other types of sensors like ultrasound sensors and inertial sensors. Besides, a picture itself provides no information about locations. The location can be estimated by matching images in the database. Therefore, image querying is an essential process, which will consume extra computational resource. Determining locations from vision-based sensors is a critical problem in the vision and robotics community. When a user obtains information from its optical sensors, the visual information is summarized and compared with the existing landmarks. Point-based features such as SIFT [11] or SURF [3] are usually used in the matching process due to their robustness. However, because of the complexity for such feature generations and matchings, the image retrieval and matching processes are not fast enough for some applications such as real time tracking and augmented reality. We propose a novel system that can track the user with augmented information in large scale areas. The system can work in real time through speeding up the retrieval and matching processes. With the GPS information, the system first selects the nearest cluster and load the corresponding database. Then for the image captured by the user, the system picks the most promising part of the image and use it to query the best matching patch in the existing database. The query results are used to refine the user s location. The calculated camera pose is still not stable or accurate enough for some applications like augmented reality since the querying features are located in a small area on the image. So in the next step, an algorithm is designed to propagate matchings to the whole image. The searching range for feature matchings is largely limited. Therefore, the speed for matchings will be significantly increased and the calculated camera pose will be more accurate. Another advantage of our proposed framework is its ability to handle occlusions and dynamics. It is common that the newly captured images are different from existing images in the database due to moving passengers and objects. The proposed algorithm will pick some patches that are from the non-occluded parts and match them in the database. In most cases, the proposed framework is robust to large occlusions and dynamics. The remainder of this paper presents the proposed system in more details. After discussing some related work in Section 2, we provide an overview of our proposed system in Section 3. The process of building the database is presented in Section 4. Following that, we talk about the recognition-based user tracking in Section 5. In Section 6, we propose a propagation method for camera pose refinement. We show experimental results in Section 7 and conclude the paper in Section RELATED WORK Over the past years, many vision-based localization systems have been proposed. Depending on the features they use to describe the images and the method they exploit to do the matchings, these systems can be divided into two categories. In the first category [7, 8, 5, 4], simple features like lines and colors are used, but sophistic learning techniques are usually required to locate the users. Horswill [7] extracts features around the environment like walls, doors or openings and identify the robot position according to these features. The algorithm is efficient due to its specialization to its task and environment. In [8], the vertical lines are extracted from images and combined with distance information which is obtained from ultrasound sensors. A Bayesian network is used for the combination and estimation of the robot location. Dodds and Hager [5] use a color interest operator consisting of a weighted combination of heuristic scores to identify landmarks. The operator can select regions that are robust representations for scenes recognition. A Bayesian filtering method was proposed in [4]. It uses samplingbased representation method and localizes the robot by using scalar brightness measurement. In the second category, more sophisticated features are used [15, 16, 19, 9]. In [15], Se et al. propose a vision-based simultaneous localization and mapping (SLAM) system by tracking the SIFT features. SIFT features [11] are robust in scale, orientation and viewpoint variations so they are good natural visual landmarks for tracking over long periods of time from different views. Tamimi et al. [16] propose an approach that reduces the number of features generated by SIFT, and with the help of a particle filter, the robot location can still be estimated accurately. In [19], Wolf et al. use local scale-invariant features and combine with Monte-Carlo localization to estimate robot positions. The system is robust against occlusions and dynamics such as people walking by. In [9], scaleinvariant features are also used, and they are combined with a proposed probabilistic environment model in order to locate the robot. To match images in large-scale database, image retrieval techniques are usually exploited [19, 10, 17, 13]. Wolf et al. [19] make use of image retrieval technique together with sample-based Monte Carlo localization to extract the possible viewpoints for the current image. Krose and Bunschoten [10] describe a vision based localization method that uses principal component analysis on images captured at different locations. In [17], Ulrich and Nourbakhsh propose an efficient approach that uses color histogram matchings for localization. The color images can be classified in real-time based on nearest-neighbor learning and voting scheme. In [13], Nister and Stewenius proposes a recognition scheme that scales to large number of objects. The scheme builds upon indexing descriptors based on SIFT features and efficiently integrates indexing and hierarchical quantization with a vocabulary tree. Some systems are not specially designed for localization purpose but try to recognize objects with less computational cost. Wagner et al. [18] presents a modified SIFT that is created from a fixed patch size of 15x15 pixels and form a descriptor with only 36 dimensions. The modified feature is used for efficient nature tracking with interactive speed on current-generation phones. Henze et al. [6] combines the simplified SIFT with a scalable vocabulary tree to achieve interactive object recognition on mobile phones. The simplified features consume less computational cost which is necessary for mobile applications. Azad et al. [2] present a combination of the Harris corner detector and the SIFT descriptor, which computes features with a high repeatability and good matching properties. By replacing the SIFT keypoint detector with extended Harris corner detector, the algorithm can generate features in real time. The techniques described above match images captured by vision sensors with existing landmarks in the database. The goal of this paper is to propose a novel system that can track users in large scale areas and provide with augmented information. We describe how the landmarks are built in the database and how the image can be queried efficiently. In practical experiments we demonstrate that our system is able to locate the user and recover the camera pose at the real time speed and is robust to large occlusions and dynamics.
3 3. OVERVIEW OF THE SYSTEM The system consists of two main parts, database building and user tracking. As shown in Fig. 2, each part contains several steps. In database building process, first we select locations of interests in the large-scale area and take images from different viewpoints for each location. Then we use the bundler algorithm [1] to build a 3D point cloud per location. The point cloud is manually registered with the world coordinates, i.e. assigning the UTM coordinates, adjusting the scale and orientation. We use overlapped clustering method to cluster these locations so that each cluster contains a few nearby locations. The images are partitioned into smaller patches and the patches in each cluster are put into a vocabulary tree for retrieval in the tracking process. For the online tracking part, the user s rough location is obtained through GPS device so that the closest cluster is determined. Due to the inaccuracy of GPS devices, the locations that are nearly close to two or more cluster centers will be assigned to these clusters at the same time. An overlapped clustering method that is based on k-means is used. When a user captures a new image, the image is partitioned into patches and the most distinctive patches are used for retrieval through the corresponding vocabulary tree. The more accurate location information and the camera pose can be recovered from 2D to 3D matchings. We will discuss each step in more details in the following sections. 4. DATABASE BUILDING We use similar method in [1] to build the environment. Instead of obtaining images from the web and cover the whole area, we take images ourselves and only cover the locations of interests such as facades of buildings. At each location, the images are taken from different views at different distances. SURF features [3] are used due to its fast speed and robustness. Moreover, the UTM coordinates and building-related information for each location are also recorded. Such information can be obtained from GPS device or manually input. In our system, 120 locations are recorded. For each location, we take 10 images from different viewing angles and distances. Therefore, there are totally 1,200 images in our database. 4.1 Overlapped Clustering with K-Means To reduce the searching range, we cluster the locations of interests into groups. The user is associated with the cluster whose center is nearest to his or her rough location. Such location is represented as the UTM coordinates returned from GPS device, so the retrieval process can be done within the cluster. To build overlapped K clusters, we first use traditional k-means clustering method to get K non-overlapped clusters. For those locations that are at the boundaries, we assign them to all those clusters that are nearly close as the closest one. The steps are shown as follows, 1. Suppose there are N different locations, and the UTM coordinates (easting and northing) for location i are L i = (X i,y i ), i = 1..N. Let the maximum error caused by a GPS device be E and the longest distance from which a user is allowed to take pictures for a location be D. Let H be the threshold for handoff (we will discuss in later section). We want to cluster the locations into K groups. 2. (Traditional K-means) Randomly select K locations. We consider these K locations as the centers of K clusters. 3. For each location, assign it to the cluster whose center is nearest to it. 4. For each cluster, recalculate the cluster center. Repeat step 3 and step 4 until it converges. Then goto step (Overlapped assignment) For each location i, let d i be the distance to the nearest cluster center. Let centers for the K clusters be C j = (X j,y j ), j = 1..K. We assign the location i to cluster j if and only if L i C j 2 d i + max(e, D, H). The step 2 to step 4 are traditional k-means clustering method. We add one more step for overlapped assignment to handle three problems. Firstly, the instability and inaccuracy of GPS. Secondly, the position of the user is usually different from the location of interest. Thirdly, the user may go back and forth within certain areas, which can cause unnecessary swaps of databases in the memory. In our system, we have N=120 locations. We want to cluster them into K=5 groups. The clustering results are shown in Fig. 3. Figure 3: The overlapped clustering with N=120 and K=5. Only half of the locations are displayed in the figure. 4.2 Image Partitioning When we build the database, instead of bagging SURF features based on images, we make smaller bags that are based on patches. For a 640 by 480 image, we partition it into 8 by 8 grids. As shown in Fig. 4, there are 4 different patches according to their size. For patches of the same size, the neighboring two patches have an overlap of half size of the patch. For example, in Fig. 4, the two patches of size 1 1 grid have an overlap of half grid size. For an 8 by 8 partitioning, there are totally = 284 patches. Therefore, there are totally = 340,800 patches so on average 340,800/5 = 68,160 patches for each cluster. Moreover, we will remove those patches that contain too few features since they are not good representations for the locations. After removal, there are around 30,000 to 40,000 patches per cluster. As we can see, many of the patches have duplicate SURF features. In another word, each SURF feature is contained in many different patches. Therefore, when we calculate the visual words of these features, we need to make sure each feature is only calculated once to avoid overheads caused by partitioning. It is also important to note that we should calculate the SURF features first and then partition them into patches. Otherwise, the features near the patch boundary will not be correctly described.
4 Figure 2: Overview of the system. Figure 4: An image is partitioned into 8 by 8 small grids. Four different sizes are used for patches, 1 1 grid, 2 2 grids, 4 4 grids and 8 8 grids (the whole image). SURF features. In our implementation, the vocabulary tree has 6 levels and each level has 10 branches. So there are 1 million leaf nodes or classes. A patch usually contains hundreds of features, and each feature generates a visual word by going through the vocabulary tree, so a patch can be represented by the bag of visual words. It can be further described by the frequency or distribution of visual words, and such distribution can be represented by a vector. The length of the vector is the same as the number of leaf nodes. To compare two patches, we only need to compare the two distribution vectors. Though the vector is very high-dimensional, there are only a few non-zero elements, so the comparison can be done in little time. When a new patch is queried, a score is calculated for each comparison with every patch containing the same nodes along the path. To calculate the distribution vector for a patch, the most time consuming part is the quantization process. Instead of going through the vocabulary tree patch by patch, we first do the quantization for all the features in the image to avoid duplicate quantization. The distribution vectors can be calculated hierarchically in a simple way as follows. As shown in Fig. 5-(a), let the number of features for these patches be N 1 to N 6. The distribution vectors for patch 7 and patch 8 can be simply calculated as, 4.3 Scalable Vocabulary Tree with Patches Every patch is represented by a bag of SURF features. For a large number of patches, to compare the 64-dimension-vector SURF features for every two images is extremely expensive. Vocabulary tree is usually used to quantize SURF into more compact features. A vocabulary tree is a hierarchical-structured tree that can efficiently integrate quantization and classification. The classification results can be further used as indexing based on well-designed scoring scheme. The quantization is built by hierarchical k-means clustering. The tree can be trained unsupervisely with a large set of v 7 = 4 k=1 v kn k 4 k=1 N k, v 8 = 6 k=3 v kn k 6 k=3 N. (1) k For each image, though we have 284 distribution vectors instead of one vector, the total amount of time used for calculating the vectors is negligible compared to the quantization process. For a image on a 4GHz CPU, the total amount of time used for the quantization process is about 70-80ms, and the time to calculate all the vectors is about 2-4ms. Therefore, the overheads for image partitioning are no more than 3 percent.
5 5.2 Patch Distinctiveness Similar to the partitioning in database building process, an image is partitioned into 8 by 8 smaller patches. However, in the querying process, we do not overlap patches so only 64 patches are used. For each image, the top 5 patches are selected according to the number of features, as shown in Fig. 7. The quantization process for these patches is about 10-20ms, which is 5 to 8 times faster than using the whole image. The patch with the most number of features is considered the one with the wealthiest visual information. However, these features may not be the best for image retrieval because some features may not be as distinct as others to identify the patch. By learning the database images, we can learn about the distinctiveness for the features. Figure 5: (a) The hierarchical way to calculate the distribution vector. (b) A 3-level vocabulary tree with 5 features (a, b, c, d, e). The weights for level 1 to 3 are 1 8, 1 4 and 1 2. The number in the parenthesis are number of features that going through for each node. 5. RECOGNITION-BASED TRACKING For the online stage, we first associate the user with the nearest cluster. The most distinctive patches in the new image will be matched with images in the corresponding database. The overlapped clusters are used to achieve "soft handoff" feature. The use of patches will greatly increase the speed for retrieval thus reduce the localization time. With careful selection of the patch used for retrieval, the performance is still competitive with using the whole image. In the cases that large occlusions exist, the patch-based method can have even better performance. 5.1 Cluster Association and Soft Handoff With the user s UTM location provided by GPS, the cluster whose center is nearest to the user can be found. The corresponding database will then be loaded to the memory. However, when a user is at some locations that are nearly close to two cluster centers and frequently crosses the boundary, there will be unnecessary swaps of memories. Therefore, we set a handoff threshold H to avoid such overheads. Figure 6: The soft handoff process. H is the threshold. As shown in Fig. 6, C 1 and C 2 are two clusters. The user is initially at point a, which is closer to cluster C 1. When the user crosses the boundary to point b which is closer to cluster C 2, he or she is still associated with cluster C 1. The association will not change if the user goes back to point a. However, if the user goes further with extra distance H to point c, the association will be changed to cluster C 2. When he or she goes back to b or a, the user is still associated to cluster C 2. In our system, since the GPS has an error E and the user may take pictures with distance D from the location of interest, we will use max(e, D, H) instead of H. Figure 7: The top 5 patches are selected according to feature numbers. The selected patches are shown in red rectangles. One way to measure feature distinctness is to calculate the feature frequencies. For a specific feature, the more patches that contain it, the less distinctive the feature is. Reversely, if the feature is contained only in one patch, such feature is considered very distinctive. For any quantized features (visual words), we count the number of patches that contain it. Only the smallest non-overlapping patches in the database are counted. The process is shown with a simpler vocabulary tree in Fig. 5- (b). Suppose there are 5 patches and each patch contains only one feature. The 5 features are a,b,c,d,e, whose paths are shown in red. As we can see, a and b are quantized to the same value so there are totally 4 different features, and their frequencies are, f a = f b = 2, f c = f d = f e = 1. The distinctiveness weight is as follows, { 1fi i=a,b,c,d,e. w i = (2) 0 other features. With such weights, features a and b are less distinctive compared to features c, d and e. Features c and d are similar so they should be less distinctive than feature e. However, the above assignment can not distinguish this. We improve this by assigning weights to nodes along a feature path at different levels. For a n-level vocabulary tree, the weights are 2 1 1,,..., 1 n 2 n 1 2 for nodes at level 1 to level n. For any path in a the tree, let the number of patches that go through a node in level i be N i. The frequency and weight of the path p are calculated as, { n N 1fp i f p = i=1 2 n+1 i, w for f p 0. p = (3) 0 for f p = 0.
6 With the new weight assignment, frequencies for a to e are calculated as, f a = f b = 2, f c = f d = 1.5, f e = In this way, e is considered more distinctive than c and d. With the top 5 patches selected, we will measure the distinctiveness for each patch by summing the weights of all the features in the patch. Then the 5 patches are ranked according to these distinctiveness. The ordered patches are used for querying in the database. 5.3 Querying with Patches The more distinctive the patch is, the higher probability that the correct patch will be retrieved in the database. Therefore, we start patch retrieval in the order of distinctiveness. In the retrieval process, a similarity score is assigned to each patch in the database that contains same features with the querying patch. For any two patches, the similarity score can be calculated by multiplying the two distribution vectors of the patches. To consider the distinctiveness for different features, we multiply with the distinctiveness weights in calculating the scores. Let v 1 and v 2 be the two vectors with length n, and v 1 = (e 11 e 12...e 1n ) and v 2 =(e 21 e 22...e 2n ). The score S 12 is calculated as, n S 12 = e 1p e 2p w p, (4) p=1 where w p is the distinctiveness weight for path p. For each of the selected 5 querying patches, the top 3 patches in the database with highest scores are returned. As shown in table 1, the top 1 query results are not always reliable. However, the probability is more than 90% that the correct patch is in the top 3 returned patches by querying the best selected patch. The accuracy rate of retrieval can be further increased by querying more patches. From the table, we see that using 3 out of the 5 patches will increase the accuracy to more than 95%, but using all the 5 patches cannot increase further much. Therefore, we will use 3 querying patches which will totally return 9 patches. Top 1 Top 2 Top 3 Rank % 85.3% 90.7% Rank % 89.3% 93.2% Rank % 91.2% 96.3% Rank % 92.5% 96.5% Rank % 93.1% 96.7% Table 1: The accuracy rate for patch retrieval. The value at ith column, jth row means the probability that the correct patch can be returned by querying j patches with each returning top i results. To find the correct patch, RANSAC is used to guarantee that the matchings are correct. In our implementation, we set the threshold value for inliers to be 20. On average 4 patch matchings are needed, which cost about 10-15ms. 5.4 More Accurate Locations When we build the database with method in [1], for each location of interests, a 3D feature point cloud is generated from images at different positions and views. Each 3D feature point corresponds to some 2D features in the database patches. When we match features from the querying patch with the features in the retrieved patch, we also find feature matchings with the 3D points. Therefore, the camera pose can be calculated from the 2D to 3D matchings. Let camera position for the returned patch be L w =(X w,y w,z w) in world coordinates. Let the camera pose of returned patch in the camera coordinates be P c =[R c T c], and the pose of current camera be P c =[R c T c ]. The user s location in the world coordinates is then calculated as, L w =(X w,y w,z w )=L w+(t c T c), (5) where Z w is set to 0 if the height information is unknown. 6. CAMERA POSE REFINEMENT The location information is obtained through patch querying and camera pose calculation. However, the calculated pose is not accurate enough for some other applications. For example, many augmented reality applications usually needs high accuracy in order to correctly place the virtual objects into the real scenes. The pose calculated with two patches is not accurate because all the features are located in a small area. With matching propagations, we can obtain a much refined pose. 6.1 Matching Propagations We can calculate the homography from the matched features in the two patches. This is reasonable because most building facades are more or less planar-like. With the estimated homography transform H 1, we can estimate the locations of the matches for other feature points that lie outside the patch. The procedure is shown in Fig. 8. The red rectangles represent the two matching patches. H 1 is calculated from the matched features, and x 2 is one of feature points that are outside the patch. The location of corresponding point for x 2 can be estimated by H 1 x 2, which is represented by x 2 in the figure. Therefore, we can find the real match point x 2 by searching the neighboring area. Figure 8: The propagation process. Two propagations are shown in the figure. H 1 is calculated from the matched points in the red area (x 1 and x 1 are one example matching pair). The corresponding point for x 2 can be estimated by x 2 = H 1 x 2. The real corresponding point x 2 can be found nearby x 2. Then H 2 can be calculated from all matched points within the blue rectangle. Similarly, H 3 can be calculated. Not all the matches of feature points lie at nearby area. For example, after the first propagation, x 3, the real match point of x 3 is probably far from the estimated match point x 3 which is calculated from H 1. This is due to the fact that the further the feature points are from the matched pairs, the larger errors are produced by such homographic transformation. Therefore, one-time propagation is not enough. For an image that is partitioned into a 8 by 8 patches, we use three propagations. The propagation steps are one patch, two patches and four patches, which are shown in Fig. 9. With the proposed propagations, the searching area for feature matchings can be largely restricted. Another advantage of such matching scheme is that RANSAC process can be simplified. This is because the number of outliers is largely reduced by restriction of searching areas. We randomly select two inliers p 1 and p 2 in the patch, and two matched feature points p 3 and p 4 in the propagation
7 Figure 9: The 3 propagations in the image matching process. Different colors are used for the correspondence lines in different propagations. The propagation steps are 1 patch, 2 patches and 4 patches for the 1st, 2nd and 3rd propagation. area. The four matching pairs are used to calculate the homography, and it is used to determine the outliers. If there are too many outliers, it is very possible that the selected points p 3 or p 4 is an outlier. In this case, we randomly choose another two points and calculate the pose again. If the number of outliers is small, we remove these outliers and calculate the pose using all the matched points. The matching time with 3 propagation steps is about 15-25ms, while it usually take ms without the proposed patch-approach and such propagations. 6.2 Pose Refinement with Propagations When the camera pose is estimated with features in the patch area, the calculated pose is not accurate. The accuracy can be improved through matching propagations. When the matchings are propagated to the whole image, the calculated pose is accurate enough for most applications. As described in previous section, the propagation method is 5 to 6 times faster than the traditional way of feature matchings. Figure 11: Reprojection errors of the calculated pose after each propagation for Fig. 9. Only the inlier features are displayed. The features are displayed according to order of propagations. ample, the errors in the third propagation area are much larger than the errors in the second propagation area, and those errors in the patch area are close to 0. Figure 11 shows similar results but only reprojection errors of the inliers are displayed. 7. EXPERIMENTAL RESULTS 7.1 System Speedup With the overlapped clustering, we can reduce the database size so that the retrieval performance will not be hurt due to its large scale. The retrieval time is reduced by querying a patch instead of querying the whole image. The matching speed is increased through first matching with a smaller patch and then propagating to the whole image. Therefore, the whole process is speeded up. Table 2 shows the comparisons of computing time between the framework with patch-approach and the one without patch-approach. without patch (ms) with patch (ms) improved by Retrieval times Matching times Total (1) % Total (2) % Total (3) % Table 2: The comparisons between the framework with patchapproach and the one without patch-approach. SURF descriptor (10ms) is employed but different detectors are used. Nearest neighbor method is used for the matching process. (1) SURF detector (300ms) (2) extended Harris corner detector (25ms) (3) FAST corner detector (<5ms). Figure 10: Reprojection errors of the calculated pose after each propagation for Fig. 9. All the matched features are displayed. The features are displayed according to the order of propagations. Those with very large errors are outliers. As shown in Fig. 10, the reprojection errors of all the matching points are shown. We can see that the errors after the third propagation are the smallest. Without any propagations, the reprojection errors are large for the features that are far from the patch. For ex- From the experiments, we conclude that if SURF detector is used, the performance can be increased by up to 50%. If faster detectors such as extended Harris corner detector [12] or FAST corner detector [14] are used, the performance will be increased significantly which can be up to 2 to 5 times. This is because patch retrieval and matching process become the most time consuming parts in the whole system.
8 7.2 Localization and Augmented Reality We conduct many experiments for our localization system in the outdoor. Some of the results are shown In Fig. 7.2 and Fig As we can see, the system is demonstrated to be robust for many different cases such as image captured with rotations, at different distances and with large occlusions. The user locations are displayed by red dots in the map. The building names and locations are also displayed with a 2D label and a 3D sign in the captured images. 8. CONCLUSION We propose an efficient system for user tracking and augmented reality in large scale area. The system is composed of two parts, offline database building process and online user tracking process. The database is built from images captured at many locations with different viewing directions and distances. The locations are clustered with overlaps for limiting the database size. The images are partitioned into patches of different sizes. Each patch is represented by the distribution of visual words, which are generated from the quantization of SURF descriptors. A 3D points cloud is also formed for each location, which is used for pose calculation. At the online stage, the new image is also partitioned into smaller patches. The nearest cluster is selected and the most distinctive patch is used to retrieve the similar patches in the corresponding database for that cluster. The camera pose can be calculated from the 2D to 3D matchings. Since the features are located in a small area, the calculated pose is not accurate enough for augmented reality applications. Therefore, the pose is further refined through matching propagations in an efficient way. Experiments show that the proposed system works well in large scale areas. Furthermore, the system is demonstrated to be robust to large occlusions and dynamics in the images. Through matching propagations, the pose can be generated with expected accuracies. 9. REFERENCES [1] S. Agarwal, N. Snavely, I. Simon, S. Seitz, and R. Szeliski. Building rome in a day. In Proceedings of the International Conference on Computer Vision (ICCV), [2] P. Azad, T. Asfour, and R. Dillmann. Combining harris interest points and the sift descriptor for fast scale-invariant object recognition. In EEE/RSJ International Conference on Intelligent Robots and Systems (IROS), [3] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3): , [4] F. Dellaert, W. Burgard, D. Fox, and S. Thrun. Using the condensation algorithm for robust, vision-based mobile robot localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [5] Z. Dodds and G. D. Hager. A color interest operator for landmark-based navigation. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages , [6] N. Henze, T. Schinke, and S. Boll. What is that? object recognition from natural features on a mobile phone. In Proceedings of the Workshop on Mobile Interaction with the Real World, [7] I. Horswill. Polly: A visisn-based artificial agent. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages , [8] D. Kortenkamp and T. Weymouth. Topological mapping for mobile robots using a combination of sonar and vision sensing. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages , [9] J. Kosecka, F. Li, and X. Yang. Global localization and relative positioning based on scale-invariant keypoints. Robotics and Autonomous Systems, 52(1), [10] B. Krose and R. Bunschoten. Probabilistic localization by appearance models and active vision. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), [11] D. Lowe. Object recognition from local scaleinvariant features. In Proceedings of the Seventh International Conference on Computer Vision (ICCV), [12] K. Mikolajcyk and C. Schmid. An affine invariant interest point detector. In Proceedings of the International Conference on Computer Vision (ICCV), [13] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [14] E. Rosten and T. Drummond. Machine learning for high-speed corner detection. In Proceedings of the European Conference on Computer Vision (ECCV), pages , [15] S. Se, D. Lowe, and J. Little. Vision-based mobile robot localization and mapping using scale-invariant features. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages , [16] H. Tamimi, H. Andreasson, A. Treptow, T. Duckett, and A. Zell. Localization of mobile robots with omnidirectional vision using particle filter and iterative sift. In Proceedings of the European Conference on Mobile Robots (ECMR), [17] I. Ulrich and I. Nourbakhsh. Appearance-based place recognition for topological localization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), [18] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and D. Schmalstieg. Pose tracking from natural features on mobile phones. In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), [19] J. Wolf, W. Burgard, and H. Burkhardt. Robust vision-based localization for mobile robots using an image retrieval system based on invariant features. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2002.
9 (a) (b) (c) (d) Figure 12: The localization system and augmented reality. (a) the captured image is rotated. (b) the image is captured at a further distance. (a) and (b) are within the same cluster area. (c) and (d) are two examples captured in other cluster areas.
10 (a) (b) Figure 13: Occlusions handling of the localization system. Due to its patch property, the proposed system is effective in handling occlusions.
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
More informationRecognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
More informationTouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service
TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service Feng Tang, Daniel R. Tretter, Qian Lin HP Laboratories HPL-2012-131R1 Keyword(s): image recognition; cloud service;
More informationBuild Panoramas on Android Phones
Build Panoramas on Android Phones Tao Chu, Bowen Meng, Zixuan Wang Stanford University, Stanford CA Abstract The purpose of this work is to implement panorama stitching from a sequence of photos taken
More informationA Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow
, pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices
More information3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More informationLocal features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
More informationFAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION
FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G. Lowe Computer Science Department, University of British Columbia, Vancouver, B.C., Canada mariusm@cs.ubc.ca,
More informationThe use of computer vision technologies to augment human monitoring of secure computing facilities
The use of computer vision technologies to augment human monitoring of secure computing facilities Marius Potgieter School of Information and Communication Technology Nelson Mandela Metropolitan University
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationA Genetic Algorithm-Evolved 3D Point Cloud Descriptor
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal
More informationSegmentation of building models from dense 3D point-clouds
Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute
More informationCS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
More informationMean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
More informationCees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
More informationMetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH
MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH VRVis Research Center for Virtual Reality and Visualization, Virtual Habitat, Inffeldgasse
More informationFlow Separation for Fast and Robust Stereo Odometry
Flow Separation for Fast and Robust Stereo Odometry Michael Kaess, Kai Ni and Frank Dellaert Abstract Separating sparse flow provides fast and robust stereo visual odometry that deals with nearly degenerate
More informationSituated Visualization with Augmented Reality. Augmented Reality
, Austria 1 Augmented Reality Overlay computer graphics on real world Example application areas Tourist navigation Underground infrastructure Maintenance Games Simplify Qualcomm Vuforia [Wagner ISMAR 2008]
More informationCATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER
TECHNICAL SCIENCES Abbrev.: Techn. Sc., No 15(2), Y 2012 CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER Piotr Artiemjew, Przemysław Górecki, Krzysztof Sopyła
More informationRecognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
More informationsiftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service
siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service Ahmad Pahlavan Tafti 1, Hamid Hassannia 2, and Zeyun Yu 1 1 Department of Computer Science, University of Wisconsin -Milwaukee,
More informationBRIEF: Binary Robust Independent Elementary Features
BRIEF: Binary Robust Independent Elementary Features Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua CVLab, EPFL, Lausanne, Switzerland e-mail: firstname.lastname@epfl.ch Abstract.
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationRandomized Trees for Real-Time Keypoint Recognition
Randomized Trees for Real-Time Keypoint Recognition Vincent Lepetit Pascal Lagger Pascal Fua Computer Vision Laboratory École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland Email:
More informationAugmented Reality Tic-Tac-Toe
Augmented Reality Tic-Tac-Toe Joe Maguire, David Saltzman Department of Electrical Engineering jmaguire@stanford.edu, dsaltz@stanford.edu Abstract: This project implements an augmented reality version
More informationObject Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image classification Image (scene) classification is a fundamental
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationJournal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition
IWNEST PUBLISHER Journal of Industrial Engineering Research (ISSN: 2077-4559) Journal home page: http://www.iwnest.com/aace/ Adaptive sequence of Key Pose Detection for Human Action Recognition 1 T. Sindhu
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationFace Recognition in Low-resolution Images by Using Local Zernike Moments
Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie
More informationOnline Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long Range Traversability
Online Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long Range Traversability Raia Hadsell1, Pierre Sermanet1,2, Jan Ben2, Ayse Naz Erkan1, Jefferson Han1, Beat Flepp2, Urs Muller2,
More informationTracking of Small Unmanned Aerial Vehicles
Tracking of Small Unmanned Aerial Vehicles Steven Krukowski Adrien Perkins Aeronautics and Astronautics Stanford University Stanford, CA 94305 Email: spk170@stanford.edu Aeronautics and Astronautics Stanford
More informationFastKeypointRecognitioninTenLinesofCode
FastKeypointRecognitioninTenLinesofCode Mustafa Özuysal Pascal Fua Vincent Lepetit Computer Vision Laboratory École Polytechnique Fédérale de Lausanne(EPFL) 115 Lausanne, Switzerland Email: {Mustafa.Oezuysal,
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationAutomatic Labeling of Lane Markings for Autonomous Vehicles
Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 jkiske@stanford.edu 1. Introduction As autonomous vehicles become more popular,
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationSWIGS: A Swift Guided Sampling Method
IGS: A Swift Guided Sampling Method Victor Fragoso Matthew Turk University of California, Santa Barbara 1 Introduction This supplement provides more details about the homography experiments in Section.
More informationRemoving Moving Objects from Point Cloud Scenes
1 Removing Moving Objects from Point Cloud Scenes Krystof Litomisky klitomis@cs.ucr.edu Abstract. Three-dimensional simultaneous localization and mapping is a topic of significant interest in the research
More informationComparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application
Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application Marta Marrón Romera, Miguel Angel Sotelo Vázquez, and Juan Carlos García García Electronics Department, University
More informationCODING MODE DECISION ALGORITHM FOR BINARY DESCRIPTOR CODING
CODING MODE DECISION ALGORITHM FOR BINARY DESCRIPTOR CODING Pedro Monteiro and João Ascenso Instituto Superior Técnico - Instituto de Telecomunicações ABSTRACT In visual sensor networks, local feature
More informationMusicGuide: Album Reviews on the Go Serdar Sali
MusicGuide: Album Reviews on the Go Serdar Sali Abstract The cameras on mobile phones have untapped potential as input devices. In this paper, we present MusicGuide, an application that can be used to
More informationACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES
ACCURACY ASSESSMENT OF BUILDING POINT CLOUDS AUTOMATICALLY GENERATED FROM IPHONE IMAGES B. Sirmacek, R. Lindenbergh Delft University of Technology, Department of Geoscience and Remote Sensing, Stevinweg
More informationA Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms
A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms H. Kandil and A. Atwan Information Technology Department, Faculty of Computer and Information Sciences, Mansoura University,El-Gomhoria
More informationMake and Model Recognition of Cars
Make and Model Recognition of Cars Sparta Cheung Department of Electrical and Computer Engineering University of California, San Diego 9500 Gilman Dr., La Jolla, CA 92093 sparta@ucsd.edu Alice Chu Department
More informationThe Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationOn Fleet Size Optimization for Multi-Robot Frontier-Based Exploration
On Fleet Size Optimization for Multi-Robot Frontier-Based Exploration N. Bouraqadi L. Fabresse A. Doniec http://car.mines-douai.fr Université de Lille Nord de France, Ecole des Mines de Douai Abstract
More informationCS 534: Computer Vision 3D Model-based recognition
CS 534: Computer Vision 3D Model-based recognition Ahmed Elgammal Dept of Computer Science CS 534 3D Model-based Vision - 1 High Level Vision Object Recognition: What it means? Two main recognition tasks:!
More informationSTORE VIEW: Pervasive RFID & Indoor Navigation based Retail Inventory Management
STORE VIEW: Pervasive RFID & Indoor Navigation based Retail Inventory Management Anna Carreras Tànger, 122-140. anna.carrerasc@upf.edu Marc Morenza-Cinos Barcelona, SPAIN marc.morenza@upf.edu Rafael Pous
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationJiří Matas. Hough Transform
Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian
More informationA PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA
A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA N. Zarrinpanjeh a, F. Dadrassjavan b, H. Fattahi c * a Islamic Azad University of Qazvin - nzarrin@qiau.ac.ir
More informationClassifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
More informationCharacter Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
More informationFeature Tracking and Optical Flow
02/09/12 Feature Tracking and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Many slides adapted from Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve
More informationBehavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011
Behavior Analysis in Crowded Environments XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011 Behavior Analysis in Sparse Scenes Zelnik-Manor & Irani CVPR
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationKeyframe-Based Real-Time Camera Tracking
Keyframe-Based Real-Time Camera Tracking Zilong Dong 1 Guofeng Zhang 1 Jiaya Jia 2 Hujun Bao 1 1 State Key Lab of CAD&CG, Zhejiang University 2 The Chinese University of Hong Kong {zldong, zhangguofeng,
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationIncremental PCA: An Alternative Approach for Novelty Detection
Incremental PCA: An Alternative Approach for Detection Hugo Vieira Neto and Ulrich Nehmzow Department of Computer Science University of Essex Wivenhoe Park Colchester CO4 3SQ {hvieir, udfn}@essex.ac.uk
More informationGroup Sparse Coding. Fernando Pereira Google Mountain View, CA pereira@google.com. Dennis Strelow Google Mountain View, CA strelow@google.
Group Sparse Coding Samy Bengio Google Mountain View, CA bengio@google.com Fernando Pereira Google Mountain View, CA pereira@google.com Yoram Singer Google Mountain View, CA singer@google.com Dennis Strelow
More informationPractical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006
Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,
More informationInteractive person re-identification in TV series
Interactive person re-identification in TV series Mika Fischer Hazım Kemal Ekenel Rainer Stiefelhagen CV:HCI lab, Karlsruhe Institute of Technology Adenauerring 2, 76131 Karlsruhe, Germany E-mail: {mika.fischer,ekenel,rainer.stiefelhagen}@kit.edu
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationHow To Write A Fast Binary Descriptor Based On Brief
ORB: an efficient alternative to SIFT or SURF Ethan Rublee Vincent Rabaud Kurt Konolige Gary Bradski Willow Garage, Menlo Park, California {erublee}{vrabaud}{konolige}{bradski}@willowgarage.com Abstract
More informationRANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING
= + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:
More informationAN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION Saurabh Asija 1, Rakesh Singh 2 1 Research Scholar (Computer Engineering Department), Punjabi University, Patiala. 2 Asst.
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationEFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM
EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM Amol Ambardekar, Mircea Nicolescu, and George Bebis Department of Computer Science and Engineering University
More informationTopographic Change Detection Using CloudCompare Version 1.0
Topographic Change Detection Using CloudCompare Version 1.0 Emily Kleber, Arizona State University Edwin Nissen, Colorado School of Mines J Ramón Arrowsmith, Arizona State University Introduction CloudCompare
More informationDistributed Kd-Trees for Retrieval from Very Large Image Collections
ALY et al.: DISTRIBUTED KD-TREES FOR RETRIEVAL FROM LARGE IMAGE COLLECTIONS1 Distributed Kd-Trees for Retrieval from Very Large Image Collections Mohamed Aly 1 malaa@vision.caltech.edu Mario Munich 2 mario@evolution.com
More informationAn Algorithm for Automatic Base Station Placement in Cellular Network Deployment
An Algorithm for Automatic Base Station Placement in Cellular Network Deployment István Törős and Péter Fazekas High Speed Networks Laboratory Dept. of Telecommunications, Budapest University of Technology
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationSpeed Performance Improvement of Vehicle Blob Tracking System
Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu, nevatia@usc.edu Abstract. A speed
More informationOpen-Set Face Recognition-based Visitor Interface System
Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe
More informationSegmentation & Clustering
EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,
More informationIntelligent Flexible Automation
Intelligent Flexible Automation David Peters Chief Executive Officer Universal Robotics February 20-22, 2013 Orlando World Marriott Center Orlando, Florida USA Trends in AI and Computing Power Convergence
More informationImage Retrieval for Image-Based Localization Revisited
SATTLER et al.: IMAGE RETRIEVAL FOR IMAGE-BASED LOCALIZATION REVISITED 1 Image Retrieval for Image-Based Localization Revisited Torsten Sattler 1 tsattler@cs.rwth-aachen.de Tobias Weyand 2 weyand@vision.rwth-aachen.de
More informationNeural Network based Vehicle Classification for Intelligent Traffic Control
Neural Network based Vehicle Classification for Intelligent Traffic Control Saeid Fazli 1, Shahram Mohammadi 2, Morteza Rahmani 3 1,2,3 Electrical Engineering Department, Zanjan University, Zanjan, IRAN
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationDynamic composition of tracking primitives for interactive vision-guided navigation
Dynamic composition of tracking primitives for interactive vision-guided navigation Darius Burschka and Gregory Hager Johns Hopkins University, Baltimore, USA ABSTRACT We present a system architecture
More informationTracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationPoint Matching as a Classification Problem for Fast and Robust Object Pose Estimation
Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation Vincent Lepetit Julien Pilet Pascal Fua Computer Vision Laboratory Swiss Federal Institute of Technology (EPFL) 1015
More informationInteractive Segmentation, Tracking, and Kinematic Modeling of Unknown 3D Articulated Objects
Interactive Segmentation, Tracking, and Kinematic Modeling of Unknown 3D Articulated Objects Dov Katz, Moslem Kazemi, J. Andrew Bagnell and Anthony Stentz 1 Abstract We present an interactive perceptual
More informationContinuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information
Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering
More informationOnline Learning of Patch Perspective Rectification for Efficient Object Detection
Online Learning of Patch Perspective Rectification for Efficient Object Detection Stefan Hinterstoisser 1, Selim Benhimane 1, Nassir Navab 1, Pascal Fua 2, Vincent Lepetit 2 1 Department of Computer Science,
More informationStatic Environment Recognition Using Omni-camera from a Moving Vehicle
Static Environment Recognition Using Omni-camera from a Moving Vehicle Teruko Yata, Chuck Thorpe Frank Dellaert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 USA College of Computing
More informationReal-Time Camera Tracking Using a Particle Filter
Real-Time Camera Tracking Using a Particle Filter Mark Pupilli and Andrew Calway Department of Computer Science University of Bristol, UK {pupilli,andrew}@cs.bris.ac.uk Abstract We describe a particle
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationAutomatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo
More information14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)
Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,
More informationBuilding an Advanced Invariant Real-Time Human Tracking System
UDC 004.41 Building an Advanced Invariant Real-Time Human Tracking System Fayez Idris 1, Mazen Abu_Zaher 2, Rashad J. Rasras 3, and Ibrahiem M. M. El Emary 4 1 School of Informatics and Computing, German-Jordanian
More informationCOLOR-BASED PRINTED CIRCUIT BOARD SOLDER SEGMENTATION
COLOR-BASED PRINTED CIRCUIT BOARD SOLDER SEGMENTATION Tz-Sheng Peng ( 彭 志 昇 ), Chiou-Shann Fuh ( 傅 楸 善 ) Dept. of Computer Science and Information Engineering, National Taiwan University E-mail: r96922118@csie.ntu.edu.tw
More informationBinary Image Scanning Algorithm for Cane Segmentation
Binary Image Scanning Algorithm for Cane Segmentation Ricardo D. C. Marin Department of Computer Science University Of Canterbury Canterbury, Christchurch ricardo.castanedamarin@pg.canterbury.ac.nz Tom
More informationWide Area Localization on Mobile Phones
Wide Area Localization on Mobile Phones Clemens Arth, Daniel Wagner, Manfred Klopschitz, Arnold Irschara, Dieter Schmalstieg Graz University of Technology, Austria ABSTRACT We present a fast and memory
More information