Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance

Size: px
Start display at page:

Download "Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance"

Transcription

1 International Journal of Advanced Intelligence Volume 1, Number 1, pp , November, c AIA International Advanced Information Institute Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance Tadashi Uemiya Faculty of Engineering, Tokushima University 2-1,Minami-josanjima, Tokushima , Japan uchikosi@helen.ocn.ne.jp Yoshihide Matsumoto Faculty of Engineering, Tokushima University 2-1,Minami-josanjima, Tokushima , Japan tecumacu@gmail.com Daichi Koizumi Faculty of Engineering, Tokushima University 2-1,Minami-josanjima, Tokushima , Japan koizumi@is.tokushima-u.ac.jp Masami Shishibori Faculty of Engineering, Tokushima University 2-1,Minami-josanjima, Tokushima , Japan bori@is.tokushima-u.ac.jp Kenji Kita Center for Advanced Information Technology, Tokushima University 2-1,Minami-josanjima, Tokushima , Japan kita@is.tokushima-u.ac.jp Received (January 2009) Revised (August 2009) The nearest neighbor search in high-dimensional spaces is an interesting and important problem that is relevant for a wide variety of applications, including multimedia information retrieval, data mining, and pattern recognition. For such applications, the curse of high dimensionality tends to be a major obstacle in the development of efficient search methods. This paper addresses the problem of designing a new and efficient algorithm for high-dimensional nearest neighbor search based on ellipsoid distance. The proposed algorithm uses Cholesky decomposition to perform data conversion beforehand so that calculation by ellipsoid distance function can be replaced with calculation by Euclidean distance, and it improves efficiency by omitting an unnecessary operation. Experimental results indicate that our scheme scales well even for a very large number of dimensions. Keywords: Nearest neighbors search; high-dimensional space; ellipsoid distance; multimedia information retrieval. 89

2 90 T. Uemiya, Y. Matsumoto, D. Koizumi, M. Shishibori, K. Kita 1. Introduction High-performance computing and large-capacity storage at lower prices have led to explosive growth in multimedia information, and the need for information retrieval technology that can handle multimedia content is growing stronger by the day. In recent years, content-based searches that perform similarity searches based on feature quantities obtained from multimedia data have become the mainstream in searches of multimedia content, but in many cases, they express multiple feature quantities in multidimensional vectors and judge the degree of similarity between samples of content on the basis of the distance between these vectors. For example, in text search, weight vectors of index words can be used to express text and search queries, 1 while in image search, the image content can be represented by feature vectors involving color histograms, texture, shape, and other features. 2,3 Similarity searches of content based on feature vectors come down to the problem of a nearest neighbor search that seeks to find targeted vectors that are close to the vectors given as search queries. Finding nearest neighbors in high-dimensional space is one of the important topics of current research, not only in multimedia content retrieval, but also in data mining, pattern recognition, and other fields of application. The challenges in nearest neighbor searches are increasing the search speed and improving the accuracy of the search. We have already proposed a very fast search algorithm for exploring nearest neighbors, achieved simply by improving the basic linear search. 4 The proposed algorithm speeds up the process by eliminating unnecessary operations in the computation of the distance between vectors. By using the maximum possible distance among the candidates, it is possible to cut out unnecessary calculations midstream in the course of the distance calculation. Since updating search result candidates requires processes for removing the candidate with the maximum value for distance from the list and inserting a new candidate, a hierarchy queue is used in the data structure in order to carry out these operations efficiently. Data conversion using dimension sorting by distribution value, together with principal component analysis, is proposed as pre-processing for the early detection of unnecessary operations. For content similarity searches based on feature vectors, defining the distance for representing the scale of similarity is important. Euclidean distance is a typical measure of distance, but there are some problems with this distance as follows: (1) Euclidean distance is extremely sensitive to the scales of the feature values, and (2) Euclidean distance is blind to correlated features. In this paper, we propose a fast multidimensional nearest neighbor search algorithm based on ellipsoid distance. Ellipsoid distance takes into account the correlation among features in calculating distance. By using ellipsoid distance, the problems of scale and correlation inherent in Euclidean distance are no longer an issue. With the algorithm proposed in this paper, efficient elimination of unnecessary arithmetic operations has been achieved by converting the calculation of ellipsoid distance to calculation of Euclidean distance through a spatial transformation performed using Cholesky decomposition to

3 Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance 91 pretreat the data that are targets of the search. Below, in Chapter 2 presents an overview of nearest neighbor searches of multidimensional data, describing problems with multidimensional indexing technology in higher-dimensional space. In Chapter 3, we explain the fast nearest neighbor search algorithm that has already been proposed, after which Chapter 4 proposes a fast nearest neighbor search algorithm based on distance ellipsoid. In Chapter 5 describes an experiment for assessing the validity of the proposed method, and finally, in Chapter 6 discusses future challenges. 2. Nearest Neighbor Search of Multidimensional Data 2.1. Nearest neighbor search of multidimensional data A nearest neighbor search in a multidimensional space is the problem of finding the nearest vector to a given vector (query vector) q among N data vectors (candidate vectors) x i (i = 1, 2,...,N) placed in n-dimentional space. There are two typical varieties of nearest neighbor search: (i) k-nearest neighbor search (search restricted by number) The search attempts to find the k vectors closest to the given query vector q, (ii) ε-nearest neighbor search (search restricted by range) The search attempts to find vectors within a distance ε from the given query vector q; that is, vector x i satisfying d(q, x i ) are found. In a linear search wherein a given vector is compared sequentially to all vectors in a database, the computational complexity increases in direct proportion to the database size. Therefore, the development of multidimensional indexing techniques for efficient nearest neighbor search has been attracting much attention recently. 5 There are various algorithms for multidimensional indexing in a Euclidean space, such as R-tree, 6 R+-tree, 7 R*-tree, 8 SS-tree, 9 SS+-tree, 10 CSS+-tree, 11 X-tree, 12 and SR-tree, 13 as well as more general indexing methods for metric spaces, for example, VP-tree, 14 MVP-tree, 15 M-tree, 16 etc. Such indexing techniques are based on restriction of the search range by hierarchical partitioning of multidimensional search space, and they limit the scope of the basic search VP-tree In the experiment described in Chapter 5, we use the VP-tree as the object of comparison with the proposed method. The following is a brief overview of the VP-tree. The VP-tree is a method for multidimensional indexing of typical distance space. It aims to shrink the amount of space explored in the search by recursively partitioning multidimensional space, based on the distance between data points. The VP-tree uses a reference point known as a vantage point, and it has the special characteristic of not allowing a common area to arise in the partitioned space so

4 92 T. Uemiya, Y. Matsumoto, D. Koizumi, M. Shishibori, K. Kita that hyperspheres can be used to partition space in a top-down manner. By contrast, the M-tree, which partitions space in a bottom-up manner, has a drawback in that there are many common areas between the partitioned spaces, with the result that search efficiency declines. VP-tree index building can be summarized as follows. A vantage point (hereinafter referred to as vp) is selected for dataset s consisting of N number of data points by means of the random algorithms described below. (i) Select temporary vp randomly from the data set, (ii) Calculate the distance to the reset of the N 1 objects from the temporary, (iii) Calculate the intermediate value and distribution of these distances, (iv) The point of maximum distribution, obtained by repeating performing (i) (iii) above, is designated vp. The intermediate value for the distance of all data in the data set S from the vp chosen as the root node is µ. When d(p, q) is established as the distance between points p, q, data set S is partitioned into S 1 and S 2 as follows: S 1 = {s S d(s, vq) < µ} S 2 = {s S d(s, vq) µ} In like manner, this partitioning operation is recursively applied to S 1 and S 2 to create the index. The VP-tree index is represented by a tree structure, and subsets such as the above-mentioned S 1 and S 2 each correspond to one node of the tree. In addition, each leaf node stores a number of data points. The search starts from the route nodes and follows the nodes conforming to the search scope, accesses data stored in the leaf node that it finally arrives at point by point, calculates the distance, and determines whether or not it conforms to the search scope Problems with multidimensional indexing technology in high dimensions Content searches of images and other multimedia content employ multidimensional feature vectors that may exceed 100 dimensions. Phenomena of the kind that cannot even be imaged in two-dimensional or three-dimensional space are known to occur in such high-dimensional space. Because the degree of spatial freedom is extremely high in higher-dimensional space, solving various problems in computational geometry and multivariate analysis involves an enormous amount of calculation and is hence notoriously difficult. These difficulties are collectively referred to as the curse of dimensionality. In nearest neighbor searches in high-dimensional space, a phenomenon occurs whereby the search becomes more and more difficult as the dimensionality becomes (1)

5 Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance 93 higher. For example, when points are uniformly distributed in n-dimensional space, the ratio of the distance of the k-th nearest and the (k + 1)-th nearest point to a given point can be approximated by the following formula: 17 E{d (k+1)nn } E{d knn } kn As you can see from the above, as n becomes larger, the ratio of the distance of the k-th nearest point and the (k + 1)-th nearest point asymptotically approaches 1. Moreover, when the points are uniformly distributed, the ratio of the distance to the nearest point to the distance to the most distant point asymptotically approaches 1 as the dimensionality becomes higher. Therefore, methods for dividing the space hierarchically entail problems in that the difference due to distance is small, making it impossible to limit the area explored, and an amount of calculation that approaches that of a linear search is required. (2) 3. Fast Nearest Neighbor Search Algorithm 3.1. Basic idea The fast nearest neighbor search algorithm that we propose aims to speed up the process by skipping unnecessary operations in the calculation of the distance between vectors. Let us briefly explain the main idea of the proposed algorithm. We assume that a search query vector q and search target vectors x 1,x 2,x 3, and x 4 have been given as follows q = 2, x 1 = 2, x 2 = 3, x 3 = 1, x 4 = 5 (3) Here, we consider the problem of searching for the top 2 search target vectors closest to the search query vector q. Computation of the distance between search query vector q and the first two search target vectors x 1 and x 2 is as follows (the square of the distance value is used to simplify the explanation). d 2 (q, x 1 ) = (1 2) 2 + (2 2) 2 + (1 2) 2 = 2 d 2 (q, x 2 ) = (1 2) 2 + (2 3) 2 + (1 2) 2 = 3 (4) For now, x 1 and x 2 are considered as candidates for the search results. The maximum value for distance to the two candidates is 3. Calculation of the third search target vector x 3 is as follows. d 2 (q, x 3 ) = (1 4) 2 + (2 1) 2 + (1 2) 2 (5)

6 94 T. Uemiya, Y. Matsumoto, D. Koizumi, M. Shishibori, K. Kita At the time of calculation of Item 1 of the right-hand side, the existing maximum distance to the search result candidate, 3, is exceeded, so it is clear, even without calculating Items 2 and 3 of the right-hand side, that this search target vector cannot be the result of the search. Similarly, the calculation of the distance to the 4th search vector x 4 is as follows. d 2 (q, x 4 ) = (1 2) 2 + (2 5) 2 + (1 2) 2 (6) The sum up through Term 2 on the right-hand side exceeds the existing maximum distance to the search result candidate, 3, making calculation of Item 3 on the right-hand side unnecessary. Thus, by eliminating the arithmetic involved in calculation of the distance between the search query vector and the search target vector, it becomes possible to perform a nearest neighbor search efficiently Early detection of unnecessary operations by conversion of data If it is possible to detect unnecessary operations early on in calculation of the cumulative distances for each dimension of a vector, then those operations can be omitted, making it possible to carry out the search that much more rapidly. The following 2 methods are proposed for use as pre-processing for early detection of unnecessary operations, and it can be demonstrated that there is no degradation in performance with an approach incorporating data conversion by principal component analysis, even in high dimensions and at high speed Dimension sorting by distribution value The element distribution is found for every dimension of the search target vector, and the elements are permuted so that the dimension with the largest distribution value comes first. As a result, calculation of cumulative distance proceeds from dimensions with large distribution values toward those with smaller distribution values, and hence one can expect that the cumulative distance increases quickly, thus providing for early detection of unnecessary operations Data conversion via principal component analysis Dimension sorting by distribution value involves only permutation of the vector elements; however, we may consider applying linear transformation as a more efficient means. An orthogonal transformation must be used to preserve the distances between vectors. There are various orthogonal transformations; in particular, in principal component analysis (KL transform), a basis for the best representation of multidimensional vector fluctuations can be found. The eigenvalue decomposition of a covariance matrix is performed, and the eigenvectors are made the new basis. The

7 Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance 95 covariance increases with the eigenvalues. The eigenvector with the largest eigenvalue is called the first principal component, after which comes the second principal component, and so on. Early detection of unnecessary operations can be made more practicable by preliminary transformation of the data so that the coordinates are arranged in the order of principal components. 4. Fast Nearest neighbor Search Algorithm Based on Ellipsoid Distance 4.1. Ellipsoid distance In ellipsoid distance, unlike dimensional Euclidean distance, it is possible to express the correlation and weight between dimensions, and a high degree of freedom in determining functions can be expected, along with improved search accuracy. 18 In ordinary Euclidean distance, the surface equidistant from a certain point is a d-dimensional sphere. In weighted Euclidean distance, the equidistant surface is a d-dimensional elliptical sphere, and its spindle is parallel to the axis. On the other hand, in ellipsoid distance, the spindle of an ellipsoidal body with equidistant surfaces can assume any direction. For this reason, the ellipsoid distance can be considered the generalized distance of the Euclidean distance and the weighted Euclidean distance. When search query vectors in n-dimensional space are given by q = [q 1,..., q n ] T and arbitrary search target vectors included in the data set are given by x = [x 1,...,x n ] T, the ellipsoid distance will be represented by the following formula (Note: represents the transposition of the vector and matrix): D 2 (x,q) = (x q)a(x q) T (7) Here, A = [a ij ] signifies a positive and definite symmetric matrix n n, called the correlation matrix. By expanding the formula for ellipsoid distance, we obtain the following formula. D 2 (x,q) = n i=1 j=1 n a ij (x i q i )(x j q j ) T (8) Because the right-hand side of the ellipsoid distance formula represents a square, D(x,q) corresponds to the distance Application of high-speed nearest neighbor search algorithm to ellipsoid distance Because a high degree of freedom in determining the search function can be expected, together with improved search accuracy, numerous studies have used ellipsoid distance in similarity searches of content, such as image searches. 19,20 Research

8 96 T. Uemiya, Y. Matsumoto, D. Koizumi, M. Shishibori, K. Kita has also been carried out on how to improve the efficiency of similarity searches based on ellipsoids. Sakurai et al. proposed a method known as SST (Spatial Transformation Technique), based on the spatial transformation method, as a technique for efficiently supporting similarity searches based on ellipsoid distance. 21 SST is a method that converts an enclosed rectangle positioned in the original space such that the distance from the search query must be calculated with the ellipsoid distance, into an object in Euclidean space. In addition, Ankerst et al. proposed an efficient similarity search method for ellipsoid distance, using a spatial transformation based on principal component analysis (KL conversion). 22 In the following, we propose a method for applying the fast nearest neighbor search algorithm described in the previous chapter to a search based on ellipsoid distance. The basic idea is similar to Sakurai et al. s method using SST and Ankerst et al. s method, and spatial transformation is used to replace calculation of the ellipsoid distance function with calculation of the Euclidean distance function. Cholesky decomposition of the matrix is used when carrying out the spatial transformation. 23 By using Cholesky decomposition to perform spatial transformation on all search target vectors in the database beforehand, it becomes possible to carry out a similarity search efficiently, without major alteration of the fast nearest neighbor search algorithm described in the previous chapter. Cholesky decomposition is a special case applying LU decomposition of a square matrix to a positive definite symmetric matrix, and it refers to the decomposition of the following positive definite symmetric matrix A = [a ij ]. a 11 a 12 a 1n a 12 a 22 a 2n = A = LLT a n1 a n2 a nn l l 11 l 21 l n1 l 21 l l 22 l n2 = l n1 l n2 l nn 0 0 l nn Here, if we seek l i,j satisfying the formula above, we obtain the following. (9) min(i,j) k=0 Accordingly, L elements, if non-diagonal elements, become l i,k l j,k = a i,j (10) j l i,k l j,k = a i,j (11) k=0

9 Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance 97 And therefore ( l i,j = 1 a i,j l i,j ) j 1 l i,k l j,k, j = 0, 1, 2,..., i 1 (12) k=0 is obtained. And then, the diagonal elements are hence i l i,k l j,k = a i,i (13) k=0 ( i 1 l i,i = a i,i k=0 l 2 i,k ), j = 0, 1, 2,..., i 1 (14) can be calculated. Because A is a positive value, l i,i becomes a real number so that the internal radical of the above equation is always positive. If we use Cholesky decomposition on ellipsoid distance in n-dimensional space S, ellipsoid distance can be expressed as shown below (L is a triangular matrix such that all diagonal elements are positive). D 2 (x,q) = (x q)ll T (x q) T (15) If we consider the point x = (x q) L in the Euclidean space S here, we find that the Euclidean distance between origin O and point p in Euclidean space S is equal to the formula for Euclidean distance in S. Because the correlation between characteristics is reflected in matrix L, it is possible, by performing data conversion beforehand using matrix L, to replace calculation based on ellipsoid distance function with calculation based on Euclidean distance. As a result, it becomes possible to efficiently eliminate unnecessary operations in the calculation of distance between vectors, and a fast nearest neighbor search algorithm based on the ellipsoid distance can be achieved. 5. Evaluation Experiment In order to evaluate the effectiveness of the nearest neighbor search algorithm based on ellipsoid distance, we conducted an experiment using actual image data and random data. A search specifying the number of items was used in the experiment. Also performed in this experiment was an evaluation using Mahalanobis distance, which is a kind of ellipsoid distance. In other words, we sought the correlation matrix of the ellipsoid distance by calculating covariance matrix P from the data set. The following is the definition of distance used in this experiment.

10 98 T. Uemiya, Y. Matsumoto, D. Koizumi, M. Shishibori, K. Kita D 2 (x,q) = (x q) P 1 (x q) T P = 1 d (x ik u i )(x jk u j ) n k (16) 5.1. Experimental setup Image data In the experiments on image similarity search, 51,067 color photographs from the Corel image database were used. Among these images, 1,000 were chosen at random as query images. The four types of feature vector data used were created from these images and had different numbers of dimensions, as shown below. HSI-48 (48-dimensional data). HSI features were found for the 256-grade hue, saturation, and intensity; every feature was compressed to 16 dimensions (total of 48 dimensions). HSI-192 (192-dimensional data). HSI features were found for the 256-grade hue, saturation, and intensity; every feature was compressed to 64 dimensions (total of 192 dimensions). HSI-384 (384-dimensional data). HSI features were found for the 256-grade hue, saturation, and intensity; every feature was compressed to 128 dimensions (total of 384 dimensions). HSI-432 (432-dimensional data). Images were portioned into 9(3 3) equal fragment, HIS features were found for each fragment; every feature was compressed to 48 dimensions (total of 432 dimensions) Random data We used a uniform distribution data set with 51,067 items, created using a random function with [0,1] as its range. From this, 1,000 items of data were randomly extracted and used as search query data. Four types of random data having different numbers of dimensions were prepared, as in the case of the image data Nearest neighbor search used in the evaluation experiment The experiment was performed using the following three kinds of techniques. (i) The proposed method (Fast-Ellipsoid) (ii) The conventional method (VP-tree) (iii) A linear search (Linear) With respect to the proposed method (Fast-Ellipsoid) and the conventional method (VP-tree), we carried out spatial transformation using Cholesky decomposition from the original data set when creating the index. In the experiment using linear search

11 Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance 99 (Linear), distance calculation was performed each time in accordance with the definition of ellipsoid distance, without performing the spatial transformation operation Experimental method The number of operations (number of distance calculations) and CPU time used per search item were determined for a search of 1,000 items of search query data using a PC with a Xeon 2.4GHz CPU and a 512Kbyte main memory under the condition of 100 nearest neighbor search items. For the conventional method (VP-tree) and the proposed method (Fast-Ellipsoid), we also performed a comparison experiment, varying the number of nearest neighbor searches from 10 to 100. In the measurement of CPU time, we repeated the same experiment two times to determine the average amount of time Experimental results Results for image data The number of operations for 100 nearest neighbor searches is shown in Table 1 and Figure 1. The number of operations in the proposed method (Fast-Ellipsoid) represents a major reduction compared to the linear search (Linear) and the conventional method (VP-tree). Experimental results for the comparison of CPU time are shown in Table 2 and Figure 2, and here the search time is shorter for the proposed method (Fast-Ellipsoid), making the process faster than the linear search (Linear) and the conventional method (VP-tree). Table 1. The number of distance calculations for image data Search The number of distance calculations Algorithm HSI-48 HSI-192 HSI-384 HSI-432 Linear 2,451,216 9,804,864 19,609,728 22,060,944 VP-tree 2,194,731 8,962,259 18,192,135 21,905,091 Fast-Ellipsoid 1,179,581 5,533,128 11,445,424 13,383,128 Results obtained for the number of operations in the case where the number of search items was varied are shown in Figure 3. Regardless of the number of searches, and no matter how many dimensions there were to the data, the number of operations was found to be smaller in the proposed method (Fast-Ellipsoid) than in the conventional method (VP-tree). In terms of CPU time as well, as shown in Figure 4, the proposed method (Fast-Ellipsoid) is faster than the conventional method (VP-tree).

12 100 T. Uemiya, Y. Matsumoto, D. Koizumi, M. Shishibori, K. Kita Fig. 1. The number of distance calculations for image data Table 2. Comparison of CPU time for image data Search CPU time(sec) Algorithm HSI-48 HSI-192 HSI-384 HSI-432 Linear VP-tree Fast-Ellipsoid Fig. 2. Comparison of CPU time for image data Results for random data The number of operations performed in the case of 100 nearest neighbor searches is shown in Table 3 and Figure 5. This number is less for the proposed method

13 Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance 101 Fig. 3. The number of distance calculations Fig. 4. CPU time (Fast-Ellipsoid) than for the linear search (Linear) and the conventional method (VP-tree), but as shown in Tables 4 and 6, the CPU time is slightly longer than in the conventional method (VP-tree). Results obtained for the number of operations in the case where the number of searches was varied are shown in Figure 7, and here we find that the proposed method (Fast-Ellipsoid) did not result in a reduction in the number of operations

14 102 T. Uemiya, Y. Matsumoto, D. Koizumi, M. Shishibori, K. Kita Table 3. The number of distance calculations for random data. Search The number of distance calculations Algorithm HSI-48 HSI-192 HSI-384 HSI-432 Linear 2,451,216 9,804,864 19,609,728 22,060,944 VP-tree 2,451,120 9,804,473 19,608,055 22,060,208 Fast-Ellipsoid 1,582,790 7,950,796 16,941,416 19,219,825 Fig. 5. The number of distance calculations for random data Table 4. Comparison of CPU time for random data Search CPU time(sec) Algorithm HSI-48 HSI-192 HSI-384 HSI-432 Linear VP-tree Fast-Ellipsoid comparable to that achieved in the results for image data. The CPU time was also slightly longer in the proposed method (Fast-Ellipsoid) than in the conventional method (VP-tree), as shown in Figure 8. The reason for the above findings is believed to lie in the characteristics of the data. In cases where the distribution of the data elements is uniform and there is hardly any distribution, such as is the case with random data, it is not possible to efficiently perform operations to reduce the amount of calculation by means of data conversion. The resulting proportional increase in the comparative calculation cost is believed to bring about a reduction in the search time. On the other hand, in the

15 Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance 103 Fig. 6. Comparison of CPU time for random data case of image data, data conversion is believed to effectively lead to a reduction in the amount of calculation because the degree of randomness is small. In cases such as actual searches of multimedia content, there is believed to be a certain bias to the data distribution, so the approach proposed in this paper is a sufficiently valid one. Fig. 7. The number of distance calculations

16 104 T. Uemiya, Y. Matsumoto, D. Koizumi, M. Shishibori, K. Kita Fig. 8. CPU time 6. Conclusions This paper proposed a fast multidimensional nearest neighbor search algorithm for ellipsoid distance. Our proposed method can efficiently eliminate unnecessary operations in distance calculations performed in nearest neighbor searches because it uses Cholesky decomposition to carry out data conversion beforehand, making it possible to replace calculations based on the ellipsoid distance function with calculations based on Euclidean distance. In the evaluation experiment using image data, it was possible to reduce the search time by 26 to 55 percent, compared to the conventional VP-tree method. Unfortunately, in the case of random data showing no data bias, the proposed method was found to be slightly inferior to the VP-tree method. However, images and other real-world data can be expected to have some bias in the data distribution, so it should be possible to use the proposed method effectively in fields such as multimedia content searches and pattern recognition. As a future challenge, there is a need to devise search techniques that can produce exceptionally good results even in distance space having a relatively high degree of spatial freedom. This time, we used an inverse matrix of the covariance matrix as the correlation matrix, but in the future, we plan to develop a fast, highly accurate system by devising a matrix that can achieve high-precision searches and then incorporating it into multimedia search and cross media search systems.

17 Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance 105 Acknowledgments This work was supported in part by grants from the Grant-in-Aid for Scientific Research (B) numbered and the Grant-in-Aid for Exploratory Research numbered from the Japan Society for the Promotion of Science. References 1. G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing, Communications of the ACM, Vol.18, No.11, pp , M. Flickner et al. Query by image and video content: The QBIC system, IEEE Computer, Vol.28, No.9, pp.23-32, A. Pentland, R. Picard, and S. Sclaroff. Photobook:Content-based manipulation of image databases, International Journal of Computer Vision, Vol.18, No.3, pp , A. Shiroo, S. Tsuge, M. Shishihori, K. Kita. Fast Multidimensional Nearest Neighbor Search Algorithm Using Priority Queue, Journal of Electronics (2006), Vol.126, No.3, V. Gaede and O. Gunther. Multidimensional access methods, ACM Computing Surveys, Vol.30, No.2, pp , A. Guttman. R-trees: A dynamic index structure for spatial searching, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.47-57, T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree:A dynamic index for multidimensional objects, Proceedings of the 12th International Conference on Very Large Data Bases, pp , N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp , D. A. White and R. Jain. Similarity indexing with the SS-tree, Proceedings of the 12th IEEE International Conference on Data Engineering, pp , R. Kurniawati J. S. Jin, and J. A. Shepherd. The SS+-tree: An improved index structure for similarity searches in a high dimensional feature space, Proceedings of the SPIE: Storage and Retrieval for Image and Video Databases, pp , J. S. Jin. Indexing and retrieving high dimensional visual features, Multimedia Information Retrieval and Management, Feng, D.,Siu, W. C. and Zhang, H. J. (Eds.), Springer, pp , S. Berchtold, D. A. Keim and H. P. Kriegel. The X-tree : An index structure for highdimensional data, Proceedings of the 22th International Conference on Very Large Data Bases, pp.28-39, N. Katayama and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp , P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces, Proceedings of the Fourth ACM-SIAM Symposium on Discrete Algorithms, pp , T. Bozkaya and Z. M. Ozsoyoglu. Indexing large metric spaces for similarity search queries, ACM Transactions on Database Systems, Vol.24, No.3, pp , P. Ciaccia, M. Patella and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces, Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 97), pp , M. Katayama, S. Sato. Search for a similar index of technology, Information processing 42, Vol.10, No.2001, pp , Yue Sheng Wu, Y. Ishikawa, H. Kitagawa. Implementation and evaluation of image retrieval techniques similar to the method of contact ellipsoid, IEICE 11 times Data Engineering Workshop (DEWS 2000), 3, J. Hafner, H. S. Sawhney, W. Equitz, M. Flicker, and W. Niblack. Efficient color histogram

18 106 T. Uemiya, Y. Matsumoto, D. Koizumi, M. Shishibori, K. Kita indexing for quadratic form distance functions, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.17, No.7, pp , T. Seidl and H. P. Kriegel. Efficient user-adaptable similarity search in large multimedia databases, Proceedings of the 23rd International Conference on Very Large Data Bases, pp , H. Sakurai, M. Yoshikawa, S. Uemura, Y. Kataoka. Search algorithm using a similar transformation for the contact ellipsoid space, IEICE Journal, Vol.J85-DI, No.3, pp , M. Ankerst and H. P. Kriegel. A multistep approach for shape similarity search in image databases, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.10, No.6, pp , H. Yanai, A. Takeuti. Projection matrix singular value decomposition, the University of Tokyo Press, Tadashi Uemiya He received the B.E., degree in science and engineering from Waseda University, Tokyo, Japan, in From 1968 to 2000, he worked for the Kawasaki Heavy Industries, Ltd., Kobe, Japan. From 2000 to 2006, he worked for the Benesse Corporation, Okayama, Japan. Since 2006, he has been with the University of Tokushima, Tokushima, Japan. His current research interests include information retrieval and multimedia processing. Yoshihide Matsumoto He received the B.E., degrees in information engineering from the Kochi University of Technology, Kochi, Japan, in Since 2002, he has been with the Laboatec in Japan Co. Ltd, Okayama, Japan. Since 2006, he has been with the University of Tokushima, Tokushima, Japan. His current research interests include information retrieval and multimedia processing. Daichi Koizumi He received the B.E., M.E. and Dr. Eng. degrees in information science and intelligent systems from the University of Tokushima, Japan, in 2002, 2004, and 2007, respectively. Since 2007, he has been with the Justsystems Corporation, Tokushima, Japan. His current research interests include information retrieval and multimedia processing.

19 Fast Multidimensional Nearest Neighbor Search Algorithm Based on Ellipsoid Distance 107 Masami Shishibori He received the B.E., M.E. and Dr. Eng. degrees in information science and intelligent systems from the University of Tokushima, Japan, in 1991, 1993 and 1997, respectively. Since 1995 he has been with the University of Tokushima. He is currently an Associate Professor in the Department of Information Science and Intelligent Systems at the University of Tokushima. His research interests include multimedia information retrieval, natural language processing and multimedia processing. Kenji Kita He received the B.S. degree in mathematics and the Ph.D degree in electrical engineering, both from Waseda University, Tokyo, Japan, in 1981 and 1992, respectively. From 1983 to 1987, he worked for the Oki Electric Industry Co. Ltd., Tokyo, Japan. From 1987 to 1992, he was a researcher at ATR Interpreting Telephony Research Laboratories, Kyoto, Japan. Since 1992, he has been with the University of Tokushima, Tokushima, Japan, where he is currently a Professor at Center for Advanced Information Technology. His current research interests include multimedia information retrieval, natural language processing, and speech recognition.

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs. Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1

More information

Challenges in Finding an Appropriate Multi-Dimensional Index Structure with Respect to Specific Use Cases

Challenges in Finding an Appropriate Multi-Dimensional Index Structure with Respect to Specific Use Cases Challenges in Finding an Appropriate Multi-Dimensional Index Structure with Respect to Specific Use Cases Alexander Grebhahn grebhahn@st.ovgu.de Reimar Schröter rschroet@st.ovgu.de David Broneske dbronesk@st.ovgu.de

More information

THE concept of Big Data refers to systems conveying

THE concept of Big Data refers to systems conveying EDIC RESEARCH PROPOSAL 1 High Dimensional Nearest Neighbors Techniques for Data Cleaning Anca-Elena Alexandrescu I&C, EPFL Abstract Organisations from all domains have been searching for increasingly more

More information

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

More information

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases Survey On: Nearest Neighbour Search With Keywords In Spatial Databases SayaliBorse 1, Prof. P. M. Chawan 2, Prof. VishwanathChikaraddi 3, Prof. Manish Jansari 4 P.G. Student, Dept. of Computer Engineering&

More information

Secure Similarity Search on Outsourced Metric Data

Secure Similarity Search on Outsourced Metric Data International Journal of Computer Trends and Technology (IJCTT) volume 6 number 5 Dec 213 Secure Similarity Search on Outsourced Metric Data P.Maruthi Rao 1, M.Gayatri 2 1 (M.Tech Scholar,Department of

More information

Large Databases. mjf@inesc-id.pt, jorgej@acm.org. Abstract. Many indexing approaches for high dimensional data points have evolved into very complex

Large Databases. mjf@inesc-id.pt, jorgej@acm.org. Abstract. Many indexing approaches for high dimensional data points have evolved into very complex NB-Tree: An Indexing Structure for Content-Based Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESC-ID/IST/Technical University

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Luigi Di Caro 1, Vanessa Frias-Martinez 2, and Enrique Frias-Martinez 2 1 Department of Computer Science, Universita di Torino,

More information

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER Gholamreza Anbarjafari icv Group, IMS Lab, Institute of Technology, University of Tartu, Tartu 50411, Estonia sjafari@ut.ee

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Visualization of General Defined Space Data

Visualization of General Defined Space Data International Journal of Computer Graphics & Animation (IJCGA) Vol.3, No.4, October 013 Visualization of General Defined Space Data John R Rankin La Trobe University, Australia Abstract A new algorithm

More information

Fast Sequential Summation Algorithms Using Augmented Data Structures

Fast Sequential Summation Algorithms Using Augmented Data Structures Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik vadim.stadnik@gmail.com Abstract This paper provides an introduction to the design of augmented data structures that offer

More information

Simultaneous Gamma Correction and Registration in the Frequency Domain

Simultaneous Gamma Correction and Registration in the Frequency Domain Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong a28wong@uwaterloo.ca William Bishop wdbishop@uwaterloo.ca Department of Electrical and Computer Engineering University

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems George Tsatsanifos (NTUA) Dimitris Sacharidis (R.C. Athena ) Timos Sellis (NTUA, R.C. Athena ) 12 th International Symposium on Spatial

More information

Secure and Faster NN Queries on Outsourced Metric Data Assets Renuka Bandi #1, Madhu Babu Ch *2

Secure and Faster NN Queries on Outsourced Metric Data Assets Renuka Bandi #1, Madhu Babu Ch *2 Secure and Faster NN Queries on Outsourced Metric Data Assets Renuka Bandi #1, Madhu Babu Ch *2 #1 M.Tech, CSE, BVRIT, Hyderabad, Andhra Pradesh, India # Professor, Department of CSE, BVRIT, Hyderabad,

More information

Pre-Algebra 2008. Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems

Pre-Algebra 2008. Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems Academic Content Standards Grade Eight Ohio Pre-Algebra 2008 STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express large numbers and small

More information

Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments

Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments Yaokai Feng a, Akifumi Makinouchi b a Faculty of Information Science and Electrical Engineering, Kyushu University,

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

Performance of KDB-Trees with Query-Based Splitting*

Performance of KDB-Trees with Query-Based Splitting* Performance of KDB-Trees with Query-Based Splitting* Yves Lépouchard Ratko Orlandic John L. Pfaltz Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science University of Virginia Illinois

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Providing Diversity in K-Nearest Neighbor Query Results

Providing Diversity in K-Nearest Neighbor Query Results Providing Diversity in K-Nearest Neighbor Query Results Anoop Jain, Parag Sarda, and Jayant R. Haritsa Database Systems Lab, SERC/CSA Indian Institute of Science, Bangalore 560012, INDIA. Abstract. Given

More information

Cluster Analysis for Optimal Indexing

Cluster Analysis for Optimal Indexing Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference Cluster Analysis for Optimal Indexing Tim Wylie, Michael A. Schuh, John Sheppard, and Rafal A.

More information

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

More information

When Is Nearest Neighbor Meaningful?

When Is Nearest Neighbor Meaningful? When Is Nearest Neighbor Meaningful? Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft CS Dept., University of Wisconsin-Madison 1210 W. Dayton St., Madison, WI 53706 {beyer, jgoldst,

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

High-performance XML Storage/Retrieval System

High-performance XML Storage/Retrieval System UDC 00.5:68.3 High-performance XML Storage/Retrieval System VYasuo Yamane VNobuyuki Igata VIsao Namba (Manuscript received August 8, 000) This paper describes a system that integrates full-text searching

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

A Maximal Covering Model for Helicopter Emergency Medical Systems

A Maximal Covering Model for Helicopter Emergency Medical Systems The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 324 331 A Maximal Covering Model

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions. 3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

More information

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets-

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.603-608 (2011) ARTICLE Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Hiroko Nakamura MIYAMURA 1,*, Sachiko

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

Object Recognition and Template Matching

Object Recognition and Template Matching Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Voronoi Treemaps in D3

Voronoi Treemaps in D3 Voronoi Treemaps in D3 Peter Henry University of Washington phenry@gmail.com Paul Vines University of Washington paul.l.vines@gmail.com ABSTRACT Voronoi treemaps are an alternative to traditional rectangular

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

More information

SYMMETRIC EIGENFACES MILI I. SHAH

SYMMETRIC EIGENFACES MILI I. SHAH SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target

More information

DELAWARE MATHEMATICS CONTENT STANDARDS GRADES 9-10. PAGE(S) WHERE TAUGHT (If submission is not a book, cite appropriate location(s))

DELAWARE MATHEMATICS CONTENT STANDARDS GRADES 9-10. PAGE(S) WHERE TAUGHT (If submission is not a book, cite appropriate location(s)) Prentice Hall University of Chicago School Mathematics Project: Advanced Algebra 2002 Delaware Mathematics Content Standards (Grades 9-10) STANDARD #1 Students will develop their ability to SOLVE PROBLEMS

More information

Operation Count; Numerical Linear Algebra

Operation Count; Numerical Linear Algebra 10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

More information

Content. Chapter 4 Functions 61 4.1 Basic concepts on real functions 62. Credits 11

Content. Chapter 4 Functions 61 4.1 Basic concepts on real functions 62. Credits 11 Content Credits 11 Chapter 1 Arithmetic Refresher 13 1.1 Algebra 14 Real Numbers 14 Real Polynomials 19 1.2 Equations in one variable 21 Linear Equations 21 Quadratic Equations 22 1.3 Exercises 28 Chapter

More information

How To Filter Spam Image From A Picture By Color Or Color

How To Filter Spam Image From A Picture By Color Or Color Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among

More information

QuickDB Yet YetAnother Database Management System?

QuickDB Yet YetAnother Database Management System? QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,

More information

Enterprise Organization and Communication Network

Enterprise Organization and Communication Network Enterprise Organization and Communication Network Hideyuki Mizuta IBM Tokyo Research Laboratory 1623-14, Shimotsuruma, Yamato-shi Kanagawa-ken 242-8502, Japan E-mail: e28193@jp.ibm.com Fusashi Nakamura

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Classifying Manipulation Primitives from Visual Data

Classifying Manipulation Primitives from Visual Data Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Content Based Data Retrieval on KNN- Classification and Cluster Analysis for Data Mining

Content Based Data Retrieval on KNN- Classification and Cluster Analysis for Data Mining Volume 12 Issue 5 Version 1.0 March 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: & Print ISSN: Abstract - Data mining is sorting

More information

Tracking and Recognition in Sports Videos

Tracking and Recognition in Sports Videos Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer

More information

Scatterplot Layout for High-dimensional Data Visualization

Scatterplot Layout for High-dimensional Data Visualization Noname manuscript No. (will be inserted by the editor) Scatterplot Layout for High-dimensional Data Visualization Yunzhu Zheng Haruka Suematsu Takayuki Itoh Ryohei Fujimaki Satoshi Morinaga Yoshinobu Kawahara

More information

Big Ideas in Mathematics

Big Ideas in Mathematics Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

More information

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Design call center management system of e-commerce based on BP neural network and multifractal

Design call center management system of e-commerce based on BP neural network and multifractal Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce

More information

Visualization Techniques in Data Mining

Visualization Techniques in Data Mining Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano

More information

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean

More information

Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

TOOLS FOR 3D-OBJECT RETRIEVAL: KARHUNEN-LOEVE TRANSFORM AND SPHERICAL HARMONICS

TOOLS FOR 3D-OBJECT RETRIEVAL: KARHUNEN-LOEVE TRANSFORM AND SPHERICAL HARMONICS TOOLS FOR 3D-OBJECT RETRIEVAL: KARHUNEN-LOEVE TRANSFORM AND SPHERICAL HARMONICS D.V. Vranić, D. Saupe, and J. Richter Department of Computer Science, University of Leipzig, Leipzig, Germany phone +49 (341)

More information

Prentice Hall Mathematics: Course 1 2008 Correlated to: Arizona Academic Standards for Mathematics (Grades 6)

Prentice Hall Mathematics: Course 1 2008 Correlated to: Arizona Academic Standards for Mathematics (Grades 6) PO 1. Express fractions as ratios, comparing two whole numbers (e.g., ¾ is equivalent to 3:4 and 3 to 4). Strand 1: Number Sense and Operations Every student should understand and use all concepts and

More information

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive

More information

Evaluation of Lump-sum Update Methods for Nonstop Service System

Evaluation of Lump-sum Update Methods for Nonstop Service System International Journal of Informatics Society, VOL.5, NO.1 (2013) 21-27 21 Evaluation of Lump-sum Update Methods for Nonstop Service System Tsukasa Kudo, Yui Takeda, Masahiko Ishino*, Kenji Saotome**, and

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Part-Based Recognition

Part-Based Recognition Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple

More information

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

The Big Data methodology in computer vision systems

The Big Data methodology in computer vision systems The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of

More information

Credit Number Lecture Lab / Shop Clinic / Co-op Hours. MAC 224 Advanced CNC Milling 1 3 0 2. MAC 229 CNC Programming 2 0 0 2

Credit Number Lecture Lab / Shop Clinic / Co-op Hours. MAC 224 Advanced CNC Milling 1 3 0 2. MAC 229 CNC Programming 2 0 0 2 MAC 224 Advanced CNC Milling 1 3 0 2 This course covers advanced methods in setup and operation of CNC machining centers. Emphasis is placed on programming and production of complex parts. Upon completion,

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Computational Geometry. Lecture 1: Introduction and Convex Hulls

Computational Geometry. Lecture 1: Introduction and Convex Hulls Lecture 1: Introduction and convex hulls 1 Geometry: points, lines,... Plane (two-dimensional), R 2 Space (three-dimensional), R 3 Space (higher-dimensional), R d A point in the plane, 3-dimensional space,

More information

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information