Performance of KDB-Trees with Query-Based Splitting*

Size: px
Start display at page:

Download "Performance of KDB-Trees with Query-Based Splitting*"

Transcription

1 Performance of KDB-Trees with Query-Based Splitting* Yves Lépouchard Ratko Orlandic John L. Pfaltz Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science University of Virginia Illinois Institute of Technology University of Virginia Charlottesville, VA 2293Chicago, IL 6616 Charlottesville, VA Abstract While the persistent data of many advanced database applications, such as OLAP and scientific studies, are characterized by very high dimensionality, typical queries posed on these data appeal to a small number of relevant dimensions. Unfortunately, the multidimensional access methods designed for highdimensional data perform rather poorly for these partially specified queries. A potentially very appealing idea, frequently suggested in the literature, is to adopt a node-splitting policy that takes into account the importance of individual dimensions, which could be determined either a priori or through a statistical sampling of actual queries. This paper presents the results of some carefully controlled experiments conducted to observe the effects of query-based splitting on the performance of KDB-trees. The strategy is compared to a splitting policy that selects the split dimensions in a cyclic fashion, which has been shown to be very effective, especially in high-dimensional situations. Based on the results, the query-based splitting does not appear to be a very appealing splitting strategy for KDB-trees. Keywords: information databases, multi-dimensional databases, access methods, data dimensionality. 1. Introduction Typical retrieval mechanisms are based on a subdivision of the search space into finer and finer subspaces organized into a tree structure. Each subspace is represented by an index node (page) of the tree structure. The leaf nodes correspond to the smallest regions in which the desired items are to be found. With large amounts of data, the structure must reside on the secondary storage. While the exact-match search typically follows a single path of the tree, the rangesearch queries usually require access to a possibly large number of nodes. To search a 2-or 3-dimensional space based on region queries, one would divide the space into index regions (typically rectangular ones) through a splitting process that partitions individual dimensions. How this process takes place may have a significant impact on the retrieval performance. In part because of this, we have today a variety of different structures for spatial (2-or 3- dimensional) retrieval [2]. The real problem arises when we consider data in higher-dimensional spaces. These spaces naturally occur when data is regarded with respect to a multi-dimensional parameter space. Examples include information retrieval, data mining, OLAP, multimedia systems and numerous scientific simulations, such as high-energy physics, longterm environmental observations, as well as genome and protein studies. Unfortunately, the traditional multi-dimensional access methods do not scale well to spaces with many dimensions. Their performance rapidly deteriorates as the number of dimensions grows. As a result, they impose a practical limit on the number of dimensions, which is typically quite low. The limitations of contemporary multi-dimensional access methods in spaces with many dimensions have attracted considerable attention lately [1,3,5,9]. Further complicating the matter, the multidimensional queries of many applications tend to specify only a relatively small subset of parameters of interest (dimensions). For example, the study reported in [8] cites large data spaces with more than 2 data dimensions. However, the dimensionality of queries is typically about 2 to 4. While the events of interest for experimental high- Partly funded by the DOE grant no. DE-FG295-ER25254.

2 energy physics may have more than 1 dimensional properties, the number of properties specified by the queries is much smaller, typically about 1 to 8 [4]. Since the traditional multi-dimensional access methods are generally designed assuming fully specified search predicates, they perform poorly for these partial queries. Given this situation, a potentially very appealing approach is to take into account the importance of individual parameters/dimensions in the process of node splitting, as to minimize the probability of overlap between the index regions and the queries and, thereby, reduce the number of page accesses. The importance of the dimensions could be determined either apriorior through a statistical sampling of actual queries. This idea is frequently suggested in the literature as a way of increasing the retrieval performance of multi-dimensional access methods. For example, in [7], Robinson suggested query-based splitting as a possible space-partitioning strategy that could improve the performance of KDBtrees; but the idea was never pursued. The KDB-tree has become a point access method of choice in many applications. While the structure was originally designed for low-dimensional data, in [5], we have shown that it can serve as a basis for an effective retrieval mechanism in high-dimensional spaces as well. The topic of this paper arose from a larger investigation, in which we studied the effects of various splitting policies on the performance of KDB-trees in high-dimensional situations. The goal of this paper is to present the results of some carefully controlled experiments conducted to observe the effects of querybased splitting on the performance of KDB-trees. The strategy is compared to a splitting policy that selects the split dimensions in a cyclic fashion, which has been shown to be very effective, especially in highdimensional situations [5]. Based on the results, we will argue that query-based splitting does not appear to be a very appealing splitting strategy for KDB-trees. The rest of the paper is organized as follows. In Section 2, we review the structure of KDB-trees and its policy of cyclic splitting. Section 3 discusses the idea of query-based splitting, introducing some relevant terminology. Section 4 presents the results of the experimental study conducted to compare the performance of KDB-trees with query-based and cyclic splitting. Section 5 concludes the paper by summarizing the results. 2. Cyclic Splitting of KDB-Trees A KDB-tree is a height-balanced hierarchy of nodes (pages), in which each node represents a portion of space. At every level of the structure, the d-dimensional universe is recursively divided into hyper-rectangles by means of (d-1)-dimensional hyper-planes, each of which is perpendicular to one of the axes. The root node represents the entire universe, which itself is a multidimensional rectangle. Figure 1 illustrates a portion of a KDB-tree and its partition of a 2-dimensional space. Observe that each rectangular subspace has been split first with respect to one, and then another dimension. This is the characteristic of cyclic splitting. Figure 1. A 2-dimensional KDB-tree. The leaf nodes of KDB-trees, also called point pages, contain actual data objects, i.e. points in space. In the conceptual subdivision of the space corresponding to this level of the structure, the directions of the dividing hyper-planes alternate among individual dimensions. Every interior node, called region page, maintains index entries, each representing a child node at the level below. With cyclic splitting of point pages, the splitting dimension of a point page is determined as (splitting dimension of the old page +1)MODd, whered is the dimensionality of the universe. We enhance this policy with a splitting strategy for region pages, called firstdivision splitting [5]. According to this strategy, a region page R is split along the dividing hyper-plane by which the index region of R was split for the first time (firstdivision plane). As a result, this policy follows the partitioning sequence at the leaf level, selecting the splitting dimensions of the region pages in a cyclic fashion. As shown in [5], this strategy significantly

3 improves the performance of KDB-trees in highdimensional spaces. 3. Query-Based Splitting In [7], Robinson suggested a finer splitting policy that can take the advantage of the actual query patterns. In the following, we say that a query is partially specified if it restricts only a subset of dimensions, leaving the rest of the dimensions unspecified. Of particular interest for our analysis will be mono-specified queries, whose result set can be formally defined as R={x S x min x.a i x max }, where S is the given set of points, a i is the coordinate of a point object along the i th dimension, and x min and x max are two scalar values. Figure 2 shows a mono-specified query (gray volume) defined on a 3- dimensional space. Figure 2. A mono-specified query. We say that a query is fully specified when all dimensions are restricted by the query predicate. Reusing the above notation, the result set can be defined as R={x S i [, d), x min.a i x.a i x max.a i }. Note that now x min and x max represent vectors, not scalar values as in the previous formula. Since queries of typical multi-dimensional applications tend to be partially specified, we can take a statistical analysis of the dimensions that are specified most often. If such analysis is possible, one can compute the probability of specifying each dimension and build the tree structure accordingly. Whenever a split of a point page occurs, we pick the splitting dimension in relation to the probability distribution resulting from the anticipated query pattern. This is the underlying idea of query-based (QB-) splitting. Note that this policy applies to point pages. The splitting of region pages follows the first-division splitting strategy described earlier. 4. Experimental Evidence In order to compare cyclic and query-based splitting policies, we implemented two versions of KDB-trees that differ in the way the splitting dimensions are selected. We also constructed three different test cases. The first test case compares the two variants of KDB-trees for queries that are always specified with respect to one particular dimension. In the second test case, the queries are mono-specified but with respect to different dimensions. The third case compares the cyclic splitting with for fully specified queries. For all test cases, the input was the same set of 1, randomly generated points. In each test case, we performed 1, queries and measured the average number of page accesses per query as dimensionality increases from 2 to 16. In the first two test cases, the queries were mono-specified and the probability distribution used to determine the split axes of the KDBtree with was the same as the importance of the dimensions implied by the queries. In the first mono-specified case, the query dimension was constant. In other words, all queries specified only this single dimension. In the second case, one dimension was clearly dominant and it was specified by most queries. But, some queries specified other dimensions as well. The third case was constructed to observe the performance of when the actual queries do not behave as anticipated. In this test case, the queries were fully specified. Obviously, these experiments were constructed for some extreme scenarios. For mono-specified queries, a traditional B-tree index would be far superior to any multi-dimensional structure. However, the test cases were constructed so that they can reveal the promise of querybased splitting. Certainly, if the policy does not perform well with mono-specified queries, for which the distribution of splits can be selected to match perfectly the implied relevance of the dimensions, it is unlikely that it can perform well for partial queries that specify more than one dimension. In the first test case, whose results are shown in Figure 3, the same dimension was specified by all queries. Thus, the probability of specifying predominant dimension was 1., whereas the probability of specifying any other dimension was.. In this case, the querybased splitting guarantees that all splits occur along the

4 predominant dimension. As one can see from Figure 3, for this scenario, is clearly superior to the cyclic splitting, especially in high-dimensional spaces. 5 4 matches the priorities of the dimensions. This is because, whenever a split occurs along a certain dimension, all queries that discriminate along other dimensions are penalized by this choice. Thus, even though the QBsplitting policy pays off in some extreme cases, for more general cases, it does not appear to bring any improvement that could justify the effort of forecasting the actual queries Figure 3. Test Case 1: Mono-specified queries along one exclusively predominant dimension. It is unrealistic to have all queries specify only one dimension. A more realistic scenario might have 6% of all queries specify one dimension, 3% might specify second, with only 1% specifying a third. However, an arbitrary distribution need not scale with data dimensionality. Here, we need a density distribution that remains self-similar as dimensionality grows. Thus, in the second test case, we applied a continuous square-root function to compute the probability distribution for any number of dimensions. The square root function F: x x 1/α,whereα = 2, was used on the real interval [, 1]. For example, in a 5-dimensional space, we calculate the probability p(i) that each dimension i is specified as follows: F().447 p() = F().447; F(1).632 p(1) = F(1) F().185; F(2).775 p(2) = F(2) F(1).143; F(3).894 p(3) = F(3) F(2).119; F(4) 1 p(4) = F(4) F(3).16. Observe that the first dimension is still clearly predominant as it is specified in more than 4% of all queries. The second dimension is specified in nearly 2% of queries, with the remaining dimensions appearing in about 1-15% queries. The strong bias toward the first dimension should still make desirable. Nevertheless, Figure 4 reveals little difference between QB-and cyclic splitting, even though the KDB-tree structure has a distribution of splits that perfectly 2 1 Figure 4. Test Case 2: Skewed importance of the dimensions. The queries used for test case 3, shown in Figure 5, were fully specified. Therefore, all dimensions are important. However, even though other dimensions may appear in the queries, this test case forces splits only along a single dimension. While this test case is somewhat contrived, its purpose is to show what could happen when the actual queries do not behave as expected. Clearly, the policy adapts very poorly to unexpected query patterns. is a clear winner in this situation. In summary, the idea of adapting the splitting policy to suit the collected statistics about queries is intuitively a fine concept. It is backed up by at least one scenario investigated in this paper. However, for more general cases, the query-based splitting does not bring significant improvement that can justify the effort of forecasting the actual queries. As appealing as it may sound, QBsplitting does not appear to be an effective splitting strategy for KDB-trees.

5 Figure 5. Test Case 3: Fully specified queries and only one splitting dimension. 5. Summary and Discussion The problem of accessing data in high-dimensional spaces has attracted considerable attention. The proposed solutions generally assume well-defined queries that restrict the values of all dimensions in the universe. However, in advanced applications such as scientific studies and OLAP, typical queries posed on their highdimensional data restrict a relatively small subset of dimensions, leaving the rest of the dimensions unspecified. The contemporary multi-dimensional access methods tend to perform poorly for these partially specified queries. In this paper, we investigated the idea of adopting a splitting policy that takes into account the priorities of individual dimensions, which we call query-based splitting. We presented the results of an experimental study conducted to observe the effects of query-based splitting on the performance of KDB-trees. The strategy was compared to a splitting policy that selects the split dimensions in a cyclic fashion, which has been shown to be very effective, especially in high-dimensional situations. Based on the results, the query-based splitting does not appear to be a very appealing splitting strategy for KDB-trees. In [6], we have proposed a much more effective retrieval technique for partial queries. The idea is to apply an elaborate storage organization, called the inverted space (IS), which assigns to a high-dimensional universe one data store and a number of multidimensional indexes, each supporting efficient selections on a subset of dimensions. This organization allows the system administrator to control the size of individual indexes and avoid the negative impact of very high data dimensionality on the retrieval performance. To support the IS storage organization, we have also developed a new point access method, called the KDB HD -tree [6]. Together, the two solutions can enable efficient access to persistent data of high dimensionality based on partially specified queries. 6. References [1] S.Berchtold, C.Bohm, and H.P.Kriegel, The Pyramid-Technique: Towards Breaking the Curse of Dimensionality, Proc. ACM SIGMOD Int. Conf. on Management of Data, pp , [2] V.Gaede, and O.Gunther, Multidimensional Access Methods, ACM Computing Surveys 3(2):17-231, [3] K.I.Lin, H.V.Jagadish, and C.Faloutsos, The TV- Tree: An Index Structure for High-Dimensional Data, VLDB Journal 3(4): , [4] R.Orlandic, J.Lukaszuk and C.Swietlik, "The Design of a Retrieval Technique for High-Dimensional Data on Tertiary Storage," SIGMOD Record, 22 (in press). [5] R.Orlandic, and B.Yu, Implementing KDB-trees to Support High-Dimensional Data, Proc. Int. Database Engineering and Applications Symposium IDEAS 1, pp , 21. [6] R.Orlandic and B.Yu, "A Retrieval Technique for High-Dimensional Data and Partially Specified Queries," Data and Knowledge Eng., 22 (in press). [7] J.T.Robinson, The K-D-B Tree: A Search Structure for Large Multidimensional Dynamic Indexes, Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 1-18, [8] K.A.Ross, and K.A.Zaman, Optimizing Selections over Databases, Proc. 12th Int. Conf. on Scientific and Statistical Database Management, pp , 2. [9] R.Weber, H.-J.Schek and S.Blott, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, Proc. 24th Int. Conf. on VLDB, , 1998.

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases Survey On: Nearest Neighbour Search With Keywords In Spatial Databases SayaliBorse 1, Prof. P. M. Chawan 2, Prof. VishwanathChikaraddi 3, Prof. Manish Jansari 4 P.G. Student, Dept. of Computer Engineering&

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses

The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses Published in the Proceedings of 16th International Conference on Data Engineering (ICDE 2) The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses Martin Ester, Jörn Kohlhammer, Hans-Peter Kriegel

More information

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems George Tsatsanifos (NTUA) Dimitris Sacharidis (R.C. Athena ) Timos Sellis (NTUA, R.C. Athena ) 12 th International Symposium on Spatial

More information

Visual Data Mining with Pixel-oriented Visualization Techniques

Visual Data Mining with Pixel-oriented Visualization Techniques Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 mihael.ankerst@boeing.com Abstract Pixel-oriented visualization

More information

The DC-tree: A Fully Dynamic Index Structure for Data Warehouses

The DC-tree: A Fully Dynamic Index Structure for Data Warehouses The DC-tree: A Fully Dynamic Index Structure for Data Warehouses Martin Ester, Jörn Kohlhammer, Hans-Peter Kriegel Institute for Computer Science, University of Munich Oettingenstr. 67, D-80538 Munich,

More information

Clustering through Decision Tree Construction in Geology

Clustering through Decision Tree Construction in Geology Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty

More information

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions

More information

Cluster Description Formats, Problems and Algorithms

Cluster Description Formats, Problems and Algorithms Cluster Description Formats, Problems and Algorithms Byron J. Gao Martin Ester School of Computing Science, Simon Fraser University, Canada, V5A 1S6 bgao@cs.sfu.ca ester@cs.sfu.ca Abstract Clustering is

More information

CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB

CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB Badal K. Kothari 1, Prof. Ashok R. Patel 2 1 Research Scholar, Mewar University, Chittorgadh, Rajasthan, India 2 Department of Computer

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Binary Space Partitions

Binary Space Partitions Title: Binary Space Partitions Name: Adrian Dumitrescu 1, Csaba D. Tóth 2,3 Affil./Addr. 1: Computer Science, Univ. of Wisconsin Milwaukee, Milwaukee, WI, USA Affil./Addr. 2: Mathematics, California State

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

A Dynamic Load Balancing Strategy for Parallel Datacube Computation

A Dynamic Load Balancing Strategy for Parallel Datacube Computation A Dynamic Load Balancing Strategy for Parallel Datacube Computation Seigo Muto Institute of Industrial Science, University of Tokyo 7-22-1 Roppongi, Minato-ku, Tokyo, 106-8558 Japan +81-3-3402-6231 ext.

More information

Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm

Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm Prepared by: Yacine ghanjaoui Supervised by: Dr. Hachim Haddouti March 24, 2003 Abstract The indexing techniques in multidimensional

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Decision Trees What Are They?

Decision Trees What Are They? Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a

More information

Challenges in Finding an Appropriate Multi-Dimensional Index Structure with Respect to Specific Use Cases

Challenges in Finding an Appropriate Multi-Dimensional Index Structure with Respect to Specific Use Cases Challenges in Finding an Appropriate Multi-Dimensional Index Structure with Respect to Specific Use Cases Alexander Grebhahn grebhahn@st.ovgu.de Reimar Schröter rschroet@st.ovgu.de David Broneske dbronesk@st.ovgu.de

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems DR.K.P.KALIYAMURTHIE 1, D.PARAMESWARI 2 Professor and Head, Dept. of IT, Bharath University, Chennai-600 073 1 Asst. Prof. (SG), Dept. of Computer Applications,

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,

More information

Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

More information

Determining optimal window size for texture feature extraction methods

Determining optimal window size for texture feature extraction methods IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer. RESEARCH ARTICLE ISSN: 2321-7758 GLOBAL LOAD DISTRIBUTION USING SKIP GRAPH, BATON AND CHORD J.K.JEEVITHA, B.KARTHIKA* Information Technology,PSNA College of Engineering & Technology, Dindigul, India Article

More information

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of

More information

Jim Lambers MAT 169 Fall Semester 2009-10 Lecture 25 Notes

Jim Lambers MAT 169 Fall Semester 2009-10 Lecture 25 Notes Jim Lambers MAT 169 Fall Semester 009-10 Lecture 5 Notes These notes correspond to Section 10.5 in the text. Equations of Lines A line can be viewed, conceptually, as the set of all points in space that

More information

Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments

Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments Ag + -tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments Yaokai Feng a, Akifumi Makinouchi b a Faculty of Information Science and Electrical Engineering, Kyushu University,

More information

Caching XML Data on Mobile Web Clients

Caching XML Data on Mobile Web Clients Caching XML Data on Mobile Web Clients Stefan Böttcher, Adelhard Türling University of Paderborn, Faculty 5 (Computer Science, Electrical Engineering & Mathematics) Fürstenallee 11, D-33102 Paderborn,

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

QuickDB Yet YetAnother Database Management System?

QuickDB Yet YetAnother Database Management System? QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

OLAP Theory-English version

OLAP Theory-English version OLAP Theory-English version On-Line Analytical processing (Business Intelligence) [Ing.J.Skorkovský,CSc.] Department of corporate economy Agenda The Market Why OLAP (On-Line-Analytic-Processing Introduction

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets-

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.603-608 (2011) ARTICLE Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Hiroko Nakamura MIYAMURA 1,*, Sachiko

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

New Approach of Computing Data Cubes in Data Warehousing

New Approach of Computing Data Cubes in Data Warehousing International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

A 0.9 0.9. Figure A: Maximum circle of compatibility for position A, related to B and C

A 0.9 0.9. Figure A: Maximum circle of compatibility for position A, related to B and C MEASURING IN WEIGHTED ENVIRONMENTS (Moving from Metric to Order Topology) Claudio Garuti Fulcrum Ingenieria Ltda. claudiogaruti@fulcrum.cl Abstract: This article addresses the problem of measuring closeness

More information

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs. Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1

More information

Less naive Bayes spam detection

Less naive Bayes spam detection Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems

More information

is in plane V. However, it may be more convenient to introduce a plane coordinate system in V.

is in plane V. However, it may be more convenient to introduce a plane coordinate system in V. .4 COORDINATES EXAMPLE Let V be the plane in R with equation x +2x 2 +x 0, a two-dimensional subspace of R. We can describe a vector in this plane by its spatial (D)coordinates; for example, vector x 5

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh

More information

XM-Tree, a new index for Web Information Retrieval

XM-Tree, a new index for Web Information Retrieval XM-Tree, a new index for Web Information Retrieval Claudia Deco, Guillermo Pierángeli, Cristina Bender Departamento de Sistemas e Informática Facultad de Ciencias Exactas, Ingeniería y Agrimensura Universidad

More information

Space-filling Techniques in Visualizing Output from Computer Based Economic Models

Space-filling Techniques in Visualizing Output from Computer Based Economic Models Space-filling Techniques in Visualizing Output from Computer Based Economic Models Richard Webber a, Ric D. Herbert b and Wei Jiang bc a National ICT Australia Limited, Locked Bag 9013, Alexandria, NSW

More information

Visualization Techniques in Data Mining

Visualization Techniques in Data Mining Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano

More information

A REVIEW PAPER ON MULTIDIMENTIONAL DATA STRUCTURES

A REVIEW PAPER ON MULTIDIMENTIONAL DATA STRUCTURES A REVIEW PAPER ON MULTIDIMENTIONAL DATA STRUCTURES Kujani. T *, Dhanalakshmi. T +, Pradha. P # Asst. Professor, Department of Computer Science and Engineering, SKR Engineering College, Chennai, TamilNadu,

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

Authors. Data Clustering: Algorithms and Applications

Authors. Data Clustering: Algorithms and Applications Authors Data Clustering: Algorithms and Applications 2 Contents 1 Grid-based Clustering 1 Wei Cheng, Wei Wang, and Sandra Batista 1.1 Introduction................................... 1 1.2 The Classical

More information

MetaGame: An Animation Tool for Model-Checking Games

MetaGame: An Animation Tool for Model-Checking Games MetaGame: An Animation Tool for Model-Checking Games Markus Müller-Olm 1 and Haiseung Yoo 2 1 FernUniversität in Hagen, Fachbereich Informatik, LG PI 5 Universitätsstr. 1, 58097 Hagen, Germany mmo@ls5.informatik.uni-dortmund.de

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Speed Up Your Moving Object Using Spatio-Temporal Predictors

Speed Up Your Moving Object Using Spatio-Temporal Predictors Time-Series Prediction with Applications to Traffic and Moving Objects Databases Bo Xu Department of Computer Science University of Illinois at Chicago Chicago, IL 60607, USA boxu@cs.uic.edu Ouri Wolfson

More information

SOLUTIONS TO ASSIGNMENT 1 MATH 576

SOLUTIONS TO ASSIGNMENT 1 MATH 576 SOLUTIONS TO ASSIGNMENT 1 MATH 576 SOLUTIONS BY OLIVIER MARTIN 13 #5. Let T be the topology generated by A on X. We want to show T = J B J where B is the set of all topologies J on X with A J. This amounts

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Johann Eder 1, Heinz Frank 1, Tadeusz Morzy 2, Robert Wrembel 2, Maciej Zakrzewicz 2 1 Institut für Informatik

More information

Optimized Data Indexing Algorithms for OLAP Systems

Optimized Data Indexing Algorithms for OLAP Systems Database Systems Journal vol. I, no. 2/200 7 Optimized Data Indexing Algoritms for OLAP Systems Lucian BORNAZ Faculty of Cybernetics, Statistics and Economic Informatics Academy of Economic Studies, Bucarest

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

More information

Indexing and Retrieval of Historical Aggregate Information about Moving Objects

Indexing and Retrieval of Historical Aggregate Information about Moving Objects Indexing and Retrieval of Historical Aggregate Information about Moving Objects Dimitris Papadias, Yufei Tao, Jun Zhang, Nikos Mamoulis, Qiongmao Shen, and Jimeng Sun Department of Computer Science Hong

More information

RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam RELEVANT TO ACCA QUALIFICATION PAPER P3 Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam Business forecasting and strategic planning Quantitative data has always been supplied

More information

Persuasion by Cheap Talk - Online Appendix

Persuasion by Cheap Talk - Online Appendix Persuasion by Cheap Talk - Online Appendix By ARCHISHMAN CHAKRABORTY AND RICK HARBAUGH Online appendix to Persuasion by Cheap Talk, American Economic Review Our results in the main text concern the case

More information

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Efficient Structure Oriented Storage of XML Documents Using ORDBMS

Efficient Structure Oriented Storage of XML Documents Using ORDBMS Efficient Structure Oriented Storage of XML Documents Using ORDBMS Alexander Kuckelberg 1 and Ralph Krieger 2 1 Chair of Railway Studies and Transport Economics, RWTH Aachen Mies-van-der-Rohe-Str. 1, D-52056

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Indexing Spatio-Temporal archive As a Preprocessing Alsuccession

Indexing Spatio-Temporal archive As a Preprocessing Alsuccession The VLDB Journal manuscript No. (will be inserted by the editor) Indexing Spatio-temporal Archives Marios Hadjieleftheriou 1, George Kollios 2, Vassilis J. Tsotras 1, Dimitrios Gunopulos 1 1 Computer Science

More information

2 Associating Facts with Time

2 Associating Facts with Time TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains time-varying data. Time is an important aspect of all real-world phenomena. Events occur at specific points

More information

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some

More information

Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets

Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets Dudu Lazarov, Gil David, Amir Averbuch School of Computer Science, Tel-Aviv University Tel-Aviv 69978, Israel Abstract

More information

GAZETRACKERrM: SOFTWARE DESIGNED TO FACILITATE EYE MOVEMENT ANALYSIS

GAZETRACKERrM: SOFTWARE DESIGNED TO FACILITATE EYE MOVEMENT ANALYSIS GAZETRACKERrM: SOFTWARE DESIGNED TO FACILITATE EYE MOVEMENT ANALYSIS Chris kankford Dept. of Systems Engineering Olsson Hall, University of Virginia Charlottesville, VA 22903 804-296-3846 cpl2b@virginia.edu

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department

More information

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

More information

THE concept of Big Data refers to systems conveying

THE concept of Big Data refers to systems conveying EDIC RESEARCH PROPOSAL 1 High Dimensional Nearest Neighbors Techniques for Data Cleaning Anca-Elena Alexandrescu I&C, EPFL Abstract Organisations from all domains have been searching for increasingly more

More information

1 Representation of Games. Kerschbamer: Commitment and Information in Games

1 Representation of Games. Kerschbamer: Commitment and Information in Games 1 epresentation of Games Kerschbamer: Commitment and Information in Games Game-Theoretic Description of Interactive Decision Situations This lecture deals with the process of translating an informal description

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Carnegie Mellon University. Extract from Andrew Moore's PhD Thesis: Ecient Memory-based Learning for Robot Control

Carnegie Mellon University. Extract from Andrew Moore's PhD Thesis: Ecient Memory-based Learning for Robot Control An intoductory tutorial on kd-trees Andrew W. Moore Carnegie Mellon University awm@cs.cmu.edu Extract from Andrew Moore's PhD Thesis: Ecient Memory-based Learning for Robot Control PhD. Thesis Technical

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

On the k-path cover problem for cacti

On the k-path cover problem for cacti On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information