Clustering Techniques: A Brief Survey of Different Clustering Algorithms

Size: px
Start display at page:

Download "Clustering Techniques: A Brief Survey of Different Clustering Algorithms"

Transcription

1 Clustering Techniques: A Brief Survey of Different Clustering Algorithms Deepti Sisodia Technocrates Institute of Technology, Bhopal, India Lokesh Singh Technocrates Institute of Technology, Bhopal, India Sheetal Sisodia Samrat Ashoka Technological Institute, Vidisha, India Khushboo saxena Technocrates Institute of Technology, Bhopal, India Abstract - Partitioning a set of objects into homogeneous clusters is a fundamental operation in data mining. The operation is needed in a number of data mining tasks. Clustering or data grouping is the key technique of the data mining. It is an unsupervised learning task where one seeks to identify a finite set of categories termed clusters to describe the data. The grouping of data into clusters is based on the principle of maximizing the intra class similarity and minimizing the inter class similarity. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. But how to decide what constitutes a good clustering? This paper deal with the study of various clustering algorithms of data mining and it focus on the clustering basics, requirement, classification, problem and application area of the clustering algorithms. I. INTRODUCTION Clustering[1,2] is an unsupervised learning task where one seeks to identify a finite set of categories termed clusters to describe the data.unlike classification that analyses class-labeled instances, clustering has no training stage, and is usually used when the classes are not known in advance. A similarity metric is defined between items of data, and then similar items are grouped together to form clusters. Often, the attributes providing the best clustering should be identified as well. The grouping of data into clusters is based on the principle of maximizing the intra class similarity and minimizing the inter class similarity. Properties about the clusters can be analyzed to determine cluster profiles, which distinguish one cluster from another. A good clustering method[1,2] will produce high quality clusters with high intra-class similarity - Similar to one another within the same cluster low inter-class similarity - Dissimilar to the objects in other clusters The quality of a clustering result depends on both the similarity measure used by the method and its implementation. The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns. Figure.1:Data partitioning & clustering Vol. 1 Issue 3 September ISSN: X

2 II. CLASSIFICATION OF CLUSTERING ALGORITHMS Clustering algorithms can be broadly classified[4] into three categories, in the following subsections together with specific algorithms: 2.1 Partitioning 2.2 Hierarchical 2.3 Density-based In short, partitioning algorithms attempt to determine k clusters that optimize a certain, often distance-based criterion function. Hierarchical algorithms create a hierarchical decomposition of the database that can be presented as a dendrogram. Density-based algorithms search for dense regions in the data space that are separated from one another by low density noise regions Partitioning Clustering Algorithms Partitioning clustering attempts to decompose a set of N objects into k clusters such that the partitions optimize a certain criterion function. Each cluster is represented by the centre of gravity (or centroid) of the cluster, e.g. k-means, or by the closest instance to the gravity centre (or medoid), e.g. k-medoids. Typically, k seeds are randomly selected and then a relocation scheme iteratively reassigns points between clusters to optimize the clustering criterion. The minimization of the square-error criterion - sum of squared Euclidean distances of points from their closest cluster centroid, is the most commonly used. A serious drawback of partitioning algorithms is that there are a number of possible solutions. In particular, the number of all possible partitions P (n, k) that can be derived by partitioning n patterns into k clusters is : 1 k k 1 k n P ( n, k ) ( 1) ( i ) i 1 i k! K-Means K-means is perhaps the most popular clustering method in metric spaces. Initially k cluster centroids[3,7] are selected at random; k- means then reassigns all the points to their nearest centroids and recomputed centroids of the newly assembled groups. The iterative relocation continues until the criterion function, e.g. square-error converges. Despite its wide popularity, k-means is very sensitive to noise and outliers since a small number of such data can substantially influence the centroids. Other weaknesses are sensitivity to initialization, entrapments into local optima, poor cluster descriptors, inability to deal with clusters of arbitrary shape, size and density, reliance on user to specify the number of clusters. Finally, this algorithm aims at minimizing an objective function; in this case a squared error function. The objective function where xi(j) cj 2 is a chosen distance measure between a data point xi(j)and the cluster centre Cj, is an indicator of the distance of the n data points from their respective cluster centres. The algorithm steps are Choose the number of clusters, k. Randomly generate k clusters and determine the cluster centers, or directly generate k random points as cluster centers. Assign each point to the nearest cluster center. Recompute the new cluster centers. Repeat the two previous steps until some convergence criterion is met (usually that the assignment hasn't changed) Fuzzy k-means clustering In fuzzy clustering, each point has a degree of belonging to clusters, as in fuzzy logic, rather than belonging completely to just one cluster. Thus, points on the edge of a cluster, may be in the cluster to a lesser degree than points in the center of cluster. For Vol. 1 Issue 3 September ISSN: X

3 each point x we have a coefficient giving the degree of being in the kth cluster uk(x). Usually, the sum of those coefficients is defined to be With fuzzy k-means, the centroid of a cluster is the mean of all points, weighted by their degree of belonging to the cluster: The degree of belonging is related to the inverse of the distance to the cluster center: then the coefficients are normalized and fuzzyfied with a real parameter m > 1 so that their sum is 1. So For m equal to 2, this is equivalent to normalising the coefficient linearly to make their sum 1. When m is close to 1, then cluster center closest to the point is given much more weight than the others, and the algorithm is similar to k-means. The fuzzy k-means algorithm is very similar to the k-means algorithm: Choose a number of clusters. Assign randomly to each point coefficients for being in the clusters. Repeat until the algorithm has converged (that is, the coefficients' change between two iterations is no more than, the given sensitivity threshold) : Compute the centroid for each cluster, using the formula above. For each point, compute its coefficients of being in the clusters, using the formula above. The algorithm minimizes intra-cluster variance as well, but has the same problems as k-means, the minimum is a local minimum, and the results depend on the initial choice of weights. The Expectation-maximization algorithm is a more statistically formalized method which includes some of these ideas: partial membership in classes. It has better convergence properties and is in general preferred to fuzzy-k-means K-Medoids Unlike k-means, in the k-medoids or Partitioning Around Medoids (PAM)[1,2]method a cluster is represented by its medoid that is the most centrally located object (pattern) in the cluster. Medoids are more resistant to outliers and noise compared to centroids. PAM begins by selecting randomly an object as medoid for each of the k clusters. Then, each of the non-selected objects is grouped with the medoid to which it is the most similar. PAM then iteratively replaces one of the medoids by one of the non-medoids objects yielding the greatest improvement in the cost function. Clearly, PAM is an expensive algorithm as regards finding the medoids, as it compares each medoid with the entire dataset at each iteration of the algorithm Clustering Large Applications based on Randomized Search -CLARANS CLARANS combines the sampling techniques with PAM. The clustering process can be presented as searching a graph where every node is a potential solution, that is, a set of k-medoids. The clustering obtained after replacing a medoid is called the neighbor of the current clustering. CLARANS [13]selects a node and compares it to a user-defined number of neighbors searching for a local minimum. If a better neighbor is found having lower square error, CLARANS moves to the neighbor's node and the process starts Vol. 1 Issue 3 September ISSN: X

4 again; otherwise the current clustering is a local optimum. If the local optimum is found, CLARANS starts with a new randomly selected node in search for a new local optimum. The advantages and disadvantages of partitioning clustering methods are: Advantages 1. Relatively scalable and simple. 2. Suitable for datasets with compact spherical clusters that are well-separated International Journal of Latest Trends in Engineering and Technology (IJLTET) I. Disadvantages 1. Severe effectiveness degradation in high dimensional spaces as almost all pairs of points are about as far away as average; the concept of distance between points in high dimensional spaces is ill-defined 2. Poor cluster descriptors 3. Reliance on the user to specify the number of clusters in advance 4. High sensitivity to initialization phase, noise and outliers 5. Frequent entrapments into local optima 6. Inability to deal with non-convex clusters of varying size and density HIERARCHICAL ALGORITHMS Unlike partitioning methods that create a single partition, hierarchical algorithms[11] produce a nested sequence (or dendrogram) of clusters, with a single all-inclusive cluster at the top and singleton clusters of individual points at the bottom. The hierarchy can be formed in top-down (divisive) or bottom-up (agglomerative) fashion and need not necessarily be Figure 2: Hierarchical Clustering extended to the extremes. The merging or splitting stops once the desired number of clusters has been formed. Typically, each iteration involves merging or splitting a pair of clusters based on a certain criterion, often measuring the proximity between clusters. Hierarchical techniques suffer from the fact that previously taken steps (merge or split), possibly erroneous, are irreversible. Some representative examples are: CURE Clustering Using Representatives (CURE)[9] is an agglomerative method introducing two novelties. First, clusters are represented by a fixed number of well-scattered points instead of a single centroid. Second, the representatives are shrunk toward their cluster centers by a constant factor. At each iteration, the pair of clusters with the closest representatives is merged. The use of multiple representatives allows CURE to deal with arbitrary-shaped clusters of different sizes, while the shrinking dampens the effects of outliers and noise. CURE uses a combination of random sampling and partitioning to improve scalability. Vol. 1 Issue 3 September ISSN: X

5 2.2.2 CHAMELEON CHAMELEON [11] improves the clustering quality by using more elaborate merging criteria compared to CURE. Initially, a graph containing links between each point and its k-nearest neighbors is created. Then a graph-partitioning algorithm recursively splits the graph into many small-unconnected sub-graphs. During the second phase, each sub-graph is treated as an initial sub-cluster and an agglomerative hierarchical algorithm repeatedly combines the two most similar clusters. Two clusters are eligible for merging only if the resultant cluster has similar inter-connectivity and closeness to the two individual clusters before merging. Due to its dynamic merging model CHAMELEON is more effective than CURE in discovering arbitrary-shaped clusters of varying density. However, the improved effectiveness comes at the expense of computational cost that is quadratic in the database size BIRCH Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH) [8] introduces a novel hierarchical data structure, CF-tree, for compressing the data into many small sub-clusters and then performs clustering with these summaries rather than the raw data. Sub-clusters are represented by compact summaries, called cluster-features (CF) that are stored in the leafs. The non-leaf nodes store the sums of the CF of their children. A CF-tree is built dynamically and incrementally, requiring a single scan of the dataset. An object is inserted in the closest leaf entry. Two input parameters control the maximum number of children per non-leaf node and the maximum diameter of sub-clusters stored in the leafs. By varying these parameters, BIRCH can create a structure that fits in main memory. Once the CF-tree is built, any partitioning or hierarchical algorithms can use it to perform clustering in main memory. BIRCH is reasonably fast, but has two serious drawbacks: data order sensitivity and inability to deal with non-spherical clusters of varying size because it uses the concept of diameter to control the boundary of a cluster. The advantages and disadvantages of hierarchical clustering methods are: II. III. Advantages Embedded flexibility regarding the level of granularity. Well suited for problems involving point linkages, e.g. taxonomy trees. A. Disadvantages Inability to make corrections once the splitting/merging decision is made. Lack of interpretability regarding the cluster descriptors. Vagueness of termination criterion. Prohibitively expensive for high dimensional and massive datasets. Severe effectiveness degradation in high dimensional spaces due to the curse of dimensionality phenomenon. 2.3.DENSITY-BASED CLUSTERING ALGORITHMS Density-based clustering methods group neighboring objects into clusters based on local density conditions rather than proximity between objects[15]. These methods regard clusters as dense regions being separated by low density noisy regions. Density-based methods have noise tolerance, and can discover non-convex clusters. Similar to hierarchical and partitioning methods, density-based techniques encounter difficulties in high dimensional spaces because of the inherent scarcity of the feature space, which in turn, reduces any clustering tendency. Some representative examples of density based clustering algorithms are: DBSCAN Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [5] seeks for core objects whose neighborhood (radius) contains at least Minpts points. A set of core objects with overlapping neighborhoods define the skeleton of a cluster. Non-core points lying inside the neighborhood of core objects represent the boundaries of the clusters, while the remaining are noise. DBSCAN can discover arbitrary-shaped clusters, is insensitive to outliers and order of data input, while its complexity is O(N2). If a spatial index data structure is used the complexity can be improved up to O(N log N ). DBSCAN breaks down in high dimensional spaces and is very sensitive to the input parameters and Minpts. DENCLUE Density-based Clustering (DENCLUE) uses an influence function to describe the impact of a point about its neighborhood while the overall density of the data space is the sum of influence functions from all data. Clusters are determined using density attractors, local maxima of the overall density function. To compute the sum of influence functions a grid structure is used. DENCLUE scales Vol. 1 Issue 3 September ISSN: X

6 well (O(N)), can find arbitrary-shaped clusters, is noise resistant, is insensitive to the data ordering, but suffers from its sensitivity to the input parameters. The curse of dimensionality phenomenon heavily affects Denclue s effectiveness. Moreover, similar to hierarchical and partitioning techniques, the output, e.g. labeled points with cluster identifier, of density-based methods can not be easily assimilated by humans. In short, the advantages and disadvantages of density-based clustering are: IV. Advantages Discovery of arbitrary-shaped clusters with varying size Resistance to noise and outliers V. Disadvantages High sensitivity to the setting of input parameters Poor cluster descriptors Unsuitable for high-dimensional datasets because of the curse of dimensionality phenomenon. Applications of Clustering Clustering has wide applications in [1] Pattern Recognition [2] Spatial Data Analysis: [3] Image Processing [4] Economic Science (especially market research) [5] Document classification [6] Cluster Web log data to discover groups of similar access patterns International Journal of Latest Trends in Engineering and Technology (IJLTET) III. CONCLUSION In this paper we study the different kind of clustering techniques in details and summarized it.we included definition, requirement, application of clustering techniques. we also give detail about classification of clustering techniques and their respective algorithms with the advantages and disadvantages. So this paper provides a quick review of the different clustering techniques in data mining. REFERENCES [1] Jiawei Han and Michheline Kamber,Data mining concepts and techniques-a reffrence book,pg. no [2] Arun K. Pujari,Data mining techniques-a reffrence book,pg. no [3] Ji He,Man Lan, Chew-Lim Tan, Sam-Yuan Sung, Hwee-Boon Low, Initialization of Cluster refinement algorithms: a review and comparative study, Proceeding of International Joint Conference on Neural Networks [C]. Budapest, [4] P. Arabie, L. J. Hubert, and G. De Soete. Clustering and Classification. World Scietific, 1996 [5] Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise". Published in Proceeding of 2nd international Conference on Knowledge Discovery and date Mining (KDD 96) [6] Anderberg, M.R., Cluster Analysis for Applications, Academic Press, New York, 1973, pp [7] Biswas, G., Weingberg, J. and Fisher, D.H., ITERATE: A conceptual clustering algorithm for data mining. IEEE Transactions on Systems, Man, and Cybernetics. v28c [8] Tian Zhang, Raghu Ramakrishnan, Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p , June 04-06, 1996, Montreal, Quebec, Canada [9] Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States. [10] Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander, OPTICS: ordering points to identify the clustering structure, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.49-60, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States [11] George Karypis, Eui-Hong (Sam) Han, Vipin Kumar, Chameleon: Hierarchical Clustering Using Dynamic Modeling, Computer, v.32 n.8, p.68-75, August 1999 [doi> / [12] He Zengyou, Xu Xiaofei, Deng Shengchun, Squeezer: an efficient algorithm for clustering categorical data, Journal of Computer Science and Technology, v.17 n.5, p , May 2002 [13] He, Z., Xu, X. and Deng, S., Scalable algorithms for clustering large datasets with mixed type attributes. International Journal of Intelligence Systems. v Vol. 1 Issue 3 September ISSN: X

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Clustering: Techniques & Applications. Nguyen Sinh Hoa, Nguyen Hung Son. 15 lutego 2006 Clustering 1

Clustering: Techniques & Applications. Nguyen Sinh Hoa, Nguyen Hung Son. 15 lutego 2006 Clustering 1 Clustering: Techniques & Applications Nguyen Sinh Hoa, Nguyen Hung Son 15 lutego 2006 Clustering 1 Agenda Introduction Clustering Methods Applications: Outlier Analysis Gene clustering Summary and Conclusions

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Clustering methods for Big data analysis

Clustering methods for Big data analysis Clustering methods for Big data analysis Keshav Sanse, Meena Sharma Abstract Today s age is the age of data. Nowadays the data is being produced at a tremendous rate. In order to make use of this large-scale

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Chapter 7. Cluster Analysis

Chapter 7. Cluster Analysis Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based

More information

A comparison of various clustering methods and algorithms in data mining

A comparison of various clustering methods and algorithms in data mining Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering

More information

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

Data Clustering Techniques Qualifying Oral Examination Paper

Data Clustering Techniques Qualifying Oral Examination Paper Data Clustering Techniques Qualifying Oral Examination Paper Periklis Andritsos University of Toronto Department of Computer Science periklis@cs.toronto.edu March 11, 2002 1 Introduction During a cholera

More information

BIRCH: An Efficient Data Clustering Method For Very Large Databases

BIRCH: An Efficient Data Clustering Method For Very Large Databases BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

The Role of Visualization in Effective Data Cleaning

The Role of Visualization in Effective Data Cleaning The Role of Visualization in Effective Data Cleaning Yu Qian Dept. of Computer Science The University of Texas at Dallas Richardson, TX 75083-0688, USA qianyu@student.utdallas.edu Kang Zhang Dept. of Computer

More information

Clustering Data Streams

Clustering Data Streams Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin gtg091e@mail.gatech.edu tprashant@gmail.com javisal1@gatech.edu Introduction: Data mining is the science of extracting

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Data Mining Process Using Clustering: A Survey

Data Mining Process Using Clustering: A Survey Data Mining Process Using Clustering: A Survey Mohamad Saraee Department of Electrical and Computer Engineering Isfahan University of Techno1ogy, Isfahan, 84156-83111 saraee@cc.iut.ac.ir Najmeh Ahmadian

More information

Cluster Analysis: Basic Concepts and Methods

Cluster Analysis: Basic Concepts and Methods 10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

On Clustering Validation Techniques

On Clustering Validation Techniques Journal of Intelligent Information Systems, 17:2/3, 107 145, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques MARIA HALKIDI mhalk@aueb.gr YANNIS

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

GraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering

GraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering GraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering Yu Qian Kang Zhang Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083-0688, USA {yxq012100,

More information

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract:- Data mining is used to find the hidden information pattern and relationship between

More information

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009 Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

A Survey of Clustering Techniques

A Survey of Clustering Techniques A Survey of Clustering Techniques Pradeep Rai Asst. Prof., CSE Department, Kanpur Institute of Technology, Kanpur-0800 (India) Shubha Singh Asst. Prof., MCA Department, Kanpur Institute of Technology,

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

A Comparative Study of clustering algorithms Using weka tools

A Comparative Study of clustering algorithms Using weka tools A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering

More information

Unsupervised learning: Clustering

Unsupervised learning: Clustering Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

More information

A Comparative Analysis of Various Clustering Techniques used for Very Large Datasets

A Comparative Analysis of Various Clustering Techniques used for Very Large Datasets A Comparative Analysis of Various Clustering Techniques used for Very Large Datasets Preeti Baser, Assistant Professor, SJPIBMCA, Gandhinagar, Gujarat, India 382 007 Research Scholar, R. K. University,

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms 8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster

More information

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

A Method for Decentralized Clustering in Large Multi-Agent Systems

A Method for Decentralized Clustering in Large Multi-Agent Systems A Method for Decentralized Clustering in Large Multi-Agent Systems Elth Ogston, Benno Overeinder, Maarten van Steen, and Frances Brazier Department of Computer Science, Vrije Universiteit Amsterdam {elth,bjo,steen,frances}@cs.vu.nl

More information

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Linköpings Universitet - ITN TNM033 2011-11-30 DBSCAN. A Density-Based Spatial Clustering of Application with Noise

Linköpings Universitet - ITN TNM033 2011-11-30 DBSCAN. A Density-Based Spatial Clustering of Application with Noise DBSCAN A Density-Based Spatial Clustering of Application with Noise Henrik Bäcklund (henba892), Anders Hedblom (andh893), Niklas Neijman (nikne866) 1 1. Introduction Today data is received automatically

More information

Data Clustering Using Data Mining Techniques

Data Clustering Using Data Mining Techniques Data Clustering Using Data Mining Techniques S.R.Pande 1, Ms. S.S.Sambare 2, V.M.Thakre 3 Department of Computer Science, SSES Amti's Science College, Congressnagar, Nagpur, India 1 Department of Computer

More information

The SPSS TwoStep Cluster Component

The SPSS TwoStep Cluster Component White paper technical report The SPSS TwoStep Cluster Component A scalable component enabling more efficient customer segmentation Introduction The SPSS TwoStep Clustering Component is a scalable cluster

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

An Introduction to Cluster Analysis for Data Mining

An Introduction to Cluster Analysis for Data Mining An Introduction to Cluster Analysis for Data Mining 10/02/2000 11:42 AM 1. INTRODUCTION... 4 1.1. Scope of This Paper... 4 1.2. What Cluster Analysis Is... 4 1.3. What Cluster Analysis Is Not... 5 2. OVERVIEW...

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths

More information

A Two-Step Method for Clustering Mixed Categroical and Numeric Data

A Two-Step Method for Clustering Mixed Categroical and Numeric Data Tamkang Journal of Science and Engineering, Vol. 13, No. 1, pp. 11 19 (2010) 11 A Two-Step Method for Clustering Mixed Categroical and Numeric Data Ming-Yi Shih*, Jar-Wen Jheng and Lien-Fu Lai Department

More information

On Data Clustering Analysis: Scalability, Constraints and Validation

On Data Clustering Analysis: Scalability, Constraints and Validation On Data Clustering Analysis: Scalability, Constraints and Validation Osmar R. Zaïane, Andrew Foss, Chi-Hoon Lee, and Weinan Wang University of Alberta, Edmonton, Alberta, Canada Summary. Clustering is

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

Clustering. Clustering. What is Clustering? What is Clustering? What is Clustering? Types of Data in Cluster Analysis

Clustering. Clustering. What is Clustering? What is Clustering? What is Clustering? Types of Data in Cluster Analysis What is Clustering? Clustering Tpes of Data in Cluster Analsis Clustering A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods What is Clustering? Clustering of data is

More information

A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases Published in the Proceedings of 14th International Conference on Data Engineering (ICDE 98) A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases Xiaowei Xu, Martin Ester, Hans-Peter

More information

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar

More information

Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch

Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS 1 PANKAJ SAXENA & 2 SUSHMA LEHRI 1 Deptt. Of Computer Applications, RBS Management Techanical Campus, Agra 2 Institute of

More information

Recent Advances in Clustering: A Brief Survey

Recent Advances in Clustering: A Brief Survey Recent Advances in Clustering: A Brief Survey S.B. KOTSIANTIS, P. E. PINTELAS Department of Mathematics University of Patras Educational Software Development Laboratory Hellas {sotos, pintelas}@math.upatras.gr

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

More information

A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory

A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory Yen-Jen Oyang, Chien-Yu Chen, and Tsui-Wei Yang Department of Computer Science and Information Engineering National Taiwan

More information

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification. Heatmaps Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

More information

Identifying erroneous data using outlier detection techniques

Identifying erroneous data using outlier detection techniques Identifying erroneous data using outlier detection techniques Wei Zhuang 1, Yunqing Zhang 2 and J. Fred Grassle 2 1 Department of Computer Science, Rutgers, the State University of New Jersey, Piscataway,

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets

Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets Dudu Lazarov, Gil David, Amir Averbuch School of Computer Science, Tel-Aviv University Tel-Aviv 69978, Israel Abstract

More information

Fuzzy Clustering Technique for Numerical and Categorical dataset

Fuzzy Clustering Technique for Numerical and Categorical dataset Fuzzy Clustering Technique for Numerical and Categorical dataset Revati Raman Dewangan, Lokesh Kumar Sharma, Ajaya Kumar Akasapu Dept. of Computer Science and Engg., CSVTU Bhilai(CG), Rungta College of

More information

Concept of Cluster Analysis

Concept of Cluster Analysis RESEARCH PAPER ON CLUSTER TECHNIQUES OF DATA VARIATIONS Er. Arpit Gupta 1,Er.Ankit Gupta 2,Er. Amit Mishra 3 arpit_jp@yahoo.co.in, ank_mgcgv@yahoo.co.in,amitmishra.mtech@gmail.com Faculty Of Engineering

More information

A Survey of Clustering Algorithms for Big Data: Taxonomy & Empirical Analysis

A Survey of Clustering Algorithms for Big Data: Taxonomy & Empirical Analysis TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014 1 A Survey of Clustering Algorithms for Big Data: Taxonomy & Empirical Analysis A. Fahad, N. Alshatri, Z. Tari, Member, IEEE, A. Alamri, I. Khalil A.

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of

More information

GE-INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH VOLUME -3, ISSUE-6 (June 2015) IF-4.007 ISSN: (2321-1717) EMERGING CLUSTERING TECHNIQUES ON BIG DATA

GE-INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH VOLUME -3, ISSUE-6 (June 2015) IF-4.007 ISSN: (2321-1717) EMERGING CLUSTERING TECHNIQUES ON BIG DATA EMERGING CLUSTERING TECHNIQUES ON BIG DATA Pooja Batra Nagpal 1, Sarika Chaudhary 2, Preetishree Patnaik 3 1,2,3 Computer Science/Amity University, India ABSTRACT The term "Big Data" defined as enormous

More information

How To Cluster On A Large Data Set

How To Cluster On A Large Data Set An Ameliorated Partitioning Clustering Algorithm for Large Data Sets Raghavi Chouhan 1, Abhishek Chauhan 2 MTech Scholar, CSE department, NRI Institute of Information Science and Technology, Bhopal, India

More information

A Review of Clustering Methods forming Non-Convex clusters with, Missing and Noisy Data

A Review of Clustering Methods forming Non-Convex clusters with, Missing and Noisy Data International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 A Review of Clustering Methods forming Non-Convex clusters with, Missing and Noisy

More information

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What is

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Visual Data Mining with Pixel-oriented Visualization Techniques

Visual Data Mining with Pixel-oriented Visualization Techniques Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 mihael.ankerst@boeing.com Abstract Pixel-oriented visualization

More information

Original Article Survey of Recent Clustering Techniques in Data Mining

Original Article Survey of Recent Clustering Techniques in Data Mining International Archive of Applied Sciences and Technology Volume 3 [2] June 2012: 68-75 ISSN: 0976-4828 Society of Education, India Website: www.soeagra.com/iaast/iaast.htm Original Article Survey of Recent

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

SCAN: A Structural Clustering Algorithm for Networks

SCAN: A Structural Clustering Algorithm for Networks SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected

More information

A Grid-based Clustering Algorithm using Adaptive Mesh Refinement

A Grid-based Clustering Algorithm using Adaptive Mesh Refinement Appears in the 7th Workshop on Mining Scientific and Engineering Datasets 2004 A Grid-based Clustering Algorithm using Adaptive Mesh Refinement Wei-keng Liao Ying Liu Alok Choudhary Abstract Clustering

More information

Grid Density Clustering Algorithm

Grid Density Clustering Algorithm Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2

More information

Comparison the various clustering algorithms of weka tools

Comparison the various clustering algorithms of weka tools Comparison the various clustering algorithms of weka tools Narendra Sharma 1, Aman Bajpai 2, Mr. Ratnesh Litoriya 3 1,2,3 Department of computer science, Jaypee University of Engg. & Technology 1 narendra_sharma88@yahoo.com

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Clustering Via Decision Tree Construction

Clustering Via Decision Tree Construction Clustering Via Decision Tree Construction Bing Liu 1, Yiyuan Xia 2, and Philip S. Yu 3 1 Department of Computer Science, University of Illinois at Chicago, 851 S. Morgan Street, Chicago, IL 60607-7053.

More information

CUP CLUSTERING USING PRIORITY: AN APPROXIMATE ALGORITHM FOR CLUSTERING BIG DATA

CUP CLUSTERING USING PRIORITY: AN APPROXIMATE ALGORITHM FOR CLUSTERING BIG DATA CUP CLUSTERING USING PRIORITY: AN APPROXIMATE ALGORITHM FOR CLUSTERING BIG DATA 1 SADAF KHAN, 2 ROHIT SINGH 1,2 Career Point University Abstract- Big data if used properly can bring huge benefits to the

More information

A Review on Clustering and Outlier Analysis Techniques in Datamining

A Review on Clustering and Outlier Analysis Techniques in Datamining American Journal of Applied Sciences 9 (2): 254-258, 2012 ISSN 1546-9239 2012 Science Publications A Review on Clustering and Outlier Analysis Techniques in Datamining 1 Koteeswaran, S., 2 P. Visu and

More information

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Unsupervised and Semi-supervised Clustering: a Brief Survey

Unsupervised and Semi-supervised Clustering: a Brief Survey Unsupervised and Semi-supervised Clustering: a Brief Survey Nizar Grira, Michel Crucianu, Nozha Boujemaa INRIA Rocquencourt, B.P. 105 78153 Le Chesnay Cedex, France {Nizar.Grira, Michel.Crucianu, Nozha.Boujemaa}@inria.fr

More information

Proposed Application of Data Mining Techniques for Clustering Software Projects

Proposed Application of Data Mining Techniques for Clustering Software Projects Proposed Application of Data Mining Techniques for Clustering Software Projects HENRIQUE RIBEIRO REZENDE 1 AHMED ALI ABDALLA ESMIN 2 UFLA - Federal University of Lavras DCC - Department of Computer Science

More information

Constrained Ant Colony Optimization for Data Clustering

Constrained Ant Colony Optimization for Data Clustering Constrained Ant Colony Optimization for Data Clustering Shu-Chuan Chu 1,3,JohnF.Roddick 1, Che-Jen Su 2, and Jeng-Shyang Pan 2,4 1 School of Informatics and Engineering, Flinders University of South Australia,

More information

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Data Mining for Knowledge Management. Clustering

Data Mining for Knowledge Management. Clustering Data Mining for Knowledge Management Clustering Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management Thanks for slides to: Jiawei Han Eamonn Keogh Jeff

More information

A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining

A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining Binu Thomas and Rau G 2, Research Scholar, Mahatma Gandhi University,Kerala, India. binumarian@rediffmail.com 2 SCMS School of Technology

More information

Determining the Number of Clusters/Segments in. hierarchical clustering/segmentation algorithm.

Determining the Number of Clusters/Segments in. hierarchical clustering/segmentation algorithm. Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms Stan Salvador and Philip Chan Dept. of Computer Sciences Florida Institute of Technology Melbourne, FL 32901

More information