Data Warehousing and Data Mining


 Bernadette Armstrong
 2 years ago
 Views:
Transcription
1 Data Warehousing and Data Mining Lecture Clustering Methods CITS CITS Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement: The Lecture Slides are adapted from the original slides of Han s textbook.
2 Lecture Outline Partitionbased Clustering KMeans Algorithm KMediods Algorithm DensityBased Clustering DBScan
3 Major Clustering Approaches (I) Partitioning approach: kmeans, kmedoids, CLARANS Construct kpartitions for the given nobjects (k n). Each group contains at least one object. Each object must belong to exactly one group. Hierarchical approach: Diana, Agnes, BIRCH, ROCK, CAMELEON Create a hierarchical decomposition of the set of objects using some criterion (linkage function ) Agglomerative Approach: bottomup merging Divisive Approach: topdown splitting Densitybased approach: DBSACN, OPTICS, DenClue Based on connectivity and density functions. i.e., for each data point within a given cluster, the radius of a given cluster has to contain at least a minimum number of points.
4 Major Clustering Approaches (II) Gridbased approach: based on a multiplelevel granularity structure Typical methods: STING, WaveCluster, CLIQUE Modelbased: A model is hypothesized for each of the clusters and tries to find the best fit of that model to each other Typical methods: EM, SOFM, COBWEB Frequent patternbased: Based on the analysis of frequent patterns Typical methods: pcluster Userguided or constraintbased: Clustering by considering userspecified or applicationspecific constraints Typical methods: COD (obstacles), constrained clustering
5 Calculate the distance between Clusters Single link smallest distance between an element in one cluster and an element in the other, i.e., dis K i, K j = min(t ip, t jp ) Complete link largest distance between an element in one cluster and an element in the other, i.e., dis K i, K j = max(t ip, t jp ) Average avg distance between an element in one cluster and an element in the other, i.e., dis K i, K j = avg(t ip, t jp ) Centroid distance between the centroids of two clusters, i.e., dis K i, K j = dis(c i, C j ) Medoid: medoid is the most centrally located object in a cluster distance between the medoids of two clusters, i.e., dis K i, K j = dis(m i, M j )
6 Centroid, Radius and Diameter of a Cluster (for numerical data sets) Centroid: the middle of a cluster C m = i= N tip N Radius: square root of average distance from any point of the cluster to its centroid R m = i= N (t ip c m ) N Diameter: square root of average mean squared distance between all pairs of points in the cluster D m = i= N i= N (t ip t iq ) N(N )
7 Partitioning Algorithms: Basic Concept Partitioning method: Construct a partition of a database D of n objects into a set of k clusters, s.t., min sum of squared distance (within cluster variance) k E = i= p Ci (p m i ) Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion Global optimal: exhaustively enumerate all partitions Heuristic methods: kmeans and kmedoids algorithms kmeans (MacQueen ): Each cluster is represented by the center of the cluster kmedoids or PAM (Partition around medoids) (Kaufman & Rousseeuw ): Each cluster is represented by one of the objects in the cluster
8 kmeans: A centroidbased partitioning algorithm Given k, the kmeans algorithm is implemented in four steps: Partition objects into k nonempty subsets Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster) Assign each object to the cluster with the nearest seed point Go back to Step, stop when no more new assignment
9 Example: the kmeans Clustering Method Assign each objects to most similar center reassign Update the cluster means reassign K= Arbitrarily choose K object as initial cluster center Update the cluster means
10 Comments on the kmeans Method Strength: Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t n. Comparing: PAM: O(k(n k) ), CLARA: O(ks + k(n k)) Comment: Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms Weakness Applicable only when mean is defined, then what about categorical data? Need to specify k, the number of clusters, in advance Unable to handle noisy data and outliers Not suitable to discover clusters with nonconvex shapes
11 Variations of kmeans A few variants of the kmeans which differ in Selection of the initial k means Dissimilarity calculations Strategies to calculate cluster means Handling categorical data: kmodes (Huang ) Replacing means of clusters with modes Using new dissimilarity measures to deal with categorical objects Using a frequencybased method to update modes of clusters A mixture of categorical and numerical data: kprototype method
12 The problem of kmeans method The kmeans algorithm is sensitive to outliers! Since an object with an extremely large value may substantially distort the distribution of the data. KMedoids: Instead of taking the mean value of the object in a cluster as a reference point, medoids can be used, which is the most centrally located object in a cluster.
13 The KMedoids Clustering Method Find representative objects, called medoids, in clusters PAM (Partitioning Around Medoids, ) starts from an initial set of medoids and iteratively replaces one of the medoids by one of the nonmedoids if it improves the total distance of the resulting clustering PAM works effectively for small data sets, but does not scale well for large data sets CLARA (Kaufmann & Rousseeuw, ) CLARANS (Ng & Han, ): Randomized sampling Focusing + spatial data structure (Ester et al., )
14 A Typical KMedoids Algorithm (PAM) Arbitrary choose k object as initial medoids Assign each remainin g object to nearest medoids K= Total Cost = Randomly select a nonmedoid object,o ramdom Do loop Until no change Swapping O and O ramdom If quality is improved. Compute total cost of swapping
15 PAM Algorithm PAM (Kaufman and Rousseeuw, ), built in S+ Use real object to represent the cluster Select k representative objects arbitrarily For each pair of nonselected object h and selected object i, calculate the total swapping cost TC ih For each pair of i and h, If TC ih <, i is replaced by h Then assign each nonselected object to the most similar representative object repeat steps  until there is no change
16 Determining O random a good replacement of O j We calculate the distance from every object p to the closest object in the set {o,, o j, o random, o j+,, ok}, Then use the distance to update the cost function. Suppose object p is currently assigned to a cluster represented by medoid o j (Figure a or b). Do we need to reassign p to a different cluster if o j is being replaced by o random? What if object p is currently assigned to a different cluster represented by medoid o i? (Figure c or d)
17 PAM Clustering: Total Swapping Cost: TC ih = j C jih j i h t C jih = t i h j C jih = d(j, h)  d(j, i) h i t j C jih = d(j, t)  d(j, i) t i h j C jih = d(j, h)  d(j, t)
18 What is the problem with PAM Pam is more robust than kmeans in the presence of noise and outliers because a medoid is less influenced by outliers or other extreme values than a mean Pam works efficiently for small data sets but does not scale well for large data sets. O(k(nk) ) for each iteration where n is # of data,k is # of clusters Sampling based method, CLARA(Clustering LARge Applications)
19 CLARA (Clustering Large Applications) () CLARA (Kaufmann and Rousseeuw in ) Built in statistical analysis packages, such as S+ It draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as the output Strength: deals with larger data sets than PAM Weakness: Efficiency depends on the sample size A good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased
20 CLARANS ( Randomized CLARA) () CLARANS (A Clustering Algorithm based on Randomized Search) (Ng and Han ) CLARANS draws sample of neighbors dynamically The clustering process can be presented as searching a graph where every node is a potential solution, that is, a set of k medoids If the local optimum is found, CLARANS starts with new randomly selected node in search for a new local optimum It is more efficient and scalable than both PAM and CLARA Focusing techniques and spatial access structures may further improve its performance (Ester et al. )
21 DensityBased Clustering Algorithms Clustering based on density (local cluster criterion), such as densityconnected points Major features: Discover clusters of arbitrary shape Handle noise One scan Need density parameters as termination condition Several interesting studies: DBSCAN: Ester, et al. (KDD ) OPTICS: Ankerst, et al (SIGMOD ). DENCLUE: Hinneburg & D. Keim (KDD ) CLIQUE: Agrawal, et al. (SIGMOD ) (more gridbased)
22 Density Based Clustering: Basic Concepts Two parameters: Eps: Maximum radius of the neighbourhood MinPts: Minimum number of points in an Epsneighbourhood of that point N Eps p : q D dist(p, q) Eps} Directly densityreachable: A point p is directly densityreachable from a point q w.r.t. Eps, MinPts if p belongs to N Eps p & q is a core. Core point condition: N Eps (q) MinPts q p MinPts = Eps = cm
23 DensityReachable and DensityConnected Densityreachable: A point p is densityreachable from a point q w.r.t. Eps, MinPts if there is a chain of points p,, p n, p = q, p n = p such that p i+ is directly densityreachable from p i Densityconnected q p p A point p is densityconnected to a point q w.r.t. Eps, MinPts if there is a point o such that both, p and q are densityreachable from o w.r.t. Eps and MinPts p q o
24 DBSCAN: Density Based Spatial Clustering of Applications with Noise Relies on a densitybased notion of cluster: A cluster is defined as a maximal set of densityconnected points Discovers clusters of arbitrary shape in spatial databases with noise Outlier Border Core Eps = cm MinPts =
25 DBSCAN: The Algorithm Arbitrary select a point p Retrieve all points densityreachable from p w.r.t. Eps and MinPts. If p is a core point, a cluster is formed. If p is a border point, no points are densityreachable from p and DBSCAN visits the next point of the database. Continue the process until all of the points have been processed.
26 DBSCAN: Sensitive to Parameters
27 Extensions of DBSCAN OPTICS: (Ordering Points To Identify the Clustering Structure) Produces cluster ordering obtained for wide range of parameter setting in the framework of densitybased clustering structure. coredistance, reachabilitydistance. Good for both automatic and interactive cluster analysis, including finding intrinsic clustering structure Can be represented graphically or using visualization techniques DENCLUE: Clusters object based on set of density distribution functions. Each point is modelled using a mathematical function called influence function, which describes the influence of the function within its neighborhood. Data space density is modelled as sum of individual influence function. Clusters are then determined by identifying density attractors, which are nothing but local maxima.
28 Summary Cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Cluster analysis can be used as a standalone data mining tool to gain insight into the data distribution or can serve as a preprocessing step for other data mining algorithms operated on the detected clusters. The quality of cluster is based on a measure of dissimilarity of objects, computed for various types of data (intervalscaled, binary, categorical, ordinal and ratio scaled). Cosine measure and Tanimoto coefficients are used for nonmetric vector data. Partitioning Method: iterative relocation technique kmeans, k medoids, CLARANS, etc. Kmedoid is efficient in presence of noise and outliers and CLARANS is its extension for working with large data sets. Desnsity Based Method Discover clusters of arbitrary shape, good at handling noise, requires only one scan But sensitive to density parameters, which are required as termination condition
29 Overview of Clustering Methods Partitioning methods Hierarchical methods Densitybased methods Gridbased methods General Characteristics Find mutually exclusive clusters of spherical shape Distancebased May use mean or medoid (etc.) to represent cluster center Effective for small to mediumsize data sets Clustering is a hierarchical decomposition (i.e., multiple levels) Cannot correct erroneous merges or splits May incorporate other techniques like microclustering or consider object linkages Can find arbitrarily shaped clusters Clusters are dense regions of objects in space that are separated by lowdensity regions Cluster density: Each point must have a minimum number of points within its neighborhood May filter out outliers Use a multiresolution grid data structure Fast processing time (typically independent of the number of data objects, yet dependent on grid size)
Chapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. DensityBased Methods 6. GridBased Methods 7. ModelBased
More informationData Mining for Knowledge Management. Clustering
Data Mining for Knowledge Management Clustering Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management Thanks for slides to: Jiawei Han Eamonn Keogh Jeff
More informationClustering: Techniques & Applications. Nguyen Sinh Hoa, Nguyen Hung Son. 15 lutego 2006 Clustering 1
Clustering: Techniques & Applications Nguyen Sinh Hoa, Nguyen Hung Son 15 lutego 2006 Clustering 1 Agenda Introduction Clustering Methods Applications: Outlier Analysis Gene clustering Summary and Conclusions
More informationCluster Analysis: Basic Concepts and Methods
10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all
More informationData Mining Project Report. Document Clustering. Meryem UzunPer
Data Mining Project Report Document Clustering Meryem UzunPer 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. Kmeans algorithm...
More informationData Mining. Session 9 Main Theme Clustering. Dr. JeanClaude Franchitti
Data Mining Session 9 Main Theme Clustering Dr. JeanClaude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences Adapted from course textbook resources
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototypebased clustering Densitybased clustering Graphbased
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationAn Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
More informationOn Clustering Validation Techniques
Journal of Intelligent Information Systems, 17:2/3, 107 145, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques MARIA HALKIDI mhalk@aueb.gr YANNIS
More informationClustering methods for Big data analysis
Clustering methods for Big data analysis Keshav Sanse, Meena Sharma Abstract Today s age is the age of data. Nowadays the data is being produced at a tremendous rate. In order to make use of this largescale
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototypebased Fuzzy cmeans
More informationA Novel Density based improved kmeans Clustering Algorithm Dbkmeans
A Novel Density based improved kmeans Clustering Algorithm Dbkmeans K. Mumtaz 1 and Dr. K. Duraiswamy 2, 1 Vivekanandha Institute of Information and Management Studies, Tiruchengode, India 2 KS Rangasamy
More informationA comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 3236 May 2015 www.allsubjectjournal.com eissn: 23494182 pissn: 23495979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
More informationData Mining Process Using Clustering: A Survey
Data Mining Process Using Clustering: A Survey Mohamad Saraee Department of Electrical and Computer Engineering Isfahan University of Techno1ogy, Isfahan, 8415683111 saraee@cc.iut.ac.ir Najmeh Ahmadian
More informationClustering and Outlier Detection
Clustering and Outlier Detection Application Examples Customer segmentation How to partition customers into groups so that customers in each group are similar, while customers in different groups are dissimilar?
More informationFig. 1 A typical Knowledge Discovery process [2]
Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering
More informationData Clustering Techniques Qualifying Oral Examination Paper
Data Clustering Techniques Qualifying Oral Examination Paper Periklis Andritsos University of Toronto Department of Computer Science periklis@cs.toronto.edu March 11, 2002 1 Introduction During a cholera
More informationData Mining: Foundation, Techniques and Applications
Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing
More informationClustering Techniques: A Brief Survey of Different Clustering Algorithms
Clustering Techniques: A Brief Survey of Different Clustering Algorithms Deepti Sisodia Technocrates Institute of Technology, Bhopal, India Lokesh Singh Technocrates Institute of Technology, Bhopal, India
More informationData Clustering Using Data Mining Techniques
Data Clustering Using Data Mining Techniques S.R.Pande 1, Ms. S.S.Sambare 2, V.M.Thakre 3 Department of Computer Science, SSES Amti's Science College, Congressnagar, Nagpur, India 1 Department of Computer
More informationWhat is Cluster Analysis?
What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping a set of data objects
More informationA Comparative Analysis of Various Clustering Techniques used for Very Large Datasets
A Comparative Analysis of Various Clustering Techniques used for Very Large Datasets Preeti Baser, Assistant Professor, SJPIBMCA, Gandhinagar, Gujarat, India 382 007 Research Scholar, R. K. University,
More informationAn Introduction to Cluster Analysis for Data Mining
An Introduction to Cluster Analysis for Data Mining 10/02/2000 11:42 AM 1. INTRODUCTION... 4 1.1. Scope of This Paper... 4 1.2. What Cluster Analysis Is... 4 1.3. What Cluster Analysis Is Not... 5 2. OVERVIEW...
More informationClustering. Adrian Groza. Department of Computer Science Technical University of ClujNapoca
Clustering Adrian Groza Department of Computer Science Technical University of ClujNapoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 Kmeans 3 Hierarchical Clustering What is Datamining?
More informationClustering. Clustering. What is Clustering? What is Clustering? What is Clustering? Types of Data in Cluster Analysis
What is Clustering? Clustering Tpes of Data in Cluster Analsis Clustering A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods What is Clustering? Clustering of data is
More informationClustering. 15381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv BarJoseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance Knearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationData Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationDistance based clustering
// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
More informationAn Ameliorated Partitioning Clustering Algorithm for Large Data Sets
An Ameliorated Partitioning Clustering Algorithm for Large Data Sets Raghavi Chouhan 1, Abhishek Chauhan 2 MTech Scholar, CSE department, NRI Institute of Information Science and Technology, Bhopal, India
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationData Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distancebased Kmeans, Kmedoids,
More informationLinköpings Universitet  ITN TNM033 20111130 DBSCAN. A DensityBased Spatial Clustering of Application with Noise
DBSCAN A DensityBased Spatial Clustering of Application with Noise Henrik Bäcklund (henba892), Anders Hedblom (andh893), Niklas Neijman (nikne866) 1 1. Introduction Today data is received automatically
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDDLAB ISTI CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationA DistributionBased Clustering Algorithm for Mining in Large Spatial Databases
Published in the Proceedings of 14th International Conference on Data Engineering (ICDE 98) A DistributionBased Clustering Algorithm for Mining in Large Spatial Databases Xiaowei Xu, Martin Ester, HansPeter
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
More informationComparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool
Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract: Data mining is used to find the hidden information pattern and relationship between
More informationClustering & Association
Clustering  Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects
More informationThe Role of Visualization in Effective Data Cleaning
The Role of Visualization in Effective Data Cleaning Yu Qian Dept. of Computer Science The University of Texas at Dallas Richardson, TX 750830688, USA qianyu@student.utdallas.edu Kang Zhang Dept. of Computer
More informationData Clustering Analysis, from simple groupings to scalable clustering with constraints. Objectives. Data Clustering Outline. Data Clustering Outline
Data Clustering Analysis, from simple groupings to scalable clustering with constraints Objectives Learn basic techniques for data clustering. Understand the issues and the major challenges in clustering
More informationNeural Networks Lesson 5  Cluster Analysis
Neural Networks Lesson 5  Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt.  Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationCluster Algorithms. Adriano Cruz adriano@nce.ufrj.br. 28 de outubro de 2013
Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 KMeans Adriano Cruz adriano@nce.ufrj.br
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
More informationOPTICS: Ordering Points To Identify the Clustering Structure
Proc. ACM SIGMOD 99 Int. Conf. on Management of Data, Philadelphia PA, 1999. OPTICS: Ordering Points To Identify the Clustering Structure Mihael Ankerst, Markus M. Breunig, HansPeter Kriegel, Jörg Sander
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis DensityBased Cluster Analysis Cluster Evaluation Constrained
More informationData Mining Clustering. Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering Toon Calders Sheets are based on the those provided b Tan, Steinbach, and Kumar. Introduction to Data Mining What is Cluster Analsis? Finding groups of objects such that the objects
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationOn Data Clustering Analysis: Scalability, Constraints and Validation
On Data Clustering Analysis: Scalability, Constraints and Validation Osmar R. Zaïane, Andrew Foss, ChiHoon Lee, and Weinan Wang University of Alberta, Edmonton, Alberta, Canada Summary. Clustering is
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationText Clustering. Clustering
Text Clustering 1 Clustering Partition unlabeled examples into disoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 WolfTilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
More informationCluster Analysis: Basic Concepts and Algorithms
Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering Kmeans Intuition Algorithm Choosing initial centroids Bisecting Kmeans Postprocessing Strengths
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, Ph.D. Abstract: Clustering methods are briefly reviewed and their applications in insurance ratemaking are discussed in this paper.
More informationA Survey of Clustering Techniques
A Survey of Clustering Techniques Pradeep Rai Asst. Prof., CSE Department, Kanpur Institute of Technology, Kanpur0800 (India) Shubha Singh Asst. Prof., MCA Department, Kanpur Institute of Technology,
More informationA Review on Clustering and Outlier Analysis Techniques in Datamining
American Journal of Applied Sciences 9 (2): 254258, 2012 ISSN 15469239 2012 Science Publications A Review on Clustering and Outlier Analysis Techniques in Datamining 1 Koteeswaran, S., 2 P. Visu and
More informationAn Enhanced Clustering Algorithm to Analyze Spatial Data
International Journal of Engineering and Technical Research (IJETR) ISSN: 23210869, Volume2, Issue7, July 2014 An Enhanced Clustering Algorithm to Analyze Spatial Data Dr. Mahesh Kumar, Mr. Sachin Yadav
More informationGraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering
GraphZip: A Fast and Automatic Compression Method for Spatial Data Clustering Yu Qian Kang Zhang Department of Computer Science, The University of Texas at Dallas, Richardson, TX 750830688, USA {yxq012100,
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationGEINTERNATIONAL JOURNAL OF ENGINEERING RESEARCH VOLUME 3, ISSUE6 (June 2015) IF4.007 ISSN: (23211717) EMERGING CLUSTERING TECHNIQUES ON BIG DATA
EMERGING CLUSTERING TECHNIQUES ON BIG DATA Pooja Batra Nagpal 1, Sarika Chaudhary 2, Preetishree Patnaik 3 1,2,3 Computer Science/Amity University, India ABSTRACT The term "Big Data" defined as enormous
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More informationA Comparative Study of clustering algorithms Using weka tools
A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering
More informationA TwoStep Method for Clustering Mixed Categroical and Numeric Data
Tamkang Journal of Science and Engineering, Vol. 13, No. 1, pp. 11 19 (2010) 11 A TwoStep Method for Clustering Mixed Categroical and Numeric Data MingYi Shih*, JarWen Jheng and LienFu Lai Department
More informationCluster Analysis: Basic Concepts and Algorithms
8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationA Gridbased Clustering Algorithm using Adaptive Mesh Refinement
Appears in the 7th Workshop on Mining Scientific and Engineering Datasets 2004 A Gridbased Clustering Algorithm using Adaptive Mesh Refinement Weikeng Liao Ying Liu Alok Choudhary Abstract Clustering
More informationForschungskolleg Data Analytics Methods and Techniques
Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!
More informationCluster analysis Cosmin Lazar. COMO Lab VUB
Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,
More informationFlat Clustering KMeans Algorithm
Flat Clustering KMeans Algorithm 1. Purpose. Clustering algorithms group a set of documents into subsets or clusters. The cluster algorithms goal is to create clusters that are coherent internally, but
More informationThe SPSS TwoStep Cluster Component
White paper technical report The SPSS TwoStep Cluster Component A scalable component enabling more efficient customer segmentation Introduction The SPSS TwoStep Clustering Component is a scalable cluster
More informationRobotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard
Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,
More informationComputational Complexity between KMeans and KMedoids Clustering Algorithms for Normal and Uniform Distributions of Data Points
Journal of Computer Science 6 (3): 363368, 2010 ISSN 15493636 2010 Science Publications Computational Complexity between KMeans and KMedoids Clustering Algorithms for Normal and Uniform Distributions
More informationThere are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
More informationChapter 4 Data Mining A Short Introduction
Chapter 4 Data Mining A Short Introduction Data Mining  1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining  2 2 3. Clustering  Descriptive
More informationRecent Advances in Clustering: A Brief Survey
Recent Advances in Clustering: A Brief Survey S.B. KOTSIANTIS, P. E. PINTELAS Department of Mathematics University of Patras Educational Software Development Laboratory Hellas {sotos, pintelas}@math.upatras.gr
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms Kmeans and its variants Hierarchical clustering
More informationKMeans Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
KMeans Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar
More informationUnsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets Knearest neighbor Support vectors Linear regression Logistic regression...
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What is
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationDATA mining in general is the search for hidden patterns
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 14, NO. 5, SEPTEMBER/OCTOBER 2002 1003 CLARANS: A Method for Clustering Objects for Spatial Data Mining Raymond T. Ng and Jiawei Han, Member, IEEE
More informationLecture 20: Clustering
Lecture 20: Clustering Wrapup of neural nets (from last lecture Introduction to unsupervised learning Kmeans clustering COMP424, Lecture 20  April 3, 2013 1 Unsupervised learning In supervised learning,
More informationClustering and Data Mining in R
Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches
More informationProposed Application of Data Mining Techniques for Clustering Software Projects
Proposed Application of Data Mining Techniques for Clustering Software Projects HENRIQUE RIBEIRO REZENDE 1 AHMED ALI ABDALLA ESMIN 2 UFLA  Federal University of Lavras DCC  Department of Computer Science
More informationClustering Hierarchical clustering and kmean clustering
Clustering Hierarchical clustering and kmean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: A quick review partition genes into distinct sets with high homogeneity
More informationCUP CLUSTERING USING PRIORITY: AN APPROXIMATE ALGORITHM FOR CLUSTERING BIG DATA
CUP CLUSTERING USING PRIORITY: AN APPROXIMATE ALGORITHM FOR CLUSTERING BIG DATA 1 SADAF KHAN, 2 ROHIT SINGH 1,2 Career Point University Abstract Big data if used properly can bring huge benefits to the
More informationComparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development eissn: 2278067X, pissn: 2278800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 1924 Comparative Analysis of EM Clustering Algorithm
More informationSurvey of Clustering Data Mining Techniques
Survey of Clustering Data Mining Techniques Pavel Berkhin Accrue Software, Inc. Clustering is a division of data into groups of similar obects. Representing the data by fewer clusters necessarily loses
More informationOn DensityBased Data Streams Clustering Algorithms: A Survey
Amini A, Wah TY, Saboohi H. On densitybased data streams clustering algorithms: A survey. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 29(1): 116 141 Jan. 2014. DOI 10.1007/s1139001314163 On DensityBased
More informationVisually driven analysis of movement data by progressive clustering
Visually driven analysis of movement data by progressive clustering Salvatore Rinzivillo, Dino Pedreschi, Mirco Nanni, Fosca Giannotti, Natalia Andrienko, Gennady Andrienko Information Visualization 2008
More informationClustering Model for Evaluating SaaS on the Cloud
Clustering Model for Evaluating SaaS on the Cloud 1 Mrs. Dhanamma Jagli, 2 Mrs. Akanksha Gupta 1 Assistant Professor, V.E.S Institute of Technology, Mumbai, India 2 Student, M.E (IT) 2 nd year, V.E.S Institute
More information. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering NonSupervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
More informationA Review of Clustering Methods forming NonConvex clusters with, Missing and Noisy Data
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume4, Issue3 EISSN: 23472693 A Review of Clustering Methods forming NonConvex clusters with, Missing and Noisy
More informationGene Expression Data Clustering Analysis: A Survey
Gene Expression Data Clustering Analysis: A Survey Sajid Nagi #, Dhruba K. Bhattacharyya *, Jugal K. Kalita # Department of Computer Science, St. Edmund s College, Shillong 793001, Meghalaya, India. *
More informationA Survey of Clustering Algorithms for Big Data: Taxonomy & Empirical Analysis
TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014 1 A Survey of Clustering Algorithms for Big Data: Taxonomy & Empirical Analysis A. Fahad, N. Alshatri, Z. Tari, Member, IEEE, A. Alamri, I. Khalil A.
More information