Hierarchical vs. Partitional Clustering

Similar documents
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Clustering UE 141 Spring 2013

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Cluster Analysis: Advanced Concepts

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Unsupervised learning: Clustering

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Cluster Analysis: Basic Concepts and Algorithms

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Hierarchical Cluster Analysis Some Basics and Algorithms

Distances, Clustering, and Classification. Heatmaps

Chapter 7. Cluster Analysis

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Cluster Analysis using R

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

An Introduction to Cluster Analysis for Data Mining

How To Cluster

Information Retrieval and Web Search Engines

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Concept of Cluster Analysis

SoSe 2014: M-TANI: Big Data Analytics

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen

Neural Networks Lesson 5 - Cluster Analysis

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Statistical Databases and Registers with some datamining

Introduction to Clustering

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

BIRCH: An Efficient Data Clustering Method For Very Large Databases

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Territorial Analysis for Ratemaking. Philip Begher, Dario Biasini, Filip Branitchev, David Graham, Erik McCracken, Rachel Rogers and Alex Takacs

Model Efficiency Through Data Compression

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

Data Mining K-Clustering Problem

Clustering. Chapter Introduction to Clustering Techniques Points, Spaces, and Distances

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

2 Basic Concepts and Techniques of Cluster Analysis

Clustering methods for Big data analysis

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Cluster analysis Cosmin Lazar. COMO Lab VUB

A comparison of various clustering methods and algorithms in data mining

Distances between Clustering, Hierarchical Clustering

How To Solve The Cluster Algorithm

A Survey of Clustering Techniques

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari

Cluster Analysis Overview. Data Mining Techniques: Cluster Analysis. What is Cluster Analysis? What is Cluster Analysis?

SPSS Tutorial. AEB 37 / AE 802 Marketing Research Methods Week 7

Data Preprocessing. Week 2

DATA CLUSTERING USING MAPREDUCE

Hadoop SNS. renren.com. Saturday, December 3, 11

Machine Learning using MapReduce

Chapter ML:XI (continued)

SOME PRINCIPLES OF UNSUPERVISED LEARNING AND APPLICATION IN EDUCATION 1

Clustering & Visualization

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva

A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.

Standardization and Its Effects on K-Means Clustering Algorithm

Feature Selection vs. Extraction

Cluster Analysis: Basic Concepts and Methods

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Data Clustering Techniques Qualifying Oral Examination Paper

B490 Mining the Big Data. 2 Clustering

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Unsupervised Data Mining (Clustering)

Closest Pair of Points. Kleinberg and Tardos Section 5.4

Using multiple models: Bagging, Boosting, Ensembles, Forests

Environmental Remote Sensing GEOG 2021

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

HES-SO Master of Science in Engineering. Clustering. Prof. Laura Elena Raileanu HES-SO Yverdon-les-Bains (HEIG-VD)

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

A Method for Decentralized Clustering in Large Multi-Agent Systems

DISTRIBUTED ANOMALY DETECTION IN WIRELESS SENSOR NETWORKS

Mining Social Network Graphs

Computational Geometry. Lecture 1: Introduction and Convex Hulls

A Cluster Analysis Approach for Banks Risk Profile: The Romanian Evidence

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

Time series experiments

Map-Reduce for Machine Learning on Multicore

Forschungskolleg Data Analytics Methods and Techniques

Graph/Network Visualization

Time series clustering and the analysis of film style

Modifying Insurance Rating Territories Via Clustering

Clustering Techniques: A Brief Survey of Different Clustering Algorithms

CLUSTER ANALYSIS. Kingdom Phylum Subphylum Class Order Family Genus Species. In economics, cluster analysis can be used for data mining.

Transcription:

Hierarchical vs. Partitional Clustering A distinction among different types of clusterings is whether the set of clusters is nested or unnested. A partitional clustering a simply a division of the set of data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset. A hierarchical clustering is a set of nested clusters that are organized as a tree.

Why of Hierarchical Clustering?. It does not assume a particular value of k, as needed by k-means clustering.. The generated tree may correspond to a meaningful taxonomy.. Only a distance or proximity matrix is needed to compute the hierarchical clustering.

Hierarchical Clustering Two types of algorithms: Agglomerative ( Bottom-up ) Start with the points as individual clusters and, at each step, merge the closest pair of clusters. Divisive ( Top-down ) Start with one, all-inclusive cluster and, at each step, split a cluster until only singleton clusters of individual points remain.

Basic Agglomerative Clustering Basic agglomerative hierarchical clustering algorithm. : Compute the proximity matrix, if necessary. : repeat : Merge the closest two clusters. : Update the proximity matrix to reflect the proximity between the new cluster and the original clusters. : until Only one cluster remains.

Defining Proximity Between Clusters The key operation of basic agglomerative clustering is the computation of the proximity between two clusters. The definition of cluster proximity differentiates the various agglomerative hierarchical techniques. MAX (single link), MIN (complete link), and group average are graph-based proximities. Ward s method is a prototype-based proximity.

MIN (Single Link) Proximity Defines cluster proximity as the shortest distance between two points, x and y, that are in different clusters, A and B: d A, B = min x A,y B d x y

MAX (Complete Link) Proximity Defines cluster proximity as the furthest distance between two points, x and y, that are in different clusters, A and B: d A, B = max x A,y B d x y 7

Group Average Proximity Defines cluster proximity as the average distance between two points, x and y, that are in different clusters, A and B (number of points in cluster j is n j ): d A, B = Σ x A,y B d x y n An B 8

Example Data for Clustering Set of Two-Dimensional Points 0. 0. 0. 0. 0. 0. xy Coordinates of Points Point x Coordinate y Coordinate p 0.0 0. p 0. 0.8 p 0. 0. p 0. 0.9 p 0.08 0. p 0. 0.0 0 0. 0. 0. 0. 0. 0. 9

Example Data for Clustering Set of Two-Dimensional Points 0. 0. 0. 0. Euclidean Distance Matrix for Points p p p p p p p 0.00 0. 0. 0.7 0. 0. p 0. 0.00 0. 0.0 0. 0. p 0. 0. 0.00 0. 0.8 0. 0. 0. 0 0. 0. 0. 0. 0. 0. p 0.7 0.0 0. 0.00 0.9 0. p 0. 0. 0.8 0.9 0.00 0.9 p 0. 0. 0. 0. 0.9 0.00 0

Example of Single Link Clustering Single Link Distance Matrix 0 0. 0. 0.7 0. 0. 0 0. 0.0 0. 0. 0 0. 0.8 0. 0 0.9 0. 0 0.9 0

Example of Single Link Clustering Single Link Distance Matrix 0 0. 0. 0.7 0. 0. 0 0. 0.0 0. 0. 0 0. 0.8 0. 0 0.9 0. 0 0.9 0 Points and have the smallest single link proximity distance. Merge these points into one cluster and update the distances to this new cluster. For example, the distance from point to this cluster is 0. (the distance to point ).

Example of Single Link Clustering Single Link Distance Matrix, 0 0. 0.7 0. 0. 0 0.0 0. 0. 0 0.9 0. 0 0.8, 0 Points and have the smallest single link proximity distance. Merge these points into one cluster and update the distances to this new cluster.

Example of Single Link Clustering Single Link Distance Matrix,, 0 0.7 0. 0. 0 0.0 0., 0 0., 0 And iterate

Example of Single Link Clustering Single Link Distance Matrix,,, 0 0.7 0. 0 0.,,, 0 And iterate

Example of Single Link Clustering Single Link Distance Matrix,,,, 0 0.,,, 0 And iterate until there is one all-inclusive cluster.

Example of Single Link Clustering Hierarchical Tree Diagram 0. 0. 0. 0.0 0 7

Example of Complete Link Clustering Single Link Distance Matrix 0 0. 0. 0.7 0. 0. 0 0. 0.0 0. 0. 0 0. 0.8 0. 0 0.9 0. 0 0.9 0 8

Example of Complete Link Clustering 9 Complete Link Distance Matrix 0 0. 0. 0.7 0. 0. 0 0. 0.0 0. 0. 0 0. 0.8 0. 0 0.9 0. 0 0.9 0 Points and have the smallest complete link proximity distance. Merge these points into one cluster and update the distances to this new cluster. For example, the distance from point to this cluster is 0. (the distance to point ).

Example of Complete Link Clustering Complete Link Distance Matrix, 0 0. 0.7 0. 0. 0 0.0 0. 0. 0 0.9 0. 0 0.9, 0 0 Points and have the smallest complete link proximity distance. Merge these points into one cluster and update the distances to this new cluster.

Example of Complete Link Clustering Complete Link Distance Matrix,, 0 0.7 0. 0. 0 0.9 0., 0 0.9, 0 And iterate

Example of Complete Link Clustering Complete Link Distance Matrix,,, 0 0. 0.7, 0 0.9,, 0 And iterate

Example of Complete Link Clustering Complete Link Distance Matrix,,,,,, 0 0.9,, 0 And iterate until there is one all-inclusive cluster.

Example of Complete Link Clustering Hierarchical Tree Diagram 0. 0. 0. 0. 0

Example of Group Average Clustering Hierarchical Tree Diagram 0. 0. 0. 0. 0.0 0

Discussion of Proximity Methods Single link is chain-like and good at handling nonelliptical shapes, but is sensitive to outliers. Complete link is less susceptible to noise and outliers, but can break large clusters and favors globular shapes. Group average is an intermediate approach between the single and complete link approaches.

Ward s Method Assumes that a cluster is represented by its centroid, and measures the proximity between two clusters in terms of the increase in sum of the squared error (SSE) that results from merging the two clusters: d A, B = SSE A B SSE A SSE B where A and B are clusters. Note that for hierarchical clustering, the SSE starts at 0. 7

Discussion of Hierarchical Clustering Useful if the underlying application has a taxonomy. Agglomerative hierarchical clustering algorithms are expensive in terms of their computational and storage requirements. Merges are final and cannot be undone at a later time, preventing global optimization and causing trouble for noisy, high-dimensional data. 8

And now Lets see some hierarchical clustering! 9