Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is



Similar documents
Clustering UE 141 Spring 2013

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Chapter 7. Cluster Analysis

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Unsupervised learning: Clustering

Information Retrieval and Web Search Engines

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Neural Networks Lesson 5 - Cluster Analysis

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Social Media Mining. Data Mining Essentials

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Cluster Analysis: Basic Concepts and Methods

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

How To Cluster

Distances, Clustering, and Classification. Heatmaps

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

The SPSS TwoStep Cluster Component

BIRCH: An Efficient Data Clustering Method For Very Large Databases

Cluster Analysis: Advanced Concepts

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

A comparison of various clustering methods and algorithms in data mining

Principles of Data Mining by Hand&Mannila&Smyth

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

DATA CLUSTERING USING MAPREDUCE

How To Solve The Cluster Algorithm

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Chapter ML:XI (continued)

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Unsupervised Data Mining (Clustering)

Data Preprocessing. Week 2

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Machine Learning using MapReduce

Categorical Data Visualization and Clustering Using Subjective Factors

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Cluster Analysis using R

An Overview of Knowledge Discovery Database and Data mining Techniques

Hierarchical Cluster Analysis Some Basics and Algorithms

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

International Journal of Advance Research in Computer Science and Management Studies

Clustering & Visualization

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Cluster Analysis: Basic Concepts and Algorithms

A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

Clustering Via Decision Tree Construction

Data Mining Practical Machine Learning Tools and Techniques

Standardization and Its Effects on K-Means Clustering Algorithm

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

Web Data Extraction: 1 o Semestre 2007/2008

Clustering Techniques: A Brief Survey of Different Clustering Algorithms

Chapter 12 Discovering New Knowledge Data Mining

Statistical Machine Learning from Data

Data Mining for Knowledge Management. Classification

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1

SoSe 2014: M-TANI: Big Data Analytics

Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)

Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen

A Survey of Clustering Techniques

Statistical Databases and Registers with some datamining

The Data Mining Process

Environmental Remote Sensing GEOG 2021

A Comparative Study of clustering algorithms Using weka tools

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Data Mining for Knowledge Management. Clustering

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE

Chapter 20: Data Analysis

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc.,

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Norbert Schuff Professor of Radiology VA Medical Center and UCSF

They can be obtained in HQJHQH format directly from the home page at:

Data Mining Classification: Decision Trees

Determining the Number of Clusters/Segments in. hierarchical clustering/segmentation algorithm.

Segmentation & Clustering

Fast Matching of Binary Features

Lecture 10: Regression Trees

Transcription:

Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among objects. Why do we want to do that? Any REAL application? 1

Example: clusty Example: clustering genes Microarrays measures the activities of all genes in different conditions Clustering genes provide a lot of information about the genes An early killer application in this area The most cited (7,812) paper in PNAS! 2

Why clustering? Organizing data into clusters shows internal structure of the data Ex. Clusty and clustering genes above Sometimes the partitioning is the goal Ex. Market segmentation Prepare for other AI techniques Ex. Summarize news (cluster and then find centroid) Techniques for clustering is useful in knowledge discovery in data Ex. Underlying rules, reoccurring patterns, topics, etc. Outline Motivation Distance measure Hierarchical clustering Partitional clustering K-means Gaussian Mixture Models Number of clusters 3

What is a natural grouping among these objects? What is a natural grouping among these objects? Clustering is subjective Simpson's Family School Employees Females Males 4

What is Similarity? The quality or state of being similar; likeness; resemblance; as, a similarity of features. Webster's Dictionary Similarity is hard to define, but We know it when we see it The real meaning of similarity is a philosophical question. We will take a more pragmatic approach. Defining Distance Measures Definition: Let O 1 and O 2 be two objects from the universe of possible objects. The distance (dissimilarity) between O 1 and O 2 is a real number denoted by D(O 1,O 2 ) gene1 gene2 0.23 3 342.7 5

d('', s) of s2+ch2) 0 s2) + else 1 s = ) + '') d(s1+ch1, s if 1, = 1 --i.e. ch1=ch2 d(s1, 0 fi, = d(s, min( d(s1+ch1, length s2+ch2) '') d(s1, then = d('', gene1 gene2 Inside these black boxes: some function on two variables (might be simple or very complex) 3 What properties should a distance measure have? D(A,B) = D(B,A) D(A,B) = 0 iif A = B D(A,B) 0 D(A,B) D(A,C) + D(B,C) Symmetry Constancy of Self-Similarity Positivity Triangular Inequality Outline Motivation Distance measure Hierarchical clustering Partitional clustering K-means Gaussian Mixture Models Number of clusters 6

Two Types of Clustering Partitional algorithms: Construct various partitions and then evaluate them by some criterion (we will see an example called BIRCH) Hierarchical algorithms: Create a hierarchical decomposition of the set of objects using some criterion Hierarchical Partitional Desirable Properties of a Clustering Algorithm Scalability (in terms of both time and space) Ability to deal with different data types Minimal requirements for domain knowledge to determine input parameters Able to deal with noise and outliers Insensitive to order of input records Incorporation of user-specified constraints Interpretability and usability 7

A Useful Tool for Summarizing Similarity Measurements In order to better appreciate and evaluate the examples given in the early part of this talk, we will now introduce the dendrogram. Terminal Branch Root Internal Node Leaf Internal Branch The similarity between two objects in a dendrogram is represented as the height of the lowest internal node they share. Note that hierarchies are commonly used to organize information, for example in a web portal. Yahoo s hierarchy is manually created, we will focus on automatic creation of hierarchies in data mining. Business & Economy B2B Finance Shopping Jobs Aerospace Agriculture Banking Bonds Animals Apparel Career Workspace 8

We can look at the dendrogram to determine the correct number of clusters. In this case, the two highly separated subtrees are highly suggestive of two clusters. (Things are rarely this clear cut, unfortunately) One potential use of a dendrogram is to detect outliers The single isolated branch is suggestive of a data point that is very different to all others Outlier 9

(How-to) Hierarchical Clustering The number of dendrograms with n leafs = (2n -3)!/[(2 (n -2) ) (n -2)!] Number Number of Possible of Leafs Dendrograms 2 1 3 3 4 15 5 105... 10 34,459,425 Since we cannot test all possible trees we will have to heuristic search of all possible trees. We could do this.. Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. Top-Down (divisive): Starting with all the data in a single cluster, consider every possible way to divide the cluster into two. Choose the best division and recursively operate on both sides. We begin with a distance matrix which contains the distances between every pair of objects in our database. 0 8 8 7 7 D(, ) = 8 D(, ) = 1 0 2 4 4 0 3 3 0 1 0 10

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. Consider all possible merges Choose the best Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. Consider all possible merges Choose the best Consider all possible merges Choose the best 11

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. Consider all possible merges Choose the best Consider all possible merges Choose the best Consider all possible merges Choose the best Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. Consider all possible merges Choose the best Consider all possible merges Choose the best Consider all possible merges Choose the best 12

Similarity criteria: Single Link cluster similarity = similarity of two most similar members - Potentially long and skinny clusters Hierarchical: Complete Link cluster similarity = similarity of two least similar members + tight clusters 13

Hierarchical: Average Link cluster similarity = average similarity of all pairs the most widely used similarity measure Robust against noise Single linkage 7 6 5 4 3 2 1 29 2 6 11 9 17 10 13 24 25 26 20 22 30 27 1 3 8 4 12 5 14 23 15 16 18 19 21 28 7 Average linkage 14

Summary of Hierarchal Clustering Methods No need to specify the number of clusters in advance. Hierarchical structure maps nicely onto human intuition for some domains They do not scale well: time complexity of at least O(n 2 ), where n is the number of total objects. Like any heuristic search algorithms, local optima are a problem. Interpretation of results is (very) subjective. Partitional Clustering Nonhierarchical, each instance is placed in exactly one of K non-overlapping clusters. Since only one set of clusters is output, the user normally has to input the desired number of clusters K. 15

K-means Clustering: Initialization Decide K, and initialize K centers (randomly) 5 4 3 k 1 2 k 2 1 0 0 1 2 3 4 5 k 3 K-means Clustering: Iteration 1 Assign all objects to the nearest center. 5 Move a center to the mean of its members. 4 3 k 1 2 k 2 1 0 k 3 0 1 2 3 4 5 16

K-means Clustering: Iteration 2 After moving centers, re-assign the objects 5 4 k 1 3 2 1 k 2 k 3 0 0 1 2 3 4 5 K-means Clustering: Iteration 2 After moving centers, re-assign the objects to nearest centers. Move a center to the mean of its new members. 5 4 k 1 3 2 1 k 2 k 3 0 0 1 2 3 4 5 17

K-means Clustering: Finished! Re-assign and move centers, until 5 no objects changed membership. 4 k 1 3 2 1 k 2 k 3 0 0 1 2 3 4 5 Algorithm k-means 1. Decide on a value for K, the number of clusters. 2. Initialize the K cluster centers (randomly, if necessary). 3. Decide the class memberships of the N objects by assigning them to the nearest cluster center. 4. Re-estimate the K cluster centers, by assuming the memberships found above are correct. 5. Repeat 3 and 4 until none of the N objects changed membership in the last iteration. 18

Why K-means Works What is a good partition? High intra-cluster similarity K-means optimizes the average distance to members of the same cluster K n 1 k n k n k = 1 k i= 1 j= 1 which is twice the total distance to centers, also called squared error x ki x kj 2 10 9 8 7 6 5 4 3 2 1 K n k se = x µ k = 1 i= 1 ki k 2 1 2 3 4 5 6 7 8 9 10 Comments on K-Means Strength Simple, easy to implement and debug Intuitive objective function: optimizes intra-cluster similarity Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. Weakness Applicable only when mean is defined, then what about categorical data? Often terminates at a local optimum. Initialization is important. Need to specify K, the number of clusters, in advance Unable to handle noisy data and outliers Not suitable to discover clusters with non-convex shapes Summary Assign members based on current centers Re-estimate centers based on current assignment 19

Outline Motivation Distance measure Hierarchical clustering Partitional clustering K-means Gaussian Mixture Models Number of clusters Gaussian Gaussian Mixture Models 2 1 ( x µ ) P( x) = ϕ ( x; µ, σ) = exp( ) 2 σ 2π 2σ ex. height of one population Gaussian Mixture P( C = i) = ω, P( x C = i) = ϕ ( x; µ, σ ) K i i i P( x) = P( x, C = i) = P( C = i) P( x C = i) = ωϕ ( x; µ, σ ) i= 1 i= 1 ex. height of two population K i i i 20

Gaussian Mixture Models Mixture of Multivariate Gaussian P( C = k) = ω, P( x C = i) = ϕ ( x; µ, Σ ) i i i ex. y-axis is blood pressure and x-axis is age GMM+EM = Soft K-means Decide the number of clusters, K Initialize parameters (randomly) E-step: assign probabilistic membership M-step: re-estimate parameters based on probabilistic membership Repeat until change in parameters are smaller than a threshold See R&N for details 21

Iteration 1 The cluster means are randomly assigned 22

Iteration 2 Iteration 5 23

Iteration 25 Strength of Gaussian Mixture Models Interpretability: learns a generative model of each cluster you can generate new data based on the learned model Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. Intuitive (?) objective function: optimizes data likelihood Extensible to other mixture models for other data types e.g. mixture of multinomial for categorical data maximization instead of mean sensitivity to noise and outliers depend on the distribution 24

Weakness of Gaussian Mixture Models Often terminates at a local optimum. Initialization is important. Need to specify K, the number of clusters, in advance Not suitable to discover clusters with non-convex shapes Summary To learn Gaussian mixture, assign probabilistic membership based on current parameters, and reestimate parameters based on current membership Clustering methods: Comparison Hierarchical K-means GMM Running time naively, O(N 3 ) fastest (each iteration is linear) fast (each iteration is linear) Assumptions requires a similarity / distance measure strong assumptions strongest assumptions Input parameters none K (number of clusters) K (number of clusters) Clusters subjective (only a tree is returned) exactly K clusters exactly K clusters 25

Outline Motivation Distance measure Hierarchical clustering Partitional clustering K-means Gaussian Mixture Models Number of clusters How can we tell the right number of clusters? In general, this is a unsolved problem. However there are many approximate methods. In the next few slides we will see an example. 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 26

When k = 1, the objective function is 873.0 1 2 3 4 5 6 7 8 9 10 When k = 2, the objective function is 173.1 1 2 3 4 5 6 7 8 9 10 27

When k = 3, the objective function is 133.6 1 2 3 4 5 6 7 8 9 10 We can plot the objective function values for k equals 1 to 6 The abrupt change at k = 2, is highly suggestive of two clusters in the data. This technique for determining the number of clusters is known as knee finding or elbow finding. Objective Function 1.00E+03 9.00E+02 8.00E+02 7.00E+02 6.00E+02 5.00E+02 4.00E+02 3.00E+02 2.00E+02 1.00E+02 0.00E+00 k 1 2 3 4 5 6 Note that the results are not always as clear cut as in this toy example 28

What you should know Why is clustering useful What are the different types of clustering algorithms What are the assumptions we are making for each, and what can we get from them Unsolved issues: number of clusters, initialization, etc. Acknowledgement: modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, Andrew Moore, and others. 29