Clustering. CMPUT 466/551 Nilanjan Ray

Similar documents
Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Neural Networks Lesson 5 - Cluster Analysis

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Distances, Clustering, and Classification. Heatmaps

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Unsupervised learning: Clustering

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Chapter ML:XI (continued)

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Social Media Mining. Data Mining Essentials

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Chapter 7. Cluster Analysis

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

How To Cluster

Vector Quantization and Clustering

0.1 What is Cluster Analysis?

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Information Retrieval and Web Search Engines

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Machine Learning using MapReduce

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

An Introduction to Cluster Analysis for Data Mining

Probabilistic Latent Semantic Analysis (plsa)

Clustering UE 141 Spring 2013

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

Hierarchical Cluster Analysis Some Basics and Algorithms

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

A comparison of various clustering methods and algorithms in data mining

Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen

Statistical Databases and Registers with some datamining

Applied Algorithm Design Lecture 5

Cluster Analysis: Advanced Concepts

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Cluster Analysis: Basic Concepts and Algorithms

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Diagrams and Graphs of Statistical Data

Cluster Analysis: Basic Concepts and Algorithms

HES-SO Master of Science in Engineering. Clustering. Prof. Laura Elena Raileanu HES-SO Yverdon-les-Bains (HEIG-VD)

Data Preprocessing. Week 2

They can be obtained in HQJHQH format directly from the home page at:

A Fast Algorithm for Multilevel Thresholding

Outlier Detection in Clustering

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Lecture 10: Regression Trees

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Colour Image Segmentation Technique for Screen Printing

Local outlier detection in data forensics: data mining approach to flag unusual schools

The SPSS TwoStep Cluster Component

Clustering. Chapter Introduction to Clustering Techniques Points, Spaces, and Distances

Big Data Analytics CSCI 4030

Cluster Analysis using R

Environmental Remote Sensing GEOG 2021

Standardization and Its Effects on K-Means Clustering Algorithm

Cluster Analysis: Basic Concepts and Methods

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Summarizing and Displaying Categorical Data

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Cluster analysis Cosmin Lazar. COMO Lab VUB

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

6.4 Normal Distribution

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Hadoop SNS. renren.com. Saturday, December 3, 11

Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Segmentation & Clustering

How To Identify Noisy Variables In A Cluster

Segmentation of stock trading customers according to potential value

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Constrained Clustering of Territories in the Context of Car Insurance

Linear Threshold Units

Data Exploration Data Visualization

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Categorical Data Visualization and Clustering Using Subjective Factors

CS Introduction to Data Mining Instructor: Abdullah Mueen

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

CLUSTER ANALYSIS FOR SEGMENTATION

Lecture 3: Linear methods for classification

Exploratory data analysis (Chapter 2) Fall 2011

HDDVis: An Interactive Tool for High Dimensional Data Visualization

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Introduction to Clustering

with functions, expressions and equations which follow in units 3 and 4.

Machine vision systems - 2

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

BIRCH: An Efficient Data Clustering Method For Very Large Databases

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Transcription:

Clustering CMPUT 466/551 Nilanjan Ray

What is Clustering? Attach label to each observation or data points in a set You can say this unsupervised classification Clustering is alternatively called as grouping Intuitively, if you would want to assign same label to a data points that are close to each other Thus, clustering algorithms rely on a distance metric between data points Sometimes, it is said that the for clustering, the distance metric is more important than the clustering algorithm

Distances: Quantitative Variables Data point: x = [ x 1 x ] i i ip T Some examples

Distances: Ordinal and Categorical Variables Ordinal variables can be forced to lie within (0, 1) and then a quantitative metric can be applied: k 1/ M 2, k = 1,2,, M For categorical variables, distances must be specified by user between each pair of categories.

Combining Distances Often weighted sum is used: D( x i, x j ) p = wl d( xil, x jl ), w l= 1 p l= 1 l = 1, w l > 0.

Combinatorial Approach In how many ways can we assign K labels to N observations? For each such possibility, we can compute a cost. Pick up the assignment with best cost. Formidable number of possible assignments: (I ll post a page about the origin of this formula)

K-means Overview An unsupervised clustering algorithm K stands for number of clusters, it is typically a user input to the algorithm; some criteria can be used to automatically estimate K It is an approximation to an NP-hard combinatorial optimization problem K-means algorithm is iterative in nature It converges, however only a local minimum is obtained Works only for numerical data Easy to implement

K-means: Setup x 1,, x N are data points or vectors of observations Each observation (vector x i ) will be assigned to one and only one cluster C(i) denotes cluster number for the i th observation Dissimilarity measure: Euclidean distance metric K-means minimizes within-cluster point scatter: where W ( C) = 1 2 K k = 1 C ( i) = k C( j) = k x i x j 2 = K N k k = 1 C ( i) = k x i m k 2 (Exercise) m k is the mean vector of the k th cluster N k is the number of observations in k th cluster

Within and Between Cluster Criteria Let s consider total point scatter for a set of N data points: T can be re-written as: T = 1 2 N N i= 1 j= 1 d( x i, x j ) Distance between two points Where, Within cluster scatter W ( C) = B( C) = 1 2 1 2 Between cluster scatter K K 1 T = ( d( xi, x j ) + d( xi, x j )) 2 k = 1 C( i) = k C( j) = k C( j) k = W ( C) + B( C) k = 1 C ( i) = k C( j) = k K k = 1 C ( i) = k C ( j) k d( x i, x j ) d( x i, x j ) If d is square Euclidean distance, then and W ( C) = B( C) = K k = 1 K N k k = 1 C( i) = k N k m k m x m Minimizing W(C) is equivalent to maximizing B(C) i 2 k 2 Ex. Grand mean

K-means Algorithm For a given cluster assignment C of the data points, compute the cluster means m k : For a current set of cluster means, assign each observation as: Iterate above two steps until convergence., 1,, ) ( : K k N x m k k i C i i k = = = N i m x i C K k k i, 1,, arg min ) ( 1 2 = =

K-means clustering example

K-means Image Segmentation An image (I) Three-cluster image (J) on gray values of I Matlab code: I = double(imread( ')); Note that K-means result is noisy J = reshape(kmeans(i(:),3),size(i));

K-means: summary Algorithmically, very simple to implement K-means converges, but it finds a local minimum of the cost function Works only for numerical observations K is a user input; alternatively BIC (Bayesian information criterion) or MDL (minimum description length) can be used to estimate K Outliers can considerable trouble to K-means

K-medoids Clustering K-means is appropriate when we can work with Euclidean distances Thus, K-means can work only with numerical, quantitative variable types Euclidean distances do not work well in at least two situations Some variables are categorical Outliers can be potential threats A general version of K-means algorithm called K- medoids can work with any distance measure K-medoids clustering is computationally more intensive

K-medoids Algorithm Step 1: For a given cluster assignment C, find the observation in the cluster minimizing the total distance to other points in that cluster: i k = arg min { i: C ( i) = k} C ( j) = k d( x i, x j ). Step 2: Assign mk = x, k = 1,2,, K i k Step 3: Given a set of cluster centers {m 1,, m K }, minimize the total error by assigning each observation to the closest (current) cluster center: Iterate steps 1 to 3 C( i) = arg min d( x, m 1 k K i k ), i = 1,, N

K-medoids Summary Generalized K-means Computationally much costlier that K-means Apply when dealing with categorical data Apply when data points are not available, but only pair-wise distances are available Converges to local minimum

Choice of K? Can W K (C), i.e., the within cluster distance as a function of K serve as any indicator? Note that W K (C) decreases monotonically with increasing K. That is the within cluster scatter decreases with increasing centroids. Instead look for gap statistics (successive difference between W K (C)): { W K * WK + 1 : K < K } >> { WK WK + 1 : K K * }

Choice of K Data points simulated from two pdfs Log(W K ) curve Gap curve This is essentially a visual heuristic

Vector Quantization A codebook (a set of centroids/codewords): { m, m2,, m 1 K A quantization function: q ( x i ) = m k K-means can be used to construct the codebook } Often, the nearest-neighbor function

Image Compression by VQ 8 bits/pixel 1.9 bits/pixel, using 200 codewords 0.5 bits/pixel, using 4 codewords

Otsu s Image Thresholding Method Based on the clustering idea: Find the threshold that minimizes the weighted within-cluster point scatter. This turns out to be the same as maximizing the between-class scatter. Operates directly on the gray level histogram [e.g. 256 numbers, P(i)], so it s fast (once the histogram is computed).

Otsu s Method Histogram (and the image) are bimodal. No use of spatial coherence, nor any other notion of object structure. Assumes uniform illumination (implicitly), so the bimodal brightness behavior arises from object appearance differences only.

The weighted within-class variance is: σ w 2 (t) =q 1 (t)σ 1 2 (t)+q 2 (t)σ 2 2 (t) Where the class probabilities are estimated as: q 1 (t) = t P(i) q 2 (t) = P(i) i =1 And the class means are given by: I i = t +1 µ 1 (t) = t i =1 ip(i) µ 2 (t) = q 1 (t) I i =t +1 ip(i) q 2 (t )

Finally, the individual class variances are: σ 1 2 (t) = t i=1 [i µ 1 (t)] 2 P(i) q 1 (t) σ 2 I 2 (t) = [i µ 2 (t)] 2 P(i) q 2 (t) i =t +1 Now, we could actually stop here. All we need to do is just run through the full range of t values [1, 256] and pick the value that minimizes. σ 2 w (t) But the relationship between the within-class and betweenclass variances can be exploited to generate a recursion relation that permits a much faster calculation.

Finally... Initialization... q 1 (1) = P(1) µ 1 (0) = 0 ; Recursion... q 1 (t +1) = q 1 (t) + P(t +1) µ 1 (t +1) = q 1(t)µ 1 (t) + (t +1)P(t +1) q 1 (t +1) µ 2 (t +1) = µ q 1(t +1)µ 1 (t +1) 1 q 1 (t +1)

After some algebra, we can express the total variance as... σ 2 = σ w 2 (t) + q 1 (t)[1 q 1 (t)][µ 1 (t) µ 2 (t)] 2 Within-class, from before Between-class, σ B 2 (t) Since the total is constant and independent of t, the effect of changing the threshold is merely to move the contributions of the two terms back and forth. So, minimizing the within-class variance is the same as maximizing the between-class variance. The nice thing about this is that we can compute the quantities in recursively σ 2 B (t) as we run through the range of t values.

Result of Otsu s Algorithm An image Binary image by Otsu s method 6 5 4 3 2 1 0 0 50 100 150 200 250 300 Gray level histogram Matlab code: I = double(imread( ')); I = (I-min(I(:)))/(max(I(:))-min(I(:))); J = I>graythresh(I);

Hierarchical Clustering Two types: (1) agglomerative (bottom up), (2) divisive (top down) Agglomerative: two groups are merged if distance between them is less than a threshold Divisive: one group is split into two if intergroup distance more than a threshold Can be expressed by an excellent graphical representation called dendogram, when the process is monotonic: dissimilarity between merged clusters is increasing. Agglomerative clustering possesses this property. Not all divisive methods possess this monotonicity. Heights of nodes in a dendogram are proportional to the threshold value that produced them.

An Example Hierarchical Clustering

Linkage Functions Linkage functions computes the dissimilarity between two groups of data points: Single linkage (minimum distance between two groups): d SL ( G, H ) = min d i G j H ij Complete linkage (maximum distance between two groups): d CL ( G, H ) = max d i j G H ij Group average (average distance between two groups): d GA ( G, H ) = N 1 N G H i G j H d ij

Linkage Functions SL considers only a single pair of data points; if this pair is close enough then action is taken. So, SL can form a chain by combining relatively far apart data points. SL often violates the compactness property of a cluster. SL can produce clusters with large diameters (D G ). D G = max d i G, j G ij CL is just the opposite of SL; it produces many clusters with small diameters. CL can violate closeness property- two close data points may be assigned to different clusters. GA is a compromise between SL and CL

Different Dendograms

Hierarchical Clustering on Microarray Data

Hierarchical Clustering Matlab Demo