How To Solve The Cluster Algorithm

Similar documents

Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering Steffen Unkel

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Cluster Analysis: Advanced Concepts

Linear Threshold Units

Unsupervised learning: Clustering

An Introduction to Cluster Analysis for Data Mining

A MATLAB Toolbox and its Web based Variant for Fuzzy Cluster Analysis

Chapter 7. Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Dynamical Clustering of Personalized Web Search Results

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

How to use the vegclust package (ver )

Data Mining: Algorithms and Applications Matrix Math Review

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Information Retrieval and Web Search Engines

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

On Clustering Validation Techniques

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Customer Segmentation and Customer Profiling for a Mobile Telecommunications Company Based on Usage Behavior

Cluster Analysis: Basic Concepts and Methods

K-Means Clustering Tutorial

Cluster Analysis: Basic Concepts and Algorithms

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

α = u v. In other words, Orthogonal Projection

Statistical Machine Learning

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

Categorical Data Visualization and Clustering Using Subjective Factors

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Distances, Clustering, and Classification. Heatmaps

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

Clustering UE 141 Spring 2013

Portfolio selection based on upper and lower exponential possibility distributions

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Component Ordering in Independent Component Analysis Based on Data Power

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Data Mining for Knowledge Management. Clustering

1 Determinants and the Solvability of Linear Systems

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS

Chapter ML:XI (continued)

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

A comparison of various clustering methods and algorithms in data mining

We shall turn our attention to solving linear systems of equations. Ax = b

Forschungskolleg Data Analytics Methods and Techniques

The SPSS TwoStep Cluster Component

Vector and Matrix Norms

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Prototype-less Fuzzy Clustering

PROBABILISTIC DISTANCE CLUSTERING

1 Introduction to Matrices

A linear algebraic method for pricing temporary life annuities

A Two-Step Method for Clustering Mixed Categroical and Numeric Data

Assessment of safety distance with the use of buffer zones of the objects

Social Media Mining. Data Mining Essentials

Cluster analysis Cosmin Lazar. COMO Lab VUB

Nonlinear Iterative Partial Least Squares Method

Statistical Databases and Registers with some datamining

STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva

uclust - A NEW ALGORITHM FOR CLUSTERING UNSTRUCTURED DATA

Solutions to Exam in Speech Signal Processing EN2300

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

Self Organizing Maps: Fundamentals

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Linear Codes. Chapter Basics

Data Mining K-Clustering Problem

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Cluster analysis with SPSS: K-Means Cluster Analysis

Unsupervised and supervised data classification via nonsmooth and global optimization 1

Clustering Techniques: A Brief Survey of Different Clustering Algorithms

Environmental Remote Sensing GEOG 2021

A Review on Clustering and Outlier Analysis Techniques in Datamining

Spazi vettoriali e misure di similaritá

Notes on Symmetric Matrices

Fuzzy Clustering Technique for Numerical and Categorical dataset

Enhanced Customer Relationship Management Using Fuzzy Clustering

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

Transcription:

Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80

Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 2 / 80

Summary 1 K-Means 2 Fuzzy C-means Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 2 / 80

Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 2 / 80

Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering 4 Gustafson-Kessel Algorithm Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 2 / 80

Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering 4 Gustafson-Kessel Algorithm 5 Gath-Geva Algorithm Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 2 / 80

Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering 4 Gustafson-Kessel Algorithm 5 Gath-Geva Algorithm 6 K-medoids Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 2 / 80

Section Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering 4 Gustafson-Kessel Algorithm 5 Gath-Geva Algorithm 6 K-medoids Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 3 / 80

K-Means Algorithm 1 Based on the Euclidean distances among elements of the cluster. 2 Centre of the cluster is the mean value of the objects in the cluster. 3 Classifies objects in a hard way. 4 Each object belongs to a single cluster. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 4 / 80

Initial Definitions 1 Consider n objects and c clusters. 2 Each object x e X is defined by l characteristics x e = (x e,1,x e,2,...,x e,l ). 3 Consider A a set of c clusters (A = A 1,A 2,...,A c ). Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 5 / 80

K-Means Properties 1 The union of all c clusters makes the Universe i c A i = X. 2 No element belongs to more than one cluster. i,j c : i j A i A j = 3 There is no empty cluster A i X. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 6 / 80

Membership Function { 1 xe A χ Ai (x e ) = i 0 x e / A i c c χ Ai (x e ) χ ie = 1, e i=1 i=1 χ ie χ je = 0, e n 0 < χ ie < n e=1 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 7 / 80

Membership Matrix U 1 Matrix containing the values of inclusion of each element into each cluster (0 or 1). 2 Matrix has c (clusters) lines and n (elements) columns. 3 The sum of all elements in the column must be equal to one (element belongs only to one cluster 4 The sum of each line must be less than n e grater than 0. No empty cluster, or cluster containing all elements. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 8 / 80

Matrix Examples 1 Two examples of clustering. 2 What do the matrices represent? 1 0 1 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 9 / 80

Matrix Examples contd 1 Matrices U 1 and U 2 represent the same clusters. U 1 1 0 1 0 1 0 0 1 0 1 0 1 U 2 0 1 0 1 0 1 1 0 1 0 1 0 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 10 / 80

K-Means inputs and outputs 1 Inputs: the number of clusters c and a database containing n objects with l characteristics each. 2 Output: A set of c clusters that minimises the square-error criterion. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 11 / 80

K-Means Algorithm v1 Arbitrarily assigns each object to a cluster (matrix U). repeat Update the cluster centres; Reassign objects to the clusters to which the objects are most similar; until no change; Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 12 / 80

K-Means Algorithm v2 Arbitrarily choose c objects as the initial cluster centres. repeat Reassign objects to the clusters to which the objects are most similar; Update the cluster centres; until no change; Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 13 / 80

Algorithm details 1 The algorithm tries to minimize the function J(U,v) = n e=1 i=1 c χ ie (d ie ) 2 2 (d ie ) 2 is the distance between the element x e (m characteristics) and the centre of the cluster i (v i ) d ie = d(x e v i ) d ie = x e v i [ l d ie = (x ej v ij ) 2] 1/2 j=1 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 14 / 80

Cluster Centres 1 The centre of the cluster i (v i ) is an l characteristics vector. 2 The jth co-ordinate is calculated as v ij = n χ ie x ej e=1 n e=1 χ ie Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 15 / 80

Detailed Algorithm Choose c (number of clusters). Set error (ε > 0) and step (r = 0). Arbitrarily set matrix U(r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 16 / 80

Detailed Algorithm cont. repeat Calculate the centre of the clusters vi r Calculate the distance d r i of each point to the centre of the clusters Generate U r+1 recalculating all characteristic functions using the equation { 1 d χ r+1 r ie = ie = min(d r je ), j k 0 until U r+1 U r < ε Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 17 / 80

Matrix Norms 1 Consider a matrix U of n lines and n columns: n 2 Column norm = A 1 = max a ij 1 j n 3 Line norm = A = max 1 i n j=1 i=1 n a ij Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 18 / 80

Stop criteria 1 Some implementatios use the value of the objective function to stop the algorithm 2 The algorithm will stop when the objective function improvement between two consecutive iterations is less than the minimum amount of improvement specified. 3 The objective function is J(U,v) = n e=1 i=1 c χ ie (d ie ) 2 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 19 / 80

K-Means Problems 1 Suitable when clusters are compact clouds well separated. 2 Scalable because computational complexity is O(nkr). 3 Necessity of choosing c is disadvantage. 4 Not suitable for non convex shapes. 5 It is sensitive to noise and outliers because they influence the means. 6 Depends on the initial allocation. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 20 / 80

Examples of Result 5 4 3 2 1 1 2 3 4 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 21 / 80

Original x Result Data Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 22 / 80

Section Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering 4 Gustafson-Kessel Algorithm 5 Gath-Geva Algorithm 6 K-medoids Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 23 / 80

Fuzzy C-means 1 Fuzzy version of K-means 2 Elements may belong to more than one cluster 3 Values of characteristic function range from 0 to 1. 4 It is interpreted as the degree of membership of an element to a cluster relative to all other clusters. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 24 / 80

Initial Definitions 1 Consider n objects and c clusters. 2 Each object x e X is defined by l characteristics x i = (x e1,x e2,...,x el ). 3 Consider A a set of c clusters (A = A 1,A 2,...,A c ). Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 25 / 80

C-Means Properties 1 The union of all c clusters makes the Universe i c A i = X. 2 There is no empty cluster A i X. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 26 / 80

Membership Function µ Ai (x e ) µ ie [0..1] c c µ Ai (x e ) µ ie = 1, e i=1 0 < i=1 n µ ie < n e=1 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 27 / 80

Membership Matrix 1 Matrix containing the values of inclusion of each element into each cluster [0,1]. 2 Matrix has c (clusters) lines and n (elements) columns. 3 The sum of all elements in the column must be equal to one. 4 The sum of each line must be less than n e grater than 0. 5 No empty cluster, or cluster containing all elements. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 28 / 80

Matrix Examples 1 Two examples of clustering. 2 What do the clusters represent? e1 e2 e3 0.9 0 1 0 0.85 0.95 0.1 1 0 1 0.15 0.05 e4 e5 e6 0 0.2 0 0 0.4 1 0 0.4 1 0 0.1 0 1 0.4 0 1 0.5 0 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 29 / 80

C-Means Algorithm v1 Arbitrarily assigns each object to a cluster (matrix U). repeat Update the cluster centres; Reassign objects to the clusters to which the objects are most similar; until no change; Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 30 / 80

C-Means Algorithm v2 Arbitrarily choose c objects as the initial cluster centres. repeat Reassign objects to the clusters to which the objects are most similar; Update the cluster centres; until no change; Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 31 / 80

Algorithm details 1 The algorithm tries to minimize the function J(U,v) = 2 m is the nebulization factor. n e=1 i=1 c µ m ie (d ie) 2 3 (d ie ) 2 is the distance between the element x e (m characteristics) and the centre of the cluster i (v i ) d ie = d(x e v i ) d ie = x e v i [ l d ie = (x ej v ij ) 2] 1/2 j=1 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 32 / 80

Nebulization Factor 1 m is the nebulization factor. 2 This value has a range [1, ) 3 If m = 1 the the system is crisp. 4 If m then all the membership values tend to 1/c. 5 The most common values are 1.25 and 2.0 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 33 / 80

Cluster Centres 1 The centre of the cluster i (v i ) is an l characteristics vector. 2 The jth co-ordinate is calculated as v ij = n µ m ie x ej e=1 n e=1 µ m ie Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 34 / 80

Detailed Algorithm Choose c (number of clusters). Set error (ε > 0), nebulization factor (m) and step (r = 0). Arbitrarily set matrix U(r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 35 / 80

Detailed Algorithm cont. repeat Calculate the centre of the clusters vi r Calculate the distance d r i of each point to the centre of the clusters Generate U r+1 recalculating all characteristic functions. How? until U r+1 U r < ε Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 36 / 80

How to recalculate? 1 If there is any distance greater than zero then membership grade is the weighted average of the distances to all centers. 2 Otherwise the element belongs to this cluster and no other one. if d ie > 0, i [1..c] [ c then µ ie = k=1 [ die d ke ] 2 ] 1 m 1 else if d ie = 0 then µ ie = 1 else µ ie = 0 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 37 / 80

Original x Result Data Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 38 / 80

Section Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering 4 Gustafson-Kessel Algorithm 5 Gath-Geva Algorithm 6 K-medoids Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 39 / 80

Clustering Details - I µ Ai (x e ) µ ie [0..1] c c µ Ai (x e ) µ ie > 0, e i=1 1 The membership degree is the representativity of typicality of the datum x related to the cluster i. i=1 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 40 / 80

Algorithm details - I 1 The algorithm tries to minimize the function n c c n J(U,v) = µ m ie(d ie ) 2 + η i (1 µ ie ) m e=1 i=1 i=1 e=1 2 The first sum is the usual and the second rewards high memberships. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 41 / 80

Algorithm details - II 1 It is possible to prove that in order to minimize the function J the membership degree must be calculated as: µ ie = 1 ( ) 1 d 1+ 2 m 1 ie η i 2 Where η i = n i=1 n i=1 µ m ied 2 ie µ m ie 3 η i is a factor that controls the expansion of the cluster. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 42 / 80

Algorithm details - III 1 η i is a factor that controls the expansion of the cluster. 2 For instance, if η i is equal to the distance d 2 ie then at this distance the membership will be equal to 0.5. 3 So using η i it is possible to control the membership degrees. 4 η i is usually estimated. µ ie = 1 ( ) 1 d 1+ 2 m 1 ie η i η i = d 2 ie µ ie = 0.5 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 43 / 80

Detailed Algorithm 1 It seems natural to repeat the algorithms K-means and C-Means that are similar. 2 However this does not give satisfactory results. 3 The algorithm tends to interpret data with low membership in all clusters as outliers instead of adjusting the results. 4 So a initial probabilistic clustering is performed. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 44 / 80

Detailed Algorithm Choose c (number of clusters). Set error (ε > 0), nebulization factor (m) and step (r = 0). Execute Algorithm Fuzzy C-Means Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 45 / 80

Detailed Algorithm cont. for 2 TIMES do Initialize U 0 and Cluster Centres with the previous results r 0 Initialize η i with the previous results repeat r r+1 Calculate the Cluster Centres using U s 1 Calculate U s until U r+1 U r < ε end for Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 46 / 80

Section Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering 4 Gustafson-Kessel Algorithm 5 Gath-Geva Algorithm 6 K-medoids Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 47 / 80

GK Method 1 This method (GK) is similar to the Fuzzy C-means (FCM). 2 The difference is the way the distance is calculated. 3 FCM uses Euclidean distances 4 GK uses Mahalanobis distances Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 48 / 80

C-Means Problem 1 C-Means and K-Means produce spherical clusters. C-Means Cluster Centre C-Means Cluster Possible better solution Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 49 / 80

Deforming Space - I 1 An arbitrary, positive definite and symmetric matrix A R p p induces a scalar product < x,y >= x T Ay [ ] 2 1/4 0 For instance the matrix A = provides 0 4 x y 2 A [ 1/4 0 = (x y) 0 4 ]( x y ) = ( x 2 /4 4y 2 ) ( (x/2) 2 = (2y) 2 ) 3 This corresponds to a deformation of the unit circle to an ellipse with a double diameter in the x-direction and a half diameter in the y-direction. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 50 / 80

Deforming Space - II 1 If there is a priori knowledge about the data elliptic clusters can be recognized. 2 If each has its own matrix the different elliptic clusters can be obtained. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 51 / 80

GK details 1 Mahalanobis distance is calculated as d 2 ik = (x i v k ) T A i (x i v k ) 2 It is possible to prove that the matrices A i are given by A i = p dets i S 1 i 3 S i is the fuzzy covariance matrix and equal to S i = n µ m ie (x e v i )(x e v i ) T e=1 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 52 / 80

GK Comments 1 In addition to cluster centres each cluster is characterized by a symmetric and positive definite matrix A. 2 The clusters are hyper-ellipsoids on the R l. 3 The hyper-ellipsoids have approximately the same size. 4 In order to be possible to calculate S 1 the number of samples n must be at least equal to the number of dimensions l plus 1. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 53 / 80

GK Results Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 54 / 80

Section Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering 4 Gustafson-Kessel Algorithm 5 Gath-Geva Algorithm 6 K-medoids Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 55 / 80

Introduction 1 It is also known as Gaussian Mixture Decomposition. 2 It is similar to the FCM method 3 The Gauss distance is used instead of Euclidean distance. 4 The clusters no longer have a definite shape and may have various sizes. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 56 / 80

Gauss Distance 1 Gauss distance d ie = 1 P i det(ai ) exp ( ) 1 2 (x e v i ) T A 1 (x e v i ) 2 Cluster centre v i = n µ ie x e e=1 n e=1 µ ie Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 57 / 80

Gauss Distance 1 A i = n e=1 µm ie (x e v i )(x e v i ) T n e=1 µm ie 2 Probability that an element x e belongs to the cluster i P i = n e=1 µm ie n c e=1 i=1 µm ie Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 58 / 80

GG Comments 1 P i is a parameter that influences the size of a cluster. 2 Bigger clusters attract more elements. 3 The exponential term makes more difficult to avoid local minima. 4 Usually another clustering method is used to initialise the partition matrix U. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 59 / 80

GG Results Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 60 / 80

Section Summary 1 K-Means 2 Fuzzy C-means 3 Possibilistic Clustering 4 Gustafson-Kessel Algorithm 5 Gath-Geva Algorithm 6 K-medoids Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 61 / 80

Source 1 Algorithm presented in: Finding groups in Data: An Introduction to clusters analysis, L. Kaufman and P. J. Rousseeuw, John Wiley & Sons Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 62 / 80

Methods 1 K-means is sensitive to outliers since an object with an extremely large value may distort the distribution of data. 2 Instead of taking the mean value the most centrally object (medoid) is used as reference point. 3 The algorithm minimizes the sum of dissimilarities between each object and the medoid (similar to k-means) Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 63 / 80

Strategies 1 Find k-medoids arbitrarily. 2 Each remaining object is clustered with the medoid to which is the most similar. 3 Then iteratively replaces one of the medoids by a non-medoid as long as the quality of the clustering is improved. 4 The quality is measured using a cost function that measures the sum of the dissimilarities between the objects and the medoid of their cluster. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 64 / 80

Reassignment 1 Each time a reassignment occurs a difference in square-error J is contributed. 2 The cost function J calculates the total cost of replacing a current medoid by a non-medoid. 3 If the total cost is negative then m j is replaced by m random, otherwise the replacement is not accepted. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 65 / 80

Phases 1 Build phase: an initial clustering is obtained by the successive selection of representative objects until c (number of clusters) objects have been found. 2 Swap phase: an attempt to improve the set of the c representative objects is made. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 66 / 80

Build Phase 1 The first object is the one for which the sum of dissimilarities to all objects is as small as possible. 2 This is most centrally located object. 3 At each subsequent step another object that decreases the objective function is selected. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 67 / 80

Build Phase next steps I 1 Consider an object e i (candidate) which has not yet been selected. 2 Consider a non selected object e j. 3 Calculate the difference C ji, between its dissimilarity D j = d(e n,e j ) with the most similar previously selected object e n, and its dissimilarity d(e i,e j ) with object e i. 4 If C ij = D j d(e i,e j ) is positive then object e j will contribute to select object e i, C ij = max(d j d(e i,e j ),0) 5 If C ij > 0 then e j is closer to e i than any other previously selected object. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 68 / 80

Build Phase next steps II 1 Calculate the total gain G i obtained by selecting object e i G i = j (C ji ) 2 Choose the not yet selected object e i which maximizes G i = j (C ji ) 3 The process continues until c objects have been found. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 69 / 80

Swap Phase 1 It is attempted to improve the set of representative elements. 2 Consider all pairs of elements (i,h) for which e i has been selected and e h has not. 3 What is the effect of swapping e i and e h? 4 Consider the objective function as the sum of dissimilarities between each element and the most similar representative object. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 70 / 80

Swap Phase - possibility a 1 What is the effect of swapping e i and e h? 2 Consider a non selected object e j and calculate its contribution C jih to the swap: 3 If e j is more distant from both e i and e h than from one of the other representatives, e.g. e k, so C ijh = 0 4 So e j belongs to object e k, sometimes referred as the medoid m k and the swap will not change the quality of the clustering. 5 Remember: positive contributions decrease the quality of the clustering. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 71 / 80

Swap Phase - possibility a 1 Object e j belongs to medoid e k (i k). 2 If e i is replaced by e h and e j is still closer to e k, then C jih = 0. ek ej ei eh Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 72 / 80

Swap Phase - possibility b 1 If e j is not further from e i than from any one of the other representative (d(e i,e j ) = D j ), two situations must be considered: 2 e j is closer to e h than to the second closest representative e n, d(e j,e h ) < d(e j,e n ) then C jih = d(e j,e h ) d(e j,e i ). 3 Contribution C jih can either positive or negative. 4 If element e j is closer to e i than to e h the contribution is positive, the swap is not favourable. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 73 / 80

Swap Phase - possibility b.1+ 1 Object e j belongs to medoid e i. 2 If e i is replaced by e h and e j is close to e i than e h the contribution is positive, Cjih > 0. en ej ei eh Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 74 / 80

Swap Phase - possibility b.1-1 Object e j belongs to medoid e i. 2 If e i is replaced by e h and e j is not closer to e i than e h the contribution is negative. C jih<0. en ei ej eh Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 75 / 80

Swap Phase - possibility b2 1 e j is at least as distant from e h than from the second closest representative d(e j,e h ) d(e j,e n ) then C jih = d(e j,e n ) d(e j,e i ) 2 The contribution is always positive because it is not advantageous to replace e i by an e h further away from e j than from the second best closest representative object. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 76 / 80

Swap Phase - possibility b.2 1 Object e j belongs to medoid e i. 2 If e i is replaced by e h and e j is further from e h than e n, the contribution is always positive. C jih>0. eh ei ej en Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 77 / 80

Swap Phase - possibility c 1 e j is more distant from e i than from at least one of the other representative objects (e n ) but closer to e h than to any representative object, then C jih = d(e j,e h ) d(e j,e i ) ei ej eh en Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 78 / 80

Comparisons 1 K-medoids is more robust than k-means in presence of noise and outliers. 2 K-means is less costly in terms of processing time. Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 79 / 80

The End Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 80 / 80