Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining



Similar documents
K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Cluster Analysis: Basic Concepts and Algorithms

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Cluster Analysis: Basic Concepts and Algorithms

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Cluster Analysis: Advanced Concepts

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Clustering UE 141 Spring 2013

A Survey of Clustering Techniques

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Chapter 7. Cluster Analysis

Unsupervised learning: Clustering

Data Mining K-Clustering Problem

An Introduction to Cluster Analysis for Data Mining

Clustering Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Cluster Analysis: Basic Concepts and Methods

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

OUTLIER ANALYSIS. Data Mining 1

Information Retrieval and Web Search Engines

Chapter ML:XI (continued)

2 Basic Concepts and Techniques of Cluster Analysis

Clustering & Visualization

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

A comparison of various clustering methods and algorithms in data mining

How To Cluster

Statistical Databases and Registers with some datamining

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

Neural Networks Lesson 5 - Cluster Analysis

Graph Mining and Social Network Analysis

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

SoSe 2014: M-TANI: Big Data Analytics

Territorial Analysis for Ratemaking. Philip Begher, Dario Biasini, Filip Branitchev, David Graham, Erik McCracken, Rachel Rogers and Alex Takacs

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Business Intelligence and Data Mining

Concept of Cluster Analysis

Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS

Unsupervised Data Mining (Clustering)

Linköpings Universitet - ITN TNM DBSCAN. A Density-Based Spatial Clustering of Application with Noise

Distances, Clustering, and Classification. Heatmaps

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Forschungskolleg Data Analytics Methods and Techniques

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Social Media Mining. Data Mining Essentials

Clustering Techniques: A Brief Survey of Different Clustering Algorithms

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

Clustering Technique in Data Mining for Text Documents

SCAN: A Structural Clustering Algorithm for Networks

The primary goal of this thesis was to understand how the spatial dependence of

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva

Norbert Schuff Professor of Radiology VA Medical Center and UCSF

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Mining Social-Network Graphs

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

On Clustering Validation Techniques

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

How To Perform An Ensemble Analysis

Data Clustering Techniques Qualifying Oral Examination Paper

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)

A Review on Clustering and Outlier Analysis Techniques in Datamining

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Data Mining Process Using Clustering: A Survey

Principles of Data Mining by Hand&Mannila&Smyth

DATA MINING TECHNIQUES AND APPLICATIONS

Constrained Clustering of Territories in the Context of Car Insurance

Clustering Data Streams

A Study of Web Log Analysis Using Clustering Techniques

A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining

Visual Data Mining with Pixel-oriented Visualization Techniques

The Data Mining Process

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

Quick Introduction of Data Mining Techniques

IJCSES Vol.7 No.4 October 2013 pp Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

Specific Usage of Visual Data Analysis Techniques

Introduction to Data Mining

Categorical Data Visualization and Clustering Using Subjective Factors

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Using Data Mining for Mobile Communication Clustering and Characterization

Transcription:

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1

What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Intra-cluster distances are minimized Inter-cluster distances are maximized 2

Applications of Cluster Analysis Discovered Clusters Understanding Group related documents for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations 1 2 Industry Group Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN, Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN, DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN, Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down, Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN, Sun-DOWN Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN, ADV-Micro-Device-DOWN,Andrew-Corp-DOWN, Computer-Assoc-DOWN,Circuit-City-DOWN, Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN, Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN Fannie-Mae-DOWN,Fed-Home-Loan-DOWN, MBNA-Corp-DOWN,Morgan-Stanley-DOWN 3 4 Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP, Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP, Schlumberger-UP Technology1-DOWN Technology2-DOWN Financial-DOWN Oil-UP Summarization Reduce the size of large data sets Clustering precipitation in Australia 3

What is not Cluster Analysis? Supervised classification Have class label information Simple segmentation Dividing students into different registration groups alphabetically, by last name Results of a query Groupings are a result of an external specification Graph partitioning Some mutual relevance and synergy, but areas are not identical 4

Notion of a Cluster can be Ambiguous How many clusters? Six Clusters Two Clusters Four Clusters 5

Types of Clusterings A clustering is a set of clusters Important distinction between hierarchical and partitional sets of clusters Partitional Clustering A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset Hierarchical clustering A set of nested clusters organized as a hierarchical tree 6

Partitional Clustering Original Points A Partitional Clustering 7

Hierarchical Clustering p1 p2 Hierarchical Clustering p3 p4 Dendrogram 8

Hierarchical Clustering Dendrogram Hierarchical Clustering Source: http://cs.jhu.edu/~razvanm/fs-expedition/tux3.html 9

Hierarchical Clustering Example: clustering of file systems from Linux Kernel 2.6.29 + tux3, based on shared external symbols (Hamming distance) Source: http://cs.jhu.edu/~razvanm/fs-expedition/tux3.html 10

Other Distinctions Between Sets of Clusters Exclusive versus non-exclusive In non-exclusive clusterings, points may belong to multiple clusters. Can represent multiple classes or border points Fuzzy versus non-fuzzy In fuzzy clustering, a point belongs to every cluster with some weight between 0 and 1 Weights must sum to 1 Probabilistic clustering has similar characteristics Partial versus complete In some cases, we only want to cluster some of the data Heterogeneous versus homogeneous Cluster of widely different sizes, shapes, and densities 11

Types of Clusters Well-separated clusters Center-based clusters Contiguous clusters Density-based clusters Property or Conceptual Described by an Objective Function 12

Types of Clusters: Well-Separated Well-Separated Clusters: A cluster is a set of points such that any point in a cluster is closer (or more similar) to every other point in the cluster than to any point not in the cluster. 3 well-separated clusters 13

Types of Clusters: Center-Based Center-based A cluster is a set of objects such that an object in a cluster is closer (more similar) to the center of a cluster, than to the center of any other cluster The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most representative point of a cluster 4 center-based clusters 14

Types of Clusters: Contiguity-Based Contiguous Cluster (Nearest neighbor or Transitive) A cluster is a set of points such that a point in a cluster is closer (or more similar) to one or more other points in the cluster than to any point not in the cluster. 8 contiguous clusters 15

Types of Clusters: Density-Based Density-based A cluster is a dense region of points, which is separated by low-density regions, from other regions of high density. Used when the clusters are irregular or intertwined, and when noise and outliers are present. 6 density-based clusters 16

Types of Clusters: Conceptual Clusters Shared Property or Conceptual Clusters Finds clusters that share some common property or represent a particular concept. 2 Overlapping Circles 17

Types of Clusters: Objective Function Clusters Defined by an Objective Function Finds clusters that minimize or maximize an objective function. Enumerate all possible ways of dividing the points into clusters and evaluate the `goodness' of each potential set of clusters by using the given objective function. (NP Hard) Can have global or local objectives. Hierarchical clustering algorithms typically have local objectives Partitional algorithms typically have global objectives A variation of the global objective function approach is to fit the data to a parameterized model. Parameters for the model are determined from the data. Mixture models assume that the data is a mixture' of a number of statistical distributions. 18

Types of Clusters: Objective Function Map the clustering problem to a different domain and solve a related problem in that domain Proximity matrix defines a weighted graph, where the nodes are the points being clustered, and the weighted edges represent the proximities between points Clustering is equivalent to breaking the graph into connected components, one for each cluster. Want to minimize the edge weight between clusters and maximize the edge weight within clusters 19

Characteristics of the Input Data Are Important Type of proximity or density measure This is a derived measure, but central to clustering Sparseness Dictates type of similarity Adds to efficiency Attribute type Dictates type of similarity Type of Data Dictates type of similarity Other characteristics, e.g., autocorrelation Dimensionality Noise and Outliers Type of Distribution 20

Clustering Algorithms K-means and its variants Hierarchical clustering Density-based clustering 21