# Chapter ML:XI (continued)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis ML:XI-11 Cluster Analysis STEIN

2 Cluster analysis is the unsupervised classification of a set of objects in groups, pursuing the following objectives: 1. maximize the similarities within the groups (intra groups) 2. minimize the similarities between the groups (inter groups) ML:XI-12 Cluster Analysis STEIN

3 Cluster analysis is the unsupervised classification of a set of objects in groups, pursuing the following objectives: 1. maximize the similarities within the groups (intra groups) 2. minimize the similarities between the groups (inter groups) Applications: identification of similar groups of buyers higher-level image processing: object recognition search of similar gene profiles specification of syndromes analysis of traffic data in computer networks visualization of complex graphs text categorization in information retrieval ML:XI-13 Cluster Analysis STEIN

4 Remarks: The setting of a cluster analysis is reverse to the setting of a variance analysis: A variance analysis verifies whether a nominal feature defines groups such that the members of the different groups differ significantly with regard to a numerical feature. I.e., the nominal feature is in the role of the independent variable, while the numerical feature(s) is (are) in role of dependent variable(s). Example: The type of a product packaging (the independent variable) may define the number of customers (the dependent variable) in a supermarket who look at the product. A cluster analysis in turn can be used to identify such a nominal feature, namely by constructing a suited feature domain for the nominal variable: each cluster corresponds implicitly to a value of the domain. Example: Equivalent but differently presented products in a supermarket are clustered (= the impact of product packaging is identified) with regard to the number of customers who buy the products. Cluster analysis is a tool for structure generation. Nearly nothing is known about the nominal variable that is to be identified. In particular, there is no knowledge about the number of domain values (the number of clusters). Variance analysis is a tool for structure verification. ML:XI-14 Cluster Analysis STEIN

5 Let x 1,... x n denote the p-dimensional feature vectors of n objects: Feature 1 Feature 2... Feature p x 1 x 11 x x 1p x 2 x 21 x x 2p x n x n1 x n2... x np ML:XI-15 Cluster Analysis STEIN

6 Let x 1,... x n denote the p-dimensional feature vectors of n objects: Feature 1 Feature 2... Feature p x 1 x 11 x x 1p x 2 x 21 x x 2p x n x n1 x n2... x np no Target concept c 1 c 2. c n ML:XI-16 Cluster Analysis STEIN

7 Let x 1,... x n denote the p-dimensional feature vectors of n objects: Feature 1 Feature 2... Feature p x 1 x 11 x x 1p x 2 x 21 x x 2p x n x n1 x n2... x np no Target concept c 1 c 2. c n 30 two-dimensional feature vectors (n = 30, p = 2) : Feature 2 Feature 1 ML:XI-17 Cluster Analysis STEIN

8 Definition 3 (Exclusive Clustering [splitting]) Let X be a set of feature vectors. An exclusive clustering C of X, C = {C 1, C 2,..., C k }, C i X, is a partitioning of X into non-empty, mutually exclusive subsets C i with C i C C i = X. ML:XI-18 Cluster Analysis STEIN

9 Definition 3 (Exclusive Clustering [splitting]) Let X be a set of feature vectors. An exclusive clustering C of X, C = {C 1, C 2,..., C k }, C i X, is a partitioning of X into non-empty, mutually exclusive subsets C i with C i C C i = X. Algorithms for cluster analysis are unsupervised learning methods: the learning process is self-organized there is no (external) teacher the optimization criterion is task- and domain-independent ML:XI-19 Cluster Analysis STEIN

10 Definition 3 (Exclusive Clustering [splitting]) Let X be a set of feature vectors. An exclusive clustering C of X, C = {C 1, C 2,..., C k }, C i X, is a partitioning of X into non-empty, mutually exclusive subsets C i with C i C C i = X. Algorithms for cluster analysis are unsupervised learning methods: the learning process is self-organized there is no (external) teacher the optimization criterion is task- and domain-independent Supervised learning: a learning objective such as the target concept is provided the optimization criterion depends on the task or the domain information is provided about how the optimization criterion can be maximized. Keyword: instructive feedback ML:XI-20 Cluster Analysis STEIN

11 Main Stages of a Cluster Analysis Objects Feature extraction & -preprocessing Similarity computation Merging Clusters ML:XI-21 Cluster Analysis STEIN

12 Feature Extraction and Preprocessing Required are (possibly new) features of high variance. Approaches: analysis of dispersion parameters dimension reduction: PCA, factor analysis, MDS visual inspection: scatter plots, box plots [Webis 2012, VDM tool] ML:XI-22 Cluster Analysis STEIN

13 Feature Extraction and Preprocessing Required are (possibly new) features of high variance. Approaches: analysis of dispersion parameters dimension reduction: PCA, factor analysis, MDS visual inspection: scatter plots, box plots Feature standardization can dampen the structure and make things worse: ML:XI-23 Cluster Analysis STEIN

14 Computation of Distances or Similarities Feature 1 Feature 2... Feature p x 1 x 11 x x 1p x 2 x 21 x x 2p x n x n1 x n2... x np x 1 x 2... x n x 1 0 d(x 1, x 2 )... d(x 1, x n ) x d(x 2, x n ). x n ML:XI-24 Cluster Analysis STEIN

15 Remarks: Usually, the distance matrix is defined implicitly by a metric on the feature space. The distance matrix can be understood as the adjacency matrix of a weighted, undirected graph G, G = V, E, w. The set X of feature vectors is mapped one-to-one (bijection) onto a set of nodes V. The distance d(x i, x j ) corresponds to the weight w({u, v}) of edge {u, v} E between those nodes u and v that are associated with x i and x j respectively. ML:XI-25 Cluster Analysis STEIN

16 Computation of Distances or Similarities (continued) Properties of a distance function: 1. d(x 1, x 2 ) 0 2. d(x 1, x 1 ) = 0 3. d(x 1, x 2 ) = d(x 2, x 1 ) 4. d(x 1, x 3 ) d(x 1, x 2 ) + d(x 2, x 3 ) ML:XI-26 Cluster Analysis STEIN

17 Computation of Distances or Similarities (continued) Properties of a distance function: 1. d(x 1, x 2 ) 0 2. d(x 1, x 1 ) = 0 3. d(x 1, x 2 ) = d(x 2, x 1 ) 4. d(x 1, x 3 ) d(x 1, x 2 ) + d(x 2, x 3 ) Minkowsky metric for features with interval-based measurement scales: where d(x 1, x 2 ) = ( p ) 1/r x 1i x 2i r i=1 r = 1. Manhattan or Hamming distance, L 1 norm r = 2. Euclidean distance, L 2 norm r =. Maximum distance, L norm or L max norm ML:XI-27 Cluster Analysis STEIN

18 Computation of Distances or Similarities (continued) Cluster analysis does not presume a particular measurement scale. Generalization of the distance function towards a (dis)similarity function by omitting the triangle inequality. (Dis)similarities can be quantified between all kinds of features irrespective of the given levels of measurement. ML:XI-28 Cluster Analysis STEIN

19 Computation of Distances or Similarities (continued) Cluster analysis does not presume a particular measurement scale. Generalization of the distance function towards a (dis)similarity function by omitting the triangle inequality. (Dis)similarities can be quantified between all kinds of features irrespective of the given levels of measurement. Similarity coefficients given two feature vectors, x 1, x 2, with binary features: Simple Matching Coefficient (SMC) = f 11 + f 00 f 11 + f 00 + f 01 + f 10 Jaccard Coefficient (J) = f 11 f 11 + f 01 + f 10 where f 11 = number of features with a value of 1 in both x 1 and x 2 f 00 = number of features with a value of 0 in both x 1 and x 2 f 01 = number of features with value 0 in x 1 and value 1 in x 2 f 10 = number of features with value 1 in x 1 and value 0 in x 2 ML:XI-29 Cluster Analysis STEIN

20 Remarks: The definitions for the above similarity coefficients can be extended towards features with a nominal measurement scale. Particular heterogeneous metrics have been developed, such as HEOM and HVDM, which allow the combined computation of feature values from different measurement scales. The computation of the correlation between all features of two feature vectors (not: between two features over all feature vectors) allows to compare feature profiles. Example: Q correlation coefficient The development of a suited, realistic, and expressive similarity measure may pose the biggest challenge within a cluster analysis tasks. Typical problems: (unwanted) structure damping due to normalization (unwanted) sensitivity concerning outliers (unrecognized) feature correlations (neglected) varying feature importance Similarity measures can be transformed straightforwardly into dissimilarity measures and vice versa. ML:XI-30 Cluster Analysis STEIN

21 Merging Principles [Analysis stages] hierarchical iterative agglomerative divisive exemplar-based exchange-based single link, group average MinCut k-means, k-medoid Kerninghan-Lin Cluster analysis density-based point-density-based attraction-based DBSCAN MajorClust meta-searchcontrolled gradient-based competitive simulated annealing evolutionary strategies stochastic Gaussian mixtures... ML:XI-31 Cluster Analysis STEIN

### DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

### Chapter 7. Cluster Analysis

Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based

### Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

### Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

### Chapter ML:XI. XI. Cluster Analysis

Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster

### Clustering and Data Mining in R

Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

### Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

### Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

### CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

### Unsupervised learning: Clustering

Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

### Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

### CLUSTERING (Segmentation)

CLUSTERING (Segmentation) Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 What is Clustering? Given a set of records, organize the records into

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

### An Introduction to Cluster Analysis for Data Mining

An Introduction to Cluster Analysis for Data Mining 10/02/2000 11:42 AM 1. INTRODUCTION... 4 1.1. Scope of This Paper... 4 1.2. What Cluster Analysis Is... 4 1.3. What Cluster Analysis Is Not... 5 2. OVERVIEW...

### Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### L15: statistical clustering

Similarity measures Criterion functions Cluster validity Flat clustering algorithms k-means ISODATA L15: statistical clustering Hierarchical clustering algorithms Divisive Agglomerative CSCE 666 Pattern

### A comparison of various clustering methods and algorithms in data mining

Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering

### Data Mining Techniques Chapter 11: Automatic Cluster Detection

Data Mining Techniques Chapter 11: Automatic Cluster Detection Clustering............................................................. 2 k-means Clustering......................................................

### Clustering & Association

Clustering - Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects

### Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

### . Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties

### A Micro-course on Cluster Analysis Techniques

Università di Trento DiSCoF Dipartimento di Scienze della Cognizione e Formazione A Micro-course on Cluster Analysis Techniques luigi.lombardi@unitn.it Methodological course (COBRAS-DiSCoF) Finding groups

### Clustering Lecture 1: Basics

Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Topics Clustering, Classification Network mining Anomaly detection Expectation Class Structure Sign-in Take quiz in class Two more projects on clustering

### Fig. 1 A typical Knowledge Discovery process [2]

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

### UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

### Data Warehousing and Data Mining

Data Warehousing and Data Mining Lecture Clustering Methods CITS CITS Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement: The

### Lecture 20: Clustering

Lecture 20: Clustering Wrap-up of neural nets (from last lecture Introduction to unsupervised learning K-means clustering COMP-424, Lecture 20 - April 3, 2013 1 Unsupervised learning In supervised learning,

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

### Data Mining 5. Cluster Analysis

Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables

### Clustering UE 141 Spring 2013

Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

### An Enhanced Clustering Algorithm to Analyze Spatial Data

International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-2, Issue-7, July 2014 An Enhanced Clustering Algorithm to Analyze Spatial Data Dr. Mahesh Kumar, Mr. Sachin Yadav

### 10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

### Layout Based Visualization Techniques for Multi Dimensional Data

Layout Based Visualization Techniques for Multi Dimensional Data Wim de Leeuw Robert van Liere Center for Mathematics and Computer Science, CWI Amsterdam, the Netherlands wimc,robertl @cwi.nl October 27,

### ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

### Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

### Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand

### SCAN: A Structural Clustering Algorithm for Networks

SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected

### Text Clustering. Clustering

Text Clustering 1 Clustering Partition unlabeled examples into disoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover

### Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

### City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Fundamentals of Data Science Course Code:

### Statistical Databases and Registers with some datamining

Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University

### Feature Extraction and Selection. More Info == Better Performance? Curse of Dimensionality. Feature Space. High-Dimensional Spaces

More Info == Better Performance? Feature Extraction and Selection APR Course, Delft, The Netherlands Marco Loog Feature Space Curse of Dimensionality A p-dimensional space, in which each dimension is a

### Data Clustering Techniques Qualifying Oral Examination Paper

Data Clustering Techniques Qualifying Oral Examination Paper Periklis Andritsos University of Toronto Department of Computer Science periklis@cs.toronto.edu March 11, 2002 1 Introduction During a cholera

### Chapter ML:XI (continued)

Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

### Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. ref. Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms ref. Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Outline Prototype-based Fuzzy c-means Mixture Model Clustering Density-based

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

### Data Mining Clustering. Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering Toon Calders Sheets are based on the those provided b Tan, Steinbach, and Kumar. Introduction to Data Mining What is Cluster Analsis? Finding groups of objects such that the objects

Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

### STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

### Multidimensional data analysis

Multidimensional data analysis Ella Bingham Dept of Computer Science, University of Helsinki ella.bingham@cs.helsinki.fi June 2008 The Finnish Graduate School in Astronomy and Space Physics Summer School

### Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

### Data Mining for Knowledge Management. Clustering

Data Mining for Knowledge Management Clustering Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management Thanks for slides to: Jiawei Han Eamonn Keogh Jeff

### Distance based clustering

// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means

### Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

### A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

### Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models

Mixture Models Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based

### IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181 The Global Kernel k-means Algorithm for Clustering in Feature Space Grigorios F. Tzortzis and Aristidis C. Likas, Senior Member, IEEE

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster

### Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig

### Forschungskolleg Data Analytics Methods and Techniques

Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.-Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### 2 Basic Concepts and Techniques of Cluster Analysis

The Challenges of Clustering High Dimensional Data * Michael Steinbach, Levent Ertöz, and Vipin Kumar Abstract Cluster analysis divides data into groups (clusters) for the purposes of summarization or

### Data visualization and clustering. Genomics is to no small extend a data science

Data visualization and clustering Genomics is to no small extend a data science [www.data2discovery.org] Data visualization and clustering Genomics is to no small extend a data science [Andersson et al.,

### Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

### Cluster Analysis: Basic Concepts and Algorithms

Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths

### K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar

### HES-SO Master of Science in Engineering. Clustering. Prof. Laura Elena Raileanu HES-SO Yverdon-les-Bains (HEIG-VD)

HES-SO Master of Science in Engineering Clustering Prof. Laura Elena Raileanu HES-SO Yverdon-les-Bains (HEIG-VD) Plan Motivation Hierarchical Clustering K-Means Clustering 2 Problem Setup Arrange items

### Cluster Analysis: Basic Concepts and Algorithms

8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should

### ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS 1 PANKAJ SAXENA & 2 SUSHMA LEHRI 1 Deptt. Of Computer Applications, RBS Management Techanical Campus, Agra 2 Institute of

### Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

### A Binary Method for Fast Computation of Inter and Intra Cluster Similarities for Combining Multiple Clusterings

A Binary Method for Fast Computation of Inter and Intra Cluster Similarities for Combining Multiple Clusterings Selim Mimaroglu and A. Murat Yagci Department of Computer Engineering Bahcesehir University

### Cluster analysis Cosmin Lazar. COMO Lab VUB

Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,

### B490 Mining the Big Data. 2 Clustering

B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### Unsupervised and supervised dimension reduction: Algorithms and connections

Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

### Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Clustering Algorithms Contents K-means Hierarchical algorithms Linkage functions Vector quantization Clustering Formulation Objects.................................

### Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

### Cluster Analysis: Basic Concepts and Methods

10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all

### ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

### Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

### Chapter 4: Non-Parametric Classification

Chapter 4: Non-Parametric Classification Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination

### Face Recognition using SIFT Features

Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Chapter 15 CLUSTERING METHODS. 1. Introduction. Lior Rokach. Oded Maimon

Chapter 15 CLUSTERING METHODS Lior Rokach Department of Industrial Engineering Tel-Aviv University liorr@eng.tau.ac.il Oded Maimon Department of Industrial Engineering Tel-Aviv University maimon@eng.tau.ac.il

### COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

### Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar

### Linköpings Universitet - ITN TNM033 2011-11-30 DBSCAN. A Density-Based Spatial Clustering of Application with Noise

DBSCAN A Density-Based Spatial Clustering of Application with Noise Henrik Bäcklund (henba892), Anders Hedblom (andh893), Niklas Neijman (nikne866) 1 1. Introduction Today data is received automatically

### Movie Classification Using k-means and Hierarchical Clustering

Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani

### SoSe 2014: M-TANI: Big Data Analytics

SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering

### Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

### Virtual Landmarks for the Internet

Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser

### Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

### PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin