# Alessandro Laio, Maria d Errico and Alex Rodriguez SISSA (Trieste)

Save this PDF as:

Size: px
Start display at page:

Download "Alessandro Laio, Maria d Errico and Alex Rodriguez SISSA (Trieste)"

## Transcription

1 Clustering by fast search- and- find of density peaks Alessandro Laio, Maria d Errico and Alex Rodriguez SISSA (Trieste)

2 What is a cluster? clus ter [kluhs- ter], noun 1.a number of things of the same kind, growing or held together; a bunch: a cluster of grapes. 2.a group of things or persons close together: There was a cluster of tourists at the gate. 3.U.S. Army. a small metal design placed on a ribbon represen=ng an awarded medal to ind icate that the same medal has been awarded again: oak- leaf cluster. 4.Phone<cs. a succession of two or more con=guous consonants in an u?erance: cluster of strap. 5.Astronomy. a group of neighboring stars, held together by mutual gravita=on, that have essen=ally the same age and composi=on and thus supposedly a common origin.

3 What is a cluster? Visually it is a region with a high density of points. How many clusters do you see? separated from other dense regions Computers easily give the same answer

4 What is a cluster? And now? How many clusters? Computers have big problems here!

5 Inefficiency of standard algorithms K-means algorithm (11748 citations!!!!) It assigns each point to the closest cluster center. Variables: the number of centers and their location By construction it is unable to recognize non-spherical clusters

6 What is a cluster? Clusters = peaks in the density of points = peaks in the mother probability distribu=on

7 Our approach: fast search- and- find of density peaks SCIENCE, 1492, vol 322 (2014) EXAMPLE: Clustering in a 2- dimensional space 1) Compute the local density around each point ρ(1)=7 ρ(8)=5 ρ(10)=4 2) For each point compute the distance with all the points with higher density. Take the minimum value.

8 Our approach: fast search- and- find of density peaks 3) For each point, plot the minimum distance as a func=on of the density. SCIENCE, 1492, vol 322 (2014)

9 Our approach: fast search- and- find of density peaks SCIENCE, 1492, vol 322 (2014) 4) the outliers in this graph are the cluster centers

10 Our approach: fast search- and- find of density peaks 4) ) the outliers in this graph are the cluster centers 5) Assign each point to the same cluster of its nearest neighbor of higher density

11 Not even an algorithm Given i δ a distance matrix i d ij, for each data point i compute: d c α ρ i = j χ (d ij d c ) (number of data points within a distance d c ) δ i = min (d ij ) j:ρ j >ρ i χ (x) =0 d c dc i (distance of the closest data point of higher density) Plot δ as a func=on of ρ {j : ρ j > ρ i } δ One free parameter: 1 the cutoff distance d c ρ i = j d α ij However: the method is only sensi=ve to the rela=ve density of two data points, not to the ρ i δ i i absolute value of ρ ρ i d c α

12 The clustering approach at work

13 The clustering approach at work

14 The clustering approach at work

15 The clustering approach at work

16 The clustering approach at work

17 The clustering approach at work SCIENCE, 1492, vol 322 (2014)

18 Robust with respect to changes in the metric SCIENCE, 1492, vol 322 (2014)

19 What about noise? The true density 3000 points sampled from this density The density reconstructed from the 3000 points (Gaussian es=mator)

20 What about noise? Density maxima )( )( )( For each maximum, find the highest saddle point towards a higher peak )( How can we dis=nguish genuine density peaks from noise? Build a tree- like structure, where each peak is a leaf, and nodes are assigned according to the density values at the saddles

21 What about noise? Density maxima )( )( )( For each maximum, find the highest saddle point towards a highest peak ç )( Build a tree- like structure, where each peak is a leaf, and nodes are assigned according to the density values at the saddles

22

23 Possible applications This algorithm can be applied any time it is possible to define a DISTANCE function between each pair of elements in a set:

24 Possible applications This algorithm can be applied any time it is possible to define a DISTANCE function between each pair of elements in a set: classification of living organisms marketing strategies libraries (book sorting) google search even FACE recognition!!!

25 Possible applications This algorithm can be applied any time it is possible to define a DISTANCE function between each pair of elements in a set: classification of living organisms marketing strategies libraries (book sorting) google search even FACE recognition!!!

26 Possible applications This algorithm can be applied any time it is possible to define a DISTANCE function between each pair of elements in a set: classification of living organisms marketing strategies libraries (book sorting) google search even FACE recognition!!!

27 Possible applications This algorithm can be applied any time it is possible to define a DISTANCE function between each pair of elements in a set: classification of living organisms marketing strategies libraries (book sorting) google search even FACE recognition!!!

28 Possible applications This algorithm can be applied any time it is possible to define a DISTANCE function between each pair of elements in a set: classification of living organisms marketing strategies libraries (book sorting) google search even FACE recognition!!!

29 Clustering Algorithm applied to faces

30 Clustering Algorithm applied to faces It is possible to define a distance between faces, based on some stable features. *Sampat, M. P., Wang, Z., Gupta, S., Bovik, A. C., & Markey, M. K. (2009). Complex wavelet structural similarity: A new image similarity index. Image Processing, IEEE Transac<ons on, 18(11),

31 DECISION GRAPH

32 DECISION GRAPH

33 The clustering approach at work: Analysis 1 of Introduction a fmri experiment (D. Ama\, M. Maieron, F. Pizzagalli) T Outcome of a fmri experiment: signal intensity for ~100,000 v i (!) = X 1 v i (t) e i! T t voxels covering densely the brain. t=0 The signal is measured every ~2 seconds for a total =me of a few minutes. Intensity Voxel index: i=1,2,,# of voxels P (!) = 1 N v i (t) NX v i (!) vi (!) General idea: if the subject is performing a task, the voxels in Using the transformed voxel signal u the brain region involved in this i (k) we then defined the distance between task must have a similar v(t). voxel i and j as follows: We look for large and connected d ij = regions with voxels with a! similar v(t), namely with a similar =me evolu=on. Similarity measure: esult). i=1 ṽ i (!) =v i (!) / p P (!) u i (!) =ṽ i (!) / max kṽ i (!)k! s X ku i (!) u j (!)k 2 (1) v ux d ij = t T (v i (t) v j (t)) 2 (2) t=1

34 The clustering approach at work: Analysis of a fmri experiment (D. Ama\, M. Maieron, F. Pizzagalli) The subject was scanned while moving the right or lej hand. They saw the words "move lej", "move right" or "stop" in a random fashion through the glasses HAND MOVEMENT dx sx 3T Achieva Philips T2* BOLD sensi=ve gradient- recalled EPI sequence standard Head Coil 8 channels TR/TE = 2500/32 ms matrix 128X128, in- plane resolu=on 1.8 X 1.8 #slices 34, thickness = 3mm, no gap 102 scans

35 The clustering approach at work: Analysis of a fmri experiment (D. Ama\, M. Maieron, F. Pizzagalli) Time window 24-36: decision graph Time window 24-36: clusters Overlap between the cluster of all the =me windows

36 The clustering approach at work: molecular dynamics 3000 ns of molecular dynamics of 3- Ala in water solu=on, at 300 K 7 slow relaxa=on =mes

37 The clustering approach at work: molecular dynamics

38 The clustering approach at work: molecular dynamics Dihedral distance RMSD SCIENCE, 1492, vol 322 (2014)

39 The clustering approach at work: Test on molecular dynamics Dihedral distance RMSD SCIENCE, 1492, vol 322 (2014)

40 Conclusions The approach allows detec=ng non- spherical clusters It allows detec=ng clusters with different densi=es Robust with respect to changes in the metric No op=miza=on, no varia=onal parameters: clusters are found determinis=cally from the data. The number of clusters is found automa=cally Outliers and background noise are automa=cally recognized and excluded from the analysis. Alex Rodriguez Maria d Errico Thank- you: Daniele Ama=, Erio Tosat, Jessica Nasica, Francesca Rizzato

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. ref. Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms ref. Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Outline Prototype-based Fuzzy c-means Mixture Model Clustering Density-based

### Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

### Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

### 10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

### Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC

Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

### ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION CSE 537 Ar@ficial Intelligence Professor Anita Wasilewska GROUP 2 TEAM MEMBERS: SAEED BOOR BOOR - 110564337 SHIH- YU TSAI - 110385129 HAN LI 110168054 SOURCES

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

### Clustering and Data Mining in R

Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

### CLUSTERING (Segmentation)

CLUSTERING (Segmentation) Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 What is Clustering? Given a set of records, organize the records into

### Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

### Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

### MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

### Layout Based Visualization Techniques for Multi Dimensional Data

Layout Based Visualization Techniques for Multi Dimensional Data Wim de Leeuw Robert van Liere Center for Mathematics and Computer Science, CWI Amsterdam, the Netherlands wimc,robertl @cwi.nl October 27,

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

### GE Medical Systems Training in Partnership. Module 8: IQ: Acquisition Time

Module 8: IQ: Acquisition Time IQ : Acquisition Time Objectives...Describe types of data acquisition modes....compute acquisition times for 2D and 3D scans. 2D Acquisitions The 2D mode acquires and reconstructs

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster

### Hierarchical Clustering. Clustering Overview

lustering Overview Last lecture What is clustering Partitional algorithms: K-means Today s lecture Hierarchical algorithms ensity-based algorithms: SN Techniques for clustering large databases IRH UR ata

### Distance based clustering

// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means

### Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation

### Fig. 1 A typical Knowledge Discovery process [2]

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

### A New Robust Algorithm for Video Text Extraction

A New Robust Algorithm for Video Text Extraction Pattern Recognition, vol. 36, no. 6, June 2003 Edward K. Wong and Minya Chen School of Electrical Engineering and Computer Science Kyungpook National Univ.

### SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

### Clustering UE 141 Spring 2013

Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

### ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

### DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

### Clustering & Association

Clustering - Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects

### Segmentation & Clustering

EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,

### Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

### Ensemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/

Ensemble Methods Adapted from slides by Todd Holloway h8p://abeau

### Chapter 4: Non-Parametric Classification

Chapter 4: Non-Parametric Classification Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination

### Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

### Clustering Hierarchical clustering and k-mean clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: A quick review partition genes into distinct sets with high homogeneity

### Character Image Patterns as Big Data

22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

### Alignment and Preprocessing for Data Analysis

Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise

### Drug Store Sales Prediction

Drug Store Sales Prediction Chenghao Wang, Yang Li Abstract - In this paper we tried to apply machine learning algorithm into a real world problem drug store sales forecasting. Given store information,

### Automate the Process of Image Recognizing a Scatter Plot: an Application of Non-parametric Cluster Analysis in Capturing Data from Graphical Output

PharmaSUG 2014 - Paper DG12 Automate the Process of Image Recognizing a Scatter Plot: an Application of Non-parametric Cluster Analysis in Capturing Data from Graphical Output Zhaojie Wang, Celgene Corporation,

### Face Recognition using SIFT Features

Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

### R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions

### Efficient visual search of local features. Cordelia Schmid

Efficient visual search of local features Cordelia Schmid Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million images

### L15: statistical clustering

Similarity measures Criterion functions Cluster validity Flat clustering algorithms k-means ISODATA L15: statistical clustering Hierarchical clustering algorithms Divisive Agglomerative CSCE 666 Pattern

### Virtual Landmarks for the Internet

Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

### - What is a feature? - Image processing essentials - Edge detection (Sobel & Canny) - Hough transform - Some images

Seminar: Feature extraction by André Aichert I Feature detection - What is a feature? - Image processing essentials - Edge detection (Sobel & Canny) - Hough transform - Some images II An Entropy-based

### Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

### A K-means-like Algorithm for K-medoids Clustering and Its Performance

A K-means-like Algorithm for K-medoids Clustering and Its Performance Hae-Sang Park*, Jong-Seok Lee and Chi-Hyuck Jun Department of Industrial and Management Engineering, POSTECH San 31 Hyoja-dong, Pohang

### Fast Matching of Binary Features

Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

### Structural Matching of 2D Electrophoresis Gels using Graph Models

Structural Matching of 2D Electrophoresis Gels using Graph Models Alexandre Noma 1, Alvaro Pardo 2, Roberto M. Cesar-Jr 1 1 IME-USP, Department of Computer Science, University of São Paulo, Brazil 2 DIE,

### A comparison of various clustering methods and algorithms in data mining

Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering

### Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

### Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

### CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015

CPSC 340: Machine Learning and Data Mining K-Means Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the

### Musculoskeletal MRI Technical Considerations

Musculoskeletal MRI Technical Considerations Garry E. Gold, M.D. Professor of Radiology, Bioengineering and Orthopaedic Surgery Stanford University Outline Joint Structure Image Contrast Protocols: 3.0T

### MHI3000 Big Data Analytics for Health Care Final Project Report

MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given

### K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar

### Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.

Sept 03-23-05 22 2005 Data Mining for Model Creation Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.com page 1 Agenda Data Mining and Estimating Model Creation

### Application of Face Recognition to Person Matching in Trains

Application of Face Recognition to Person Matching in Trains May 2008 Objective Matching of person Context : in trains Using face recognition and face detection algorithms With a video-surveillance camera

### Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction

Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction Oliver Sutton February, 2012 Contents 1 Introduction 1 1.1 Example........................................

### Introduction to knn Classification and CNN Data Reduction

Introduction to knn Classification and CNN Data Reduction Oliver Sutton February, 2012 1 / 29 1 The Classification Problem Examples The Problem 2 The k Nearest Neighbours Algorithm The Nearest Neighbours

### Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

### Part-Based Recognition

Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple

### Nodes, Ties and Influence

Nodes, Ties and Influence Chapter 2 Chapter 2, Community Detec:on and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010. 1 IMPORTANCE OF NODES 2 Importance of Nodes Not

### A discussion of Statistical Mechanics of Complex Networks P. Part I

A discussion of Statistical Mechanics of Complex Networks Part I Review of Modern Physics, Vol. 74, 2002 Small Word Networks Clustering Coefficient Scale-Free Networks Erdös-Rényi model cover only parts

### Lecture 20: Clustering

Lecture 20: Clustering Wrap-up of neural nets (from last lecture Introduction to unsupervised learning K-means clustering COMP-424, Lecture 20 - April 3, 2013 1 Unsupervised learning In supervised learning,

### Cluster Algorithms. Adriano Cruz adriano@nce.ufrj.br. 28 de outubro de 2013

Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br

### Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

### A Comparison of Five Methods for Signal Intensity Standardization in MRI

A Comparison of Five Methods for Signal Intensity Standardization in MRI Jan-Philip Bergeest, Florian Jäger Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg jan.p.bergeest@informatik.stud.uni-erlangen.de

### Data visualization and clustering. Genomics is to no small extend a data science

Data visualization and clustering Genomics is to no small extend a data science [www.data2discovery.org] Data visualization and clustering Genomics is to no small extend a data science [Andersson et al.,

### Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

### Standardization and Its Effects on K-Means Clustering Algorithm

Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

### Unsupervised learning: Clustering

Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

### Classification/Decision Trees (II)

Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).

### Lecture 4 Online and streaming algorithms for clustering

CSE 291: Geometric algorithms Spring 2013 Lecture 4 Online and streaming algorithms for clustering 4.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

### Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

### There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are

### Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract:- Data mining is used to find the hidden information pattern and relationship between

### Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications

Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers

### Randomized Trees for Real-Time Keypoint Recognition

Randomized Trees for Real-Time Keypoint Recognition Vincent Lepetit Pascal Lagger Pascal Fua Computer Vision Laboratory École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland Email:

### STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

Stock Asian-African Market Trends Journal using of Economics Cluster Analysis and Econometrics, and ARIMA Model Vol. 13, No. 2, 2013: 303-308 303 STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

### Rapid Image Retrieval for Mobile Location Recognition

Rapid Image Retrieval for Mobile Location Recognition G. Schroth, A. Al-Nuaimi, R. Huitl, F. Schweiger, E. Steinbach Availability of GPS is limited to outdoor scenarios with few obstacles Low signal reception

### Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

### Clustering & Visualization

Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

### Data Mining and Visualization

Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

### Big Data Analytics CSCI 4030

High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

### Food brand image (Logos) recognition

Food brand image (Logos) recognition Ritobrata Sur(rsur@stanford.edu), Shengkai Wang (sk.wang@stanford.edu) Mentor: Hui Chao (huichao@qti.qualcomm.com) Final Report, March 19, 2014. 1. Introduction Food

### Cluster Analysis: Basic Concepts and Algorithms

8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should

### Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks

Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural

### SITE IMAGING MANUAL ACRIN 6698

SITE IMAGING MANUAL ACRIN 6698 Diffusion Weighted MR Imaging Biomarkers for Assessment of Breast Cancer Response to Neoadjuvant Treatment: A sub-study of the I-SPY 2 TRIAL Version: 1.0 Date: May 28, 2012

### Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar

### Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin

Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error

### Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class