Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering


 Maryann Long
 2 years ago
 Views:
Transcription
1 Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering
2 Clustering Algorithms Contents Kmeans Hierarchical algorithms Linkage functions Vector quantization
3 Clustering Formulation Objects Attributes Find groups of similar points (observations) in multidimensional space No target variable (unsupervised learning) Model
4 Methods of Clustering  Overview Variety of methods: Hierarchical clustering create hierarchy of clusters (one cluster entirely contained within another cluster) Nonhierarchical methods create disjoint clusters Overlapping clusters (objects can belong to >1 cluster simultaneously) Fuzzy clusters (defined by the probability (grade) of membership of each object in each cluster) Useful data preprocessing prior to clustering: PCA (Principal Components Analysis) to reduce dimensionality of data Data standarization (transform data to reduce large influence of variables with larger variance on results of clustering)
5 Introductory Example 97 countries described by 3 attributes: Birth, Death, InfantDeath rate (given as number per 1000, data from year 1995)
6 Analysis I Clustering raw data Kmeans algorithm Result: 3 clusters (no. of obs. in each cluster: 13, 32, 52) Example cntd.
7
8 Example Profiles of Clusters
9 Example Profiles of Clusters Notice: data clustered based on InfantDeath Rate only!
10 Example Standarization of Data Analysis II Data standarized prior to clustering (variables divided by their standard deviation) Result: 3 clusters (with 35, 46, 16 obs.) Data clustered based on InfantDeath and Death Analysis II Analysis I Observe that data with largest variance have largest influence on results of clustering
11 Example Profiles of Clusters Analysis II: profiles of clusters
12 Methods of Clustering Nonhierarchical methods Kmeans clustering Nondeterministic O(n), n  number of observations Hierarchical methods Aglomerative (join small clusters) Divisive (split big clusters) Deterministic methods O(n 2 ) O(n 3 ), depending on the clustering method (i.e. definition of intercluster distance)
13 Methods of Clustering  Remarks Clustering large datasets Kmeans If results of hierarchical clustering needed first use Kmeans yielding e.g. 50 clusters, followed by hierarchical clustering on results of Kmeans Consensus clustering Discover real clusters in data analyze stability of results with noise injected
14 Kmeans Algorithm Kmeans clustering Select k points (centroids of initial clusters; select randomly) Assign each observation to the nearest centroid (nearest cluster) For each cluster find the new centroid Repeat step 2 and 3 until no change occurs in cluster assignments
15 Kmeans Algorithm Result: k separate clusters Algorithm requires that the correct number of clusters k is specified in advance (difficult problem: how to know the real number of clusters in data )
16 Hierarchical Clustering Notation x i observations, i=1..n C k clusters G current number of clusters D KL distance between clusters C K and C L Betweencluster distance D KL linkage function (various definitions available, results of clustering depend on D KL ) C L C K D KL
17 Hierarchical Clustering Algorithm (agglomerative hierarchical clustering) C k = {x k }, k=1..n, G=n Find K, L such that D KL = min D IJ, 1<=I,J<=G Replace clusters C K and C L by cluster C K C L, G=G1 Repeat steps 2 and 3 while G>1 C L D KL C K Result: hierarchy of clusters dendrogram
18 Hierarchy of Clusters  Dendrogram
19 Definitions of Distance Between Clusters Different definitions of distance between clusters Average linkage Single linkage Complete linkage Density linkage Ward s minimum variance method (SAS CLUSTER procedure accepts 11 different definitions of intercluster distance)
20 Notation x i observations, i=1..n Average Linkage d(x,y) distance between observations (Euclidean distance assumed from now on) C k clusters N K number of observations in cluster C K D KL distance between clusters C K and C L mean CK mean observation in cluster C K W K = x i mean CK 2 x i C K variance in cluster Average linkage Tends to join clusters with small variance Resulting clusters tend to have similar variance
21 Notation x i observations, i=1..n Complete Linkage d(x,y) distance between observations C k clusters N K number of observations in cluster C K D KL distance between clusters C K and C L mean CK mean observation in cluster C K W K = x i mean CK 2 x i C K variance in cluster Complete linkage Resulting clusters tend to have similar diameter
22 Notation x i observations, i=1..n Single Linkage d(x,y) distance between observations C k clusters N K number of observations in cluster C K D KL distance between clusters C K and C L mean CK mean observation in cluster C K W K = x i mean CK 2 x i C K variance in cluster Single linkage Tends to produce elongated clusters, irregular in shape
23 Ward s Minimum Variance Method Notation x i observations, i=1..n d(x,y) distance between observations C k clusters N K number of observations in cluster C K D KL distance between clusters C K and C L mean CK mean observation in cluster C K W K = x i mean CK 2 x i C K variance in cluster B KL =W M W K W L where C M =C K C L Ward s minimum variance method Tends to join small clusters Tends to produce clusters with similar number of observations
24 Density Linkage Notation x i observations, i=1..n d(x,y) distance between observations r a fixed constant f(x) proportion of observations within sphere centered at x with radius r divided by the volume of the sphere (measure of density of points near observation x) Density linkage We realize single linkage using the measure d* Capable of discovering clusters of irregular shape
25 Example Average Linkage Elongated clusters in data
26 Elongated clusters in data Example Kmeans
27 Example Density Linkage Elongated clusters in data
28 Nonconvex clusters in data Example Kmeans
29 Example Centroid Linkage Nonconvex clusters in data
30 Example Density Linkage Nonconvex clusters in data
31 Clusters of unequal size Example True Clusters
32 Clusters of unequal size Example Kmeans
33 Example Ward s Method Clusters of unequal size
34 Example Average Linkage Method: average linkage
35 Example Centroid Linkage Clusters of unequal size
36 Example Single Linkage Clusters of unequal size
37 Example Well Separated Data Any method will work
38 Example Poorly Separated Data True clusters
39 Example Poorly Separated Data Method: Kmeans
40 Example Poorly Separated Data Ward s method
41 Clustering Methods Final Remarks Standarization of variables prior to clustering Often necessary, otherwise variables with large variance tend to have large influence on clustering Often standarized measurement z ij is computed as the zscore: where x ij original measurement in observation i and variable j, µ j mean value of variable j, s j mean absolute deviation of variable j (or its standard deviation) Other ideas: divide variable by its range, max value or standard deviation
42 Clustering Methods Final Remarks The number of clusters No satisfactory theory to determine the right number of clusters in data Various criteria can be observed to help determine the right number of clusters, e.g. criteria based on variance accounted for by clusters R 2 =1P G /T or semipartial R 2 =B KL /T where T total variance of observations; P G = W K over G clusters B KL =W M W K W L where C M =C K C L Cubic Clustering Criterion (CCC) Often data visualization useful for determining the number of clusters Scatterplot for 23 dimensional data In high dimensions apply PCA transformation (or similar) visualize data in 23 dimensional space of first principal components
43 Example 2 R, Semipartial 2 R
44 Example Number of Clusters Useful Checks PST2: 3 or 6 or 9 (one before peak in value) PSF: 9 (peak in value) CCC: 18 (CCC around 3)
45 Kohonen VQ (Vector Quantization) Algorithm similar to kmeans Idea of VQ algorithm: Select k points (initial cluster centroids) For observation x i find nearest centroid (winning seed) denoted by S n Modify S n according to the formula: S n =S n (1L)+x i L, where L learning constant (decresing during learning process) Repeat steps 2 and 3 over all training observations Repeat steps 24 given number of iterations
46 Kohonen SOM (Self Organizing Maps) Idea of the SOM algorithm Select k initial points (cluster centroids), represent them on a 2D map For observation x i find winning seed S n Modify all centroids : S j =S j (1K(j,n)L)+x i K(j,n)L, where L learning constant (decreasing during training) K(j,n) function decreasing with increasing distance on the 2D map between S j i S n centroids (K(j,j)=1) Repeat steps 2 and 3 over all training observations
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationCluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative Kmeans Densitybased Interpretation
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationFig. 1 A typical Knowledge Discovery process [2]
Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering
More informationAn Enhanced Clustering Algorithm to Analyze Spatial Data
International Journal of Engineering and Technical Research (IJETR) ISSN: 23210869, Volume2, Issue7, July 2014 An Enhanced Clustering Algorithm to Analyze Spatial Data Dr. Mahesh Kumar, Mr. Sachin Yadav
More informationClustering and Data Mining in R
Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches
More informationClustering. Adrian Groza. Department of Computer Science Technical University of ClujNapoca
Clustering Adrian Groza Department of Computer Science Technical University of ClujNapoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 Kmeans 3 Hierarchical Clustering What is Datamining?
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationText Clustering. Clustering
Text Clustering 1 Clustering Partition unlabeled examples into disoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover
More informationClustering and Cluster Evaluation. Josh Stuart Tuesday, Feb 24, 2004 Read chap 4 in Causton
Clustering and Cluster Evaluation Josh Stuart Tuesday, Feb 24, 2004 Read chap 4 in Causton Clustering Methods Agglomerative Start with all separate, end with some connected Partitioning / Divisive Start
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More informationMachine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand
More informationStandardization and Its Effects on KMeans Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 3993303, 03 ISSN: 0407459; eissn: 0407467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. DensityBased Methods 6. GridBased Methods 7. ModelBased
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance Knearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationL15: statistical clustering
Similarity measures Criterion functions Cluster validity Flat clustering algorithms kmeans ISODATA L15: statistical clustering Hierarchical clustering algorithms Divisive Agglomerative CSCE 666 Pattern
More informationA Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
More informationData Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distancebased Kmeans, Kmedoids,
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 WolfTilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
More informationThere are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
More information10810 /02710 Computational Genomics. Clustering expression data
10810 /02710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intracluster similarity low intercluster similarity Informally,
More informationA Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototypebased Fuzzy cmeans
More informationData Clustering. Dec 2nd, 2013 Kyrylo Bessonov
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms kmeans Hierarchical Main
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationImportant Characteristics of Cluster Analysis Techniques
Cluster Analysis Can we organize sampling entities into discrete classes, such that withingroup similarity is maximized and amonggroup similarity is minimized according to some objective criterion? Sites
More informationNeural Networks Lesson 5  Cluster Analysis
Neural Networks Lesson 5  Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt.  Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationLecture 20: Clustering
Lecture 20: Clustering Wrapup of neural nets (from last lecture Introduction to unsupervised learning Kmeans clustering COMP424, Lecture 20  April 3, 2013 1 Unsupervised learning In supervised learning,
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
More informationIntroduction to Clustering
Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, 2010 1 / 36 Microarray Example N=65 P=1756 Yumi
More informationThey can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat
HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended
More informationCLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
More informationSoSe 2014: MTANI: Big Data Analytics
SoSe 2014: MTANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering
More informationA comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 3236 May 2015 www.allsubjectjournal.com eissn: 23494182 pissn: 23495979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
More informationA Novel Density based improved kmeans Clustering Algorithm Dbkmeans
A Novel Density based improved kmeans Clustering Algorithm Dbkmeans K. Mumtaz 1 and Dr. K. Duraiswamy 2, 1 Vivekanandha Institute of Information and Management Studies, Tiruchengode, India 2 KS Rangasamy
More informationClustering & Association
Clustering  Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects
More informationPERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
More informationClustering. 15381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv BarJoseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationCluster Analysis: Basic Concepts and Algorithms
Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering Kmeans Intuition Algorithm Choosing initial centroids Bisecting Kmeans Postprocessing Strengths
More informationData Mining Project Report. Document Clustering. Meryem UzunPer
Data Mining Project Report Document Clustering Meryem UzunPer 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. Kmeans algorithm...
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis DensityBased Cluster Analysis Cluster Evaluation Constrained
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationHierarchical Cluster Analysis Some Basics and Algorithms
Hierarchical Cluster Analysis Some Basics and Algorithms Nethra Sambamoorthi CRMportals Inc., 11 Bartram Road, Englishtown, NJ 07726 (NOTE: Please use always the latest copy of the document. Click on this
More informationExample: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? KMeans Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
More informationCluster Analysis: Basic Concepts and Methods
10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDDLAB ISTI CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationStatistical Databases and Registers with some datamining
Unsupervised learning  Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501528 Department of Statistics Stockholm University
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototypebased clustering Densitybased clustering Graphbased
More informationA Review on Image Segmentation Clustering Algorithms
A Review on Image Segmentation Clustering Algorithms Devarshi Naik, Pinal Shah Department of Information Technology, Charusat University CSPIT, Changa, di.anand, GJ,India Abstract Clustering attempts to
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationEM Clustering Approach for MultiDimensional Analysis of Big Data Set
EM Clustering Approach for MultiDimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More informationKMeans Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
KMeans Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar
More informationVector Quantization and Clustering
Vector Quantization and Clustering Introduction Kmeans clustering Clustering issues Hierarchical clustering Divisive (topdown) clustering Agglomerative (bottomup) clustering Applications to speech recognition
More informationA successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.
MARKET SEGMENTATION The simplest and most effective way to operate an organization is to deliver one product or service that meets the needs of one type of customer. However, to the delight of many organizations
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationDistance based clustering
// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means
More informationOverview. Clustering. Clustering vs. Classification. Supervised vs. Unsupervised Learning. Connectionist and Statistical Language Processing
Overview Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.unisb.de Computerlinguistik Universität des Saarlandes clustering vs. classification supervised vs. unsupervised
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationComparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool
Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract: Data mining is used to find the hidden information pattern and relationship between
More informationRobotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard
Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,
More informationAn Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
More informationData Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)
Data Mining 資 料 探 勘 Tamkang University 分 群 分 析 (Cluster Analysis) DM MI Wed,, (: :) (B) MinYuh Day 戴 敏 育 Assistant Professor 專 任 助 理 教 授 Dept. of Information Management, Tamkang University 淡 江 大 學 資
More informationCluster analysis Cosmin Lazar. COMO Lab VUB
Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,
More informationAdvanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis
Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Arumugam, P. and Christy, V Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu,
More informationClustering Hierarchical clustering and kmean clustering
Clustering Hierarchical clustering and kmean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: A quick review partition genes into distinct sets with high homogeneity
More information. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering NonSupervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
More informationUnsupervised and supervised data classification via nonsmooth and global optimization 1
Unsupervised and supervised data classification via nonsmooth and global optimization 1 A. M. Bagirov, A. M. Rubinov, N.V. Soukhoroukova and J. Yearwood School of Information Technology and Mathematical
More informationData visualization and clustering. Genomics is to no small extend a data science
Data visualization and clustering Genomics is to no small extend a data science [www.data2discovery.org] Data visualization and clustering Genomics is to no small extend a data science [Andersson et al.,
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel MartínMerino Universidad
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.unisb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationThe Result Analysis of the Cluster Methods by the Classification of Municipalities
The Result Analysis of the Cluster Methods by the Classification of Municipalities PAVEL PETR, KAŠPAROVÁ MILOSLAVA System Engineering and Informatics Institute Faculty of Economics and Administration University
More informationUnsupervised Learning: Clustering with DBSCAN Mat Kallada
Unsupervised Learning: Clustering with DBSCAN Mat Kallada STAT 2450  Introduction to Data Mining Supervised Data Mining: Predicting a column called the label The domain of data mining focused on prediction:
More informationHierarchical Clustering
Hierarchical Clustering Basics Please read the introduction to principal component analysis first. There, we explain how spectra can be treated as data points in a multidimensional space, which is required
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationData Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
More informationClustering Genetic Algorithm
Clustering Genetic Algorithm Petra Kudová Department of Theoretical Computer Science Institute of Computer Science Academy of Sciences of the Czech Republic ETID 2007 Outline Introduction Clustering Genetic
More informationFlat Clustering KMeans Algorithm
Flat Clustering KMeans Algorithm 1. Purpose. Clustering algorithms group a set of documents into subsets or clusters. The cluster algorithms goal is to create clusters that are coherent internally, but
More informationCluster Analysis: Basic Concepts and Algorithms
8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should
More informationTime series clustering and the analysis of film style
Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such
More informationVisualization of textual data: unfolding the Kohonen maps.
Visualization of textual data: unfolding the Kohonen maps. CNRS  GET  ENST 46 rue Barrault, 75013, Paris, France (email: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing
More informationOn Clustering Validation Techniques
Journal of Intelligent Information Systems, 17:2/3, 107 145, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques MARIA HALKIDI mhalk@aueb.gr YANNIS
More informationUnsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets Knearest neighbor Support vectors Linear regression Logistic regression...
More informationOriginal Article Survey of Recent Clustering Techniques in Data Mining
International Archive of Applied Sciences and Technology Volume 3 [2] June 2012: 6875 ISSN: 09764828 Society of Education, India Website: www.soeagra.com/iaast/iaast.htm Original Article Survey of Recent
More informationChurn problem in retail banking Current methods in churn prediction models Fuzzy cmeans clustering algorithm vs. classical kmeans clustering
CHURN PREDICTION MODEL IN RETAIL BANKING USING FUZZY C MEANS CLUSTERING Džulijana Popović Consumer Finance, Zagrebačka banka d.d. Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing University
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More informationData Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di MilanoBicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it
More informationAn Introduction to Cluster Analysis for Data Mining
An Introduction to Cluster Analysis for Data Mining 10/02/2000 11:42 AM 1. INTRODUCTION... 4 1.1. Scope of This Paper... 4 1.2. What Cluster Analysis Is... 4 1.3. What Cluster Analysis Is Not... 5 2. OVERVIEW...
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What is
More informationSELFORGANISING MAPPING NETWORKS (SOM) WITH SAS EMINER
SELFORGANISING MAPPING NETWORKS (SOM) WITH SAS EMINER C.Sarada, K.Alivelu and Lakshmi Prayaga Directorate of Oilseeds Research, Rajendranagar, Hyderabad saradac@yahoo.com Self Organising mapping networks
More informationAn Ameliorated Partitioning Clustering Algorithm for Large Data Sets
An Ameliorated Partitioning Clustering Algorithm for Large Data Sets Raghavi Chouhan 1, Abhishek Chauhan 2 MTech Scholar, CSE department, NRI Institute of Information Science and Technology, Bhopal, India
More informationIntroduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones
Introduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation
More informationMultivariate Analysis
Table Of Contents Multivariate Analysis... 1 Overview... 1 Principal Components... 2 Factor Analysis... 5 Cluster Observations... 12 Cluster Variables... 17 Cluster KMeans... 20 Discriminant Analysis...
More informationPCA, Clustering and Classification. By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker
PCA, Clustering and Classification By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker Motivation: Multidimensional data Pat1 Pat2 Pat3 Pat4 Pat5 Pat6 Pat7 Pat8 Pat9 209619_at 7758 4705 5342
More information