Introduction to Clustering
|
|
|
- Prosper Weaver
- 9 years ago
- Views:
Transcription
1 Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
2 Microarray Example N=65 P=1756 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
3 Clustering The data set {x ij }, i =1,..,N, j=1,...,p consist of P features measured on n independent observations. Clustering Clustering algorithm seek to assign N observations in p space, labeled x 1,.., x N to one of K groups, based on some similarity measure. Unsupervised learning the problem of finding groups in data without the help of a response variable No right or wrong partition umi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
4 What is similarity measure? x 1 and x 2 are observation vectors in p dimention Some examples Euclidean distance x i x i 2 n[,2] n[,1] Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
5 What is similarity measure? x 1 and x 2 are observation vectors in p dimention Some examples Euclidean distance x i x i 2 n[,2] n[,1] Absolute distance x i x i 2 1 Correlation with d = 1 correlation Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
6 what is K-means? Clustering method Hierarchical Clustering Non-hierarchical Clustering -K-means Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
7 Hierarchical Clustering Outline what is K-means? Hierarchical Clustering It produces a dendrogram that represents a nested set of clusters: depending on where the dendrogram is cut, between 1 and N clusters can result. Cool Microarray example http : //genome c ancer/molecularportraits/download.shtm Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
8 what is K-means? Hierarchical Clustering PRO CON Nice tree! (dendrogram) Visualize the different levels of similarity between observations. computationally expensive! Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
9 what is K-means? Non-hierarchical clustering, K-means K-mean with Euclidian distance as a similarity measure Solution of K-mean clustering is the partition such that min WSS C 1,..,C K = min C 1,..,C K K 1 n k k=1 i,i C k x i x i 2 -white board Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
10 what is K-means? note; WSS = = = K 1 x i x i 2 2n k i,i C k K 1 K K x i x k + x k x i 2 2n k k=1 k=1 i=1 i =1 K n k x i x k 2 k=1 i=1 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
11 Algorithm for K-mean Outline what is K-means? Step 1 and step 2 are iterated until convergence. Step 1. Given cluster assignment C 1,.., C K, cluster centroids are calculated as i C ˆµ k = k x i k = 1,.., K N Step 2. Given cluster centroids, objective function is minimized by assigning each observation to the closest cluster mean. I i = argmin 1 k K x i ˆµ k 2 white board Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
12 what is K-means? Correlation as similarity measurement in K-means It is not so easy to create an algorithm when similarity measurement is correlation. No simple analytic form for cluster centroid :( Data transformation approach 1. normalize the observation vector x i = x i x x i x 2 P umi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
13 what is K-means? Correlation as similarity measurement in K-means It is not so easy to create an algorithm when similarity measurement is correlation. No simple analytic form for cluster centroid :( Data transformation approach 1. normalize the observation vector x i = x i x x i x 2 2. x i ỹ i 2 d ρx,y P x i ỹ i 2 x i x = y i ȳ 2 x i x 2 P y i ȳ 2 P = x i x 2 x i x 2 P P P x i x 2 y i ȳ 2 (x i x) (y i ȳ) y 2 i ȳ y i ȳ 2 P 2 = 2p 2p (x i x) (y i ȳ) x i x y i ȳ umi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
14 what is K-means? Non-hierarchical Method K-means Drawback of K-means No pretty tree The number of clusters must be pre-known! Not robust Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
15 K must be preknown but how? GAP statistics, Tibshirani et al (2001) Clest Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
16 GAP statistics idea behind GAP statistics Find ˆk such that WSS k shows an elbow decline Jump to WSS -cool example in R Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
17 Definition GAP(k) = E null (log(wss(k)) ˆ log(wss(k)) ˆk = the smallest k such that GAP(k) GAP(k + 1) s k+1 Standardize the graph of log (WSS(k)) by comparing it with its expectation under an appropriate null reference distribution of the data ˆ E null (log(wss(k)) = 1 B B log(wss(k) b ) i=1 -another cool one in R Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
18 Reference distribution: K=1 Generate the reference features from a uniform distribution over a box aligned with the principal components of the data. Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
19 Reference distribution: K=1 Generate the reference features from a uniform distribution over a box aligned with the principal components of the data. Orthogonally diagonalize S X = PDP. D is a diagonal matrix with the eigenvalues,λ 1,.., λ p of S on the diagonal, arranged so that,λ 1,.., λ p 0 and P is an orthogonal matrix whose columns are the corresponding unit eigenvectors u 1,.., u p. Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
20 Reference distribution: K=1 Generate the reference features from a uniform distribution over a box aligned with the principal components of the data. Orthogonally diagonalize S X = PDP. D is a diagonal matrix with the eigenvalues,λ 1,.., λ p of S on the diagonal, arranged so that,λ 1,.., λ p 0 and P is an orthogonal matrix whose columns are the corresponding unit eigenvectors u 1,.., u p. Transform via X = XP. Then S X = P S X P = P PDP P = D. The transformed data is no longer correlated. Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
21 Reference distribution: K=1 Generate the reference features from a uniform distribution over a box aligned with the principal components of the data. Orthogonally diagonalize S X = PDP. D is a diagonal matrix with the eigenvalues,λ 1,.., λ p of S on the diagonal, arranged so that,λ 1,.., λ p 0 and P is an orthogonal matrix whose columns are the corresponding unit eigenvectors u 1,.., u p. Transform via X = XP. Then S X = P S X P = P PDP P = D. The transformed data is no longer correlated. Draw uniform features Z over the range of the columns of X. Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
22 Reference distribution: K=1 Generate the reference features from a uniform distribution over a box aligned with the principal components of the data. Orthogonally diagonalize S X = PDP. D is a diagonal matrix with the eigenvalues,λ 1,.., λ p of S on the diagonal, arranged so that,λ 1,.., λ p 0 and P is an orthogonal matrix whose columns are the corresponding unit eigenvectors u 1,.., u p. Transform via X = XP. Then S X = P S X P = P PDP P = D. The transformed data is no longer correlated. Draw uniform features Z over the range of the columns of X. Finally we back-transform via Z = Z P to give reference data set Z. Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
23 Definition GAP(k) = E null (log(wss(k))) ˆ log(wss(k)) ˆk = the smallest k such that GAP(k) GAP(k + 1) s k+1 ˆ E null (log(wss(k)) = 1 B B log(wss(k) b ) i=1 Standardize the graph of log (WSS(k)) by comparing it with its expectation under an appropriate null reference distribution of the data. Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
24 Prediction-based resampling method, Clest Clest returns K which has the most stable predictability in clustering procedure. Algorithm For each K, repeat the following process B times for data set and reference data set partition data set into learning set and testing set perform K-means on learning set, return classifiers Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
25 Prediction-based resampling method, Clest Clest returns K which has the most stable predictability in clustering procedure. Algorithm For each K, repeat the following process B times for data set and reference data set partition data set into learning set and testing set perform K-means on learning set, return classifiers classify testing set by classifiers, return C 1,classifier,.., C K,classifiers Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
26 Prediction-based resampling method, Clest Clest returns K which has the most stable predictability in clustering procedure. Algorithm For each K, repeat the following process B times for data set and reference data set partition data set into learning set and testing set perform K-means on learning set, return classifiers classify testing set by classifiers, return C 1,classifier,.., C K,classifiers classify testing set by Kmeans, return C 1,K mean,..., C K,K means Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
27 Prediction-based resampling method, Clest Clest returns K which has the most stable predictability in clustering procedure. Algorithm For each K, repeat the following process B times for data set and reference data set partition data set into learning set and testing set perform K-means on learning set, return classifiers classify testing set by classifiers, return C 1,classifier,.., C K,classifiers classify testing set by Kmeans, return C 1,K mean,..., C K,K means measure the similarity of two partitions C 1,classifier,.., C K,classifiers and C 1,K mean,..., C K,K means Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
28 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
29 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
30 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
31 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
32 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
33 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
34 Compute S(k,cluster labels 1, cluster labels 2) Repeat this process B times for each K and obtain the average of measure S k Repeat the algorithm for reference dataset for B timesand obtain S 0 k Obtain standardized similarity measure d k =S k S 0 k and ˆK=argmax k {1,..,K} d k Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
35 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
36 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
37 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
38 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
39 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
40 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
41 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
42 Compute S(k,cluster labels 1, cluster labels 2) Repeat this process B times for each K and obtain the average of measure S k Repeat the algorithm for reference dataset and obtain S 0 k Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
43 Compute S(k,cluster labels 1, cluster labels 2) Repeat this process B times for each K and obtain the average of measure S k Repeat the algorithm for reference dataset and obtain S 0 k Obtain standardized similarity measure d k =S k S 0 k and ˆK=argmax k {1,..,K} d k Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
44 Similarity measures of two partitions Let P and Q represent two partitions i>i I CER = P(i,i ) I Q(i,i ) ( n 2) { 1 if i and i belong to the same cluster by partitioning P I P(i,i ) = 0 otherwise 0 CER 1 CER= 0 means perfect agreement of two partitions CER= 1 means complete disagreement of two partitions Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
45 Does Clest outperform GAP statistics? Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
46 Tibshirani, Robert,et al. Estimating the number of clusters in a data set via the gap statistic Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, / 36
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Statistical Databases and Registers with some datamining
Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat
HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 [email protected] What is Learning? "Learning denotes changes in a system that enable
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
CLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
SoSe 2014: M-TANI: Big Data Analytics
SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering
Distances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
Cluster Analysis: Basic Concepts and Algorithms
Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Hierarchical Cluster Analysis Some Basics and Algorithms
Hierarchical Cluster Analysis Some Basics and Algorithms Nethra Sambamoorthi CRMportals Inc., 11 Bartram Road, Englishtown, NJ 07726 (NOTE: Please use always the latest copy of the document. Click on this
Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)
Data Mining 資 料 探 勘 Tamkang University 分 群 分 析 (Cluster Analysis) DM MI Wed,, (:- :) (B) Min-Yuh Day 戴 敏 育 Assistant Professor 專 任 助 理 教 授 Dept. of Information Management, Tamkang University 淡 江 大 學 資
COC131 Data Mining - Clustering
COC131 Data Mining - Clustering Martin D. Sykora [email protected] Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
Exploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany [email protected] Visualization
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
Cluster analysis with SPSS: K-Means Cluster Analysis
analysis with SPSS: K-Means Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,
Multivariate Analysis
Table Of Contents Multivariate Analysis... 1 Overview... 1 Principal Components... 2 Factor Analysis... 5 Cluster Observations... 12 Cluster Variables... 17 Cluster K-Means... 20 Discriminant Analysis...
K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar
Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Cluster Analysis using R
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set
A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set D.Napoleon Assistant Professor Department of Computer Science Bharathiar University Coimbatore
Clustering Very Large Data Sets with Principal Direction Divisive Partitioning
Clustering Very Large Data Sets with Principal Direction Divisive Partitioning David Littau 1 and Daniel Boley 2 1 University of Minnesota, Minneapolis MN 55455 [email protected] 2 University of Minnesota,
Multidimensional data analysis
Multidimensional data analysis Ella Bingham Dept of Computer Science, University of Helsinki [email protected] June 2008 The Finnish Graduate School in Astronomy and Space Physics Summer School
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Hierarchical Clustering Analysis
Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger
Applying Data Analysis to Big Data Benchmarks Jazmine Olinger Abstract This paper describes finding accurate and fast ways to simulate Big Data benchmarks. Specifically, using the currently existing simulation
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Distances between Clustering, Hierarchical Clustering
Distances between Clustering, Hierarchical Clustering 36-350, Data Mining 14 September 2009 Contents 1 Distances Between Partitions 1 2 Hierarchical clustering 2 2.1 Ward s method............................
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
Tutorial Segmentation and Classification
MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel
Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen
Summary Data Mining & Process Mining (1BM46) Made by S.P.T. Ariesen Content Data Mining part... 2 Lecture 1... 2 Lecture 2:... 4 Lecture 3... 7 Lecture 4... 9 Process mining part... 13 Lecture 5... 13
SPSS Tutorial. AEB 37 / AE 802 Marketing Research Methods Week 7
SPSS Tutorial AEB 37 / AE 802 Marketing Research Methods Week 7 Cluster analysis Lecture / Tutorial outline Cluster analysis Example of cluster analysis Work on the assignment Cluster Analysis It is a
Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What is
A Cluster Analysis Approach for Banks Risk Profile: The Romanian Evidence
109 European Research Studies, Volume XII, Issue (1) 2009 A Cluster Analysis Approach for Banks Risk Profile: The Romanian Evidence By Nicolae DARDAC 1 Iustina Alina BOITAN 2 Abstract: Cluster analysis,
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
Unsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
Didacticiel - Études de cas
1 Topic Linear Discriminant Analysis Data Mining Tools Comparison (Tanagra, R, SAS and SPSS). Linear discriminant analysis is a popular method in domains of statistics, machine learning and pattern recognition.
Rank one SVD: un algorithm pour la visualisation d une matrice non négative
Rank one SVD: un algorithm pour la visualisation d une matrice non négative L. Labiod and M. Nadif LIPADE - Universite ParisDescartes, France ECAIS 2013 November 7, 2013 Outline Outline 1 Data visualization
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: [email protected] Office: Dipartimento di Ingegneria
Segmentation of stock trading customers according to potential value
Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research
Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering
IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Chapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
Cluster Analysis for Optimal Indexing
Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference Cluster Analysis for Optimal Indexing Tim Wylie, Michael A. Schuh, John Sheppard, and Rafal A.
NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES
NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES Silvija Vlah Kristina Soric Visnja Vojvodic Rosenzweig Department of Mathematics
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES C.Priyanka 1, T.Giri Babu 2 1 M.Tech Student, Dept of CSE, Malla Reddy Engineering
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Vector Quantization and Clustering
Vector Quantization and Clustering Introduction K-means clustering Clustering issues Hierarchical clustering Divisive (top-down) clustering Agglomerative (bottom-up) clustering Applications to speech recognition
Bisecting K-Means for Clustering Web Log data
Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining
A Demonstration of Hierarchical Clustering
Recitation Supplement: Hierarchical Clustering and Principal Component Analysis in SAS November 18, 2002 The Methods In addition to K-means clustering, SAS provides several other types of unsupervised
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Cluster analysis Cosmin Lazar. COMO Lab VUB
Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,
Manifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
A Relevant Document Information Clustering Algorithm for Web Search Engine
A Relevant Document Information Clustering Algorithm for Web Search Engine Y.SureshBabu, K.Venkat Mutyalu, Y.A.Siva Prasad Abstract Search engines are the Hub of Information, The advances in computing
Package MixGHD. June 26, 2015
Type Package Package MixGHD June 26, 2015 Title Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions Version 1.7 Date 2015-6-15 Author
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
B490 Mining the Big Data. 2 Clustering
B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern
