Cluster Analysis Overview. Data Mining Techniques: Cluster Analysis. What is Cluster Analysis? What is Cluster Analysis?
|
|
- Austen Daniels
- 8 years ago
- Views:
Transcription
1 Cluster Analsis Overview Data Mining Techniques: Cluster Analsis Mirek Riedewald Man slides based on presentations b Han/Kamber, Tan/Steinbach/Kumar, and Andrew Moore Introduction Foundations: Measuring Distance (Similarit) Partitioning Methods: K-Means Hierarchical Methods Densit-Based Methods Clustering High-Dimensional Data Cluster Evaluation What is Cluster Analsis? What is Cluster Analsis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Unsupervised learning: usuall no training set with known classes Tpical applications As a stand-alone tool to get insight into data properties As a preprocessing step for other algorithms Intra-cluster distances are minimized Inter-cluster distances are maimized Rich Applications, Multidisciplinar Efforts Pattern Recognition Spatial Data Analsis Image Processing Data Reduction Economic Science Market research WWW Document classification Weblogs: discover groups of similar access patterns Clustering precipitation in Australia Eamples of Clustering Applications Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs Land use: Identification of areas of similar land use in an earth observation database Insurance: Identifing groups of motor insurance polic holders with a high average claim cost Cit-planning: Identifing groups of houses according to their house tpe, value, and geographical location Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults
2 Qualit: What Is Good Clustering? Notion of a Cluster can be Ambiguous Cluster membership objects in same class High intra-class similarit, low inter-class similarit Choice of similarit measure is important How man clusters? Si Clusters Abilit to discover some or all of the hidden patterns Difficult to measure without ground truth Two Clusters Four Clusters 7 Distinctions Between Sets of Clusters Eclusive versus non-eclusive Non-eclusive clustering: points ma belong to multiple clusters Fuzz versus non-fuzz Fuzz clustering: a point belongs to ever cluster with some weight between and Weights must sum to Partial versus complete Cluster some or all of the data Heterogeneous versus homogeneous Clusters of widel different sizes, shapes, densities Cluster Analsis Overview Introduction Foundations: Measuring Distance (Similarit) Partitioning Methods: K-Means Hierarchical Methods Densit-Based Methods Clustering High-Dimensional Data Cluster Evaluation 9 Distance Clustering is inherentl connected to question of (dis-)similarit of objects How can we define similarit between objects? Similarit Between Objects Usuall measured b some notion of distance Popular choice: Minkowski distance q q q q dist ( i ), ( j) ( i) ( j) ( i) ( j) ( i) ( j) d d q is a positive integer q = : Manhattan distance dist ( i), ( j) ( i) ( j) ( i) ( j) ( i) ( j) d d q = : Euclidean distance: dist ( i), ( j) ( i) ( j) ( i) ( j) ( i ) ( j ) d d
3 Metrics Properties of a metric d(i,j) d(i,j) = if and onl if i=j d(i,j) = d(j,i) d(i,j) d(i,k) + d(k,j) Eamples: Euclidean distance, Manhattan distance Man other non-metric similarit measures eist After selecting the distance function, is it now clear how to compute similarit between objects? Challenges How to compute a distance for categorical attributes An attribute with a large domain often dominates the overall distance Weight and scale the attributes like for k-nn Curse of dimensionalit Curse of Dimensionalit Best solution: remove an attribute that is known to be ver nois or not interesting Tr different subsets of the attributes and determine where good clusters are found Nominal Attributes Method : work with original values Difference = if same value, difference = otherwise Method : transform to binar attributes New binar attribute for each domain value Encode specific domain value b setting corresponding binar attribute to and all others to Ordinal Attributes Method : treat as nominal Problem: loses ordering information Method : map to [,] Problem: To which values should the original values be mapped? Default: equi-distant mapping to [,] Scaling and Transforming Attributes Sometimes it might be necessar to transform numerical attributes to [,] or use another normalizing transformation, mabe even nonlinear (e.g., logarithm) Might need to weight attributes differentl Often requires epert knowledge or trial-anderror 7
4 Other Similarit Measures Special distance or similarit measures for man applications Might be a non-metric function Information retrieval Document similarit based on kewords Bioinformatics Gene features in micro-arras Calculating Cluster Distances Single link = smallest distance between an element in one cluster and an element in the other: dist(k i, K j ) = min( ip, jq ) Complete link = largest distance between an element in one cluster and an element in the other: dist(k i, K j ) = ma( ip, jq ) Average distance between an element in one cluster and an element in the other: dist(k i, K j ) = avg( ip, jq ) Distance between cluster centroids: dist(k i, K j ) = d(m i, m j ) Distance between cluster medoids: dist(k i, K j ) = dist( mi, mj ) Medoid: one chosen, centrall located object in the cluster 9 Cluster Centroid, Radius, and Diameter Centroid: the middle of a cluster C m Radius: square root of average distance from an point of the cluster to its centroid ( m) C R C Diameter: square root of average mean squared distance between all pairs of points in the cluster ( ) C C, D C ( C ) C C Cluster Analsis Overview Introduction Foundations: Measuring Distance (Similarit) Partitioning Methods: K-Means Hierarchical Methods Densit-Based Methods Clustering High-Dimensional Data Cluster Evaluation Partitioning Algorithms: Basic Concept Construct a partition of a database D of n objects into a set of K clusters, s.t. sum of squared distances to cluster representative m is minimized K ( m ) i C i i Given a K, find partition of K clusters that optimizes the chosen partitioning criterion Globall optimal: enumerate all partitions Heuristic methods K-means ( 7): each cluster represented b its centroid K-medoids ( 7): each cluster represented b one of the objects in the cluster K-means Clustering Each cluster is associated with a centroid Each object is assigned to the cluster with the closest centroid. Given K, select K random objects as initial centroids. Repeat until centroids do not change. Form K clusters b assigning ever object to its nearest centroid. Recompute centroid of each cluster
5 K-Means Eample Iteration. Overview of K-Means Convergence Iteration Iteration Iteration Iteration Iteration Iteration K-means Questions What is it tring to optimize? Will it alwas terminate? Will it find an optimal clustering? How should we start it? How could we automaticall choose the number of centers?.we ll deal with these questions net K-means Clustering Details Initial centroids often chosen randoml Clusters produced var from one run to another Distance usuall measured b Euclidean distance, cosine similarit, correlation, etc. Comparabl fast algorithm: O( n * K * I * d ) n = number of objects I = number of iterations d = number of attributes 7 Evaluating K-means Clusters Most common measure: Sum of Squared Error (SSE) For each point, the error is the distance to the nearest centroid K SSE dist ( m, ) i C i m i = centroid of cluster C i Given two clusterings, choose the one with the smallest error Eas wa to reduce SSE: increase K In practice, large K not interesting i K-means Convergence () Assign each to its nearest center (minimizes SSE for fied centers) () Choose centroid of all points in the same cluster as cluster center (minimizes SSE for fied clusters) Ccle through steps () and () = K-means algorithm Algorithm terminates when neither () nor () results in change of configuration Finite number of was of partitioning n records into K groups If the configuration changes on an iteration, it must have improved SSE So each time the configuration changes it must go to a configuration it has never been to before So if it tried to go on forever, it would eventuall run out of configurations 9
6 Will it Find the Optimal Clustering?... Importance of Initial Centroids Iteration Iteration Iteration Iteration Iteration Iteration Optimal Clustering Sub-optimal Clustering Will It Find The Optimal Clustering Now? Iteration Importance of Initial Centroids Iteration Iteration Iteration Iteration Iteration Problems with Selecting Initial Centroids Probabilit of starting with eactl one initial centroid per real cluster is ver low K selected for algorithm might be different from inherent K of the data Might randoml select multiple initial objects from same cluster Sometimes initial centroids will readjust themselves in the right wa, and sometimes the don t Clusters Eample Iteration Starting with two initial centroids in one cluster of each pair of clusters
7 Clusters Eample Iteration Iteration Clusters Eample Iteration Iteration - - Iteration Starting with two initial centroids in one cluster of each pair of clusters 7 Starting with some pairs of clusters having three initial centroids, while other have onl one. Clusters Eample Iteration Iteration Solutions to Initial Centroids Problem Iteration Iteration Multiple runs Helps, but probabilit is not on our side Sample and use hierarchical clustering to determine initial centroids Select more than k initial centroids and then select among these the initial centroids Select those that are most widel separated Postprocessing Eliminate small clusters that ma represent outliers Split clusters with high SSE Merge clusters that are close and have low SSE Starting with some pairs of clusters having three initial centroids, while other have onl one. 9 Limitations of K-means Limitations of K-means: Differing Sizes K-means has problems when clusters are of differing Sizes Densities Non-globular shapes K-means has problems when the data contains outliers K-means ( Clusters) 7
8 Limitations of K-means: Differing Densit Limitations of K-means: Non-globular Shapes K-means ( Clusters) K-means ( Clusters) Overcoming K-means Limitations Overcoming K-means Limitations K-means Clusters K-means Clusters One solution is to use man clusters. Find parts of clusters, then put them together. Overcoming K-means Limitations K-Means and Outliers K-means Clusters K-means algorithm is sensitive to outliers Centroid is average of cluster members Outlier can dominate average computation Solution: K-medoids Medoid = most centrall located real object in a cluster Algorithm similar to K-means, but finding medoid is much more epensive Tr all objects in cluster to find the one that minimizes SSE, or just tr a few randoml to reduce cost 7
9 Cluster Analsis Overview Introduction Foundations: Measuring Distance (Similarit) Partitioning Methods: K-Means Hierarchical Methods Densit-Based Methods Clustering High-Dimensional Data Cluster Evaluation Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Visualized as a dendrogram Tree-like diagram that records the sequences of merges or splits Strengths of Hierarchical Clustering Do not have to assume an particular number of clusters An number of clusters can be obtained b cutting the dendogram at the proper level Ma correspond to meaningful taonomies Eample in biological sciences (e.g., animal kingdom, phlogen reconstruction, ) Hierarchical Clustering Two main tpes of hierarchical clustering Agglomerative: Start with the given objects as individual clusters At each step, merge the closest pair of clusters until onl one cluster (or K clusters) left Divisive: Start with one, all-inclusive cluster At each step, split a cluster until each cluster contains a single object (or there are K clusters) Agglomerative Clustering Algorithm More popular hierarchical clustering technique Basic algorithm is straightforward. Compute the proimit matri. Let each data object be a cluster. Repeat until onl a single cluster remains. Merge the two closest clusters. Update the proimit matri Ke operation: computation of the proimit of two clusters Different approaches to defining the distance between clusters distinguish the different algorithms Starting Situation Clusters of individual objects, proimit matri p p p p p.. p p p p p.... Proimit Matri... p p p p p9 p p p 9
10 Intermediate Situation Intermediate Situation Some clusters are merged C C C C C C Merge closest clusters (C and C ) and update proimit matri C C C C C C C C C C C C C Proimit Matri C C C C C C C Proimit Matri C C C C... p p p p p9 p p p... p p p p p9 p p p After Merging Defining Cluster Distance How do we update the proimit matri? C C C C C U C C C C C U C??????? C C Proimit Matri Min: clusters near each other Ma: low diameter Avg: more robust against outliers Distance between centroids C U C... p p p p p9 p p p 7 Strength of MIN Limitations of MIN Two Clusters Two Clusters Can handle non-elliptical shapes Sensitive to noise and outliers 9
11 Strength of MAX Limitations of MAX Two Clusters Two Clusters Less susceptible to noise and outliers Tends to break large clusters Biased towards globular clusters Hierarchical Clustering: Average Compromise between Single and Complete Link Strengths Less susceptible to noise and outliers Limitations Biased towards globular clusters Cluster Similarit: Ward s Method Distance of two clusters is based on the increase in squared error when two clusters are merged Similar to group average if distance between objects is distance squared Less susceptible to noise and outliers Biased towards globular clusters Hierarchical analogue of K-means Can be used to initialize K-means Hierarchical Clustering: Comparison MIN MAX Ward s Method Group Average Time and Space Requirements O(n ) space for proimit matri n = number of objects O(n ) time in man cases There are n steps and at each step the proimit matri must be updated and searched Compleit can be reduced to O(n log(n) ) time for some approaches
12 Hierarchical Clustering: Problems and Limitations Once a decision is made to combine two clusters, it cannot be undone No objective function is directl minimized Different schemes have problems with one or more of the following: Sensitivit to noise and outliers Difficult handling different sized clusters and conve shapes Breaking large clusters Cluster Analsis Overview Introduction Foundations: Measuring Distance (Similarit) Partitioning Methods: K-Means Hierarchical Methods Densit-Based Methods Clustering High-Dimensional Data Cluster Evaluation 7 7 Densit-Based Clustering Methods Clustering based on densit of data objects in a neighborhood Local clustering criterion Major features: Discover clusters of arbitrar shape Handle noise Need densit parameters as termination condition 7 DBSCAN: Basic Concepts Two parameters: Eps: Maimum radius of the neighborhood N Eps (q): {p D dist(q,p) Eps} MinPts: Minimum number of points in an Epsneighborhood of that point A point p is directl densit-reachable from a point q w.r.t. Eps and MinPts if p belongs to N Eps (q) Core point condition: MinPts = N Eps (q) MinPts p q 7 Densit-Reachable, Densit-Connected DBSCAN: Classes of A point p is densit-reachable from a point q w.r.t. Eps, MinPts if there is a chain of points q = p,p,, p n = p such that p i + is directl densitreachable from p i A point p is densit-connected to a point q w.r.t. Eps, MinPts if there is a point o such that both p and q are densit-reachable from o w.r.t. Eps and MinPts Cluster = set of densit-connected points p q o p p q A point is a core point if it has more than a specified number of points (MinPts) within Eps At the interior of a cluster A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point At the outer surface of a cluster A noise point is an point that is not a core point or a border point Not part of an cluster 77 7
13 DBSCAN: Core, Border, and Noise DBSCAN Algorithm Repeat until all points have been processed Select a point p If p is core point then Retrieve and remove all points densit-reachable from p w.r.t. Eps and MinPts; output them as a cluster Discards all noise points (how?) Discovers clusters of arbitrar shape Fairl robust against noise Runtime: O(n ), space: O(n) O(n * timetofindinneighborhood) Can be O(n log(n)) with spatial inde 79 DBSCAN: Core, Border and Noise When DBSCAN Works Well Point tpes: core, border and noise Clusters Eps =, MinPts = When DBSCAN Does NOT Work Well DBSCAN: Determining Eps and MinPts (MinPts=, large Eps) Idea: for points in a cluster, their k-th nearest neighbors are at roughl the same distance Noise points have the k-th nearest neighbor at farther distance Plot the sorted distance of ever point to its k-th nearest neighbor Choose Eps where sharp change occurs MinPts = k k too large: small clusters labeled as noise k too small: small groups of outliers labeled as cluster Varing densities High-dimensional data (MinPts=, small Eps)
14 DBSCAN: Sensitive to Parameters Cluster Analsis Overview Introduction Foundations: Measuring Distance (Similarit) Partitioning Methods: K-Means Hierarchical Methods Densit-Based Methods Clustering High-Dimensional Data Cluster Evaluation 9 Clustering High-Dimensional Data Man applications: tet documents, DNA micro-arra data Major challenges: Irrelevant dimensions ma mask clusters Curse of dimensionalit for distance computation Clusters ma eist onl in some subspaces Methods Feature transformation, e.g., PCA and SVD Some useful onl when features are highl correlated/redundant Feature selection: wrapper or filter approaches Subspace-clustering: find clusters in all subspaces CLIQUE Curse of Dimensionalit Graphs on the right adapted from Parsons et al. KDD Eplorations Data in onl one dimension is relativel packed Adding a dimension stretches the objects across that dimension, moving them further apart High-dimensional data is ver sparse Distance measure becomes meaningless For man distributions, distances between objects become more similar in high dimensions 9 97 Wh Subspace Clustering? Adapted from Parsons et al. SIGKDD Eplorations CLIQUE (Clustering In QUEst) Automaticall identifies clusters in sub-spaces Eploits monotonicit propert If a set of points forms a dense cluster in d dimensions, the also form a cluster in an subset of these dimensions A region is dense if the fraction of data points in the region eceeds the input model parameter Sound familiar? Apriori algorithm... Algorithm is both densit-based and grid-based Partitions each dimension into the same number of equallength intervals Partitions an m-dimensional data space into non-overlapping rectangular units Cluster = maimal set of connected dense units within a subspace 9 99
15 Vacation 7 Salar (,) Vacation (week) CLIQUE Algorithm Find all dense regions in -dim space for each attribute. This is the set of dense -dim cells. Let k=. Repeat until there are no dense k-dim cells k = k+ Generate all candidate k-dim cells from dense (k-)-dim cells Eliminate cells with fewer than points Find clusters b taking union of all adjacent, highdensit cells of same dimensionalit Summarize each cluster using a small set of inequalities that describe the attribute ranges of the cells in the cluster = age 7 age age Compute intersection of dense age-salar and age-vacation regions Strengths and Weaknesses of CLIQUE Strengths Automaticall finds subspaces of the highest dimensionalit that contain high-densit clusters Insensitive to the order of objects in input and does not presume some canonical data distribution Scales linearl with input size and has good scalabilit with number of dimensions Weaknesses Need to tune grid size and densit threshold Each point can be a member of man clusters Can still have high mining cost (inherent problem for subspace clustering) Same densit threshold for low and high dimensionalit Cluster Analsis Overview Introduction Foundations: Measuring Distance (Similarit) Partitioning Methods: K-Means Hierarchical Methods Densit-Based Methods Clustering High-Dimensional Data Cluster Evaluation Cluster Validit on Test Data Cluster Validit Clustering: usuall no ground truth available Problem: clusters are in the ee of the beholder Then wh do we want to evaluate them? To avoid finding patterns in noise To compare clustering algorithms To compare two sets of clusters To compare two clusters
16 Random K-means Clusters found in Random Data DBSCAN Complete Link Measuring Cluster Validit Via Correlation Two matrices Similarit Matri Incidence Matri One row and one column for each object Entr is if the associated pair of objects belongs to the same cluster, otherwise Compute correlation between the two matrices Since the matrices are smmetric, onl the correlation between n(n-) / entries needs to be calculated. High correlation: objects close to each other tend to be in same cluster Not a good measure when clusters can be non-globular and intertwined Measuring Cluster Validit Via Correlation.9 Similarit Matri for Cluster Validation Order the similarit matri with respect to cluster labels and inspect visuall Block-diagonal matri for well-separated clusters Corr =.9 Corr = Similarit 9 Similarit Matri for Cluster Validation Clusters in random data are not so crisp Similarit Matri for Cluster Validation Clusters in random data are not so crisp Similarit.... Similarit.... DBSCAN K-means
17 Count SSE Similarit Matri for Cluster Validation Similarit Matri for Cluster Validation Clusters in random data are not so crisp Similarit Complete Link DBSCAN Sum of Squared Error When SSE Is Not So Great For fied number of clusters, lower SSE indicates better clustering Not necessaril true for non-globular, intertwined clusters Can also be used to estimate the number of clusters Run K-means for different K, compare SSE SSE of clusters found using K-means - K Comparison to Random Data or Clustering Need a framework to interpret an measure E.g., if measure =, is that good or bad? Statistical framework for cluster validit Compare cluster qualit measure on random data or random clustering to those on real data If value for random setting is unlikel, then cluster results are valid (cluster = non-random structure) For comparing the results of two different sets of cluster analses, a framework is less necessar But: need to know whether the difference between two inde values is significant Statistical Framework for SSE Eample: found clusters, got SSE =. for given data set Compare to SSE of clusters in random data Histogram: SSE of clusters in sets of random data points ( points from range.. for and ) Estimate mean, stdev for SSE on random data Check how man stdev awa from mean the real-data SSE is Random data SSE 7
18 Statistical Framework for Correlation Compare correlation of incidence and proimit matrices for well-separated data versus random data Random data Cluster Cohesion and Separation Cohesion: how closel related are objects in a cluster Can be measured b SSE (m i = centroid of cluster i): SSE ( mi) ( ) C i Ci i i, C i Separation: how well-separated are clusters Can be measured b between-cluster sum of squares (m = overall mean): BSS ( m m ) i C i i Corr = -.9 Corr = -. 9 cohesion separation Cohesion and Separation Eample Note: BSS + SSE = constant Minimize SSE get ma. BSS m m m K= cluster: K= clusters: SSE ( ) ( ) ( ) ( ) BSS ( ) Total SSE (.) (.) (.) (.) BSS (.) (. ) 9 Total 9 Silhouette Coefficient Combines ideas of both cohesion and separation For an individual object i Calculate a i = average distance of i to the objects in its cluster Calculate b i = average distance of i to objects in another cluster C, choosing the C that minimizes b i Silhouette coefficient of i = (b i -a i ) / ma{a i,b i } Range: [-,], but tpicall between and The closer to, the better Can calculate the Average Silhouette width over all objects b a Final Comment on Cluster Validit The validation of clustering structures is the most difficult and frustrating part of cluster analsis. Without a strong effort in this direction, cluster analsis will remain a black art accessible onl to those true believers who have eperience and great courage. Algorithms for Clustering Data, Jain and Dubes
19 Summar Cluster analsis groups objects based on their similarit (or distance) and has wide applications Measure of similarit (or distance) can be computed for all tpes of data Man different tpes of clustering algorithms Discover different tpes of clusters Man measures of clustering qualit, but absence of ground truth alwas a challenge 9
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What is
More informationExample: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
More informationK-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationFor supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall
Cluster Validation Cluster Validit For supervised classification we have a variet of measures to evaluate how good our model is Accurac, precision, recall For cluster analsis, the analogous question is
More informationCluster Analysis: Basic Concepts and Algorithms
Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths
More informationData Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationClustering. Clustering. What is Clustering? What is Clustering? What is Clustering? Types of Data in Cluster Analysis
What is Clustering? Clustering Tpes of Data in Cluster Analsis Clustering A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods What is Clustering? Clustering of data is
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More informationCluster Analysis: Basic Concepts and Algorithms
8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationData Mining for Knowledge Management. Clustering
Data Mining for Knowledge Management Clustering Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management Thanks for slides to: Jiawei Han Eamonn Keogh Jeff
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
More informationAn Introduction to Cluster Analysis for Data Mining
An Introduction to Cluster Analysis for Data Mining 10/02/2000 11:42 AM 1. INTRODUCTION... 4 1.1. Scope of This Paper... 4 1.2. What Cluster Analysis Is... 4 1.3. What Cluster Analysis Is Not... 5 2. OVERVIEW...
More informationCluster Analysis: Basic Concepts and Methods
10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationClient Based Power Iteration Clustering Algorithm to Reduce Dimensionality in Big Data
Client Based Power Iteration Clustering Algorithm to Reduce Dimensionalit in Big Data Jaalatchum. D 1, Thambidurai. P 1, Department of CSE, PKIET, Karaikal, India Abstract - Clustering is a group of objects
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationComparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool
Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract:- Data mining is used to find the hidden information pattern and relationship between
More informationCluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationClustering: Techniques & Applications. Nguyen Sinh Hoa, Nguyen Hung Son. 15 lutego 2006 Clustering 1
Clustering: Techniques & Applications Nguyen Sinh Hoa, Nguyen Hung Son 15 lutego 2006 Clustering 1 Agenda Introduction Clustering Methods Applications: Outlier Analysis Gene clustering Summary and Conclusions
More informationAn Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
More informationClustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationConcept of Cluster Analysis
RESEARCH PAPER ON CLUSTER TECHNIQUES OF DATA VARIATIONS Er. Arpit Gupta 1,Er.Ankit Gupta 2,Er. Amit Mishra 3 arpit_jp@yahoo.co.in, ank_mgcgv@yahoo.co.in,amitmishra.mtech@gmail.com Faculty Of Engineering
More informationData Clustering Techniques Qualifying Oral Examination Paper
Data Clustering Techniques Qualifying Oral Examination Paper Periklis Andritsos University of Toronto Department of Computer Science periklis@cs.toronto.edu March 11, 2002 1 Introduction During a cholera
More informationGoing Big in Data Dimensionality:
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE Going Big in Data Dimensionality: Challenges and Solutions for Mining High Dimensional Data Peer Kröger Lehrstuhl für
More informationClassification Techniques (1)
10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality
More informationThere are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
More informationA comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
More informationCluster analysis Cosmin Lazar. COMO Lab VUB
Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,
More informationSummary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen
Summary Data Mining & Process Mining (1BM46) Made by S.P.T. Ariesen Content Data Mining part... 2 Lecture 1... 2 Lecture 2:... 4 Lecture 3... 7 Lecture 4... 9 Process mining part... 13 Lecture 5... 13
More informationData Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)
Data Mining 資 料 探 勘 Tamkang University 分 群 分 析 (Cluster Analysis) DM MI Wed,, (:- :) (B) Min-Yuh Day 戴 敏 育 Assistant Professor 專 任 助 理 教 授 Dept. of Information Management, Tamkang University 淡 江 大 學 資
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More information2 Basic Concepts and Techniques of Cluster Analysis
The Challenges of Clustering High Dimensional Data * Michael Steinbach, Levent Ertöz, and Vipin Kumar Abstract Cluster analysis divides data into groups (clusters) for the purposes of summarization or
More informationClustering methods for Big data analysis
Clustering methods for Big data analysis Keshav Sanse, Meena Sharma Abstract Today s age is the age of data. Nowadays the data is being produced at a tremendous rate. In order to make use of this large-scale
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
More informationData Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
More informationPERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
More informationHow To Cluster On A Large Data Set
An Ameliorated Partitioning Clustering Algorithm for Large Data Sets Raghavi Chouhan 1, Abhishek Chauhan 2 MTech Scholar, CSE department, NRI Institute of Information Science and Technology, Bhopal, India
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationOutlier Detection in Clustering
Outlier Detection in Clustering Svetlana Cherednichenko 24.01.2005 University of Joensuu Department of Computer Science Master s Thesis TABLE OF CONTENTS 1. INTRODUCTION...1 1.1. BASIC DEFINITIONS... 1
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationCluster Analysis using R
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other
More informationClustering Techniques: A Brief Survey of Different Clustering Algorithms
Clustering Techniques: A Brief Survey of Different Clustering Algorithms Deepti Sisodia Technocrates Institute of Technology, Bhopal, India Lokesh Singh Technocrates Institute of Technology, Bhopal, India
More informationData Mining 5. Cluster Analysis
Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More information0.1 What is Cluster Analysis?
Cluster Analysis 1 2 0.1 What is Cluster Analysis? Cluster analysis is concerned with forming groups of similar objects based on several measurements of different kinds made on the objects. The key idea
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More informationTerritorial Analysis for Ratemaking. Philip Begher, Dario Biasini, Filip Branitchev, David Graham, Erik McCracken, Rachel Rogers and Alex Takacs
Territorial Analysis for Ratemaking by Philip Begher, Dario Biasini, Filip Branitchev, David Graham, Erik McCracken, Rachel Rogers and Alex Takacs Department of Statistics and Applied Probability University
More informationHow To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationLinköpings Universitet - ITN TNM033 2011-11-30 DBSCAN. A Density-Based Spatial Clustering of Application with Noise
DBSCAN A Density-Based Spatial Clustering of Application with Noise Henrik Bäcklund (henba892), Anders Hedblom (andh893), Niklas Neijman (nikne866) 1 1. Introduction Today data is received automatically
More informationData Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationThe Role of Visualization in Effective Data Cleaning
The Role of Visualization in Effective Data Cleaning Yu Qian Dept. of Computer Science The University of Texas at Dallas Richardson, TX 75083-0688, USA qianyu@student.utdallas.edu Kang Zhang Dept. of Computer
More informationA Comparative Study of clustering algorithms Using weka tools
A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering
More information. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationOn Clustering Validation Techniques
Journal of Intelligent Information Systems, 17:2/3, 107 145, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques MARIA HALKIDI mhalk@aueb.gr YANNIS
More informationStatistical Databases and Registers with some datamining
Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University
More informationData Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
More informationCLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
More informationA Method for Decentralized Clustering in Large Multi-Agent Systems
A Method for Decentralized Clustering in Large Multi-Agent Systems Elth Ogston, Benno Overeinder, Maarten van Steen, and Frances Brazier Department of Computer Science, Vrije Universiteit Amsterdam {elth,bjo,steen,frances}@cs.vu.nl
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationClustering. Chapter 7. 7.1 Introduction to Clustering Techniques. 7.1.1 Points, Spaces, and Distances
240 Chapter 7 Clustering Clustering is the process of examining a collection of points, and grouping the points into clusters according to some distance measure. The goal is that points in the same cluster
More informationA Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases
Published in the Proceedings of 14th International Conference on Data Engineering (ICDE 98) A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases Xiaowei Xu, Martin Ester, Hans-Peter
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationA Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationAuthors. Data Clustering: Algorithms and Applications
Authors Data Clustering: Algorithms and Applications 2 Contents 1 Grid-based Clustering 1 Wei Cheng, Wei Wang, and Sandra Batista 1.1 Introduction................................... 1 1.2 The Classical
More informationSolving Quadratic Equations by Graphing. Consider an equation of the form. y ax 2 bx c a 0. In an equation of the form
SECTION 11.3 Solving Quadratic Equations b Graphing 11.3 OBJECTIVES 1. Find an ais of smmetr 2. Find a verte 3. Graph a parabola 4. Solve quadratic equations b graphing 5. Solve an application involving
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More informationA Comparative Analysis of Various Clustering Techniques used for Very Large Datasets
A Comparative Analysis of Various Clustering Techniques used for Very Large Datasets Preeti Baser, Assistant Professor, SJPIBMCA, Gandhinagar, Gujarat, India 382 007 Research Scholar, R. K. University,
More informationForschungskolleg Data Analytics Methods and Techniques
Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.-Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!
More informationDownloaded from www.heinemann.co.uk/ib. equations. 2.4 The reciprocal function x 1 x
Functions and equations Assessment statements. Concept of function f : f (); domain, range, image (value). Composite functions (f g); identit function. Inverse function f.. The graph of a function; its
More informationSoSe 2014: M-TANI: Big Data Analytics
SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationUSING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva
382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationUnsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More information