Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering Steffen Unkel
|
|
- Vanessa McBride
- 8 years ago
- Views:
Transcription
1 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering Steffen Unkel Institut für Statistik LMU München Sommersemester 2013
2 Outline 1 Setting the scene 2 Methods for fuzzy clustering 3 The assessment of fuzzy clustering 4 Software for fuzzy clustering 5 Example 6 Literature Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 1 / 20
3 1 Setting the scene Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 2 / 20
4 The concept of a membership function In fuzzy clustering, also referred to as soft clustering, objects are not assigned to a particular cluster. The objects possess a membership function indicating the strength of membership in all or some of the clusters. In most of all other existing clustering techniques, strength of membership is either zero or one, with an object being either in or not in a cluster (hard clustering). An exception is perhaps the finite mixture approach, where strength of membership might be taken as the posteriori probability of belonging to a cluster. Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 3 / 20
5 The concept of a membership function / 2 In fuzzy clustering jargon, methods where strength of membership is zero or one are known as crisp methods. Fuzzy clustering has the advantage over crisp methods that memberships for any given object indicate whether there is a second best cluster that is almost as good as the best cluster. The concept of a membership function derives from fuzzy logic. The term fuzzy logic was introduced with the 1965 proposal of fuzzy set theory by Lotfi A. Zadeh. Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 4 / 20
6 Fuzzy clustering and fuzzy logic Fuzzy logic is an extension of Boolean logic, in which the concepts of true and false are replaced by that of partial truth. Let a 100 ml glass contain 30 ml of water and consider two concepts: empty and full. One person might define the glass as being 0.7 empty and 0.3 full. Another might design a membership function where the glass would be considered full for all values down to 50 ml. The connection between fuzzy cluster analysis and fuzzy logic is usually only through the application of membership functions, and not the more comprehensive theory. Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 5 / 20
7 Example of a membership function A fuzzy membership function for the verbal description of intelligence quotient (IQ). Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 6 / 20
8 2 Methods for fuzzy clustering Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 7 / 20
9 Fuzzy partitioning versus hard partitioning Given a data matrix X R n p of n observations measured on p variables, a partition (membership) matrix U = (u ik ) R n K is sought such that 1 u ik [0, 1] (i = 1,..., n; k = 1,..., K ); 2 K u ik = 1 k=1 (i = 1,..., n); 3 0 < n u ik < n (k = 1,..., K ). i=1 In hard partitioning, the first condition above would be replaced by u ik {0, 1} (i = 1,..., n; k = 1,..., K ). Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 8 / 20
10 Fuzzy k-means (fuzzy c-means) The fuzzy k-means algorithm minimizes K k=1 i=1 n u ν ik d 2 (x i, m k ), where m k is the centre of cluster k, d(x i, m k ) are Euclidean distances between the data point and the cluster centres, and ν is the fuzzifier that affects the membership distribution. The cluster centres are weighted cluster means: m k = n n u ν ik x i/ u ν ik. i=1 i=1 If the u ik are restricted to zero or one, the k-means method is obtained. Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 9 / 20
11 FANNY (Fuzzy analysis) Given a proximity matrix D = (d ij ) R n n and number of clusters K, the unknown membership strengths, u ik, are found by minimizing the objective function K k=1 n u ν ik uν jk d ij i,j=1 2 n u ν lk l=1, where ν is the membership exponent. The objective function is minimized subject to the nonnegativity and unit sum restrictions by using an iterative algorithm. FANNY only requires a proximity matrix and is relatively robust to nonspherical clusters. Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 10 / 20
12 3 The assessment of fuzzy clustering Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 11 / 20
13 The silhouette plot Assume the data have been clustered via any technique. For each datum i (i = 1,..., n), let a(i) be the average dissimilarity of i with all other data within the same cluster. Then find the average dissimilarity of i with the data of another single cluster. Repeat this for every cluster of which i is not a member. Denote the lowest average dissimilarity to i of any such cluster by b(i). The cluster with this lowest average dissimilarity is the neighbouring cluster of i as it is, aside from the cluster i is assigned, the cluster in which i fits best. Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 12 / 20
14 The silhouette plot / 2 Define s(i) = which can be written as s(i) = b(i) a(i) max{a(i), b(i)}, 1 a(i) b(i) : if a(i) < b(i) 0 : if a(i) = b(i). b(i) a(i) 1 : if a(i) > b(i) It holds that 1 s(i) 1 for all i = 1,..., n. An s(i) close to 1 means that the datum is appropriately clustered. If s(i) is close to 1, then it would be more appropriate if it was clustered in its neighbouring cluster. When the index is near zero it is not clear whether the object should have been assigned to its current cluster or its neighbouring cluster. Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 13 / 20
15 Silhouette plot of Iris data clusters n = clusters C j j : n j ave i Cj s i 1 : : : Average silhouette width : 0.57 Silhouette width s i Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 14 / 20
16 Dunn s partition coefficient Dunn s partition coefficient (Dunn, 1974) is a criterion for assessing the strength of membership. It can be computed as D K = n K i=1 k=1 u 2 ik n (D K [1/K, 1]). When normalized to lie in the range [0,1], it has the form K D K 1 K 1. Dunn s coefficient and silhouette plots give information to allow a number of clusters to be chosen so that a balance can be struck in the degree of fuzziness in different clusters. Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 15 / 20
17 4 Software for fuzzy clustering Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 16 / 20
18 User-written packages for the open-source software R cluster: Cluster Analysis (extended original from Peter J. Rousseeuw, Anja Struyf and Mia Hubert), written by Martin Maechler, Anja Struyf and Mia Hubert. This package is part of the base R distribution. Function fanny(). e1071: Miscellaneous functions of the Department of Statistics (e1071), TU Wien, written by David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch. Function cmeans(). Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 17 / 20
19 5 Example (R code: fcim-examples-03fuzzy.r) Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 18 / 20
20 6 Literature Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 19 / 20
21 Book Kaufman, L. and Rousseeuw, P. J. (2005): Finding Groups in Data: An Introduction to Cluster Analysis, Wiley. Chapter 4: Fuzzy Analysis (Program FANNY) Steffen Unkel c Sommersemester 2013 Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering 20 / 20
How To Solve The Cluster Algorithm
Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br
More informationFortgeschrittene Computerintensive Methoden: Finite Mixture Models Steffen Unkel Manuel Eugster, Bettina Grün, Friedrich Leisch, Matthias Schmid
Fortgeschrittene Computerintensive Methoden: Finite Mixture Models Steffen Unkel Manuel Eugster, Bettina Grün, Friedrich Leisch, Matthias Schmid Institut für Statistik LMU München Sommersemester 2013 Outline
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationPrototype-less Fuzzy Clustering
Prototype-less Fuzzy Clustering Christian Borgelt Abstract In contrast to standard fuzzy clustering, which optimizes a set of prototypes, one for each cluster, this paper studies fuzzy clustering without
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationThere are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
More informationA FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer
More informationTerritorial Analysis for Ratemaking. Philip Begher, Dario Biasini, Filip Branitchev, David Graham, Erik McCracken, Rachel Rogers and Alex Takacs
Territorial Analysis for Ratemaking by Philip Begher, Dario Biasini, Filip Branitchev, David Graham, Erik McCracken, Rachel Rogers and Alex Takacs Department of Statistics and Applied Probability University
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
More informationSteven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationK-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationA Novel Fuzzy Clustering Method for Outlier Detection in Data Mining
A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining Binu Thomas and Rau G 2, Research Scholar, Mahatma Gandhi University,Kerala, India. binumarian@rediffmail.com 2 SCMS School of Technology
More informationCluster analysis Cosmin Lazar. COMO Lab VUB
Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,
More informationA Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images
A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods
More informationSOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM
SOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM Mrutyunjaya Panda, 2 Manas Ranjan Patra Department of E&TC Engineering, GIET, Gunupur, India 2 Department
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationFuzzy Clustering Technique for Numerical and Categorical dataset
Fuzzy Clustering Technique for Numerical and Categorical dataset Revati Raman Dewangan, Lokesh Kumar Sharma, Ajaya Kumar Akasapu Dept. of Computer Science and Engg., CSVTU Bhilai(CG), Rungta College of
More informationA New Image Edge Detection Method using Quality-based Clustering. Bijay Neupane Zeyar Aung Wei Lee Woon. Technical Report DNA #2012-01.
A New Image Edge Detection Method using Quality-based Clustering Bijay Neupane Zeyar Aung Wei Lee Woon Technical Report DNA #2012-01 April 2012 Data & Network Analytics Research Group (DNA) Computing and
More informationPrototype-based classification by fuzzification of cases
Prototype-based classification by fuzzification of cases Parisa KordJamshidi Dep.Telecommunications and Information Processing Ghent university pkord@telin.ugent.be Bernard De Baets Dep. Applied Mathematics
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationData Mining and Soft Computing. Francisco Herrera
Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: herrera@decsai.ugr.es http://sci2s.ugr.es
More informationEvaluation of Croatian Development Strategies Using SWOT Analyses with Fuzzy TOPSIS Method and K-Means Methods
Evaluation of Croatian Development Strategies Using SWOT Analyses with Fuzzy TOPSIS Method and K-Means Methods Bruno Trstenjak, Andrijana Kos Kavran, and Ivana Bujan Abstract The purpose of this paper
More informationSTOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
Stock Asian-African Market Trends Journal using of Economics Cluster Analysis and Econometrics, and ARIMA Model Vol. 13, No. 2, 2013: 303-308 303 STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
More informationCustomer Segmentation and Customer Profiling for a Mobile Telecommunications Company Based on Usage Behavior
Customer Segmentation and Customer Profiling for a Mobile Telecommunications Company Based on Usage Behavior A Vodafone Case Study S.M.H. Jansen July 17, 2007 Acknowledgments This Master thesis was written
More informationCluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationSegmentation of stock trading customers according to potential value
Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research
More informationPROBABILISTIC DISTANCE CLUSTERING
PROBABILISTIC DISTANCE CLUSTERING BY CEM IYIGUN A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New Jersey in partial fulfillment of the requirements for
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationClustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationDetection of DDoS Attack Scheme
Chapter 4 Detection of DDoS Attac Scheme In IEEE 802.15.4 low rate wireless personal area networ, a distributed denial of service attac can be launched by one of three adversary types, namely, jamming
More informationCafcam: Crisp And Fuzzy Classification Accuracy Measurement Software
Cafcam: Crisp And Fuzzy Classification Accuracy Measurement Software Mohamed A. Shalan 1, Manoj K. Arora 2 and John Elgy 1 1 School of Engineering and Applied Sciences, Aston University, Birmingham, UK
More informationCLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES *
CLUSTERING LARGE DATA SETS WITH MIED NUMERIC AND CATEGORICAL VALUES * ZHEUE HUANG CSIRO Mathematical and Information Sciences GPO Box Canberra ACT, AUSTRALIA huang@cmis.csiro.au Efficient partitioning
More informationA MATLAB Toolbox and its Web based Variant for Fuzzy Cluster Analysis
A MATLAB Toolbox and its Web based Variant for Fuzzy Cluster Analysis Tamas Kenesei, Balazs Balasko, and Janos Abonyi University of Pannonia, Department of Process Engineering, P.O. Box 58, H-82 Veszprem,
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More informationCLUSTERING FOR FORENSIC ANALYSIS
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationHow to use the vegclust package (ver. 1.6.0)
How to use the vegclust package (ver. 1.6.0) Miquel De Cáceres 1 1 Centre Tecnològic Forestal de Catalunya. Ctra. St. Llorenç de Morunys km 2, 25280, Solsona, Catalonia, Spain February 18, 2013 Contents
More informationINTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS www.ijrcar.com
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 CLUSTERING APPROACH TOWARDS IMAGE SEGMENTATION: AN ANALYTICAL STUDY Dibya Jyoti Bora 1, Dr. Anil Kumar Gupta 2 1 Department
More informationStatistical Databases and Registers with some datamining
Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University
More informationIDENTIFICATION OF ORGANIZATIONAL METAPHORS IN BRAZILIAN COMPANIES USING FUZZY CLUSTERING
PS-558 IDENTIFICATION OF ORGANIZATIONAL METAPHORS IN BRAZILIAN COMPANIES USING FUZZY CLUSTERING Angel Cobo Ortega (Universidad de Cantabria, Spain) acobo@unican.es Rocío Rocha (Universidad de Cantabria,
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationFortgeschrittene Computerintensive Methoden
Fortgeschrittene Computerintensive Methoden Einheit 3: mlr - Machine Learning in R Bernd Bischl Matthias Schmid, Manuel Eugster, Bettina Grün, Friedrich Leisch Institut für Statistik LMU München SoSe 2014
More informationCLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
More informationData Mining K-Clustering Problem
Data Mining K-Clustering Problem Elham Karoussi Supervisor Associate Professor Noureddine Bouhmala This Master s Thesis is carried out as a part of the education at the University of Agder and is therefore
More informationAn Introduction to Cluster Analysis for Data Mining
An Introduction to Cluster Analysis for Data Mining 10/02/2000 11:42 AM 1. INTRODUCTION... 4 1.1. Scope of This Paper... 4 1.2. What Cluster Analysis Is... 4 1.3. What Cluster Analysis Is Not... 5 2. OVERVIEW...
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
More information. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
More informationFuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining
Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining by Ashish Mangalampalli, Vikram Pudi Report No: IIIT/TR/2008/127 Centre for Data Engineering International Institute of Information Technology
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationClassify then Summarize or Summarize then Classify
Classify then Summarize or Summarize then Classify DIMACS, Rutgers University Piscataway, NJ 08854 Workshop Honoring Edwin Diday held on September 4, 2007 What is Cluster Analysis? Software package? Collection
More informationStandardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
More informationUSING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva
382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent
More informationA FUZZY LOGIC APPROACH FOR SALES FORECASTING
A FUZZY LOGIC APPROACH FOR SALES FORECASTING ABSTRACT Sales forecasting proved to be very important in marketing where managers need to learn from historical data. Many methods have become available for
More informationThe influence of teacher support on national standardized student assessment.
The influence of teacher support on national standardized student assessment. A fuzzy clustering approach to improve the accuracy of Italian students data Claudio Quintano Rosalia Castellano Sergio Longobardi
More informationA Toolbox for Bicluster Analysis in R
Sebastian Kaiser and Friedrich Leisch A Toolbox for Bicluster Analysis in R Technical Report Number 028, 2008 Department of Statistics University of Munich http://www.stat.uni-muenchen.de A Toolbox for
More informationCluster Analysis: Basic Concepts and Algorithms
8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationHow To Understand The History Of Navigation In French Marine Science
E-navigation, from sensors to ship behaviour analysis Laurent ETIENNE, Loïc SALMON French Naval Academy Research Institute Geographic Information Systems Group laurent.etienne@ecole-navale.fr loic.salmon@ecole-navale.fr
More informationMobile Customer Clustering Analysis Based on Call Detail Records
Mobile Customer Clustering Analysis Based on Call Detail Records Qining Economics and Management School Beijing University of Posts and Telecommunications ABSTRACT Competition in the mobile telecommunications
More informationpaircompviz: An R Package for Visualization of Multiple Pairwise Comparison Test Results
paircompviz: An R Package for Visualization of Multiple Pairwise Comparison Test Results Michal Burda University of Ostrava Abstract The aim of this paper is to describe an R package for visualization
More informationSTATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239
STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationAn effective fuzzy kernel clustering analysis approach for gene expression data
Bio-Medical Materials and Engineering 26 (2015) S1863 S1869 DOI 10.3233/BME-151489 IOS Press S1863 An effective fuzzy kernel clustering analysis approach for gene expression data Lin Sun a,b,, Jiucheng
More informationClass-specific Sparse Coding for Learning of Object Representations
Class-specific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH Carl-Legien-Str. 30, 63073 Offenbach am Main, Germany
More informationK Thangadurai P.G. and Research Department of Computer Science, Government Arts College (Autonomous), Karur, India. ktramprasad04@yahoo.
Enhanced Binary Small World Optimization Algorithm for High Dimensional Datasets K Thangadurai P.G. and Research Department of Computer Science, Government Arts College (Autonomous), Karur, India. ktramprasad04@yahoo.com
More informationA Two-Step Method for Clustering Mixed Categroical and Numeric Data
Tamkang Journal of Science and Engineering, Vol. 13, No. 1, pp. 11 19 (2010) 11 A Two-Step Method for Clustering Mixed Categroical and Numeric Data Ming-Yi Shih*, Jar-Wen Jheng and Lien-Fu Lai Department
More informationMatrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.
Matrix Algebra A. Doerr Before reading the text or the following notes glance at the following list of basic matrix algebra laws. Some Basic Matrix Laws Assume the orders of the matrices are such that
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationEnhanced Customer Relationship Management Using Fuzzy Clustering
Enhanced Customer Relationship Management Using Fuzzy Clustering Gayathri. A, Mohanavalli. S Department of Information Technology,Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai,
More informationAn Ensemble Model for Day-ahead Electricity Demand Time Series Forecasting
An Ensemble Model for Day-ahead Electricity Demand Time Series Forecasting Wen Shen, Vahan Babushkin, Zeyar Aung, and Wei Lee Woon Masdar Institute of Science and Technology PO Box 54224, Abu Dhabi United
More informationData Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)
Data Mining 資 料 探 勘 Tamkang University 分 群 分 析 (Cluster Analysis) DM MI Wed,, (:- :) (B) Min-Yuh Day 戴 敏 育 Assistant Professor 專 任 助 理 教 授 Dept. of Information Management, Tamkang University 淡 江 大 學 資
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationData Mining 5. Cluster Analysis
Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationWhy? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
More informationFortgeschrittene Computerintensive Methoden
Fortgeschrittene Computerintensive Methoden Einheit 5: mlr - Machine Learning in R Bernd Bischl Matthias Schmid, Manuel Eugster, Bettina Grün, Friedrich Leisch Institut für Statistik LMU München SoSe 2015
More informationCalculation of Minimum Distances. Minimum Distance to Means. Σi i = 1
Minimum Distance to Means Similar to Parallelepiped classifier, but instead of bounding areas, the user supplies spectral class means in n-dimensional space and the algorithm calculates the distance between
More informationHow To Identify Noisy Variables In A Cluster
Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,
More informationMetamodeling by using Multiple Regression Integrated K-Means Clustering Algorithm
Metamodeling by using Multiple Regression Integrated K-Means Clustering Algorithm Emre Irfanoglu, Ilker Akgun, Murat M. Gunal Institute of Naval Science and Engineering Turkish Naval Academy Tuzla, Istanbul,
More informationTorgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances
Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationRandom graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
More informationComparison of Heterogeneous Probability Models for Ranking Data
Comparison of Heterogeneous Probability Models for Ranking Data Master Thesis Leiden University Mathematics Specialisation: Statistical Science Pieter Marcus Thesis Supervisors: Prof. dr. W. J. Heiser
More informationMath Review. for the Quantitative Reasoning Measure of the GRE revised General Test
Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important
More informationPackage MixGHD. June 26, 2015
Type Package Package MixGHD June 26, 2015 Title Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions Version 1.7 Date 2015-6-15 Author
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationVisualizing non-hierarchical and hierarchical cluster analyses with clustergrams
Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams Matthias Schonlau RAND 7 Main Street Santa Monica, CA 947 USA Summary In hierarchical cluster analysis dendrogram graphs
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationAn analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
More information