0.1 What is Cluster Analysis?
|
|
|
- Morgan Eaton
- 9 years ago
- Views:
Transcription
1 Cluster Analysis 1
2 2 0.1 What is Cluster Analysis? Cluster analysis is concerned with forming groups of similar objects based on several measurements of different kinds made on the objects. The key idea is to identify classifications of the objects that would be useful for the aims of the analysis. This idea has been applied in many areas including astronomy, archeology, medicine, chemistry, education, psychology, linguistics and sociology. For example, biological sciences have made extensive use of classes and sub-classes to organize species. A spectacular success of the clustering idea in chemistry was Mendelev s periodic table of the elements. In marketing and political forecasting, clustering of neighborhoods using US postal Zip codes has been used successfully to group neighborhoods by lifestyles. Claritas, a company that pioneered this approach grouped neighborhoods into 40 clusters using various measures of consumer expenditure and demographics. Examining the clusters enabled Claritas to come up with evocative names, such as Bohemian Mix, Furs and Station Wagons and Money and Brains, for the groups that captured the dominant lifestyles in the neighborhoods. Knowledge of lifestyles can be used to estimate the potential demand for products such as sports utility vehicles and services such as pleasure cruises. The objective of this chapter is to help you to understand the key ideas underlying the most commonly used techniques for cluster analysis and to appreciate their strengths and weaknesses. We cannot aspire to be comprehensive as there are literally hundreds of methods (there is even a journal dedicated to clustering ideas: The Journal of Classification!). Typically, the basic data used to form clusters is a table of measurements on several variables where each column represents a variable and a row represents an object often referred to in statistics as a case. Thus the set of rows are to be grouped so that similar cases are in the same group. The number of groups may be specified or has to be determined from the data. 0.2 Example 1: Public Utilities Data Table 1.1 below gives corporate data on 22 US public utilities. We are interested in forming groups of similar utilities. The objects to be clustered are the utilities. There are 8 measurements on each utility described in Table 1.2. An example where clustering would be useful is a study to predict the cost impact of deregulation. To do the requisite analysis economists would need to build a detailed cost model of the various utilities. It would save a considerable amount of time and effort if we could cluster similar types of
3 Sec. 0.3 Clustering Algorithms 3 utilities and to build detailed cost models for just one typical utility in each cluster and then scaling up from these models to estimate results for all utilities. The objects to be clustered are the utilities and there are 8 measurements on each utility. Before we can use any technique for clustering we need to define a measure for distances between utilities so that similar utilities are a short distance apart and dissimilar ones are far from each other. A popular distance measure based on variables that take on continuous values is to standardize the values by dividing by the standard deviation (sometimes other measures such as range are used) and then to compute the distance between objects using the Euclidean metric. The Euclidean distance d ij between two cases, i and j with variable values (x i1,x i2,...,x ip ) and (x j1,x j2,...,x jp ) is defined by: d ij = (x i1 x j1 ) 2 +(x i2 x j2 ) 2 + +(x ip x jp ) 2 All our variables are continuous in this example, so we compute distances using this metric. The result of the calculations is given in Table 1.2 below. If we felt that some variables should be given more importance than others we would modify the squared difference terms by multiplying them by weights (positive numbers adding up to one) and use larger weights for the important variables. The Weighted Euclidean distance measure is given by: d ij = w 1 (x i1 x j1 ) 2 + w 2 (x i2 x j2 ) w p (x ip x jp ) 2 where w 1,w 2,...,w p are the weights for variables 1, 2,...,p so that w i 0, w i =1. i=1 0.3Clustering Algorithms A large number of techniques have been proposed for forming clusters from distance matrices. The most important types are hierarchical techniques, optimization techniques and mixture models. We discuss the first two types here. We will discuss mixture models in a separate note that includes their use in classification and regression as well as clustering Hierarchical Methods There are two major types of hierarchical techniques: divisive and agglomerative. Agglomerative hierarchical techniques are the more commonly used. The
4 4 No. Company X1 X2 X3 X4 X5 X6 X7 X8 1 Arizona Public Service Boston Edison Company Central Louisiana Electric Co Commonwealth Edison Co Consolidated Edison Co. (NY) Florida Power and Light Hawaiian Electric Co Idaho Power Co Kentucky Utilities Co Madison Gas & Electric Co Nevada Power Co New England Electric Co Northern States Power Co Oklahoma Gas and Electric Co Pacific Gas & Electric Co Puget Sound Power & Light Co San Diego Gas & Electric Co The Southern Co Texas Utilities Co Wisconsin Electric Power Co United Illuminating Co Virginia Electric & Power Co Table 1: Public Utilities Data. X1: Fixed-charge covering ratio (income/debt) X2: Rate of return on capital X3: Cost per KW capacity in place X4: Annual Load Factor X5: Peak KWH demand growth from 1974 to 1975 X6: Sales (KWH use per year) X7: Percent Nuclear X8: Total fuel costs (cents per KWH) Table 2: Explanation of variables.
5 Sec. 0.3 Clustering Algorithms Table 3: Distances based on standardized variable values. idea behind this set of techniques is to start with each cluster comprising of exactly one object and then progressively agglomerating (combining) the two nearest clusters until there is just one cluster left consisting of all the objects. Nearness of clusters is based on a measure of distance between clusters. All agglomerative methods require as input a distance measure between all the objects that are to be clustered. This measure of distance between objects is mapped into a metric for the distance between clusters (sets of objects) metrics for the distance between two clusters. The only difference between the various agglomerative techniques is the way in which this inter-cluster distance metric is defined. The most popular agglomerative techniques are: 1. Nearest neighbor (also called single linkage). Here the distance between two clusters is defined as the distance between the nearest pair of objects with one object in the pair belonging to a distinct cluster. If cluster A is the set of objects A1,A2,...Am and cluster B is B1,B2,...Bn the single linkage distance between A and B is Min(distance(Ai, Bj) i = 1, 2...m; j = 1, 2...n). This method has a tendency to cluster together at an early stage objects that are distant from each other in the same cluster because of a chain of intermediate objects in the same cluster. Such clusters have elongated sausage-like shapes when visualized as objects in space. 2. Farthest neighbor (also called complete linkage). Here the distance between two clusters is defined as the distance between the far-
6 6 thest pair of objects with one object in the pair belonging to a distinct cluster. If cluster A is the set of objects A1,A2,...Am and cluster B is B1,B2,...Bn the single linkage distance between A and B is Max(distance(Ai, Bj) i =1, 2...m; j =1, 2...n). This method tends to produce clusters at the early stages that have objects that are within a narrow range of distances from each other. If we visualize them as objects in space the objects in such clusters would have a more spherical shape. 3. Group average (also called average linkage). Here the distance between two clusters is defined as the average distance between all possible pairs of objects with one object in each pair belonging to a distinct cluster. If cluster A is the set of objects A1,A2,...Am and cluster B is B1,B2,...Bn the single linkage distance between A and B is (1/mn)Σdistance(Ai, Bj) the sum being taken over i = 1, 2...mandj = 1, 2...n. Note that the results of the single linkage and the complete linkage methods depend only on the order of the inter-object distances and so are invariant to monotonic transformations of the inter-object distances. The nearest neighbor clusters for the utilities are displayed in Figure 1 below in a useful graphic format called a dendogram. For any given number of clusters we can determine the cases in the clusters by sliding a vertical line from left to right until the number of horizontal intersections of the vertical line equals the desired number of clusters. For example, if we wanted to form 6 clusters we would find that the clusters are: {1, 18, 14, 19, 9, 10, 13, 4, 20, 2, 12, 21, 7, 15, 22, 6}; {3}; {8, 16}; {17}; {11}; and {5}. Notice that if we wanted 5 clusters they would be the same as for six with the exception that the first two clusters above would be merged into one cluster. In general all hierarchical methods have clusters that are nested within each other as we decrease the number of clusters we desire. The average linkage dendogram is shown in Figure 2. If we want sixclusters using average linkage, they would be: {1, 18, 14, 19, 6, 3, 9}; {2, 22, 4, 20, 10, 13}; {12, 21, 7, 15}; {17}; {5}; {8, 16, 11}. Notice that both methods identify {5} and {17} as small ( individualistic ) clusters. The clusters tend to group geographically for example there is a southern group {1, 18, 14, 19, 6, 3, 9}, a east/west seaboard group: {12, 21, 7, 15}.
7 Sec. 0.3 Clustering Algorithms 7 Figure1: Dendogram - Single Linkage
8 8 Figure2: Dendogram - Average Linkage between groups Similarity Measures Sometimes it is more natural or convenient to work with a similarity measure between cases rather than distance which measures dissimilarity. An example is the square of the correlation coefficient, r2 ij, defined by (x im x m )(x jm x m ) 2 m=1 r ij (x im x m ) 2 (x jm x m ) 2 m=1 m=1 Such measures can always be converted to distance measures. In the above example we could define a distance measure d ij =1 r 2 ij.
9 Sec. 0.3 Clustering Algorithms 9 However, in the case of binary values of x it is more intuitively appealing to use similarity measures. Suppose we have binary values for all the x ij s and for individuals i and j we have the following 2 2 table: Individual j 0 1 Individual i 0 a b a + b 1 c d c + d a + c b + d p The most useful similarity measures in this situation are: The matching coefficient, (a + d)/p Jaquard s coefficient, d/(b + c + d). This coefficient ignores zero matches. This is desirable when we do not want to consider two individuals to be similar simply because they both do not have a large number of characteristics. When the variables are mixed a similarity coefficient suggested by Gower is very useful. It is defined as s ij = w ijm s ijm m=1 w ijm m=1 with w ijm = 1 subject to the following rules: w ijm = 0 when the value of the variable is not known for one of the pair of individuals or to binary variables to remove zero matches. For non-binary categorical variables s ijm = 0 unless the individuals are in the same category in which case s ijm =1 For continuous variables s ijm = 1 x im x jm /((max(xm) - min(xm)) Other distance measures Two useful measures of disimmilarity other than the Euclidean distance that satisfy the triangular inequality and so qualify as distance metrics are:
10 10 Mahalanobis distance defined by d ij = (x i x j ) S 1 (x i x j ) where x i and x j are p-dimensional vectors of the variable values for i and j respectively; and S is the covariance matrixfor these vectors. This measure takes into account the correlation between the variable: variables that are highly correlated with other variables do not contribute as much as variables that are uncorrelated or mildly correlated. Manhattan distance defined by d ij = x im x jm m=1 Maximum co-ordinate distance defined by max d ij = m = 1, 2,...,p x im x jm
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Hierarchical Cluster Analysis Some Basics and Algorithms
Hierarchical Cluster Analysis Some Basics and Algorithms Nethra Sambamoorthi CRMportals Inc., 11 Bartram Road, Englishtown, NJ 07726 (NOTE: Please use always the latest copy of the document. Click on this
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering
Data Mining 5. Cluster Analysis
Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables
There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
National Price Rankings
National Price Rankings The links below provide the latest available electric service price data from a July 1, 2010 survey of more than 150 investor-owned utilities. Data source: Edison Electric Institute.
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Chapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
Unsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
CLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS
CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS Venkat Venkateswaran Department of Engineering and Science Rensselaer Polytechnic Institute 275 Windsor Street Hartford,
Investor-Owned Utility Ratio Comparisons
Investor-Owned Utility Ratio Comparisons 2002 Data Published March 2004 2301 M Street NW Washington, D.C. 20037-1484 202/467-2900 Investor-Owned Utility Ratio Comparisons, 2002 Data APPA provides several
Multiple regression - Matrices
Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
Principal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
Cluster analysis Cosmin Lazar. COMO Lab VUB
Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat
HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Computer program review
Journal of Vegetation Science 16: 355-359, 2005 IAVS; Opulus Press Uppsala. - Ginkgo, a multivariate analysis package - 355 Computer program review Ginkgo, a multivariate analysis package Bouxin, Guy Haute
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems
Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
Deregulation, Consolidation, and Efficiency: Evidence from U.S. Nuclear Power
ONLINE DATA APPENDIX Deregulation, Consolidation, and Efficiency: Evidence from U.S. Nuclear Power Lucas W. Davis UC Berkeley and NBER Catherine Wolfram UC Berkeley and NBER February 2012 This study is
Dimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
How To Understand Multivariate Models
Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
Tutorial Segmentation and Classification
MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel
National Heavy Duty Truck Transportation Efficiency Macroeconomic Impact Analysis
National Heavy Duty Truck Transportation Efficiency Macroeconomic Impact Analysis Prepared for the: Union of Concerned Scientists 2397 Shattuck Ave., Suite 203 Berkeley, CA 94704 Prepared by: Marshall
Cluster Analysis using R
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other
Manifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
Dental School Additional Required Courses Job Shadowing/ # of hours Alabama
Dental Schools with Math and/or Advanced Science Requirements for the 2015 Application Cycle For nearly all U.S. dental schools, the minimum required science courses for admission include one year each
An Introduction to Point Pattern Analysis using CrimeStat
Introduction An Introduction to Point Pattern Analysis using CrimeStat Luc Anselin Spatial Analysis Laboratory Department of Agricultural and Consumer Economics University of Illinois, Urbana-Champaign
SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID
SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID Renewable Energy Laboratory Department of Mechanical and Industrial Engineering University of
Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and
An Introduction to Cluster Analysis for Data Mining
An Introduction to Cluster Analysis for Data Mining 10/02/2000 11:42 AM 1. INTRODUCTION... 4 1.1. Scope of This Paper... 4 1.2. What Cluster Analysis Is... 4 1.3. What Cluster Analysis Is Not... 5 2. OVERVIEW...
CHAPTER 6 DATA MINING
Camm, Cochran, Fry, Ohlmann 1 Chapter 6 CHAPTER 6 DATA MINING CONTENTS 6.1 DATA SAMPLING 6.2 DATA PREPARATION Treatment of Missing Data Identification of Erroneous Data and Outliers Variable Representation
Social Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
Distances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Introduction to Multivariate Analysis
Introduction to Multivariate Analysis Lecture 1 August 24, 2005 Multivariate Analysis Lecture #1-8/24/2005 Slide 1 of 30 Today s Lecture Today s Lecture Syllabus and course overview Chapter 1 (a brief
Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen
Summary Data Mining & Process Mining (1BM46) Made by S.P.T. Ariesen Content Data Mining part... 2 Lecture 1... 2 Lecture 2:... 4 Lecture 3... 7 Lecture 4... 9 Process mining part... 13 Lecture 5... 13
Personalized Hierarchical Clustering
Personalized Hierarchical Clustering Korinna Bade, Andreas Nürnberger Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany {kbade,nuernb}@iws.cs.uni-magdeburg.de
How To Solve The Cluster Algorithm
Cluster Algorithms Adriano Cruz [email protected] 28 de outubro de 2013 Adriano Cruz [email protected] () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz [email protected]
Graphical Representation of Multivariate Data
Graphical Representation of Multivariate Data One difficulty with multivariate data is their visualization, in particular when p > 3. At the very least, we can construct pairwise scatter plots of variables.
The Williams Capital Group, L.P. Presentation to National Association of Regulatory Utility Commissioners Summer Committee Meetings.
Presentation to National Association of Regulatory Utility Commissioners Summer Committee Meetings July 18, 2010 WCG Co Manager relationships with Power and Gas utilities Fixed Income Underwriting (Lead
Using Library Dependencies for Clustering
Using Library Dependencies for Clustering Jochen Quante Software Engineering Group, FB03 Informatik, Universität Bremen [email protected] Abstract: Software clustering is an established approach
Between 1986 and 2010, homeowners and renters. A comparison of 25 years of consumer expenditures by homeowners and renters.
U.S. BUREAU OF LABOR STATISTICS OCTOBER 2012 VOLUME 1 / NUMBER 15 A comparison of 25 years of consumer expenditures by homeowners and renters Author: Adam Reichenberger, Consumer Expenditure Survey Between
Digital Imaging and Multimedia. Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University
Digital Imaging and Multimedia Filters Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines What are Filters Linear Filters Convolution operation Properties of Linear Filters Application
Variable Reduction for Predictive Modeling with Clustering Robert Sanche, and Kevin Lonergan, FCAS
Variable Reduction for Predictive Modeling with Clustering Robert Sanche, and Kevin Lonergan, FCAS Abstract Motivation. Thousands of variables are contained in insurance data warehouses. In addition, external
The Assignment Problem and the Hungarian Method
The Assignment Problem and the Hungarian Method 1 Example 1: You work as a sales manager for a toy manufacturer, and you currently have three salespeople on the road meeting buyers. Your salespeople are
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Information Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Movie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India [email protected] Saheb Motiani
STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239
STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent
Cluster Analysis. Examples. Chapter
Cluster Analysis Chapter 16 Identifying groups of individuals or objects that are similar to each other but different from individuals in other groups can be intellectually satisfying, profitable, or sometimes
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition
Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd
Cluster Analysis: Basic Concepts and Algorithms
Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths
Question 2: How do you solve a matrix equation using the matrix inverse?
Question : How do you solve a matrix equation using the matrix inverse? In the previous question, we wrote systems of equations as a matrix equation AX B. In this format, the matrix A contains the coefficients
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
2015 National Utilization and Compensation Survey Report. Section 3 Billing Rates. Based on Data Collected: 4 th Quarter 2014
2015 National Utilization and Compensation Survey Report Section 3 Billing s Based on Data Collected: 4 th Quarter Copyright 2015 Reproduction of this report or portions thereof without express written
Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
Cluster Analysis. Aims and Objectives. What is Cluster Analysis? How Does Cluster Analysis Work? Postgraduate Statistics: Cluster Analysis
Aims and Objectives By the end of this seminar you should: Cluster Analysis Have a working knowledge of the ways in which similarity between cases can be quantified (e.g. single linkage, complete linkage
Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
$6,519,000. 6.75% CAP Rate. DeerfieldPartners. John Giordani Art Griffith (415) 685-3035 [email protected]
$6,519,000 6.75% CAP Rate Proven performing Walgreens Reported sales are well above average Neighboring retailers include CVS/pharmacy, McDonalds, AutoZone, Dairy Queen 5+ years firm term remain on true
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
A comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
