CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS
|
|
|
- Juliet Nichols
- 10 years ago
- Views:
Transcription
1 CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS Venkat Venkateswaran Department of Engineering and Science Rensselaer Polytechnic Institute 275 Windsor Street Hartford, CT USA (+1) [email protected] John Maleyeff Lally School of Management & Technology Rensselaer Polytechnic Institute 275 Windsor Street Hartford, CT USA (+1) [email protected] ABSTRACT A new classification approach is explored where service systems are grouped by dimensions of performance important to customers. Service systems are coded as binary vectors and Ward s Algorithm is used to group these systems into eight clusters, using the simple matching metric to measure distances between vectors. The resulting clusters were analyzed. Across the clusters, similar types of customers (i.e., internal vs. external) and similar process characteristics were evident. Hence, this clustering approach generates sets of services that differ from classifications based purely on process characteristics. The implications of this result for leaders of service innovation efforts are discussed. KEYWORDS: Cluster analysis, Service operations, Service marketing, Innovation INTRODUCTION Innovation is often accomplished by adapting ideas, processes, and techniques successful in one situation to solve problems or make improvements in a seemingly unrelated situation. With respect to service innovation, the challenge would be to identify hidden patterns that exist within multiple services, even those that appear unrelated. For example, emergency room trauma center teams of physicians, nurses, and technicians learned to effectively and quickly treat patients by incorporating methods used by pit crews at automobile racing events [1]. Preliminary results of an ongoing research project are presented below. This research attempts to create sets of services that would be deemed similar because, within each set, customers have similar needs. For example, both trauma center patients and automobile racers need fast and expert service, with little need for other dimensions of performance that customers of other services would consider important. Once the sets are created, their characteristics are explored to determine
2 whether or not leaders or innovation teams would gain a better understanding of how innovation could be achieved. BACKGROUND Prior work in classifying service systems is plentiful, most of it contained in the service marketing literature. For the sake of brevity, a very brief background is presented. Numerous attempts have been made to classify services in an effort to provide some understanding of the special challenges faced by service managers. A popular scheme separates services into four types: the service factory, the service shop, the mass services, and the professional service [2]. But it is not clear that the classification schemes offered in the past will be helpful to managers who wish to manage or improve customer satisfaction, because they tend to be based on the structure of the service rather than on the need or wants of customers. For example, Verma showed that only 4 of 22 important management challenges are affected by the differences in this classification scheme [2]. This research presented below uses a mathematical approach to cluster services based on the dimensions of performance deemed important by customers of each service. METHODOLOGY The approach to classifying services based on performance dimensions uses a binary vector clustering algorithm. The work began with the creation of a data set consisting of 168 services. Each service was analyzed by a professional employee of the organization who was very familiar with the activities associated with the delivery of the service and had access to customers of the service. The services selected were not random, but did consist of a cross section of various service types, albeit biased towards service contained within technologically sophisticated organizations. A mixture of customer types existed within the database. Many of the services were primarily for internal customers, many served external customers exclusively, and some served both internal and external customers. No single analyst studied more than one service. All of the analysts were working professionals, enrolled in a part-time graduate management program on the Hartford, Connecticut campus of Rensselaer Polytechnic Institute, in a course called Service Operations Management. Each analyst asked several customers of the service to list strengths and weaknesses of the process, and list key performance dimensions important to customers. The resulting reports followed a standard template that allowed for easy tabulation of key results. To ensure quality and consistency in the data, the authors studied the data generated from each report and at times modified the resulting list of performance dimensions. The resulting database that was input to the clustering algorithm consisted of 168 records and 9 fields, one field for each of 9 potential dimensions of performance. A binary code was used to signify whether or not the dimension was important to customers (1=important, 0=unimportant). The following dimensions were specified: (1) empathy (e.g., courtesy, professionalism); (2) knowledge (of service providers); (3) communication (providers with customers); (4) speed (e.g., responsiveness, turnaround time); (5) usefulness (e.g., comprehension, completeness, flexibility);
3 (6) quality (e.g., accuracy, consistency); (7) tangible (a physical good); (8) convenience (e.g., availability, ease); and (9) security (e.g., information, financial, personal). Clustering Algorithm The problem then becomes one of clustering 168 binary vectors (one per service) into groups containing like vectors. To do this, a metric must be developed to gauge the closeness of each pair of binary vectors. Several different distance metrics have been proposed in the literature. We have used the simple matching metric. This metric is described as follows: given two binary vectors V₁ and V₂, let B denote the number of digits where V₁ and V₂ agree. The intervector distance, D(V₁,V₂) is equal to 1-B/L. We note that 0 D(V₁,V₂) 1 and that D(V₁,V₂) is 0 when V₁ = V₂ and 1 when V₁ and V₂ are complements. Ward s Algorithm is a well-known and widely used algorithm for grouping binary vectors into clusters. We have used the version of this algorithm wherein the user specifies the target number of clusters. The algorithm is agglomerative and begins by placing each vector in its own separate cluster. Thus, in present case, the method began with 168 clusters. Then, clusters are successively merged in a systematic way until the requisite number of clusters is obtained. We next describe how clusters are selected for merging. The algorithm computes a medoid for every cluster. This is a member of the cluster (not necessarily unique) that has the smallest sum of distances (based on the simple matching metric) to other members [3]. Thus, a medoid is the binary vector analogous to the familiar centroid of a cluster of points on a plane. However, a medoid (unlike a centroid) is necessarily a member belonging to the cluster. Next, to determine which pair of clusters to merge, the algorithm considers all pairs for merging and selects the pair with least variance (calculated as the sum of squares of the distances averaged over the number of members in this tentative cluster). Any two clusters under consideration are temporarily merged, a medoid determined, and the sum of squares of distances to all members from this medoid computed. At each stage, in selecting pairs for merging with minimum variance, the algorithm seeks to merge clusters so that the resulting clusters are round (i.e., they have members that tend to be equally distant from the medoid of that cluster). The algorithm terminates when the requisite number of clusters has been generated. RESULTS After some trial and error, the target number of clusters was specified to be 8. This level of discrimination was chosen because fewer clusters appeared to contain dissimilar services and more clusters would provide a less than useful classification scheme. It is important to note that the clusters generated by Ward s Algorithm are known to be fairly immune to the ordering of the input data. The authors verified this characteristic by running the algorithm using a number reordered data sets. The numbers of services within each cluster group (numbered 1-8 in the tables that follow) were 13, 16, 16, 33, 32, 32, 11, and 12, respectively. Table 1 provides a summary of the 8 clusters by showing, for each cluster, the percentage of services that indicated each potential dimension as important to that service. In the table, the dimensions are abbreviated (Emp is empathy, Knw is
4 knowledge, Cmc is communication, Spd is speed, Use is usefulness, Qua is quality, Tan is tangibles, Cnv is convenience, and Sec is security). For example, the first row shows that, for the 13 services included within Cluster #1, each had empathy as an important dimension, 11 of the 13 services (85%) had knowledge as an important dimension, none of the 13 services had communication as an important dimension, etc. Table 1: Clusters and Associated Dimensions Cluster Emp Knw Cmc Spd Use Qua Tan Cnv Sec 1 100% 85% 0% 100% 69% 92% 8% 100% 8% 2 19% 0% 0% 88% 0% 94% 13% 94% 0% 3 81% 100% 100% 100% 31% 88% 13% 88% 0% 4 6% 21% 70% 94% 88% 94% 6% 100% 6% 5 13% 34% 0% 91% 84% 94% 25% 0% 0% 6 16% 53% 100% 97% 84% 100% 22% 0% 3% 7 9% 45% 9% 100% 100% 73% 100% 100% 0% 8 75% 67% 8% 100% 0% 100% 58% 8% 0% To explore the usefulness of the resulting classification scheme, and to compare this scheme to a scheme based on process characteristics alone, a number of statistical analyses were performed. Perhaps the most important of these analyses compared the clusters with another classification scheme that was based on process characteristics, rather than customer dimensions. Details on this scheme may be obtained from the authors. In Table 2, the process-oriented classifications are abbreviated (A=analysis, C=consultation, E=evaluation, G=gathering, P=planning, and T=troubleshooting). For example, of the 13 services contained in Cluster #1, one service was classified as an analysis process, 4 services were classified as a consultation process, one service was classified as an evaluation process, etc. As implied by the diversity of process types within each cluster and supported by a chi-square statistical analysis, no relationship was evident between these two classification schemes (p=0.235). An example of two similar processes that were assigned to different clusters will help to explain this result. This process was one that involved the testing of material. The algorithm assigned one testing process to cluster 4 and a second testing process to cluster 5. Both testing services included quality, speed, and usefulness as important dimensions, but the service classified in cluster 4 also listed convenience and communication. Therefore, the material testing service assigned to cluster 5 had customers who expected more interaction with the service provider than did the material testing service assigned to cluster 4. Table 2: Clusters and Associated Service Process Classification Cluster A C E G P T
5 Table 3 shows the fraction of services in each cluster whose customers were primarily internal or primarily external, and the average number of functions through which the service flowed in each cluster. For example, 53.8% (7 of the 13) of the services in Cluster #1 served primarily internal customers and 46.2% (6 of the 13) of the services in Cluster #1 served primarily external customers. In some clusters, some services served internal and external customer in about equal measure. In these cases, the internal and external fractions will not add to one. Also, in cluster #1, an average of 5.2 departments or functions that took part in delivering the service. An analysis of variance concluded that the number of functions did not vary across clusters (p=0.164). A chi-square analysis showed that the prevalence of internal or external customers did not vary across clusters at a 5% level of significance (p=0.093). Significance at a 10% level for this test may indicate that a statistically significant, but weak in magnitude, relationship exists relative to the prevalence of internal customers across clusters. Table 3: Clusters and Characteristics Cluster Internal External Functions IMPLICATIONS The main result of this exploratory investigation is that a difference exists between a classification scheme based on process characteristics and a scheme based on customer preferences. This result has implications for leaders of service improvement or service innovation teams. It also supports an earlier conclusion by Maleyeff [5] who argued that, based on characteristics unique to service systems, improvement efforts should start by focusing on the information being provided to customers of interval services rather than the physical manifestation of that information. For example, he suggests that rather than focus an improvement project on speeding up the flow of a payment invoice, project teams should first ensure that the information contained on the invoice is useful, clearly printed, unambiguous in meaning, and accurate. A secondary implication could be stated as a word of caution to leaders who may focus the improvement or innovation of services based exclusively on process improvements alone. Many service improvement methodologies, such as those contained in the Lean Six Sigma toolbox [6], are process-based, such as mistake proofing, process standardization, or visual workflow control. For example, it would appear that a dimension such as empathy may be ignored by these project teams. In the case of the emergency room trauma team learning from pit crews, perhaps the
6 innovation was successful because the customers need have similar dimensions (e.g., speed and competency). FUTURE WORK This research has some limitations. Because binary data is much less powerful than continuous data, perhaps a similar analysis that incorporated dimensions measured on a continuous scale should be undertaken. The precision and reliability of the data used here can also be questioned, due to the multiple analysts and the potential for mischaracterization of customer preferences. This limitation can easily be overcome in future analyzes. Future research could also investigate if these other well-known metrics (besides the simple matching metric) would generate clusters similar to the clusters obtained above. Finally, a more thorough analysis of best number of clusters may prove useful. REFERENCES [1] Nicholson, Kieran, Hospital teams find vroom to improve by changing race-car tires. Denver Post, April 16, 2004, p. B1. [2] Verma, Rohit, An empirical analysis of management challenges in service factories, service shops, mass services and professional services. International Journal of Service Industry Management, 2000, 11(1), [3] Guralnik, V. and Karypis, G., "A Scalable Algorithm for Clustering Protein Sequences." in Workshop on Data Mining in Bioinformatics, 2001, [4] Luke, Brian T., Agglomerative Linkages. [5] Maleyeff, John, Exploration of Internal Service Systems using Lean Principles. Management Decision, 2006, 44(5), [6] Maleyeff, John, Improving Service Delivery in Government Using Lean Six Sigma. IBM Center for The Business of Government, Washington, DC, 2007.
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
0.1 What is Cluster Analysis?
Cluster Analysis 1 2 0.1 What is Cluster Analysis? Cluster analysis is concerned with forming groups of similar objects based on several measurements of different kinds made on the objects. The key idea
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Distances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
Chapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Unsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013
FACTORING CRYPTOSYSTEM MODULI WHEN THE CO-FACTORS DIFFERENCE IS BOUNDED Omar Akchiche 1 and Omar Khadir 2 1,2 Laboratory of Mathematics, Cryptography and Mechanics, Fstm, University of Hassan II Mohammedia-Casablanca,
Movie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India [email protected] Saheb Motiani
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Information Architecture Planning Template for Health, Safety, and Environmental Organizations
Environmental Conference September 18-20, 2005 The Fairmont Hotel Information Architecture Planning Template for Health, Safety, and Environmental Organizations Presented By: Alan MacGregor ENVIRON International
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen
Summary Data Mining & Process Mining (1BM46) Made by S.P.T. Ariesen Content Data Mining part... 2 Lecture 1... 2 Lecture 2:... 4 Lecture 3... 7 Lecture 4... 9 Process mining part... 13 Lecture 5... 13
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
Hierarchical Cluster Analysis Some Basics and Algorithms
Hierarchical Cluster Analysis Some Basics and Algorithms Nethra Sambamoorthi CRMportals Inc., 11 Bartram Road, Englishtown, NJ 07726 (NOTE: Please use always the latest copy of the document. Click on this
Notes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Strategic Online Advertising: Modeling Internet User Behavior with
2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew
EXPERIMENTAL ERROR AND DATA ANALYSIS
EXPERIMENTAL ERROR AND DATA ANALYSIS 1. INTRODUCTION: Laboratory experiments involve taking measurements of physical quantities. No measurement of any physical quantity is ever perfectly accurate, except
Measurement Information Model
mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides
Multiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 [email protected] What is Learning? "Learning denotes changes in a system that enable
Novel Automatic PCB Inspection Technique Based on Connectivity
Novel Automatic PCB Inspection Technique Based on Connectivity MAURO HIROMU TATIBANA ROBERTO DE ALENCAR LOTUFO FEEC/UNICAMP- Faculdade de Engenharia Elétrica e de Computação/ Universidade Estadual de Campinas
Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch
Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global
Trends in Interdisciplinary Dissertation Research: An Analysis of the Survey of Earned Doctorates
Trends in Interdisciplinary Dissertation Research: An Analysis of the Survey of Earned Doctorates Working Paper NCSES 12-200 April 2012 by Morgan M. Millar and Don A. Dillman 1 Disclaimer and Acknowledgments
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
Teaching Multivariate Analysis to Business-Major Students
Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
Available online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science
Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Procedia Engineering Engineering 00 (2011) 15 (2011) 000 000 1822 1826 Procedia Engineering www.elsevier.com/locate/procedia
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
SECTION 1-6 Quadratic Equations and Applications
58 Equations and Inequalities Supply the reasons in the proofs for the theorems stated in Problems 65 and 66. 65. Theorem: The complex numbers are commutative under addition. Proof: Let a bi and c di be
Cluster Analysis using R
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other
CLUSTERING FOR FORENSIC ANALYSIS
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
Feature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.
Time series clustering and the analysis of film style
Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such
KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
Marketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
How To Identify Noisy Variables In A Cluster
Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,
A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities
A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities The first article of this series presented the capability model for business analytics that is illustrated in Figure One.
Microsoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
Lean Certification Program Blended Learning Program Cost: $5500. Course Description
Lean Certification Program Blended Learning Program Cost: $5500 Course Description Lean Certification Program is a disciplined process improvement approach focused on reducing waste, increasing customer
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
A comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
Offline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]
Analysis of JOLTS Research Estimates by Size of Firm
Analysis of JOLTS Research Estimates by Size of Firm Katherine Bauer Klemmer 1 1 U.S. Bureau of Labor Statistics, 2 Massachusetts Ave. NE, Washington DC 2212 Abstract The Job Openings and Labor Turnover
