CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS


 Juliet Nichols
 2 years ago
 Views:
Transcription
1 CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS Venkat Venkateswaran Department of Engineering and Science Rensselaer Polytechnic Institute 275 Windsor Street Hartford, CT USA (+1) John Maleyeff Lally School of Management & Technology Rensselaer Polytechnic Institute 275 Windsor Street Hartford, CT USA (+1) ABSTRACT A new classification approach is explored where service systems are grouped by dimensions of performance important to customers. Service systems are coded as binary vectors and Ward s Algorithm is used to group these systems into eight clusters, using the simple matching metric to measure distances between vectors. The resulting clusters were analyzed. Across the clusters, similar types of customers (i.e., internal vs. external) and similar process characteristics were evident. Hence, this clustering approach generates sets of services that differ from classifications based purely on process characteristics. The implications of this result for leaders of service innovation efforts are discussed. KEYWORDS: Cluster analysis, Service operations, Service marketing, Innovation INTRODUCTION Innovation is often accomplished by adapting ideas, processes, and techniques successful in one situation to solve problems or make improvements in a seemingly unrelated situation. With respect to service innovation, the challenge would be to identify hidden patterns that exist within multiple services, even those that appear unrelated. For example, emergency room trauma center teams of physicians, nurses, and technicians learned to effectively and quickly treat patients by incorporating methods used by pit crews at automobile racing events [1]. Preliminary results of an ongoing research project are presented below. This research attempts to create sets of services that would be deemed similar because, within each set, customers have similar needs. For example, both trauma center patients and automobile racers need fast and expert service, with little need for other dimensions of performance that customers of other services would consider important. Once the sets are created, their characteristics are explored to determine
2 whether or not leaders or innovation teams would gain a better understanding of how innovation could be achieved. BACKGROUND Prior work in classifying service systems is plentiful, most of it contained in the service marketing literature. For the sake of brevity, a very brief background is presented. Numerous attempts have been made to classify services in an effort to provide some understanding of the special challenges faced by service managers. A popular scheme separates services into four types: the service factory, the service shop, the mass services, and the professional service [2]. But it is not clear that the classification schemes offered in the past will be helpful to managers who wish to manage or improve customer satisfaction, because they tend to be based on the structure of the service rather than on the need or wants of customers. For example, Verma showed that only 4 of 22 important management challenges are affected by the differences in this classification scheme [2]. This research presented below uses a mathematical approach to cluster services based on the dimensions of performance deemed important by customers of each service. METHODOLOGY The approach to classifying services based on performance dimensions uses a binary vector clustering algorithm. The work began with the creation of a data set consisting of 168 services. Each service was analyzed by a professional employee of the organization who was very familiar with the activities associated with the delivery of the service and had access to customers of the service. The services selected were not random, but did consist of a cross section of various service types, albeit biased towards service contained within technologically sophisticated organizations. A mixture of customer types existed within the database. Many of the services were primarily for internal customers, many served external customers exclusively, and some served both internal and external customers. No single analyst studied more than one service. All of the analysts were working professionals, enrolled in a parttime graduate management program on the Hartford, Connecticut campus of Rensselaer Polytechnic Institute, in a course called Service Operations Management. Each analyst asked several customers of the service to list strengths and weaknesses of the process, and list key performance dimensions important to customers. The resulting reports followed a standard template that allowed for easy tabulation of key results. To ensure quality and consistency in the data, the authors studied the data generated from each report and at times modified the resulting list of performance dimensions. The resulting database that was input to the clustering algorithm consisted of 168 records and 9 fields, one field for each of 9 potential dimensions of performance. A binary code was used to signify whether or not the dimension was important to customers (1=important, 0=unimportant). The following dimensions were specified: (1) empathy (e.g., courtesy, professionalism); (2) knowledge (of service providers); (3) communication (providers with customers); (4) speed (e.g., responsiveness, turnaround time); (5) usefulness (e.g., comprehension, completeness, flexibility);
3 (6) quality (e.g., accuracy, consistency); (7) tangible (a physical good); (8) convenience (e.g., availability, ease); and (9) security (e.g., information, financial, personal). Clustering Algorithm The problem then becomes one of clustering 168 binary vectors (one per service) into groups containing like vectors. To do this, a metric must be developed to gauge the closeness of each pair of binary vectors. Several different distance metrics have been proposed in the literature. We have used the simple matching metric. This metric is described as follows: given two binary vectors V₁ and V₂, let B denote the number of digits where V₁ and V₂ agree. The intervector distance, D(V₁,V₂) is equal to 1B/L. We note that 0 D(V₁,V₂) 1 and that D(V₁,V₂) is 0 when V₁ = V₂ and 1 when V₁ and V₂ are complements. Ward s Algorithm is a wellknown and widely used algorithm for grouping binary vectors into clusters. We have used the version of this algorithm wherein the user specifies the target number of clusters. The algorithm is agglomerative and begins by placing each vector in its own separate cluster. Thus, in present case, the method began with 168 clusters. Then, clusters are successively merged in a systematic way until the requisite number of clusters is obtained. We next describe how clusters are selected for merging. The algorithm computes a medoid for every cluster. This is a member of the cluster (not necessarily unique) that has the smallest sum of distances (based on the simple matching metric) to other members [3]. Thus, a medoid is the binary vector analogous to the familiar centroid of a cluster of points on a plane. However, a medoid (unlike a centroid) is necessarily a member belonging to the cluster. Next, to determine which pair of clusters to merge, the algorithm considers all pairs for merging and selects the pair with least variance (calculated as the sum of squares of the distances averaged over the number of members in this tentative cluster). Any two clusters under consideration are temporarily merged, a medoid determined, and the sum of squares of distances to all members from this medoid computed. At each stage, in selecting pairs for merging with minimum variance, the algorithm seeks to merge clusters so that the resulting clusters are round (i.e., they have members that tend to be equally distant from the medoid of that cluster). The algorithm terminates when the requisite number of clusters has been generated. RESULTS After some trial and error, the target number of clusters was specified to be 8. This level of discrimination was chosen because fewer clusters appeared to contain dissimilar services and more clusters would provide a less than useful classification scheme. It is important to note that the clusters generated by Ward s Algorithm are known to be fairly immune to the ordering of the input data. The authors verified this characteristic by running the algorithm using a number reordered data sets. The numbers of services within each cluster group (numbered 18 in the tables that follow) were 13, 16, 16, 33, 32, 32, 11, and 12, respectively. Table 1 provides a summary of the 8 clusters by showing, for each cluster, the percentage of services that indicated each potential dimension as important to that service. In the table, the dimensions are abbreviated (Emp is empathy, Knw is
4 knowledge, Cmc is communication, Spd is speed, Use is usefulness, Qua is quality, Tan is tangibles, Cnv is convenience, and Sec is security). For example, the first row shows that, for the 13 services included within Cluster #1, each had empathy as an important dimension, 11 of the 13 services (85%) had knowledge as an important dimension, none of the 13 services had communication as an important dimension, etc. Table 1: Clusters and Associated Dimensions Cluster Emp Knw Cmc Spd Use Qua Tan Cnv Sec 1 100% 85% 0% 100% 69% 92% 8% 100% 8% 2 19% 0% 0% 88% 0% 94% 13% 94% 0% 3 81% 100% 100% 100% 31% 88% 13% 88% 0% 4 6% 21% 70% 94% 88% 94% 6% 100% 6% 5 13% 34% 0% 91% 84% 94% 25% 0% 0% 6 16% 53% 100% 97% 84% 100% 22% 0% 3% 7 9% 45% 9% 100% 100% 73% 100% 100% 0% 8 75% 67% 8% 100% 0% 100% 58% 8% 0% To explore the usefulness of the resulting classification scheme, and to compare this scheme to a scheme based on process characteristics alone, a number of statistical analyses were performed. Perhaps the most important of these analyses compared the clusters with another classification scheme that was based on process characteristics, rather than customer dimensions. Details on this scheme may be obtained from the authors. In Table 2, the processoriented classifications are abbreviated (A=analysis, C=consultation, E=evaluation, G=gathering, P=planning, and T=troubleshooting). For example, of the 13 services contained in Cluster #1, one service was classified as an analysis process, 4 services were classified as a consultation process, one service was classified as an evaluation process, etc. As implied by the diversity of process types within each cluster and supported by a chisquare statistical analysis, no relationship was evident between these two classification schemes (p=0.235). An example of two similar processes that were assigned to different clusters will help to explain this result. This process was one that involved the testing of material. The algorithm assigned one testing process to cluster 4 and a second testing process to cluster 5. Both testing services included quality, speed, and usefulness as important dimensions, but the service classified in cluster 4 also listed convenience and communication. Therefore, the material testing service assigned to cluster 5 had customers who expected more interaction with the service provider than did the material testing service assigned to cluster 4. Table 2: Clusters and Associated Service Process Classification Cluster A C E G P T
5 Table 3 shows the fraction of services in each cluster whose customers were primarily internal or primarily external, and the average number of functions through which the service flowed in each cluster. For example, 53.8% (7 of the 13) of the services in Cluster #1 served primarily internal customers and 46.2% (6 of the 13) of the services in Cluster #1 served primarily external customers. In some clusters, some services served internal and external customer in about equal measure. In these cases, the internal and external fractions will not add to one. Also, in cluster #1, an average of 5.2 departments or functions that took part in delivering the service. An analysis of variance concluded that the number of functions did not vary across clusters (p=0.164). A chisquare analysis showed that the prevalence of internal or external customers did not vary across clusters at a 5% level of significance (p=0.093). Significance at a 10% level for this test may indicate that a statistically significant, but weak in magnitude, relationship exists relative to the prevalence of internal customers across clusters. Table 3: Clusters and Characteristics Cluster Internal External Functions IMPLICATIONS The main result of this exploratory investigation is that a difference exists between a classification scheme based on process characteristics and a scheme based on customer preferences. This result has implications for leaders of service improvement or service innovation teams. It also supports an earlier conclusion by Maleyeff [5] who argued that, based on characteristics unique to service systems, improvement efforts should start by focusing on the information being provided to customers of interval services rather than the physical manifestation of that information. For example, he suggests that rather than focus an improvement project on speeding up the flow of a payment invoice, project teams should first ensure that the information contained on the invoice is useful, clearly printed, unambiguous in meaning, and accurate. A secondary implication could be stated as a word of caution to leaders who may focus the improvement or innovation of services based exclusively on process improvements alone. Many service improvement methodologies, such as those contained in the Lean Six Sigma toolbox [6], are processbased, such as mistake proofing, process standardization, or visual workflow control. For example, it would appear that a dimension such as empathy may be ignored by these project teams. In the case of the emergency room trauma team learning from pit crews, perhaps the
6 innovation was successful because the customers need have similar dimensions (e.g., speed and competency). FUTURE WORK This research has some limitations. Because binary data is much less powerful than continuous data, perhaps a similar analysis that incorporated dimensions measured on a continuous scale should be undertaken. The precision and reliability of the data used here can also be questioned, due to the multiple analysts and the potential for mischaracterization of customer preferences. This limitation can easily be overcome in future analyzes. Future research could also investigate if these other wellknown metrics (besides the simple matching metric) would generate clusters similar to the clusters obtained above. Finally, a more thorough analysis of best number of clusters may prove useful. REFERENCES [1] Nicholson, Kieran, Hospital teams find vroom to improve by changing racecar tires. Denver Post, April 16, 2004, p. B1. [2] Verma, Rohit, An empirical analysis of management challenges in service factories, service shops, mass services and professional services. International Journal of Service Industry Management, 2000, 11(1), [3] Guralnik, V. and Karypis, G., "A Scalable Algorithm for Clustering Protein Sequences." in Workshop on Data Mining in Bioinformatics, 2001, [4] Luke, Brian T., Agglomerative Linkages. [5] Maleyeff, John, Exploration of Internal Service Systems using Lean Principles. Management Decision, 2006, 44(5), [6] Maleyeff, John, Improving Service Delivery in Government Using Lean Six Sigma. IBM Center for The Business of Government, Washington, DC, 2007.
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationData Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distancebased Kmeans, Kmedoids,
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDDLAB ISTI CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationThe Statistics of Income (SOI) Division of the
Brian G. Raub and William W. Chen, Internal Revenue Service The Statistics of Income (SOI) Division of the Internal Revenue Service (IRS) produces data using information reported on tax returns. These
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms Kmeans and its variants Hierarchical clustering
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationComputational Complexity between KMeans and KMedoids Clustering Algorithms for Normal and Uniform Distributions of Data Points
Journal of Computer Science 6 (3): 363368, 2010 ISSN 15493636 2010 Science Publications Computational Complexity between KMeans and KMedoids Clustering Algorithms for Normal and Uniform Distributions
More informationKnowledge Discovery and Data Mining. Structured vs. NonStructured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. NonStructured Data Most business databases contain structured data consisting of welldefined fields with numeric or alphanumeric values.
More informationClustering. Adrian Groza. Department of Computer Science Technical University of ClujNapoca
Clustering Adrian Groza Department of Computer Science Technical University of ClujNapoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 Kmeans 3 Hierarchical Clustering What is Datamining?
More information0.1 What is Cluster Analysis?
Cluster Analysis 1 2 0.1 What is Cluster Analysis? Cluster analysis is concerned with forming groups of similar objects based on several measurements of different kinds made on the objects. The key idea
More informationData Clustering. Dec 2nd, 2013 Kyrylo Bessonov
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms kmeans Hierarchical Main
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationHatice Camgöz Akdağ. findings of previous research in which two independent firm clusters were
Innovative Culture and Total Quality Management as a Tool for Sustainable Competitiveness: A Case Study of Turkish Fruit and Vegetable Processing Industry SMEs, Sedef Akgüngör Hatice Camgöz Akdağ Aslı
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationThere are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
More informationCREATING VALUE WITH BUSINESS ANALYTICS EDUCATION
ISAHP Article: Ozaydin, Ulengin/Creating Value with Business Analytics Education, Washington D.C., U.S.A. CREATING VALUE WITH BUSINESS ANALYTICS EDUCATION Ozay Ozaydin Faculty of Engineering Dogus University
More informationNeural Networks Lesson 5  Cluster Analysis
Neural Networks Lesson 5  Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt.  Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationFig. 1 A typical Knowledge Discovery process [2]
Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture  17 ShannonFanoElias Coding and Introduction to Arithmetic Coding
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. DensityBased Methods 6. GridBased Methods 7. ModelBased
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More informationIntroduction to Statistical Machine Learning
CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationStandardization and Its Effects on KMeans Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 3993303, 03 ISSN: 0407459; eissn: 0407467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationCluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative Kmeans Densitybased Interpretation
More informationInternational Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013
FACTORING CRYPTOSYSTEM MODULI WHEN THE COFACTORS DIFFERENCE IS BOUNDED Omar Akchiche 1 and Omar Khadir 2 1,2 Laboratory of Mathematics, Cryptography and Mechanics, Fstm, University of Hassan II MohammediaCasablanca,
More informationMovie Classification Using kmeans and Hierarchical Clustering
Movie Classification Using kmeans and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DAIICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationClustering and Data Mining in R
Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationDistance based clustering
// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationREFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 20150305
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 20150305 Roman Kern (KTI, TU Graz) Ensemble Methods 20150305 1 / 38 Outline 1 Introduction 2 Classification
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationRandom forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
More informationThe Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
More informationData Mining Project Report. Document Clustering. Meryem UzunPer
Data Mining Project Report Document Clustering Meryem UzunPer 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. Kmeans algorithm...
More informationLecture 20: Clustering
Lecture 20: Clustering Wrapup of neural nets (from last lecture Introduction to unsupervised learning Kmeans clustering COMP424, Lecture 20  April 3, 2013 1 Unsupervised learning In supervised learning,
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationAP Statistics 2002 Scoring Guidelines
AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought
More informationExample: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? KMeans Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
More informationEM Clustering Approach for MultiDimensional Analysis of Big Data Set
EM Clustering Approach for MultiDimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationClassify then Summarize or Summarize then Classify
Classify then Summarize or Summarize then Classify DIMACS, Rutgers University Piscataway, NJ 08854 Workshop Honoring Edwin Diday held on September 4, 2007 What is Cluster Analysis? Software package? Collection
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More informationInformation Architecture Planning Template for Health, Safety, and Environmental Organizations
Environmental Conference September 1820, 2005 The Fairmont Hotel Information Architecture Planning Template for Health, Safety, and Environmental Organizations Presented By: Alan MacGregor ENVIRON International
More informationLinear Codes. In the V[n,q] setting, the terms word and vector are interchangeable.
Linear Codes Linear Codes In the V[n,q] setting, an important class of codes are the linear codes, these codes are the ones whose code words form a subvector space of V[n,q]. If the subspace of V[n,q]
More informationNotes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationHierarchical Cluster Analysis Some Basics and Algorithms
Hierarchical Cluster Analysis Some Basics and Algorithms Nethra Sambamoorthi CRMportals Inc., 11 Bartram Road, Englishtown, NJ 07726 (NOTE: Please use always the latest copy of the document. Click on this
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table of Contents INTRODUCTION: WHAT
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationEXPERIMENTAL ERROR AND DATA ANALYSIS
EXPERIMENTAL ERROR AND DATA ANALYSIS 1. INTRODUCTION: Laboratory experiments involve taking measurements of physical quantities. No measurement of any physical quantity is ever perfectly accurate, except
More informationMATHEMATICS CLASS  XII BLUE PRINT  II. (1 Mark) (4 Marks) (6 Marks)
BLUE PRINT  II MATHEMATICS CLASS  XII S.No. Topic VSA SA LA TOTAL ( Mark) (4 Marks) (6 Marks). (a) Relations and Functions 4 () 6 () 0 () (b) Inverse trigonometric Functions. (a) Matrices Determinants
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 3448 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationAnother Way to Burn. The rise of the burndown chart
This article describes a burndown chart based upon test cases rather than effort and describes its advantages. It is intended for readers already familiar with the concept of a burndown chart. The rise
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 15 scale to 0100 scores When you look at your report, you will notice that the scores are reported on a 0100 scale, even though respondents
More informationSummary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen
Summary Data Mining & Process Mining (1BM46) Made by S.P.T. Ariesen Content Data Mining part... 2 Lecture 1... 2 Lecture 2:... 4 Lecture 3... 7 Lecture 4... 9 Process mining part... 13 Lecture 5... 13
More informationNovel Automatic PCB Inspection Technique Based on Connectivity
Novel Automatic PCB Inspection Technique Based on Connectivity MAURO HIROMU TATIBANA ROBERTO DE ALENCAR LOTUFO FEEC/UNICAMP Faculdade de Engenharia Elétrica e de Computação/ Universidade Estadual de Campinas
More informationClustering on Large Numeric Data Sets Using Hierarchical Approach Birch
Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global
More informationMeasurement Information Model
mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides
More informationTrends in Interdisciplinary Dissertation Research: An Analysis of the Survey of Earned Doctorates
Trends in Interdisciplinary Dissertation Research: An Analysis of the Survey of Earned Doctorates Working Paper NCSES 12200 April 2012 by Morgan M. Millar and Don A. Dillman 1 Disclaimer and Acknowledgments
More informationTeaching Multivariate Analysis to BusinessMajor Students
Teaching Multivariate Analysis to BusinessMajor Students WingKeung Wong and TeckWong Soon  Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationKMeans Clustering Tutorial
KMeans Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. KMeans Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
More informationSTT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables
Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random
More informationImproving Generalization
Improving Generalization Introduction to Neural Networks : Lecture 10 John A. Bullinaria, 2004 1. Improving Generalization 2. Training, Validation and Testing Data Sets 3. CrossValidation 4. Weight Restriction
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis DensityBased Cluster Analysis Cluster Evaluation Constrained
More informationCLUSTERING FOR FORENSIC ANALYSIS
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 23218843; ISSN(P): 23474599 Vol. 2, Issue 4, Apr 2014, 129136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS
More informationVaccination Level Deduplication in Immunization
Vaccination Level Deduplication in Immunization Information Systems (IIS) One of the major functions of an Immunization Information System (IIS) is to create and maintain an accurate and timely record
More informationKATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE CQAS 747 Principles of
More informationRobotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard
Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,
More informationPREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.
More informationLargeScale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 59565963 Available at http://www.jofcis.com LargeScale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationTime series clustering and the analysis of film style
Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such
More informationAvailable online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science
Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Procedia Engineering Engineering 00 (2011) 15 (2011) 000 000 1822 1826 Procedia Engineering www.elsevier.com/locate/procedia
More informationData Desk Professional: Statistical Analysis for the Macintosh. PUB DATE Mar 89 NOTE
DOCUMENT RESUME ED 309 760 IR 013 926 AUTHOR Wise, Steven L.; Kutish, Gerald W. TITLE Data Desk Professional: Statistical Analysis for the Macintosh. PUB DATE Mar 89 NOTE 10p,; Paper presented at the Annual
More informationIdentification of noisy variables for nonmetric and symbolic data in cluster analysis
Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,
More informationClustering Hierarchical clustering and kmean clustering
Clustering Hierarchical clustering and kmean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: A quick review partition genes into distinct sets with high homogeneity
More informationLean Certification Program Blended Learning Program Cost: $5500. Course Description
Lean Certification Program Blended Learning Program Cost: $5500 Course Description Lean Certification Program is a disciplined process improvement approach focused on reducing waste, increasing customer
More informationSteven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 306022501
CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 306022501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups
More informationMeasurement Systems Analysis MSA for Suppliers
Measurement Systems Analysis MSA for Suppliers Copyright 20032007 Raytheon Company. All rights reserved. R6σ is a Raytheon trademark registered in the United States and Europe. Raytheon Six Sigma is a
More informationClassification of Household Devices by Electricity Usage Profiles
Classification of Household Devices by Electricity Usage Profiles Jason Lines 1, Anthony Bagnall 1, Patrick CaigerSmith 2, and Simon Anderson 2 1 School of Computing Sciences University of East Anglia
More information