# Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points

Save this PDF as:

Size: px
Start display at page:

Download "Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points"

## Transcription

1 Journal of Computer Science 6 (3): , 2010 ISSN Science Publications Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points T. Velmurugan and T. Santhanam Department of Computer Science, DG Vaishnav College, Chennai, India Abstract: Problem statement: Clustering is one of the most important research areas in the field of data mining. Clustering means creating groups of objects based on their features in such a way that the objects belonging to the same groups are similar and those belonging to different groups are dissimilar. Clustering is an unsupervised learning technique. The main advantage of clustering is that interesting patterns and structures can be found directly from very large data sets with little or none of the background knowledge. Clustering algorithms can be applied in many domains. Approach: In this research, the most representative algorithms K-Means and K-Medoids were examined and analyzed based on their basic approach. The best algorithm in each category was found out based on their performance. The input data points are generated by two ways, one by using normal distribution and another by applying uniform distribution. Results: The randomly distributed data points were taken as input to these algorithms and clusters are found out for each algorithm. The algorithms were implemented using JAVA language and the performance was analyzed based on their clustering quality. The execution time for the algorithms in each category was compared for different runs. The accuracy of the algorithm was investigated during different execution of the program on the input data points. Conclusion: The average time taken by K-Means algorithm is greater than the time taken by K-Medoids algorithm for both the case of normal and uniform distributions. The results proved to be satisfactory. Key words: K-Means clustering, K-Medoids clustering, data clustering, cluster analysis INTRODUCTION Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data (Jain and Dubes, 1988; Jain et al., 1999). A loose definition of clustering could be the process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. Unlike classification, in which objects are assigned to predefined classes, clustering does not have any predefined classes. The main advantage of clustering is that interesting patterns and structures can be found directly from very large data sets with little or none of the background knowledge. The cluster results are subjective and implementation dependent. The quality of a clustering method depends on: The similarity measure used by the method and its implementation Its ability to discover some or all of the hidden patterns The definition and representation of cluster chosen A number of algorithms for clustering have been proposed by researchers, of which this study establishes with a comparative study of K-Means and K-Medoids clustering algorithms (Berkhin, 2002; Dunham, 2002; Han and Kamber, 2006; Xiong et al., 2009; Park et al., 2006; Khan and Ahmad, 2004; Borah and Ghose, 2009; Rakhlin and Caponnetto, 2007). MATERIALS AND METHODS Problem specification: In this research, two unsupervised clustering methods, namely K-Means and K-Medoids are examined to analyze based on the distance between the various input data points. The clusters are formed according to the distance between data points and cluster centers are formed for each cluster. The implementation plan will be in two parts, one in normal distribution and another in uniform distribution of input data points. Both the algorithms Corresponding Author: T. Santhanam, Department of Computer Science, DG Vaishnav College, Chennai, India Tel:

2 are implemented by using JAVA language. The number of clusters is chosen by the user. The data points in each cluster are displayed by different colors and the execution time is calculated in milliseconds. K-Means algorithm: K-Means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early group is done. At this point, it is need to re-calculate k new centroids as centers of the clusters resulting from the previous step. After these k new centroids, a new binding has to be done between the same data points and the nearest new centroid. A loop has been generated. As a result of this loop it may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more. Finally, this algorithm aims at minimizing an objective function, in this case a squared error function. The objective function: where, x ( j) i c j 2 k n J = x c j= 1 i= 1 ( j) i j 2 is a chosen distance measure between ( j) a data point xi and the cluster centre c j, is an indicator of the distance of the n data points from their respective cluster centers. The algorithm is composed of the following steps: 1. Place k points into the space represented by the objects that are being clustered. These points represent initial group centroids 2. Assign each object to the group that has the closest centroid 3. When all objects have been assigned, recalculate the positions of the k centroids 4. Repeat Step 2 and 3 until the centroids no longer move J. Computer Sci., 6 (3): , 2010 This produces a separation of the objects into groups from which the metric to be minimized can be calculated. Although it can be proved that the procedure Until no change; 364 will always terminate, the K-Means algorithm does not necessarily find the most optimal configuration, corresponding to the global objective function minimum. The algorithm is also significantly sensitive to the initial randomly selected cluster centers. The K- Means algorithm can be run multiple times to reduce this effect. K-Means is a simple algorithm that has been adapted to many problem domains. First, the algorithm randomly selects k of the objects. Each selected object represents a single cluster and because in this case only one object is in the cluster, this object represents the mean or center of the cluster. K-Medoids algorithm: The K-means algorithm is sensitive to outliers since an object with an extremely large value may substantially distort the distribution of data. How might the algorithm be modified to diminish such sensitivity? Instead of taking the mean value of the objects in a cluster as a reference point, a Medoid can be used, which is the most centrally located object in a cluster. Thus the partitioning method can still be performed based on the principle of minimizing the sum of the dissimilarities between each object and its corresponding reference point. This forms the basis of the K-Medoids method. The basic strategy of K- Mediods clustering algorithms is to find k clusters in n objects by first arbitrarily finding a representative object (the Medoids) for each cluster. Each remaining object is clustered with the Medoid to which it is the most similar. K-Medoids method uses representative objects as reference points instead of taking the mean value of the objects in each cluster. The algorithm takes the input parameter k, the number of clusters to be partitioned among a set of n objects. A typical K- Mediods algorithm for partitioning based on Medoid or central objects is as follows: Input: K: The number of clusters D: A data set containing n objects Output: A set of k clusters that minimizes the sum of the dissimilarities of all the objects to their nearest medoid. Method: Arbitrarily choose k objects in D as the initial representative objects; Repeat: Assign each remaining object to the cluster with the nearest medoid; Randomly select a non medoid object O random ; compute the total points S of swap point O j with O ramdom if S < 0 then swap O j with O random to form the new set of k medoid

3 Like this algorithm, a Partitioning Around Medoids (PAM) was one of the first k-medoids algorithms introduced. It attempts to determine k partitions for n objects. After an initial random selection of k medoids, the algorithm repeatedly tries to make a better choice of medoids. RESULTS In this study, the k-means algorithm is explained with an example first, followed by k-medoids algorithm. The two types of experimental results are discussed for the K-Means algorithm. One is Normal and another is Uniform distributions of input data points. For both the types of distributions, the data points are created by using Box-Muller formula. The resulting clusters of the normal distribution of K-Means algorithm is presented in Fig. 1. The number of clusters and data points is given by the user during the execution of the program. The number of data points is 1000 and the number of clusters given by the user is 10 (k = 10). The algorithm is repeated 1000 times to get efficient output. The cluster centers (centroids) are calculated for each cluster by its mean value and clusters are formed depending upon the distance between data points. For different input data points, the algorithm gives different types of outputs. The input data points are generated in red color and the output of the algorithm is displayed in different colors as shown in Fig. 1. The center point of each cluster is displayed in green color. The execution time of each run is calculated in milliseconds. The time taken for execution of the algorithm varies from one run to another run and also it differs from one computer to another computer. The result shows the normal distribution of 1000 data points around ten cluster centers. The number of data points is the size of the cluster. The size of the cluster is high for some clusters due to increase in the number of data points around a particular cluster center. For example, size of cluster 10 is 139, because of the distance between the data points of that particular cluster and the cluster centers is very less. Since the data points are normally distributed, clusters vary in size with maximum data points in cluster 10 and minimum data points in cluster 8. For the same normal distribution of the input data points and for the uniform distribution, the algorithm is executed 10 times and the results are tabulated in the Table 1. From Table 1 (against the rows indicated by N ), it is observed that the execution time for Run 6 is 7640 m sec and for Run 8 is 7297 msec. In a normal distribution of the data points, there is much difference between the sizes of the clusters. For example, in Run 7, cluster 2 has 167 points and cluster 5 has only 56 points. Next, the uniformly distributed one thousand data points are taken as input. One of the executions is shown in Fig. 2. The number of clusters chosen by user is 10. The result of the algorithm for 10 different execution of the program is given in Table 1 (against the rows indicated by U ). Fig. 1: Normal distribution output-k-means 365

4 Table 1: Cluster results for K-Means algorithm Cluster Time (m sec) Run 1 N U Run 2 N U Run 3 N U Run 4 N U Run 5 N U Run 6 N U Run 7 N U Run 8 N U Run 9 N U Run 10 N U From the result, it can be inferred that the size of the clusters are near to the average data points in the uniform distribution. This can easily be observed that for 1000 points and 10 clusters, the average is 100. But this is not true in the case of normal distribution. It is easy to identify from the Fig. 1 and 2, the size of the clusters are different. This shows that the behavior of the algorithm is different for both types of distributions. Next, for the K-Medoids algorithm, the input data points and the experimental results are considered by the same method as discussed for K-Means algorithm. Fig. 2: Uniform distribution output-k-means 366 In the next example, one thousand normally distributed data points are taken as input. Number of clusters chosen by the user is 10. The output of one of the trials is shown in Fig. 3. Next, the uniformly distributed one thousand data points are taken as input. The number of clusters chosen by the user is 10. One of the executions is shown in Fig. 4. The result of the algorithm for ten different executions of the program is given in Table 2. For both the algorithms, K-Means and K-Medoids, the program is executed many times and the results are analyzed based on the number of data points and the number of clusters. The behavior of the algorithms is analyzed based on observations.

5 Fig. 3: Normal distribution output-k-medoids Fig. 4: Uniform distribution output-k-medoids Table 2: Cluster results for K-Medoids algorithm Cluster Time (m sec) Run 1 N U Run 2 N U Run 3 N U Run 4 N U Run 5 N U Run 6 N U Run 7 N U Run 8 N U Run 9 N U Run 10 N U

6 The performance of the algorithms have also been analyzed for several executions by considering different data points (for which the results are not shown) as input (250 data points, 500 data points etc), the outcomes are found to be highly satisfactory. DISCUSSION From the experimental results, for K-Means algorithm, the average time for clustering the normal distribution of data points is found to be m sec. The average time for clustering the uniform distribution of data points is found to be m sec. For the K-Medoids algorithm, the average time for clustering the normal distribution of data points is found to be m sec and for the average time for clustering the uniform distribution of data points are m sec. It is understood that the average time for normal distribution is greater than the average time for the uniform distribution. This is true for both the algorithms K-Means and K-Medoids. If the number of data points is less, then the K-Means algorithm takes lesser execution time. But when the data points are increased to maximum the K-Means algorithm takes maximum time and the K-Medoids algorithm performs reasonably better than the K-Means algorithm. The characteristic feature of this algorithm is that it requires the distance between every pair of objects only once and uses this distance at every stage of iteration. CONCLUSION The time taken for one execution of the program for the Uniform Distribution is less than the time taken for Normal Distribution. Usually the time complexity varies from one processor to another processor, which depends on the speed and the type of the system. The partition based algorithms work well for finding spherical-shaped clusters in small to medium-sized data points. The advantage of the K-Means algorithm is its favorable execution time. Its drawback is that the user has to know in advance how many clusters are searched for. From the experimental results (by many execution of the programs), it is observed that K-Means algorithm is efficient for smaller data sets and K-Medoids algorithm seems to perform better for large data sets. REFERENCES Berkhin, P., Survey of clustering data mining techniques. Technical Report, Accrue Software, Inc. rvey.pdf Borah, S. and M.K. Ghose, Performance analysis of AIM-K-Means and K-Means in quality cluster generation. J. Comput., 1: Dunham, M., Data Mining: Introductory and Advanced Topics. 1st Edn., Prentice Hall, USA., ISBN: 10: , pp: 315. Han, J. and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2nd Edn., New Delhi, ISBN: Jain, A.K. and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall Inc., Englewood Cliffs, New Jersey, ISBN: X, pp: 320. Jain, A.K., M.N. Murty and P.J. Flynn, Data clustering: A review. ACM Comput. Surveys, 31: DOI: / Khan, S.S. and A. Ahmad, Cluster center initialization algorithm for K-Means clustering. Patt. Recog. Lett., 25: DOI: /j.patrec Park, H.S., J.S. Lee and C.H. Jun, A K-meanslike algorithm for K-medoids clustering and its performance &rep=rep1&type=pdf Rakhlin, A. and A. Caponnetto, Stability of k- Means clustering. Adv. Neural Inform. Process. Syst., 12: hlin-stability-clustering.pdf Xiong, H., J. Wu and J. Chen, K-Means clustering versus validation measures: A data distribution perspective. IEEE Trans. Syst., Man, Cybernet. Part B, 39:

### Fig. 1 A typical Knowledge Discovery process [2]

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

### A K-means-like Algorithm for K-medoids Clustering and Its Performance

A K-means-like Algorithm for K-medoids Clustering and Its Performance Hae-Sang Park*, Jong-Seok Lee and Chi-Hyuck Jun Department of Industrial and Management Engineering, POSTECH San 31 Hyoja-dong, Pohang

### Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

### Clustering & Association

Clustering - Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects

### Standardization and Its Effects on K-Means Clustering Algorithm

Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

### EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING

EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING Navjot Kaur, Jaspreet Kaur Sahiwal, Navneet Kaur Lovely Professional University Phagwara- Punjab Abstract Clustering is an essential

### Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

### USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva

382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent

### CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015

CPSC 340: Machine Learning and Data Mining K-Means Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the

### Clustering Breast Cancer Dataset using Self- Organizing Map Method

International Journal of Advanced Studies in Computer Science and Engineering Clustering Breast Cancer Dataset using Self- Organizing Map Method M. Thenmozhi Assistant Professor Dept. of Computer Science

### Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

### An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

### A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p

Title A fuzzy k-modes algorithm for clustering categorical data Author(s) Huang, Z; Ng, MKP Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p. 446-452 Issue Date 1999 URL http://hdl.handle.net/10722/42992

### Data Warehousing and Data Mining

Data Warehousing and Data Mining Lecture Clustering Methods CITS CITS Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement: The

### Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

### Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract:- Data mining is used to find the hidden information pattern and relationship between

### The Result Analysis of the Cluster Methods by the Classification of Municipalities

The Result Analysis of the Cluster Methods by the Classification of Municipalities PAVEL PETR, KAŠPAROVÁ MILOSLAVA System Engineering and Informatics Institute Faculty of Economics and Administration University

### Overview. Clustering. Clustering vs. Classification. Supervised vs. Unsupervised Learning. Connectionist and Statistical Language Processing

Overview Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes clustering vs. classification supervised vs. unsupervised

### Clustering in Machine Learning. By: Ibrar Hussain Student ID:

Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning K-means Clustering Example of k-means

### Fuzzy Clustering Technique for Numerical and Categorical dataset

Fuzzy Clustering Technique for Numerical and Categorical dataset Revati Raman Dewangan, Lokesh Kumar Sharma, Ajaya Kumar Akasapu Dept. of Computer Science and Engg., CSVTU Bhilai(CG), Rungta College of

### PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,

### Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

### Clustering Hierarchical clustering and k-mean clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: A quick review partition genes into distinct sets with high homogeneity

### Unsupervised learning: Clustering

Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

### Chapter 7. Cluster Analysis

Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based

### UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

### An Enhanced Clustering Algorithm to Analyze Spatial Data

International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-2, Issue-7, July 2014 An Enhanced Clustering Algorithm to Analyze Spatial Data Dr. Mahesh Kumar, Mr. Sachin Yadav

### K-Means Clustering Tutorial

K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July

### Data Clustering Using Data Mining Techniques

Data Clustering Using Data Mining Techniques S.R.Pande 1, Ms. S.S.Sambare 2, V.M.Thakre 3 Department of Computer Science, SSES Amti's Science College, Congressnagar, Nagpur, India 1 Department of Computer

### ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

### Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data

### Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

### A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining

A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining Binu Thomas and Rau G 2, Research Scholar, Mahatma Gandhi University,Kerala, India. binumarian@rediffmail.com 2 SCMS School of Technology

### Study of Euclidean and Manhattan Distance Metrics using Simple K-Means Clustering

Study of and Distance Metrics using Simple K-Means Clustering Deepak Sinwar #1, Rahul Kaushik * # Assistant Professor, * M.Tech Scholar Department of Computer Science & Engineering BRCM College of Engineering

### Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

### A Proposed Framework for Analyzing Crime Data Set Using Decision Tree and Simple K-Means Mining Algorithms

Journal of Kufa for Mathematics and Computer Vol.1, No.3, may, 2011, pp.8-24 A Proposed Framework for Analyzing Crime Data Set Using Decision Tree and Simple K-Means Mining Algorithms Kadhim B. Swadi Al-Janabi

### Information Retrieval of K-Means Clustering For Forensic Analysis

Information Retrieval of K-Means Clustering For Forensic Analysis Prachi Gholap 1, Vikas Maral 2 1 Pune University, K.J College of Engineering Kondhwa, 411043, Pune, Maharashtra, India 2 Pune University,

### K Thangadurai P.G. and Research Department of Computer Science, Government Arts College (Autonomous), Karur, India. ktramprasad04@yahoo.

Enhanced Binary Small World Optimization Algorithm for High Dimensional Datasets K Thangadurai P.G. and Research Department of Computer Science, Government Arts College (Autonomous), Karur, India. ktramprasad04@yahoo.com

### Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application

Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application Marta Marrón Romera, Miguel Angel Sotelo Vázquez, and Juan Carlos García García Electronics Department, University

### Lecture 20: Clustering

Lecture 20: Clustering Wrap-up of neural nets (from last lecture Introduction to unsupervised learning K-means clustering COMP-424, Lecture 20 - April 3, 2013 1 Unsupervised learning In supervised learning,

### IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES C.Priyanka 1, T.Giri Babu 2 1 M.Tech Student, Dept of CSE, Malla Reddy Engineering

### Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

### Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process

Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek Abstract Depending

### Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain Situations

Vol.2, Issue.6, Nov-Dec. 2012 pp-4094-4101 ISSN: 2249-6645 Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain Situations Ahmed Bahgat El Seddawy 1, Prof. Dr. Turky Sultan 2, Dr.

### Classification and Segmentation in Satellite Imagery Using Back Propagation Algorithm of ANN and K-Means Algorithm

Classification and Segmentation in Satellite Imagery Using Back Propagation Algorithm of ANN and K-Means Algorithm P. Sathya and L. Malathi Abstract Now a days unsupervised image classification and segmentation

### Distance based clustering

// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means

### PLAANN as a Classification Tool for Customer Intelligence in Banking

PLAANN as a Classification Tool for Customer Intelligence in Banking EUNITE World Competition in domain of Intelligent Technologies The Research Report Ireneusz Czarnowski and Piotr Jedrzejowicz Department

### Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

### Clustering in Ratemaking: Applications in Territories Clustering

Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, Ph.D. Abstract: Clustering methods are briefly reviewed and their applications in insurance ratemaking are discussed in this paper.

### A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

### Data Mining Individual Assignment report

Björn Þór Jónsson bjrr@itu.dk Data Mining Individual Assignment report This report outlines the implementation and results gained from the Data Mining methods of preprocessing, supervised learning, frequent

### EXTENDED CENTROID BASED CLUSTERING TECHNIQUE FOR ONLINE SHOPPING FRAUD DETECTION

EXTENDED CENTROID BASED CLUSTERING TECHNIQUE FOR ONLINE SHOPPING FRAUD DETECTION Priya J Rana 1, Jwalant Baria 2 1 ME IT, Department of IT, Parul institute of engineering & Technology, Gujarat, India 2

### International Journal of Electronics and Computer Science Engineering 1449

International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

### K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

### ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

### Unsupervised Learning. What is clustering for? What is clustering for? (cont )

Unsupervised Learning c: Artificial Intelligence The University of Iowa Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

### Data Mining and Clustering Techniques

DRTC Workshop on Semantic Web 8 th 10 th December, 2003 DRTC, Bangalore Paper: K Data Mining and Clustering Techniques I. K. Ravichandra Rao Professor and Head Documentation Research and Training Center

### Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

### Clustering In Data Mining : A Brief Review

Clustering In Data Mining : A Brief Review Meenu Sharma Mtech scholar, Department of computer science,sd Bansal college of tehnology,indore. Mr. Kamal Borana Assistant Professor Department of computer

### Fractal Images Compressing by Estimating the Closest Neighborhood with Using of Schema Theory

Journal of Computer Science 6 (5): 591-596, 2010 ISSN 1549-3636 2010 Science Publications Fractal Images Compressing by Estimating the Closest Neighborhood with Using of Schema Theory Mahdi Jampour, Mahdi

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

### Flat Clustering K-Means Algorithm

Flat Clustering K-Means Algorithm 1. Purpose. Clustering algorithms group a set of documents into subsets or clusters. The cluster algorithms goal is to create clusters that are coherent internally, but

### Performance Analysis of Clustering using Partitioning and Hierarchical Clustering Techniques. Karunya University, Coimbatore, India. India.

Vol.7, No.6 (2014), pp.233-240 http://dx.doi.org/10.14257/ijdta.2014.7.6.21 Performance Analysis of Clustering using Partitioning and Hierarchical Clustering Techniques S. C. Punitha 1, P. Ranjith Jeba

### CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS

CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS Venkat Venkateswaran Department of Engineering and Science Rensselaer Polytechnic Institute 275 Windsor Street Hartford,

### City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Fundamentals of Data Science Course Code:

### A Review on Clustering and Outlier Analysis Techniques in Datamining

American Journal of Applied Sciences 9 (2): 254-258, 2012 ISSN 1546-9239 2012 Science Publications A Review on Clustering and Outlier Analysis Techniques in Datamining 1 Koteeswaran, S., 2 P. Visu and

### Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)

Data Mining 資 料 探 勘 Tamkang University 分 群 分 析 (Cluster Analysis) DM MI Wed,, (:- :) (B) Min-Yuh Day 戴 敏 育 Assistant Professor 專 任 助 理 教 授 Dept. of Information Management, Tamkang University 淡 江 大 學 資

### Cluster Analysis: Basic Concepts and Methods

10 Cluster Analysis: Basic Concepts and Methods Imagine that you are the Director of Customer Relationships at AllElectronics, and you have five managers working for you. You would like to organize all

### Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

### Neural Networks in Data Mining

IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department

### Question 1. Question 2. Question 3. Question 4. Mert Emin Kalender CS 533 Homework 3

Question 1 Cluster hypothesis states the idea of closely associating documents that tend to be relevant to the same requests. This hypothesis does make sense. The size of documents for information retrieval

### ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

### Distributed Framework for Data Mining As a Service on Private Cloud

RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

### Precise Recommendation System for the Long Tail Problem Using Adaptive Clustering Technique

Precise Recommendation System for the Long Tail Problem Using Adaptive Clustering Technique S.Jeyshirii 1, Dr.T.K.Thivakaran 2 P.G. Student, Department of Computer Science & Engineering,Sri Venkateswara

### Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

### Identifying Peer-to-Peer Traffic Based on Traffic Characteristics

Identifying Peer-to-Peer Traffic Based on Traffic Characteristics Prof S. R. Patil Dept. of Computer Engineering SIT, Savitribai Phule Pune University Lonavala, India srp.sit@sinhgad.edu Suraj Sanjay Dangat

### CLUSTER ANALYSIS OF TRAFFIC FLOWS ON A CAMPUS NETWORK

CLUSTER ANALYSIS OF TRAFFIC FLOWS ON A CAMPUS NETWORK Asim Karim 1, Irfan Ahmad 2, Syed Imran Jami 3, and Mansoor Sarwar 2 1 Dept. of Computer Science 2 School of Science and Technology 3 Dept. of Computer

### Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

### STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

Stock Asian-African Market Trends Journal using of Economics Cluster Analysis and Econometrics, and ARIMA Model Vol. 13, No. 2, 2013: 303-308 303 STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

### Research Article Traffic Analyzing and Controlling using Supervised Parametric Clustering in Heterogeneous Network. Coimbatore, Tamil Nadu, India

Research Journal of Applied Sciences, Engineering and Technology 11(5): 473-479, 215 ISSN: 24-7459; e-issn: 24-7467 215, Maxwell Scientific Publication Corp. Submitted: March 14, 215 Accepted: April 1,

### II. SPATIAL DATA MINING DEFINITION

Spatial Data Mining using Cluster Analysis Ch.N.Santhosh Kumar 1, V. Sitha Ramulu 2, K.Sudheer Reddy 3, Suresh Kotha 4, Ch. Mohan Kumar 5 1 Assoc. Professor, Dept. of CSE, Swarna Bharathi Inst. of Sc.

### Applications of Data Mining in Correlating Stock Data and Building Recommender Systems

Applications of Data Mining in Correlating Stock Data and Building Recommender Systems Sharang Bhat Student, Department of Computer Engineering Mukesh Patel School of Technology Management and Engineering,

### SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

### A Novel Density based improved k-means Clustering Algorithm Dbkmeans

A Novel Density based improved k-means Clustering Algorithm Dbkmeans K. Mumtaz 1 and Dr. K. Duraiswamy 2, 1 Vivekanandha Institute of Information and Management Studies, Tiruchengode, India 2 KS Rangasamy

### LVQ Plug-In Algorithm for SQL Server

LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico licinia.monteiro@tagus.ist.utl.pt I. Executive Summary In this Resume we describe a new functionality implemented

### A Comparative Study of clustering algorithms Using weka tools

A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering

### Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

### Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

### EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

### K- means++ Optimal Clustering Initialization Algorithm

K- means++ Optimal Clustering Initialization Algorithm Isaiah Shaefer Executive Summary After using the K- means Clustering algorithm several times, analysts noticed some inconsistencies and decided to

### . Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### A Two-Step Method for Clustering Mixed Categroical and Numeric Data

Tamkang Journal of Science and Engineering, Vol. 13, No. 1, pp. 11 19 (2010) 11 A Two-Step Method for Clustering Mixed Categroical and Numeric Data Ming-Yi Shih*, Jar-Wen Jheng and Lien-Fu Lai Department

### POISSON DISTRIBUTION BASED INITIALIZATION FOR FUZZY CLUSTERING

POISSON DISTRIBUTION BASED INITIALIZATION FOR FUZZY CLUSTERING Tomas Vintr, Vanda Vintrova, Hana Rezankova Abstract: A quality of centroid-based clustering is highly dependent on initialization. In the