# EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING Navjot Kaur, Jaspreet Kaur Sahiwal, Navneet Kaur Lovely Professional University Phagwara- Punjab Abstract Clustering is an essential task in Data Mining process which is used for the purpose to make groups or clusters of the given data set based on the similarity between them. K-Means clustering is a clustering method in which the given data set is divided into K number of clusters. This paper is intended to give the introduction about K-means clustering and its algorithm. The experimental results of K- means clustering and its performance in case of execution time is discussed here. But there are certain limitations in K- means clustering algorithm such as it takes more time for execution. So in order to reduce the execution, time we are using the Ranking Method. And also shown that how clustering is performed in less execution time as compared to the traditional method. This work makes an attempt at studying the feasibility of K-means clustering algorithm in data mining using the Ranking Method. Index Terms Clustering, K-means Clustering, Ranking method. I. INTRODUCTION In today s highly competitive business environment Clustering play an important role. As K- means Clustering is a method for making groups of the data set or the objects that are having similar properties. In this paper the II section includes the introduction part of Clustering and section III contain the related study or the literature survey about K-Means clustering algorithm. IV Section contains introduction about K-means Clustering algorithm and also examples of this algorithm. This Section also includes how in K-means algorithm the distance between the objects and mean is calculated and the methods of selecting initial points in K-means Clustering algorithm. Section V contains main steps in K-means clustering algorithm, then Section VI includes introduction about our proposed method Ranking Method and what is the need of Ranking method. Now section VII includes the introduction about the tool used for the implementation. Section VIII includes the results of both K-means clustering and Ranking Method and also shown the execution time taken by both algorithm during the clustering process. Section IX includes the research diagram which shows how in this research, the step by step implementation is done. Then the last Sections X includes the conclusion. II. CLUSTERING Mainly Clustering is the method which includes the grouping of similar type objects into one cluster and a cluster which includes the objects of data set is chosen in order to minimize some measure of dissimilarity. Clustering is a type of unsupervised learning not supervised learning like Classification. In clustering method, objects of the dataset are grouped into clusters, in such a way that groups are very different from each other and the objects in the same group or cluster are very similar to each other. Unlike Classification, in which predefined set of classes are presented, but in Clustering there are no predefined set of classes which means that resulting clusters are not known before the execution of clustering algorithm. In this these clusters are extracted from the dataset by grouping the objects in it. Types of Clustering Algorithms Hierarchical Clustering Algorithm K-means Clustering Algorithm Density Based Clustering Algorithm Self-organization maps (SOM) EM clustering Algorithm III. RELATED WORK K. A. Abdul Nazeer, M. P. Sebastian (2009) Improving the Accuracy and Efficiency of the k-means Clustering Algorithm In this paper an improvement in K-means clustering is shown. In this paper in the first phase of K- means clustering algorithm, the initial centroids are determined systematically so as to produce clusters with better accuracy [1]. The second phase makes use of an efficient way for assigning data points to clusters. 85

2 D. Napoleon, P. Ganga lakshmi (2010) An Efficient K- Means Clustering Algorithm for Reducing Time Complexity using Uniform Distribution Data Points In this paper the uniform distribution of the data points is discussed that how this approach reduce the time complexity of the K-means clustering algorithm [2]. By using this approach the elapsed time is reduced and the cluster is of better quality. In this a very good method is used for finding the initial centroid. In this initially, the distance between each data points is computed. Madhuri A. Dalal1 Nareshkumar D. Harale 2 Umesh L.Kulkarni (2011) An Iterative Improved k-means Clustering discuss an iterative approach which is beneficial in reducing the number of iterations from k- mean algorithm, so as to improve the execution time or by reducing the total number of distance calculations [3]. So Iterative improved K-means clustering produces good starting point for the K-means algorithm instead of selecting them randomly. And it will lead to a better cluster at the last result. IV. K-MEANS CLUSTERING ALGORITHM K-means clustering is a well known partitioning method. In this objects are classified as belonging to one of K- groups. The results of Partitioning method is a set of K clusters, each object of data set belonging to one cluster. In each cluster there may be a centroid or a cluster representative. In case where we consider real-valued data, the arithmetic mean of the attribute vectors for all objects within a cluster provides an appropriate representative; alternative types of centroid may be required in other cases. Example: A cluster of documents can be represented by a list of those keywords that occur in some minimum number of documents within a cluster. If the number of the clusters is large, the centroids can be further clustered to produces hierarchy within a dataset. K-means is a data mining algorithm which performs clustering of the data samples. As mentioned previously, clustering means the division of a dataset into a number of groups such that similar items falls or belong to same groups. In order to cluster the database, K-means algorithm uses an iterative approach. The input in this case is the number of desired clusters and the initial means and also produces final means as output. These mentioned initial and final means are the means of clusters. If in the algorithm requirement is to produce K clusters then there will be K initial means and final means also. After termination of this clustering algorithm, each object of dataset becomes a member of one cluster. The cluster is determined by searching throughout the means for the purpose to find the cluster having nearest mean to the object. Cluster with shortest distanced mean is cluster to which examined object belongs. In case of K-means ISSN: algorithm, it tries to group the data items in dataset into desired number of clusters. To perform this task well it makes some iteration until some converges criteria meets. After each iteration, recently calculated means are updated such that they become closer to the final means. And at final, the algorithm converges and then stops performing iterations. Expected convergence of K-means algorithm is illustrated in the figure 2. In this example the algorithm converges in three iterations. The initial means which may be gathered randomly are represented by Blue points. Purple points are for the intermediate means. Finally, red points describe the final means which are also the results of K-means clustering. A. Measurement of Distance Between Objects And Means In order to measure the distance between objects and means different K-means clustering techniques can be used. Most popular distant metric that used is Euclidean Distance. Euclidean distance is represented as the square root of addition of squared differences between corresponding dimensions of object and the mean or cluster centroid. Euclidean distance is the most common distance metric which is most commonly used when dealing with multidimensional data. B. Selection of Initial Means Basically the selection of initial means is up to the developer of clustering system what he/she wants. But this selection of initial means is independent of K-means clustering, because these initial means are inputs of K- means algorithm. In some cases, it is preferred to select initial means randomly from the given dataset while some others prefer to produce initial points randomly. As known that selection of initial means affects both the execution time of the algorithm and also the success of K-means algorithm. Certain strategies are introduced to gather better results that are considering the initial means. a) The simplest form of these strategies is that, in order to execute K-means algorithm with different sets of initial means considered and then select the best results. But this strategy is hardly feasible when dataset is large and especially for serial K-means. b) Another strategy that is used to gather better clustering results is to use refine initial points method. If in case, it is possible to begin K-means with initial means which are closer to final means, then it is strongly possible case that the number of iterations that the clustering algorithm needs to converge will decrease. It also lessens the time required for conversion and also increases the accuracy of final means. So there are different ways that are used for evaluating clustering results. This is upto the developers of clustering systems who need to decide on which criteria to use, in order to select the best results for clustering. 86

3 V. STEPS OF K-MEANS CLUSTERING ALGORITHM K-Means Clustering algorithm is an idea, in which there is need to classify the given data set into K clusters, the value of K (Number of clusters) is defined by the user which is fixed. In this first the centroid of each cluster is selected for clustering and then according to the chosen centriod, the data points having minimum distance from the given cluster, is assigned to that particular cluster. Euclidean Distance is used for calculating the distance of data point from the particular centroid. This algorithm consists of four steps: 1. Initialization In this first step data set, number of clusters and the centroid that we defined for each cluster. 2. Classification The distance is calculated for each data point from the centroid and the data point having minimum distance from the centriod of a cluster is assigned to that particular cluster. 3. Centroid Recalculation Clusters generated previously, the centriod is again repeatly calculated means recalculation of the centriod. 4. Convergence Condition Some convergence conditions are given as below: 4.1 Stopping when reaching a given or defined number of iterations. 4.2 Stopping when there is no exchange of data points between the clusters. 4.3 Stopping when a threshold value is achieved. 5. If all of the above conditions are not satisfied, then go to step 2 and the whole process repeat again, until the given conditions are not satisfied. Main advantages: 1. K-means clustering is very Fast, robust and easily understandable. If the data set is well separated from each other data set, then it gives best results. 2. The clusters do not having overlapping character and are also non-hierarchical in nature. Main disadvantages: 1. In this algorithm, complexity is more as compared to others. 2. Need of predefined cluster centres. 3. Handling any of empty Clusters: One more problem with K-means clustering is that empty clusters are generated during execution, if in case no data points are allocated to a cluster under consideration during the assignment phase. VI. RANKING METHOD With regards to Clustering, ranking operations are a natural way to estimate the likelihood of the occurance of data items or the objects. So we propose evaluating ranking overall design of database for student data in order to form the clusters. So Ranking function introduce new opportunities to optimize the results of K-means clustering algorithm. A. Need of Ranking Method Search of relevant records or similar data search is a most popular function of database to obtain knowledge. There are certain similar records that we want to fall in one category or form one cluster. That`s why, we need to rank the more relevance student marks by a ranking method and to improve search effectiveness. In last, related answers will be returned for a given keyword query by the created index and better ranking strategy. So I have applied this Ranking method with K- means clustering method because this method is also having the property to find relevant records. So it is also helpful in creating clusters that are having similar properties between all data points within that cluster. VII. TOOLS USED FOR K-MEANS CLUSTERING ALGORITHM IMPLEMENTATION The tools that are used for the implementation of this improved k-means clustering algorithm incorporated with threshold value and also for Ranking Method is the Visual Studio 2008 using C#. VIII. RESULTS A. K-Means Clustering Results In this case, clusters are created in K-means clustering algorithm, using the concept of threshold value. Graph that is given below shows the number of clusters that are made on the basis of the threshold value. On the basis of the centroid the clusters are formed. This graph is made on the basis of the values x and y, which values are taken on the both axis of the graph. The Euclidean distance is calculated between both the centroid and the data points. Each cluster is shown with different color in order to distinguish between them. 87

4 As if the number of records are100 that are considered for clustering, then it takes execution time 132ms and so on for all records. So in this way using different number of records, the execution time differentiation is shown. Fig 1 Graph Shows Clusters in K-means Clustering Algorithm B. K-Means Clustering Results using Ranking Method Graph below shows the accuracy and performance of ranking method. In this case, clusters are formed on the basis of rank that is calculated by applying ranking method. The execution time also reduces as compared to K-means clustering algorithm and it is used over large data set. As shown in graph, the clusters are created with accuracy and well differentiated from each- other. Fig 3 Execution time of K-means clustering algorithm. In the table that also shows the number of records and the clustering execution time taken by K-means clustering algorithm is shown. As if the number of records are 50, the the execution time will be 98ms and so on. With the help of this type of tables we can easily calculate the performance. Table 1. Execution time for K-means clustering Records Execution Time for Clustering Method Fig 2 Graph of K-means clustering algorithm by applying Ranking Method C. Execution Time Analysis For K- Means Clustering Algorithm Execution time analysis for K-means clustering algorithm is done on the basis of the number of records that are considered for clustering and how much time is taken by this whole process. D. Execution Time Analysis for using Ranking Method 88

5 In this graph x-axis represents number of records and y- axis represents time that are included during ranking method when applied on clustering approach. Start Students marks is taken as data sets Initial Centroid is chosen Euclidean Distance is calculated for each object from the centroid of the cluster Fig 4. Execution time using ranking method for clustering. The execution time for ranking method is less. So this is an appropriate approach applied for clustering method. As in case of only K-means clustering for 50 records take the execution time that is 98ms, but in this case of Ranking method, for the purpose of executing same number of records, it takes 91ms. And the main table that shows the execution time for the Ranking method for each particular records Table 2 Execution time table for Ranking method Records Execution time for Ranking Method IX. RESEARCH DIAGRAM Threshold value is set for all data set Ranking method is applied on data set Clusters are formed on the basis of minimum distance between the data point and centroid. Finish Fig 5 Research Diagram for K-means clustering algorithm. X. CONCLUSION The proposed work represents ranking based method that improved K-means clustering algorithm performance and accuracy. In this we have also done analysis of K-means clustering algorithm by applying two methods, one is the existing K-means clustering approach which is incorporated with some threshold value and second one is ranking method applied on K-means algorithm and also compared the performance of both the methods by using graphs. The experimental results demonstrated that the 89

6 proposed ranking based K-means algorithm produces better results than that of the existing k-means algorithm. FUTURE WORK In future, in case of clustering the marks of students from different-2 databases are considered by using the concept of Query redirection. By using the Query redirection approach we can easily cluster the large amount of data from distributed environment as from different databases. So if this approach is considered, then the performance of K-means clustering algorithm is improved for large samples of data set that are also distributed in nature. Advance Computing Conference (IACC 2009) Patialae, India, 6-7 March [9] Shi Na & et.al, Research on k-means Clustering Algorithm, IEEE Third International Symposium on Intelligent Information Technology and Security Informatics, [10] Jaehui Park, Sang-goo Lee Probabilistic Ranking for Relational Databases based on Correlations ACM ACKNOWLEDGMENT Foremost, I would like to express my sincere gratitude to Ms. Jaspreet Kaur Sahiwal who gave her heart whelming full support in the completion of this research paper with her stimulating suggestions and encouragement to go ahead in all the time. She has always been a source of inspiration and confidence for me. She has beaconed light to me as a guide at all stages of preparation of my Research work. I express my thanks from the core of my heart to my parents and friends for encouragement, cooperation and also help in challenging circumstances. At last I am very thankful to my GOD who has given me this golden opportunity to do M.Tech as well as to do research work. REFERENCES [1] K. A. Abdul Nazeer & M. P. Sebastian Improving the Accuracy and Efficiency of the K-Means Clustering Algorithm.Proceedings of the World Congress on Engineering 2009 Vol I WCE 2009, London, U.K, July 1-3, [2] D. Napoleon & P. Ganga lakshmi, An Efficient K- Means Clustering Algorithm for Reducing Time Complexity using Uniform Distribution Data Points, IEEE, [3] Madhuri A. Dalal & Nareshkumar D. Harale An Iterative Improved k-means Clustering Proc. of Int. Conf. on Advances in Computer Engineering, [4] Paul S. Bradley & Usama M. Fayyad, Refining Initial Points for K-Means Clustering, 15th International Conference on Machine Learning, ICML98. [5] Osama Abu Abbas Comparison of various clustering algorithms The International Arab Journal of Information Technology, Vol. 5, No. 3, July [6] Jirong Gu & et.al, An Enhancement of K-means Clustering Algorithm, IEEE International Conference on Business Intelligence and Financial Engineering, [7] Dost Muhammad Khan & Nawaz Mohamudally A Multiagent System (MAS) for the Generation of Initial Centroids for k-means clustering Data Mining Algorithm Based on Actual Sample datapoints, IEEE, [8] Malay K. Pakhira, Clustering Large Databases in Distributed Environment, IEEE 2009 WEE International AUTHORS PROFILE 90

7 1)Navjot Kaur: Received her Bachelor degree in Information Technology and and currently doing M.Tech from Lovely Professional University, Jalandhar in department of Computer science and Information Technology. She has published her paper in ICTRCTA 2012 titled Comparison of various Clustering Algorithms. Currently doing research on K- Means Clustering algorithm enhancement using Ranking Method in Data Mining. And research interests are in Database and Data Mining. 3) Navneet Kaur: Received her Bachelor Degree and currently doing her Masters (M.Tech) in Information Technology from Lovely Professional University. Currently doing research on Keyword Search in Distributed Database environment and also published one International publication ICTRCTA 2012 titled Review study of Keyword Search over Relational Database using the Ranking method. Her research interests are in Database. 2) Jaspreet Kaur Sahiwal: Received her Bachelor Degree from Guru Nanak Dev University, Amritsar. She is an Assistant Professor at Lovely Professional University at department of Computer Science and Technology. She has completed her masters in Database. Her research interests are in Data Warehouse and Data Mining. 91

### Fig. 1 A typical Knowledge Discovery process [2]

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

### Clustering Hierarchical clustering and k-mean clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: A quick review partition genes into distinct sets with high homogeneity

### PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,

### Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points

Journal of Computer Science 6 (3): 363-368, 2010 ISSN 1549-3636 2010 Science Publications Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### An Enhanced Clustering Algorithm to Analyze Spatial Data

International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-2, Issue-7, July 2014 An Enhanced Clustering Algorithm to Analyze Spatial Data Dr. Mahesh Kumar, Mr. Sachin Yadav

### An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

### Enhanced Boosted Trees Technique for Customer Churn Prediction Model

IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

### Distance based clustering

// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means

### ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

### K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### Clustering & Association

Clustering - Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects

### ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

### Role of Neural network in data mining

Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)

### Clustering and Data Mining in R

Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### 10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

### Colour Image Segmentation Technique for Screen Printing

60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone

### Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

### Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

### Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

### Grid Density Clustering Algorithm

Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2

### Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand

### Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

### Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

### A Study of Various Methods to find K for K-Means Clustering

International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 A Study of Various Methods to find K for K-Means Clustering Hitesh Chandra Mahawari

### Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

### Overview. Clustering. Clustering vs. Classification. Supervised vs. Unsupervised Learning. Connectionist and Statistical Language Processing

Overview Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes clustering vs. classification supervised vs. unsupervised

### A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

### A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

### MapReduce Design of K-Means Clustering Algorithm

MapReduce Design of K-Means Clustering Algorithm Prajesh P Anchalia Department of CSE, Email: prajeshanchalia@yahoo.in Anjan K Koundinya Department of CSE, Email: annjank@gmail.com Srinath N K Department

### Clustering in Machine Learning. By: Ibrar Hussain Student ID:

Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning K-means Clustering Example of k-means

### Inner Classification of Clusters for Online News

Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant

### ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS 1 PANKAJ SAXENA & 2 SUSHMA LEHRI 1 Deptt. Of Computer Applications, RBS Management Techanical Campus, Agra 2 Institute of

### International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

### Lecture 20: Clustering

Lecture 20: Clustering Wrap-up of neural nets (from last lecture Introduction to unsupervised learning K-means clustering COMP-424, Lecture 20 - April 3, 2013 1 Unsupervised learning In supervised learning,

### A Novel Density based improved k-means Clustering Algorithm Dbkmeans

A Novel Density based improved k-means Clustering Algorithm Dbkmeans K. Mumtaz 1 and Dr. K. Duraiswamy 2, 1 Vivekanandha Institute of Information and Management Studies, Tiruchengode, India 2 KS Rangasamy

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

### HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM

HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM Ms.Barkha Malay Joshi M.E. Computer Science and Engineering, Parul Institute Of Engineering & Technology, Waghodia. India Email:

### Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation

### CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment

### Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig

### Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

### Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data

### Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

### CLUSTER ANALYSIS FOR SEGMENTATION

CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every

### Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

### A Survey of Kernel Clustering Methods

A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering

### STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

### Introduction to Clustering

Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, 2010 1 / 36 Microarray Example N=65 P=1756 Yumi

### Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

### Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

### SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

### Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

### Cluster analysis with SPSS: K-Means Cluster Analysis

analysis with SPSS: K-Means Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,

### Web Usage Mining: Identification of Trends Followed by the user through Neural Network

International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 617-624 International Research Publications House http://www. irphouse.com /ijict.htm Web

### Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

### The Effect of Clustering in the Apriori Data Mining Algorithm: A Case Study

WCE 23, July 3-5, 23, London, U.K. The Effect of Clustering in the Apriori Data Mining Algorithm: A Case Study Nergis Yılmaz and Gülfem Işıklar Alptekin Abstract Many organizations collect and store data

### A comparison of various clustering methods and algorithms in data mining

Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering

### EXTENDED ANGEL: KNOWLEDGE-BASED APPROACH FOR LOC AND EFFORT ESTIMATION FOR MULTIMEDIA PROJECTS IN MEDICAL DOMAIN

EXTENDED ANGEL: KNOWLEDGE-BASED APPROACH FOR LOC AND EFFORT ESTIMATION FOR MULTIMEDIA PROJECTS IN MEDICAL DOMAIN Sridhar S Associate Professor, Department of Information Science and Technology, Anna University,

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### DEA implementation and clustering analysis using the K-Means algorithm

Data Mining VI 321 DEA implementation and clustering analysis using the K-Means algorithm C. A. A. Lemos, M. P. E. Lins & N. F. F. Ebecken COPPE/Universidade Federal do Rio de Janeiro, Brazil Abstract

### Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process

Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek Abstract Depending

### Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

### International Journal of Software and Web Sciences (IJSWS) Web Log Mining Based on Improved FCM Algorithm using Multiobjective

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

### Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction

### Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

### uclust - A NEW ALGORITHM FOR CLUSTERING UNSTRUCTURED DATA

uclust - A NEW ALGORITHM FOR CLUSTERING UNSTRUCTURED DATA D. Venkatavara Prasad, Sathya Madhusudanan and Suresh Jaganathan Department of Computer Science and Engineering, SSN College of Engineering, Chennai,

### Clustering UE 141 Spring 2013

Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

### Clustering Genetic Algorithm

Clustering Genetic Algorithm Petra Kudová Department of Theoretical Computer Science Institute of Computer Science Academy of Sciences of the Czech Republic ETID 2007 Outline Introduction Clustering Genetic

### Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain Situations

Vol.2, Issue.6, Nov-Dec. 2012 pp-4094-4101 ISSN: 2249-6645 Enhanced K-Mean Algorithm to Improve Decision Support System Under Uncertain Situations Ahmed Bahgat El Seddawy 1, Prof. Dr. Turky Sultan 2, Dr.

### A Review of Anomaly Detection Techniques in Network Intrusion Detection System

A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In

### Data mining knowledge representation

Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

### Visualization of Breast Cancer Data by SOM Component Planes

International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

### A survey on Data Mining based Intrusion Detection Systems

International Journal of Computer Networks and Communications Security VOL. 2, NO. 12, DECEMBER 2014, 485 490 Available online at: www.ijcncs.org ISSN 2308-9830 A survey on Data Mining based Intrusion

### Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application

Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application Marta Marrón Romera, Miguel Angel Sotelo Vázquez, and Juan Carlos García García Electronics Department, University

### INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

### Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

### A NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEM

A NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEM PPrabhu 1 and NAnbazhagan 2 1 Directorate of Distance Education, Alagappa University, Karaikudi, Tamilnadu, INDIA 2 Department of Mathematics,

### IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES C.Priyanka 1, T.Giri Babu 2 1 M.Tech Student, Dept of CSE, Malla Reddy Engineering

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

### ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

### Distributed Framework for Data Mining As a Service on Private Cloud

RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

### Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

### IMPLEMENTATION OF DATA MINING IN ONLINE SHOPPING SYSTEM USING TANAGRA TOOL

International Journal of Computer Science and Engineering (IJCSE) ISSN 2278-9960 Vol. 2, Issue 1, Feb 2013, 47-58 IASET IMPLEMENTATION OF DATA MINING IN ONLINE SHOPPING SYSTEM USING TANAGRA TOOL VISHAL

### A QoS-Aware Web Service Selection Based on Clustering

International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,

### A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

### Study and Analysis of Data Mining Concepts

Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College

### Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch

Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global

### K-Means Clustering Tutorial

K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July