Grid Density Clustering Algorithm



Similar documents
Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Data Mining Solutions for the Business Environment

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Clustering Model for Evaluating SaaS on the Cloud

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

Introduction to Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

The Scientific Data Mining Process

Data Mining & Data Stream Mining Open Source Tools

An Overview of Database management System, Data warehousing and Data Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Information Management course

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Unsupervised Data Mining (Clustering)

SPATIAL DATA CLASSIFICATION AND DATA MINING

Quality Assessment in Spatial Clustering of Data Mining

A Survey on Intrusion Detection System with Data Mining Techniques

Cluster Analysis: Advanced Concepts

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

Healthcare Measurement Analysis Using Data mining Techniques

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

Data Mining: Overview. What is Data Mining?

Specific Usage of Visual Data Analysis Techniques

Foundations of Artificial Intelligence. Introduction to Data Mining

The Prophecy-Prototype of Prediction modeling tool

not possible or was possible at a high cost for collecting the data.

Data Mining for Fun and Profit

Chapter 7. Cluster Analysis

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Dynamic Data in terms of Data Mining Streams

Data Mining + Business Intelligence. Integration, Design and Implementation

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

A survey on Data Mining based Intrusion Detection Systems

[Kaur, 3(3): March, 2014] ISSN: Impact Factor: 1.852

A Comparative Study of clustering algorithms Using weka tools

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

A Comparative Analysis of Various Clustering Techniques used for Very Large Datasets

Research-based Learning (RbL) in Computing Courses for Senior Engineering Students

MS1b Statistical Data Mining

IJCSES Vol.7 No.4 October 2013 pp Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

REVIEW OF ENSEMBLE CLASSIFICATION

Social Media Mining. Data Mining Essentials

Classification algorithm in Data mining: An Overview

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Image Compression through DCT and Huffman Coding Technique

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Using multiple models: Bagging, Boosting, Ensembles, Forests

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

EXTENDED CENTROID BASED CLUSTERING TECHNIQUE FOR ONLINE SHOPPING FRAUD DETECTION

Hadoop Operations Management for Big Data Clusters in Telecommunication Industry

Credit Card Fraud Detection Using Self Organised Map

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Data Mining K-Clustering Problem

The Data Mining Process

A comparison of various clustering methods and algorithms in data mining

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Data Mining Applications in Higher Education

2.1. Data Mining for Biomedical and DNA data analysis

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS

Data Mining Algorithms Part 1. Dejan Sarka

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Subject Description Form

International Journal of Advanced Computer Technology (IJACT) ISSN: PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Clustering Data Streams

IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Customer Classification And Prediction Based On Data Mining Technique

Hexaware E-book on Predictive Analytics

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

International Journal of Innovative Research in Computer and Communication Engineering

Role of Social Networking in Marketing using Data Mining

A comparative study of data mining (DM) and massive data mining (MDM)

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Transcription:

Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2 Abstract: Data mining is the method of finding the useful information in huge data repositories. Clustering is the significant task of the data mining. It is an unsupervised learning task. Similar data items are grouped together to form clusters. These days the clustering plays a major role in every day-to-day application. In this paper, the field of KDD i.e. Knowledge Discovery in Databases, Data mining, clustering analysis and the prevailing the Grid Density Clustering Algorithm are described. Keywords: Data mining, KDD, Clustering, Cluster Analysis, Grid Density Clustering Algorithm. I. INTRODUCTION KDD stands on the Knowledge Discovery in Databases is the process of finding associations and patterns in raw data automatically from large databases and gives the output results. In particular, the KDD process consists of the following steps: 1. Selection and Integration: The data that is to be mined may exist in dissimilar and mixed data form. Therefore, It is necessary to first selecting the appropriate data from a variety of databases for analysis and integrate that data values into a logical data store. 2. Pre-processing: Raw data may have invalid or missing values. Invalid values are corrected while misplaced values are supplied or predicted. 3. Transformation: The data are transformed into representations that are suitable for mining tasks. 4. Data Mining: Data Mining is the core part of the KDD process referring to the purpose of intelligent techniques to take out the secreted knowledge from the transformed data. Figure1: Steps of the KDD process. 5. Interpretation and Evaluation: A data mining system has the ability to produce a great number of patterns but only a small fraction of them may be of attention. Therefore the suitable methods are required to estimate the valuable results. Copyright to IJIRSET www.ijirset.com 5548

Data Mining Data mining is the method of without human intervention for finding the useful information in huge data repositories. Data mining involves six common tasks: Anomaly Detection, Association rules learning, Classification, Clustering and Regression. In this paper we only discussed the clustering. Clustering is the most significant unsupervised-learning problem. The major function of clustering is to finding a structure of similar data items. Totally, the clustering involves partitioning a given dataset into a number of groups of data whose members are similar in some way. The usability of cluster analysis has been used broadly in data recovery, pattern recognition, text and web mining, software reverse engineering and image segmentation. Figure 2: The four core of Data mining task Specific uses of data mining include: Market segmentation - Identify the universal characteristics of customers who buy the similar products from your company. Customer churn - calculate which customers are likely to go away from your company and buy the products from your competitor. Fraud detection - Identify which transactions are most likely to be fraudulent. Direct marketing - Identify which prediction should be included in a mailing list to gain the maximum response rate. Interactive marketing - Predict what each individual accessing a Web site is most likely interested in seeing. Market basket analysis - Understand and identify the products or services that are usually purchased together; e.g., soap and oil. Trend analysis - Reveal the difference between typical customers this month and last. Dividing the data objects in significant groups of data objects or classes known as cluster are based on familiar attribute and play an important role in how people analyse and explain the world. For an example, children can speedily label the object in a photograph, such as buildings, trees, people and so on. In the field of understanding data, clusters are potential classes, and cluster analysis is a studying method to discover classes. Cluster Analysis Cluster Analysis method as fields grow extreme rapidly with the objective of together data objects, based on information found in data and describing the associations inside the data. Cluster analysis is a descriptive data analysis task that aims to find the intrinsic structure of a collection of data points (or objects) by partitioning them into identical Copyright to IJIRSET www.ijirset.com 5549

clusters based on the values of their attributes. A similarity metric is defined between the data objects, and then similar objects are grouped together to form clusters. Clustering is unsupervised learning since it does not require assumptions about category labels that tag objects with prior identifiers. Since there is no universal definition of clustering, there is no universal measure with which to evaluate clustering algorithms. Figure 3: Different types of clusters II. GRID DENSITY BASED ALGORITHMS Grid Density based clustering is concerned with the value space that surrounds the data points not with the data objects. This algorithm uses the grid data structure and use dense grids to form clusters. It first quantized the original data space into finite number of cells which form the grid structure and then perform all the operations on the quantized space. Grid based clustering maps the infinite amount of data records in data streams to finite numbers of grids. Its main uniqueness is the fastest processing time, since like data points will fall into similar cell and will be treated as a single point. It makes the algorithm self-governing of the number of data points in the original data set. The grid-based clustering algorithms are STING, Wave Cluster, and GDCLU. Grid density takes the advantage of the density and the grid algorithms. Grid density is suitable for handling noise. It can find the arbitrary shaped clusters used for high dimensional data. The grid density algorithm does not require the distance computation. K-mean knows the number of clusters in advance but the grid density does not. Grid density algorithm is better than the k-mean algorithm in clustering. The advantage of grid density method is lower processing time. Therefore, we implement the grid density clustering algorithm for analyse and increase the speed, and accuracy of the dataset. We implement this in the MATLAB environment. The name MATLAB stands for matrix laboratory. MATLAB is a numerical computing environment and fourth-generation programming language. MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. The steps of the grid based algorithm are: 1. Creating the grid structure, in other words divide the data space into a finite number of cells. 2. Calculating the cell density. 3. Sorting of the cells according to their densities. 4. Identifying cluster. Copyright to IJIRSET www.ijirset.com 5550

Figure4: Grid based Clustering GDCLU GDCLU generates major clusters by merging dense grids. Grid density is defined as number of points mapped to one grid. A grid is called sporadic when its density is less than the input argument. Two dense grids should be special neighbours in order to be merged; by merging these grids all their special neighbours also will be joined to the similar cluster. Based on the input parameter density, the algorithm is processed. The different types of the dataset are taken and their performance is analysed III. RESULTS The results obtained from grid density clustering algorithm on different types of dataset based on number of numeric data values are shown in figure 5, 6, 7, 8. As the data set values are increases the computation time also increases. When the number of clusters is more than the execution time is also more. The analytical results of the entire algorithms under the following parameters are implemented for given clustering algorithms and results are shown: Elapsed Time: It is defined as the completion time of the algorithm. Lesser the elapsed time, less time to execute the algorithm more efficient is the algorithm. Elapsed time is calculated by measuring the finishing time of the exit task by the algorithm. Clusters: Clusters defined as the similar data items in one group. More clusters means the more clearly representation of data values and easily findable. When more clusters are formed then elapsed time increases. Density: Grid density is defined as number of points mapped to one grid. A grid is called sporadic when its density is less than the input argument Copyright to IJIRSET www.ijirset.com 5551

Figure5: Computation time and analysis of grid for 4clusters Figure6: Computation time analysis of grid for 9clusters Copyright to IJIRSET www.ijirset.com 5552

Figure7: Computation time and analysis of grid for 28clusters 35 30 25 20 15 cluster time 10 5 0 1 2 3 Figure8: This figure shows that as the accuracy of the data values are increased, and then the execution time is also increased. Copyright to IJIRSET www.ijirset.com 5553

IV. CONCLUSION AND FUTURE SCOPE The clustering is the part of data mining. Clustering is used everywhere in our daily life. Traditional way of clustering is the partitional algorithm that is the k-mean. Although this algorithm is a number of limitations but this is being widely used in every part of the clustering and that clustering is the soft clustering. Grid density takes the advantage of the density and the grid algorithms. Grid density is suitable for handling noise. It can find the arbitrary shaped clusters. Grid density algorithm is better than the k-mean algorithm in clustering. The advantage of grid density method is lower processing time. The input parameters of any algorithms to get the best results are very carefully taken. This algorithm is implemented on the numeric dataset. The execution time depends upon the how large the size of the dataset. The future scope of this work is that we can also implement this hybrid approach clustering algorithm in the alphanumeric data and see the impact of the results and by varying the input parameters. As these results are concluded by using dense grids, the new approach can be developed by using the C-mean with Grid Clustering algorithm. Acknowledgement I would like to thanks the Department of Computer Science & Engineering of RIMT Institutes near Floating Restaurant, Sirhind Side, Mandi Gobindgarh-147301, and Punjab, India. References 1. Wei-keng Liao, et.all. A Grid-based Clustering Algorithm using Adaptive Mesh Refinement, Appears in the 7th Workshop on Mining Scientific and Engineering Datasets, pp.1-9, 2004. 2. Oded Maimon, Lior Rokach, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, Springer Science+Business Media.Inc, pp.321-352, 2005. 3. GuiBin Hou, RuiXia Yao, et.all., Irregular Grid-based Clustering Over High-Dimensional Data Streams, IEEE 1 st International Conference on Pervasive Computing, Signal Processing and Applications, pp.783-786, 2010. 4. MR ILANGO, Dr V MOHAN, A Survey of Grid Based Clustering Algorithms, International Journal of Engineering Science and Technology, pp 3441-3446. 2010. 5. Zheng Hua, Wang Zhenxing, et.all, Clustering Algorithm Based on Characteristics of Density Distribution, IEEE 2 nd International Conference On Advanced Computer Control, vol.2, pp.431-435, 2010. 6. Amineh Amini, Teh Ying Wah, et.all, A Study of Density-Grid based Clustering Algorithms on Data Streams,IEEE 8 th International Conference on Fuzzy Systems and Knowledge Discovery, vol.3, pp.1652-1656, 2011. 7. Guohua Lei, Xiang Yu, et.all, An Incremental Clustering Algorithm Based on Grid,IEEE 8 th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp.1099-1103, 2011. 8. Amineh Amini1, Teh Ying Wah, DENGRIS-Stream: A Density-Grid based Clustering Algorithm for Evolving Data Streams over Sliding Window, International Conference on Data Mining and Computer Engineering, pp-206-211, 2012. 9. Cheng-Fa Tsai, Tang-Wei Huang, QIDBSCAN: A Quick Density-Based Clustering Technique, IEEE International Symposium on Computer, Consumer and Control, pp.638-641, 2012 Copyright to IJIRSET www.ijirset.com 5554