A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining

Size: px
Start display at page:

Download "A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining"

Transcription

1 A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining Binu Thomas and Rau G 2, Research Scholar, Mahatma Gandhi University,Kerala, India. binumarian@rediffmail.com 2 SCMS School of Technology & Management, Cochin, Kerala, India. kurupgrau@rediffmail.com Abstract In data mining, the conventional clustering algorithms have difficulties in handling the challenges posed by the collection of natural data which is often vague and uncertain. Fuzzy clustering methods have the potential to manage such situations efficiently. This paper introduces the limitations of conventional clustering methods through k-means and fuzzy c-means clustering and demonstrates the drawbacks of the algorithms in handling outlier points. In this paper, we propose a new fuzzy clustering method which is more efficient in handling outlier points than conventional fuzzy c-means algorithm. The new method excludes outlier points by giving them extremely small membership values in existing clusters while fuzzy c-means algorithm tends give them outsized membership values. The new algorithm also incorporates the positive aspects of k-means algorithm in calculating the new cluster centers in a more efficient approach than the c-means method. Index Terms fuzzy clustering, outlier points, knowledge discovery, c-means algorithm I. INTRODUCTION The process of finding useful patterns and information from raw data is often known as Knowledge discovery in databases or KDD. Data mining is a particular step in this process involving the application of specific algorithms for extracting patterns (models from data [5]. Cluster analysis is a technique for breaking data down into related components in such a way that patterns and order becomes visible. It aims at sifting through large volumes of data in order to reveal useful information in the form of new relationships, patterns, or clusters, for decisionmaking by a user. Clusters are natural groupings of data items based on similarity metrics or probability density models. Clustering algorithms maps a new data item into one of several known clusters. In fact cluster analysis has the virtue of strengthening the exposure of patterns and behavior as more and more data becomes available [7]. A cluster has a center of gravity which is basically the weighted average of the cluster. Membership of a data item in a cluster can be determined by measuring the distance from each cluster center to the data point [6]. The data item is added to a cluster for which this distance is a minimum. This paper provides an overview of the crisp clustering technique, advantages and limitations of fuzzy c-means clustering and a new fuzzy clustering method which is simple and superior to c-means clustering in handling outlier points. Section 2 describes the basic notions of 6 clustering and also introduces k-means clustering algorithm. In Section 3 we explain the concept of vagueness and uncertainty in natural data. Section 4 introduces the fuzzy clustering method and describes how it can handle vagueness and uncertainty with the concept of overlapping clusters with partial membership functions. The same section also introduces the most common fuzzy clustering algorithm namely c-means algorithm and it ends with the limitations of it. Section 5 proposes the new fuzzy clustering method. Section 6 demonstrates the concepts presented in the paper. Finally, section 7 concludes the paper. II. CRISP CLUSTERING TECHNIQUES Traditional clustering techniques attempt to segment data by grouping related attributes in uniquely defined clusters. Each data point in the sample space is assigned to only one cluster. K-means algorithm and its different variations are the most well-known and commonly used partitioning methods. The value k stands for the number of cluster seeds initially provided for the algorithm. This algorithm takes the input parameter k and partitions a set of m obects into k clusters [7]. The technique work by computing the distance between a data point and the cluster center to add an item into one of the clusters so that intra-cluster similarity is high but inter-cluster similarity is low. A common method to find the distance is to calculate to sum of the squared difference as follows and it is known as the Euclidian distance [0](exp.. where, d k = 2 k d k X Ci ( n : is the distance of the k th data point from C With the definition of the distance of a data point from the cluster centers, the k-means the algorithm is fairly simple. The cluster centers are randomly initialized and we assign a data point x i into a cluster to which it has minimum distance. When all the data points have been assigned to clusters, new cluster centers are calculated by finding the weighted average of all data points in a cluster. The cluster center calculation causes the previous centroid location to move towards the center of the cluster set. This is continued until there is no change in cluster centers.

2 A. Limitations of k-means algorithm The main limitation of the algorithm comes from its crisp nature in assigning cluster membership to data points. Depending on the minimum distance, a data point always becomes a member of one of the clusters. This works well with highly structured data. The real world data is almost never arranged in clear cut groups. Instead, clusters have ill defined boundaries that smear into the data space often overlapping the perimeters of surrounding clusters [4]. In most of the cases the real world data have apparent extraneous data points clearly not belonging to any of the clusters and they are called outlier points. The k-means algorithm is not capable of dealing with overlapping clusters and outlier points since it has to include a data point into one of the existing clusters. Because of this even extreme outlier points will be included in to a cluster based on the minimum distance. III. FUZZY LOGIC The modeling of imprecise and qualitative knowledge, as well as handling of uncertainty at various stages is possible through the use of fuzzy sets. Fuzzy logic is capable of supporting, to a reasonable extent, human type reasoning in natural form by allowing partial membership for data items in fuzzy subsets [2]. Integration of fuzzy logic with data mining techniques has become one of the key constituents of soft computing in handling the challenges posed by the massive collection of natural data [].Fuzzy logic is logic of fuzzy sets. A Fuzzy set has, potentially, an infinite range of truth values between one and zero[3]. Propositions in fuzzy logic have a degree of truth, and membership in fuzzy sets can be fully inclusive, fully exclusive, or some degree in between[3].the fuzzy set is distinct from a crisp set is that it allows the elements to have a degree of membership. The core of a fuzzy set is its membership function: a function which defines the relationship between a value in the sets domain and its degree of membership in the fuzzy set(exp 2. The relationship is functional because it returns a single degree of membership for any value in the domain[]. µ=f(s,x (2 Here, µ : is he fuzzy membership value for the element s : is the fuzzy set x : is the value from the underlying domain. data points are assigned membership values for each of the clusters. The fuzzy clustering algorithms allow the clusters to grow into their natural shapes [5]. In some cases the membership value may be zero indicating that the data point is not a member of the cluster under consideration. Many crisp clustering techniques have difficulties in handling extreme outliers but fuzzy clustering algorithms tend to give them very small membership degree in surrounding clusters [4]. The non-zero membership values, with a maximum of one, show the degree to which the data point represents a cluster. Thus fuzzy clustering provides a flexible and robust method for handling natural data with vagueness and uncertainty. In fuzzy clustering, each data point will have an associated degree of membership for each cluster. The membership value is in the range zero to one and indicates the strength of its association in that cluster. A. C-means fuzzy clustering algorithm[0] Fuzzy c-means clustering involves two processes: the calculation of cluster centers and the assignment of points to these centers using a form of Euclidian distance. This process is repeated until the cluster centers stabilize. The algorithm is similar to k-means clustering in many ways but it assigns a membership value to the data items for the clusters within a range of 0 to. So it incorporates fuzzy set s concepts of partial membership and forms overlapping clusters to support it. The algorithm needs a fuzzification parameter m in the range [,n] which determines the degree of fuzziness in the clusters. When m reaches the value of the algorithm works like a crisp partitioning algorithm and for larger values of m the overlapping of clusters is tend to be more. The algorithm calculates the membership value µ with the formula, m d i (3 µ = p m k = d ki where µ (x i : is the membership of x i in the th cluster d i : is the distance of x i in cluster c m : is the fuzzification parameter p : is the number of specified clusters d ki : is the distance of xi in cluster C k Fuzzy sets provide a means of defining a series of overlapping concepts for a model variable since it represent degrees of membership. The values from the complete universe of discourse for a variable can have memberships in more than one fuzzy set. IV. FUZZY CLUSTERING METHODS The central idea in fuzzy clustering is the non-unique partitioning of the data in a collection of clusters. The 62 The new cluster centers are calculated with these membership values using the exp. 4. c = where C x i m [ µ ] i [ µ ] i x i m (4 : is the center of the th cluster : is the i th data point

3 µ : the function which returns the membership m : is the fuzzification parameter This is a special form of weighted average. We modify the degree of fuzziness in x i s current membership and multiply this by x i. The product obtained is divided by the sum of the fuzzified membership. The first loop of the algorithm calculates membership values for the data points in clusters and the second loop recalculates the cluster centers using these membership values. When the cluster center stabilizes (when there is no change the algorithm ends. B. Limitations of the algorithm The fuzzy c-means approach to clustering suffers from several constrains that affect the performance [0]. The main drawback is from the restriction that the sum of membership values of a data point x i in all the clusters must be one as in Expression (5, and this tends to give high membership values for the outlier points. So the algorithm has difficulty in handling outlier points. Secondly the membership of a data point in a cluster depends directly on the membership values of other cluster centers and this sometimes happens to produce undesirable results. p = = µ (5 In fuzzy c-means method a point will have partial membership in all the clusters. The exp.(4 for calculating the new cluster centers finds a special form of weighted average of all the data points. The third limitation of the algorithm is that due to the influence (partial membership of all the data members, the cluster centers tend to move towards the center of all the data points [0].The fourth constrain of the algorithm is its inability to calculate the membership value if the distance of a data point is zero(exp.3 TABLE. FUZZY C-MEANS ALGORITHM initialize p=number of clusters initialize m=fuzzification parameter initialize C (cluster centers Repeat For i= to n :Update µ (x i applying (3 For = to p :Update C i with(4with current µ (x i Until C estimate stabilize V. THE NEW FUZZY CLUSTERING METHOD The new fuzzy clustering method we propose, removes the restriction imposed by exp (4. Due to this constrain the c-means algorithm tends to give more membership values for the outlier points. In c-means algorithm, the membership of a point a cluster is calculated based on its membership in other clusters. Many limitations of the algorithm arise due to this and in the new method the membership of a point in a cluster center depends only on its distance in that cluster. For calculating the membership values, we use a new simple expression as given in exp (6. ( x = Max ( d Max d ( d i µ i (6 Where µ (x i : is the membership of x i in the th cluster d i : is the distance of x i in cluster c Max(d : is the maximum distance in the cluster c Since Max ( d =, the above membership function Max ( d (exp.6 will generate values closer to one( for smaller distances (d i and a membership value of zero for the maximum distance. If the distance of a data point is zero then the function returns a membership value of one and thus it overcomes the fourth constrain of c-means algorithm. The membership values are calculated only based on the distance of a data member in the cluster and due to this the method does not suffer from the first and second constrains of c-means algorithm. To overcome the third limitation of c-means algorithm in calculating new cluster centers, the new method inherits the features of k- means algorithm. For this purpose, A data point is completely assigned to a cluster where it has got maximum membership and the point is used only for the calculation of new cluster center in that cluster. This way the influence of a data point in the calculation of all the cluster centers can be avoided. The point is used of the calculations only if its membership value falls above a threshold value α. This way we can ensure that outlier points are not considered for new cluster center calculations. Unlike the crisp clustering algorithms like k-means, most of the fuzzy clustering algorithms are sensitive to the selection of initial centroids. The effectiveness of the algorithms depends on the initial cluster seeds [7]. For better results we suggest to use k-means algorithm to get the initial cluster seeds. This way we can ensure that the algorithm converges with best possible results. In the new method, since the membership values vary strictly from 0 to, we find that initializing the threshold value of α to 0.5 produces better results in cluster center calculation. 63

4 TABLE 2. THE NEW FUZZY CLUSTERING ALGORITHM initialize p=number of clusters initialize C (cluster centers initialize α (threshold value Repeat For i= to n :Update µ (x i applying (6 For k= to p : Sum=0 Count=0 For i= to n : If µ(x i is maximum in C k then If µ(x i >=α Sum=Sum+x i count=count + C k =Sum/count Until C estimate stabilize The fuzzy membership values found with the expression 6. can be used in the new fuzzy clustering algorithm given in table2. The algorithm stops when the cluster centers stabilize. The algorithm will be more efficient in handling data with outlier points and in overcoming other constrains imposed by c-means algorithm. The new method we propose is also far superior in the calculation of new cluster centers, which we demonstrate in the next session with two numerical examples. VI. ILLUSTRATIONS In order to find the effectiveness of the new algorithm, we applied it with a small synthetic data set to demonstrate the functioning of the algorithm in detail. The algorithm is also tested with real data collected for Bhutan s Gross National Happiness (GNH program. A. With synthetic data Consider the sales of an item from different shops at different periods of the year, from figure. There is an outlier point at (2,400. If we start data exploration with two random cluster centers at C(.5,50 and C2(8,50, the algorithm ends with the cluster centers at C(.52,66.20 and C2(5.49,72.. We also applied c- means algorithm with the same data set and we found that the algorithm converges to cluster centers C(2.79,47.65 and C2(3.29, Figure. The data points and final overlapping clusters We also found that the algorithm is far superior to c- means in handling outlier points. (See table 3. The new algorithm is capable of giving very low membership values to the outlier points. If we start with C at (.5,50, the algorithm takes four steps to converge the first cluster center to(.52, Similarly the second cluster center C2 converges to (5.49,72. from (8,50 in four steps. The algorithm finally ends with the final cluster centers at C(.5,66.2 and C2 (5.4,72. and with two overlapping clusters as shown in figure. The outlier point is given zero membership in both the clusters. From Table 3, it is clear that the new fuzzy clustering method we propose is better than the conventional c- means algorithm in handling outlier points and in the calculation of new cluster centers. Due the constrain given in expression 5, the c-means algorithm gives membership values.37 and.63 to the outlier point (2,400 so that the sum of memberships of this point is one. But the new algorithm gives zero membership to this point. 6.2 With Natural Data Happiness and satisfaction are directly related with a community s ability to meet their basic needs and these are important factors in safeguarding their physical health. The unique concept of Bhutan s Gross National Happiness (GNH depends on nine factors like health, ecosystem, emotional well being etc.[6]. GNH regional chapter at Sherubtse College conducted a survey among 3 villagers and the responses were converted into numeric values. For the analysis of the new method we took the attributes income and health index as shown in figure 2. TABLE 3. C-MEANS AND NEW METHOD Methods Final Cluster Centers Outlier membership Point - 2,400 C-means C 2.79, C2 3.29, New Method C.52, C2 5.49,72. 0 Figure 2. The data points 64

5 Cent ers TABLE 4. THE FINAL CLUSTER CENTERS Final Cluster Centers C-means New Method X Y X Y C C C As we can see from the data set, in Bhutan the low income group maintains better health than high income group since they are self sufficient in many ways. Like any other natural data this data set also contains many outlier points which do not belong to any of the groups. If we apply c-means algorithm these points tend to get more membership values due to exp.4. To start the data analysis, first we applied k-means algorithm to find the initial three cluster centers. The algorithm ended with three cluster centers at C(24243,6.7,C2(69794,5. and C3(979.29,2.72. We applied these initial values in both c-means algorithm and the new method to analyze the data and the algorithms ended with centroids as given in Table 4. From figure 2, and table 4, It can be seen that the final centroids of c-means method does not represent the actual centers of the clusters and this is due to the influence of outlier points. The memberships of all the points are also considered for the calculation of cluster centers. So the cluster centers tend to move towards the centre of all the points. But the new method identifies the cluster centers in a better way. The efficiency of the new algorithm lies in its ability to provide very low membership values to outlier points and also to consider a point only for the calculation of one cluster centre. VII. CONCLUSION A good clustering algorithm produces high quality clusters to yield low inter cluster similarity and high intra cluster similarity. Many conventional clustering algorithms like k-means and fuzzy c-means algorithm achieve this on crisp and highly structured data. But they have difficulties in handling unstructured natural data which often contain outlier data points. The proposed new fuzzy clustering algorithm combines the positive aspects of both crisp and fuzzy clustering algorithms. It is more efficient in handling the natural data with outlier points than both k-means and fuzzy c-means algorithm. It achieves this by assigning very low membership values to the outlier points. But The Algorithm has limitations in exploring highly structured crisp data which is free from outlier points. The efficiency of the algorithm has to be further tested on a comparatively larger data set. REFERENCES [] Sankar K. Pal, P. Mitra, Data Mining in Soft Computing Framework: A Survey, IEEE transactions on neural networks, vol. 3, no., January 2002 [2] R. Cruse, C. Borgelt, Fuzzy Data Analysis Challenges and Perspective kruse99fuzzy.html [3] G. Rau, Th. Shanta Kumar, Binu Thomas, Integration of Fuzzy Logic in Data Mining: A comparativecase Study, Proc. of International Conf. on Mathematicsand Computer Science, Loyola College, Chennai, 28-36,2008 [4]Maria Halkidi, Quality assessment and Uncertainty Handling in Data Mining Process, halkidi00quality.html [5] W. H. Inmon The data warehouse and data mining, Commn. ACM, vol. 39, pp , 996. [6] U. Fayyad and R. Uthurusamy, Data mining and knowledge discovery in databases, Commn. ACM, vol. 39, pp , 996. [7] Pavel Berkhin, Survey of Clustering Data Mining Techniques, [8] Chau, M., Cheng, R., and Kao, B, Uncertain Data Mining: A New Research Direction, /~mchau/papers/uncertaindatamining_wsa.pdf [9] Keith C.C, C. Wai-Ho Au, B. Choi, Mining Fuzzy Rules in A Donor Database for Direct Marketing by A Charitable Organization, Proc of First IEEE International Conference on Cognitive Informatics, pp: , 2002 [0] E. Cox, Fuzzy Modeling And Genetic Algorithms For Data Mining And Exploration, Elsevier, 2005 [] G. J Klir, T A. Folger, Fuzzy Sets, Uncertainty and Information, Prentice Hall,988 [2] J Han, M Kamber, Data Mining Concepts and Techniques, Elsevier, 2003 [3] J. C. Bezdek, Fuzzy Mathematics in Pattern Classification, Ph.D. thesis, Center for Applied Mathematics, Cornell University, Ithica, N.Y., 973. [4] Carl G. Looney, A Fuzzy Clustering and Fuzzy Merging Algorithm, [5] Frank Klawonn, Anette Keller, Fuzzy Clustering Based on Modified Distance Measures, [6] Sullen Donnelly, How Bhutan Can Develop and Measure GNH, gnh/gnh-papers-st_8-20.pdf [7] Lei Jiang and Wenhui Yang, A Modified Fuzzy C-Means Algorithm for Segmentation of Magnetic Resonance Images Proc. VIIth Digital Image Computing: Techniques and Applications, pp , 0-2 Dec. 2003, Sydney. 65

Enhanced Customer Relationship Management Using Fuzzy Clustering

Enhanced Customer Relationship Management Using Fuzzy Clustering Enhanced Customer Relationship Management Using Fuzzy Clustering Gayathri. A, Mohanavalli. S Department of Information Technology,Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai,

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract:- Data mining is used to find the hidden information pattern and relationship between

More information

A FUZZY LOGIC APPROACH FOR SALES FORECASTING

A FUZZY LOGIC APPROACH FOR SALES FORECASTING A FUZZY LOGIC APPROACH FOR SALES FORECASTING ABSTRACT Sales forecasting proved to be very important in marketing where managers need to learn from historical data. Many methods have become available for

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

More information

Fuzzy Clustering Technique for Numerical and Categorical dataset

Fuzzy Clustering Technique for Numerical and Categorical dataset Fuzzy Clustering Technique for Numerical and Categorical dataset Revati Raman Dewangan, Lokesh Kumar Sharma, Ajaya Kumar Akasapu Dept. of Computer Science and Engg., CSVTU Bhilai(CG), Rungta College of

More information

SOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM

SOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM SOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM Mrutyunjaya Panda, 2 Manas Ranjan Patra Department of E&TC Engineering, GIET, Gunupur, India 2 Department

More information

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar

More information

Quality Assessment in Spatial Clustering of Data Mining

Quality Assessment in Spatial Clustering of Data Mining Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering

More information

Project Management Efficiency A Fuzzy Logic Approach

Project Management Efficiency A Fuzzy Logic Approach Project Management Efficiency A Fuzzy Logic Approach Vinay Kumar Nassa, Sri Krishan Yadav Abstract Fuzzy logic is a relatively new technique for solving engineering control problems. This technique can

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

Data Mining and Soft Computing. Francisco Herrera

Data Mining and Soft Computing. Francisco Herrera Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: herrera@decsai.ugr.es http://sci2s.ugr.es

More information

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009 Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Knowledge Based Descriptive Neural Networks

Knowledge Based Descriptive Neural Networks Knowledge Based Descriptive Neural Networks J. T. Yao Department of Computer Science, University or Regina Regina, Saskachewan, CANADA S4S 0A2 Email: jtyao@cs.uregina.ca Abstract This paper presents a

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Fuzzy Logic Based Revised Defect Rating for Software Lifecycle Performance. Prediction Using GMR

Fuzzy Logic Based Revised Defect Rating for Software Lifecycle Performance. Prediction Using GMR BIJIT - BVICAM s International Journal of Information Technology Bharati Vidyapeeth s Institute of Computer Applications and Management (BVICAM), New Delhi Fuzzy Logic Based Revised Defect Rating for Software

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths

More information

Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)

Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis) Data Mining 資 料 探 勘 Tamkang University 分 群 分 析 (Cluster Analysis) DM MI Wed,, (:- :) (B) Min-Yuh Day 戴 敏 育 Assistant Professor 專 任 助 理 教 授 Dept. of Information Management, Tamkang University 淡 江 大 學 資

More information

Grid Density Clustering Algorithm

Grid Density Clustering Algorithm Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2

More information

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

Intrusion Detection Using Data Mining Along Fuzzy Logic and Genetic Algorithms

Intrusion Detection Using Data Mining Along Fuzzy Logic and Genetic Algorithms IJCSNS International Journal of Computer Science and Network Security, VOL.8 No., February 8 7 Intrusion Detection Using Data Mining Along Fuzzy Logic and Genetic Algorithms Y.Dhanalakshmi and Dr.I. Ramesh

More information

K Thangadurai P.G. and Research Department of Computer Science, Government Arts College (Autonomous), Karur, India. ktramprasad04@yahoo.

K Thangadurai P.G. and Research Department of Computer Science, Government Arts College (Autonomous), Karur, India. ktramprasad04@yahoo. Enhanced Binary Small World Optimization Algorithm for High Dimensional Datasets K Thangadurai P.G. and Research Department of Computer Science, Government Arts College (Autonomous), Karur, India. ktramprasad04@yahoo.com

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

A survey on Data Mining based Intrusion Detection Systems

A survey on Data Mining based Intrusion Detection Systems International Journal of Computer Networks and Communications Security VOL. 2, NO. 12, DECEMBER 2014, 485 490 Available online at: www.ijcncs.org ISSN 2308-9830 A survey on Data Mining based Intrusion

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Clustering Data Streams

Clustering Data Streams Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin gtg091e@mail.gatech.edu tprashant@gmail.com javisal1@gatech.edu Introduction: Data mining is the science of extracting

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

DATA MINING PROCESS FOR UNBIASED PERSONNEL ASSESSMENT IN POWER ENGINEERING COMPANY

DATA MINING PROCESS FOR UNBIASED PERSONNEL ASSESSMENT IN POWER ENGINEERING COMPANY Medunarodna naucna konferencija MENADŽMENT 2012 International Scientific Conference MANAGEMENT 2012 Mladenovac, Srbija, 20-21. april 2012 Mladenovac, Serbia, 20-21 April, 2012 DATA MINING PROCESS FOR UNBIASED

More information

Detection of DDoS Attack Scheme

Detection of DDoS Attack Scheme Chapter 4 Detection of DDoS Attac Scheme In IEEE 802.15.4 low rate wireless personal area networ, a distributed denial of service attac can be launched by one of three adversary types, namely, jamming

More information

Fuzzy Candlestick Approach to Trade S&P CNX NIFTY 50 Index using Engulfing Patterns

Fuzzy Candlestick Approach to Trade S&P CNX NIFTY 50 Index using Engulfing Patterns Fuzzy Candlestick Approach to Trade S&P CNX NIFTY 50 Index using Engulfing Patterns Partha Roy 1, Sanjay Sharma 2 and M. K. Kowar 3 1 Department of Computer Sc. & Engineering 2 Department of Applied Mathematics

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

How To Solve The Cluster Algorithm

How To Solve The Cluster Algorithm Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

How To Understand The History Of Navigation In French Marine Science

How To Understand The History Of Navigation In French Marine Science E-navigation, from sensors to ship behaviour analysis Laurent ETIENNE, Loïc SALMON French Naval Academy Research Institute Geographic Information Systems Group laurent.etienne@ecole-navale.fr loic.salmon@ecole-navale.fr

More information

Types of Degrees in Bipolar Fuzzy Graphs

Types of Degrees in Bipolar Fuzzy Graphs pplied Mathematical Sciences, Vol. 7, 2013, no. 98, 4857-4866 HIKRI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.37389 Types of Degrees in Bipolar Fuzzy Graphs Basheer hamed Mohideen Department

More information

SoSe 2014: M-TANI: Big Data Analytics

SoSe 2014: M-TANI: Big Data Analytics SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering

More information

Enhanced data mining analysis in higher educational system using rough set theory

Enhanced data mining analysis in higher educational system using rough set theory African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review

More information

K-Means Clustering Tutorial

K-Means Clustering Tutorial K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July

More information

Prototype-based classification by fuzzification of cases

Prototype-based classification by fuzzification of cases Prototype-based classification by fuzzification of cases Parisa KordJamshidi Dep.Telecommunications and Information Processing Ghent university pkord@telin.ugent.be Bernard De Baets Dep. Applied Mathematics

More information

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment

More information

A comparison of various clustering methods and algorithms in data mining

A comparison of various clustering methods and algorithms in data mining Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering

More information

A Two-Step Method for Clustering Mixed Categroical and Numeric Data

A Two-Step Method for Clustering Mixed Categroical and Numeric Data Tamkang Journal of Science and Engineering, Vol. 13, No. 1, pp. 11 19 (2010) 11 A Two-Step Method for Clustering Mixed Categroical and Numeric Data Ming-Yi Shih*, Jar-Wen Jheng and Lien-Fu Lai Department

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster

More information

Degree of Uncontrollable External Factors Impacting to NPD

Degree of Uncontrollable External Factors Impacting to NPD Degree of Uncontrollable External Factors Impacting to NPD Seonmuk Park, 1 Jongseong Kim, 1 Se Won Lee, 2 Hoo-Gon Choi 1, * 1 Department of Industrial Engineering Sungkyunkwan University, Suwon 440-746,

More information

Unsupervised learning: Clustering

Unsupervised learning: Clustering Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

More information

Optimization of Fuzzy Inventory Models under Fuzzy Demand and Fuzzy Lead Time

Optimization of Fuzzy Inventory Models under Fuzzy Demand and Fuzzy Lead Time Tamsui Oxford Journal of Management Sciences, Vol. 0, No. (-6) Optimization of Fuzzy Inventory Models under Fuzzy Demand and Fuzzy Lead Time Chih-Hsun Hsieh (Received September 9, 00; Revised October,

More information

Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining

Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining by Ashish Mangalampalli, Vikram Pudi Report No: IIIT/TR/2008/127 Centre for Data Engineering International Institute of Information Technology

More information

Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering Steffen Unkel

Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering Steffen Unkel Fortgeschrittene Computerintensive Methoden: Fuzzy Clustering Steffen Unkel Institut für Statistik LMU München Sommersemester 2013 Outline 1 Setting the scene 2 Methods for fuzzy clustering 3 The assessment

More information

Introduction to Fuzzy Control

Introduction to Fuzzy Control Introduction to Fuzzy Control Marcelo Godoy Simoes Colorado School of Mines Engineering Division 1610 Illinois Street Golden, Colorado 80401-1887 USA Abstract In the last few years the applications of

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms 8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should

More information

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

ISSN: 2277-3754 ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 3, September 2013

ISSN: 2277-3754 ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 3, September 2013 Performance Appraisal using Fuzzy Evaluation Methodology Nisha Macwan 1, Dr.Priti Srinivas Sajja 2 Assistant Professor, SEMCOM 1 and Professor, Department of Computer Science 2 Abstract Performance is

More information

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

TDWI Best Practice BI & DW Predictive Analytics & Data Mining TDWI Best Practice BI & DW Predictive Analytics & Data Mining Course Length : 9am to 5pm, 2 consecutive days 2012 Dates : Sydney: July 30 & 31 Melbourne: August 2 & 3 Canberra: August 6 & 7 Venue & Cost

More information

STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL Stock Asian-African Market Trends Journal using of Economics Cluster Analysis and Econometrics, and ARIMA Model Vol. 13, No. 2, 2013: 303-308 303 STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Segmentation of stock trading customers according to potential value

Segmentation of stock trading customers according to potential value Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS

ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS 1 PANKAJ SAXENA & 2 SUSHMA LEHRI 1 Deptt. Of Computer Applications, RBS Management Techanical Campus, Agra 2 Institute of

More information

A Survey on Parallel Method for Rough Set using MapReduce Technique for Data Mining

A Survey on Parallel Method for Rough Set using MapReduce Technique for Data Mining www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 9 Sep 2015, Page No. 14160-14163 A Survey on Parallel Method for Rough Set using MapReduce Technique

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

A ROUGH CLUSTER ANALYSIS OF SHOPPING ORIENTATION DATA. Kevin Voges University of Canterbury. Nigel Pope Griffith University

A ROUGH CLUSTER ANALYSIS OF SHOPPING ORIENTATION DATA. Kevin Voges University of Canterbury. Nigel Pope Griffith University Abstract A ROUGH CLUSTER ANALYSIS OF SHOPPING ORIENTATION DATA Kevin Voges University of Canterbury Nigel Pope Griffith University Mark Brown University of Queensland Track: Marketing Research and Research

More information

A Study of Various Methods to find K for K-Means Clustering

A Study of Various Methods to find K for K-Means Clustering International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 A Study of Various Methods to find K for K-Means Clustering Hitesh Chandra Mahawari

More information

Solving Mining Aspects Problem by Non Linear Regression Technique and FIRS

Solving Mining Aspects Problem by Non Linear Regression Technique and FIRS Solving Mining Aspects Problem by Non Linear Regression Technique and FIRS DivyaJyoti Shrivastav 1 Abstract- Data mining deals with extracting knowledge from large and infinite amount of stream data with

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

Colour Image Segmentation Technique for Screen Printing

Colour Image Segmentation Technique for Screen Printing 60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone

More information

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva

USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva 382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent

More information

A New Approach For Estimating Software Effort Using RBFN Network

A New Approach For Estimating Software Effort Using RBFN Network IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 008 37 A New Approach For Estimating Software Using RBFN Network Ch. Satyananda Reddy, P. Sankara Rao, KVSVN Raju,

More information

Real Time Traffic Balancing in Cellular Network by Multi- Criteria Handoff Algorithm Using Fuzzy Logic

Real Time Traffic Balancing in Cellular Network by Multi- Criteria Handoff Algorithm Using Fuzzy Logic Real Time Traffic Balancing in Cellular Network by Multi- Criteria Handoff Algorithm Using Fuzzy Logic Solomon.T.Girma 1, Dominic B. O. Konditi 2, Edward N. Ndungu 3 1 Department of Electrical Engineering,

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING USER DEFINED PARAMETERS

A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING USER DEFINED PARAMETERS A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING USER DEFINED PARAMETERS A.Vanitha Katherine (1) and K.Alagarsamy (2 ) 1 Department of Master of Computer Applications, PSNA College

More information

S.Thiripura Sundari*, Dr.A.Padmapriya**

S.Thiripura Sundari*, Dr.A.Padmapriya** Structure Of Customer Relationship Management Systems In Data Mining S.Thiripura Sundari*, Dr.A.Padmapriya** *(Department of Computer Science and Engineering, Alagappa University, Karaikudi-630 003 **

More information

Clustering In Data Mining : A Brief Review

Clustering In Data Mining : A Brief Review Clustering In Data Mining : A Brief Review Meenu Sharma Mtech scholar, Department of computer science,sd Bansal college of tehnology,indore. Mr. Kamal Borana Assistant Professor Department of computer

More information

Neural Networks in Data Mining

Neural Networks in Data Mining IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department

More information

A Novel Approach for Network Traffic Summarization

A Novel Approach for Network Traffic Summarization A Novel Approach for Network Traffic Summarization Mohiuddin Ahmed, Abdun Naser Mahmood, Michael J. Maher School of Engineering and Information Technology, UNSW Canberra, ACT 2600, Australia, Mohiuddin.Ahmed@student.unsw.edu.au,A.Mahmood@unsw.edu.au,M.Maher@unsw.

More information

IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES

IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE IMPROVISATION OF STUDYING COMPUTER BY CLUSTER STRATEGIES C.Priyanka 1, T.Giri Babu 2 1 M.Tech Student, Dept of CSE, Malla Reddy Engineering

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value. Malaysia

A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value. Malaysia A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value Amir Hossein Azadnia a,*, Pezhman Ghadimi b, Mohammad Molani- Aghdam a a Department of Engineering, Ayatollah Amoli

More information