A COMPREHENSIVE STUDY OF DATA MINING METHODS WITH IMPLEMENTATION OF NEURAL NETWORK TECHNIQUES



Similar documents
How To Use Neural Networks In Data Mining

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

NEURAL NETWORKS IN DATA MINING

Ms. Aruna J. Chamatkar Assistant Professor in Kamla Nehru Mahavidyalaya, Sakkardara Square, Nagpur

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

Knowledge Based Descriptive Neural Networks

Data Mining and Neural Networks in Stata

Data Mining System, Functionalities and Applications: A Radical Review

Neural Networks and Back Propagation Algorithm

A New Approach For Estimating Software Effort Using RBFN Network

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

American International Journal of Research in Science, Technology, Engineering & Mathematics

Comparison of K-means and Backpropagation Data Mining Algorithms

Keywords: Data Mining, Neural Networks, Data Mining Process, Knowledge Discovery, Implementation. I. INTRODUCTION

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Database Marketing, Business Intelligence and Knowledge Discovery

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

SPATIAL DATA CLASSIFICATION AND DATA MINING

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Towards applying Data Mining Techniques for Talent Mangement

An Overview of Knowledge Discovery Database and Data mining Techniques

Visualization of large data sets using MDS combined with LVQ.

Role of Neural network in data mining

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Binary Coded Web Access Pattern Tree in Education Domain

A New Method for Traffic Forecasting Based on the Data Mining Technology with Artificial Intelligent Algorithms

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

An Evaluation of Neural Networks Approaches used for Software Effort Estimation

Novelty Detection in image recognition using IRF Neural Networks properties

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Artificial Neural Network Approach for Classification of Heart Disease Dataset

Data quality in Accounting Information Systems

The Research of Data Mining Based on Neural Networks

Prediction of Heart Disease Using Naïve Bayes Algorithm

Visualization of Breast Cancer Data by SOM Component Planes

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Building A Smart Academic Advising System Using Association Rule Mining

Meta-learning. Synonyms. Definition. Characteristics

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

A Review of Data Mining Techniques

DATA MINING TECHNIQUES AND APPLICATIONS

6.2.8 Neural networks for data mining

Artificial Neural Networks are bio-inspired mechanisms for intelligent decision support. Artificial Neural Networks. Research Article 2014

An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION

Data Mining Solutions for the Business Environment

Using Data Mining for Mobile Communication Clustering and Characterization

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains

Introduction to Data Mining

Chapter 12 Discovering New Knowledge Data Mining

Neural Network Design in Cloud Computing

Big Data with Rough Set Using Map- Reduce

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

A Survey on Association Rule Mining in Market Basket Analysis

Machine Learning and Data Mining -

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

Use of Artificial Neural Network in Data Mining For Weather Forecasting

Self Organizing Maps for Visualization of Categories

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services

Random forest algorithm in big data environment

Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

Effective Data Mining Using Neural Networks

A New Approach for Evaluation of Data Mining Techniques

A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data

Machine Learning Introduction

Neural Networks and Support Vector Machines

ABSTRACT The World MINING R. Vasudevan. Trichy. Page 9. usage mining. basic. processing. Web usage mining. Web. useful information

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Stock Prediction using Artificial Neural Networks

Spam Detection Using Customized SimHash Function

D A T A M I N I N G C L A S S I F I C A T I O N

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

Credit Card Fraud Detection Using Self Organised Map

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Healthcare Measurement Analysis Using Data mining Techniques

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

Transcription:

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 5, Issue 12, December (2014), pp. 264-274 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET I A E M E A COMPREHENSIVE STUDY OF DATA MINING METHODS WITH IMPLEMENTATION OF NEURAL NETWORK TECHNIQUES Ms. Aruna J. Chamatkar Research Scholar, Department of Electro. And Computer Science, RTM Nagpur University, Nagpur Dr. Pradeep K. Butey Research Supervisor, HOD, Computer Science Department, Kamla Nehru Mahavidyalaya, Nagpur ABSTRACT Data mining methods have been successfully applied in a wide range of unsupervised and supervised learning applications. But fact is that neural network is not that commonly used in data mining, because they require long training times and often produce incomprehensible models. In this paper we are going to review the research we have done on the data mining and neural network. In this paper the review of data mining with neural network is done. The main motive of the research is to study and improve the data mining results with neural network for this we summarize the state-of-art principle using neural network model in data mining. Keywords: Artificial Neural Network, Data Mining, CHARM algorithm, K rule mining, CM SPAM Algorithm. I. INTRODUCTION The central focus of the data-mining enterprise is to gain insight into large collections of data. Often, achieving this goal involves applying machine-learning methods to inductively construct models of the data at hand. In this article, we provide an introduction to the topic of using neural. Network methods for data mining, neural networks have been applied to a wide variety of problem domains to learn models that are able to perform such interesting tasks. Although neural-network learning algorithms have been successfully applied to a wide range of unsupervised learning problems and supervised learning problems, they have not often been applied in data-mining settings, in which 264

ISSN 0976-6375(Online), Volume 5, Issue 12, December (2014), pp. 264-274 IAEME two fundamental considerations are the comprehensibility of learned models and the time required to induce models from large data sets. We discuss new developments in neural-network learning that effectively address the comprehensibility and speed issues which often are of prime importance in the data-mining community. Specifically, we describe algorithms that are able to extract symbolic rules from trained neural networks and algorithms that are able to directly learn comprehensible models. Data mining (DM) is the interesting, previously unknown, potentially useful information and nontrivial extraction of implicit (mainly in the form of knowledge models or patterns) from data. Data mining industry has grown rapidly for the large business database applications, such as ecommerce industry where finding patterns in customer purchasing activities from transactions databases is used. Main Data Mining problems were to combine or work with known methods such as neural networks (NN) and decision trees to very large datasets (example 100,000 and more records in the single files) and to maintain relational database structures. After some research methods such as association rules were developed specifically motivated to solve the data mining challenge. One of the most important data mining problems is reduced to machine leaning and traditional statistical methods: association rule extraction, prediction, sequence detection, and classification. Different techniques used in data mining are very heterogeneous: neural network, case-based reasoning, Bayesian networks, statistical methods, rough sets, rule induction, decision trees, fuzzy sets, genetic algorithms/evolutionary programming. Major stages to solve the data mining problems are as follows [1]: 1. Define or identify the problem. 2. Select and Collect the data from the data sets, such as deciding how to collect and which data to Collect form them. 3. Transform data to a certain format to operate properly or use data cleansing. 4. Data preprocessing: this process is used to improve the quality of the data sets available for Processing 5. Apply the particular data mining method according to requirement which includes: (a) Selecting a model or algorithm. (b) Selecting model/algorithm training parameters. 6. Training of the system is used to evaluate the input data sets. 7. Finally intricate the data mining model and evaluate to find the results. In today s world there is huge amount of data is stored in the files, databases, and other repositories, because of this for better decision-making, it is very important to develop powerful tool or software for interpretation and analysis of such data. Data Mining is the only optimal solution for above problems. Data mining is the process of extraction of hidden predictive information and data from large databases; Data mining is the very powerful technology with great capability to solve organizations focus on the most important information in their data warehouses [2][3][4][5]. Data mining software and tools predict future trends and behaviors help firms and organizations to make proactive knowledge-driven decisions [3]. The prospective and automated analyses offered by different data mining methods move beyond the analyses of past events provided by prospective tools typical of decision support systems. II. NEURAL NETWORK WITH DATA MINING The main motive of the research is to study and improve the data mining results with neural network for this we summarize the state-of-art principle using neural network model in data mining, and not of the applications. Author J. P. Bigus write non-technical book on neural network for data 265

mining [6,7]. Neural networks are suitable in very large databases and are typically used for extracting embedded knowledge in the form of rules, self-organization, clustering, quantitative evaluation of these rules, regression and classification, dimensionality reduction and feature evaluation. Survey of 43 data mining software products in 1999, which were either commercially available or research prototypes, Gruenwald and Goebel [8] found 10 products that relates to neural network: Intelligent Miner (IBM), Clementine, Brain Maker, Decision Series, Data Surveyor, Tool Diag, Model Quest, Delta Miner, Kepler, and Darwin. Neural network is used by many slandered data mining software models. However, some of these modules are extremely basic: most of the time just a simple trainable with inefficient multi, -layered Perceptron and old fashioned updating techniques such as standard Back propagation. These techniques not even capable to fulfill the important requirement of providing insight in the database. In fact, one could even argue whether these standard neural networks are truly methods for data mining as defined above, or at most predictions, clustering and perhaps classification tools. Some of the latest research on neural network field brings them much closer to the ideal of data mining: knowledge out of data in understandable terms. Methods have been developed for the visualization and simplification ( pruning ) of neural network, for input to discover symbolic rules, and relevance determination out of trained NN. Neural network and their soft computing techniques have been used in a variety of data mining tasks [9, 10]. We can say that the main contribution of neural network toward data mining stems from clustering and from rule extraction. Rule Extraction and Evaluation: To achieve the required accuracy rate, typically a network is first trained. Then to remove the redundant connections of the network a pruning algorithm is used. The link activation and weights values of the hidden units the classification rules are generated, and the networks are analyzed [11]. It seems that from the global data mining perspective rules extraction from neural network is a temporary solution for getting interpretable results. Direct rule extraction from data potentially can produce better rules. Extraction rules from neural network may carry artifacts to rules and neural network limitations. Dimensionality and Clustering Reduction: Kohonen s SOM [12] proved to be an appropriate tool for handling huge data bases. Kohonen et al. [13] have demonstrated the utility of a SOM with more than one million nodes to partition a little less than seven million patent abstracts where the documents are represented by 500-dimensional feature vectors. Kohonen s LVQ [14] was successfully used for on-line dimensionality reduction [15], [16]. SOM and LVQ, used with data visualization techniques, are presently one of the most promising neural network applications in data mining. The main reason for this is the scalability of the SOM model. Meanwhile, dimensionality reduction is essential for data visualization and analysis. III. DATA MINING METHODS Data mining classification is according to the data analysis approach used such as genetic algorithms, neural networks, machine learning, visualization, statistics, database oriented or data warehouse-oriented, etc. The classification can also take into account the degree of user interaction involved in the data mining process such as query-driven systems, interactive exploratory systems, or autonomous systems. A comprehensive system would provide a wide variety of data mining techniques to fit different situations and options, and offer different degrees of user interaction. There are many different methods are proposed by the different authors for the data mining. In the paper we are going to study and compare three methods of the data mining which we implemented and they are CHARM algorithm, K rule mining and CM SPAM algorithms. 266

ISSN 0976-6375(Online), Volume 5, Issue 12, December (2014), pp. 264-274 IAEME 1. CHARM Algorithm CHARM is an efficient algorithm for enumerating the set of all frequent closed item-sets. There are a number of innovative ideas employed in the development of CHARM; these include: 1) CHARM simultaneously explores both the item-set space and transaction space, over a novel IT-tree (item set-tides tree) search space of the database. In contrast previous algorithms exploit only the item-set search space. 2) CHARM uses a highly efficient hybrid search method that skips many levels of the IT-tree to quickly identify the frequent closed item-sets, instead of having to enumerate many possible subsets. 3) It uses a fast hash-based approach to eliminate non-closed item-sets during subsumption checking. CHARM also able to utilize a novel vertical data representation called diffset [19], for fast frequency computations. Diffsets also keep track for differences in the tids of a candidate pattern from its prefix pattern. Diffsets drastically cut down (by orders of magnitude) the size of memory required to store intermediate results. Thus the entire working set of patterns can fit entirely in the memory, even for huge databases. Several factors make this a realistic assumption. First, CHARM breaks the search space into small independent chunks (based on prefix equivalence classes [20]). Second, diffsets lead to extremely small memory footprint. Finally, CHARM uses simple set difference operations and not requires any complex internal data structures (candidate generation and counting happens in a single step). The current trend toward large (gigabyte-sized) main memories, combined with the above features, makes CHARM a efficient and practical algorithm for reasonably large databases. CHARM simultaneously explores both the itemset space and tidset space using the IT-tree, unlike old algorithms which typically exploit only the itemset space. CHARM uses a novel search method, which is based on the IT-pair properties, that skips many levels in the IT-tree to quickly converge on the itemset closures, rather than having to enumerate many possible subsets. 2. Mining Top K Association Rule Association rule mining [21] consists of discovering associations between items in transportation. It is the most important data mining tasks. It has been integrated in many commercial data mining software and has wide applications in several domains. The idea of mining top-k association rules presented in this paper is analogous to the idea of mining top-k itemsets [22] and top-k sequential patterns [23] [24] [25] in the field of frequent pattern mining in database. Note that although many authors have previously used the term top-k association rules, they did not use the actual standard definition of an association rule. KORD [26] [27] only finds rules with a single item in the consequent, whereas the algorithm of You et al. [28] consists of mining association rules from a stream instead of a transaction database. To achieve this goal, a question is how to combine the concept of top-k pattern mining with association rules? Two thresholds are used For association rule mining. But, in practice minsup is much more difficult to set than minconf because minsup depends on database characteristics that are unknown to most users, whereas minconf represents the minimal confidence that users want in rules and is generally easy to determine. For this reason, we define top-k on the support rather than the confidence. The algorithm main idea is the following. Top K Rules first sets an internal minsup variable to 0. Then, the algorithm starts searching for rules. As soon as a rule is found, it is added to a list of rules L ordered by the support. The list is used to maintain the top-k rules found until now. Once k valid rules are found, the internal minsup variable is raised to the support of the rule with the lowest support in L. Raising the minsup value is used to prune the search space when searching for more rules. Thereafter, each time a valid rule is found, the rule is inserted in L, the rules in L not respecting minsup anymore are removed from L, and minsup is raised to the value of the least interesting rule in L. The algorithm 267

continues searching for more rules until no rule are found, which means that it has found the top-k rules. Another idea incorporated in Top K Rules is to try to generate the most promising rules first. This is because if rules with high support are found earlier, Top K Rules can raise its internal minsup variable faster to prune the search space. To perform this, Top K Rules uses an internal variable R to store all the rules that can be expanded to have a chance of finding more valid rules. Top K Rules uses this set to determine the rules that are the most likely to produce valid rules with a high support to raise minsup more quickly and prune a larger part of the search space. 3. CM SPAM Algorithm Mining useful patterns in sequential data is a challenging task. Many studies have been proposed for mining interesting patterns in sequence databases [29]. Sequential pattern mining is probably the most popular research topic among them. A subsequence is called sequential pattern or frequent sequence if it frequently appears in a sequence database and its frequency is no less than a user-specified minimum support threshold minsup [30]. Sequential pattern mining plays an important role in data mining and is essential to a wide range of applications such as the analysis of web medical data, program executions, click-streams, e-learning data and biological data [29]. Several efficient algorithms have been proposed for sequential data mining and one of them is CM SPAM Algorithm. Problem with sequential data mining, Let I = {i1, i2,..., il} be a set of items (symbols). An itemset Ix = {i1, i2,..., im} I is an unordered set of distinct items. The lexicographical order >lex is defined as any total order on I. Without loss of generality, it is assumed in the following that all itemsets are ordered according to >lex. A sequence is an ordered list of itemsets s = <I1, I2,..., In > such that Ik I (1 k n). A sequence database SDB is a list of sequences SDB = >s1, s2,..., sp_ having sequence identifiers (SIDs) 1, 2...p. Example. A sequence database is shown in Fig. 1 (left). It contains four sequences having the SIDs 1, 2, 3 and 4. Each single letter represents an item. Items between curly brackets represent an itemset. The first sequence <{a, b}, {c}, {f, g}, {g}, {e}>contains five itemsets. It indicates that items a and b occurred at the same time, were followed by c, then f, g and lastly e. SID Sequence 1 <{a, b},{c},{f, g},{g},{e}> 2 <{a, d},{c},{b},{a, b, e, f}> 3 <{a},{b},{f},{e}> 4 <{b},{f, g}> (a) ID Pattern Support P1 <{a},{f}> 3 P2 <{a},{c},{f}> 2 P3 <{b},{f, g}> 2 P4 <{g},{e}> 2 P5 <{c},{f}> 2 P6 <{b}> 4 (b) Figure.1 A sequence database (a) and some sequential patterns found (b) The pseudo code of SPAM is shown below. SPAM take as input a sequence database SDB and the minsup threshold. SPAM first scans the input database SDB once to construct the vertical 268

ISSN 0976-6375(Online), Volume 5, Issue 12, December (2014), pp. 264-274 IAEME representation of the database V (SDB) and the set of frequent items F1. For each frequent item s F1, SPAM calls the SEARCH procedure with <s>, F1, {e F1 e >lex s}, and minsup. The SEARCH procedure outputs the pattern <{s}> and recursively explore candidate patterns starting with the prefix <{s}>. The SEARCH procedure takes as parameters a sequential pattern pat and two sets of items to be appended to pat to generate candidates. The first set Sn represents items to be appended to pat by s-extension. The s-extension of a sequential pattern <I1, I2,...Ih> with an item x is defined as <I1, I2,...Ih>, {x}>. The second set Si represents items to be appended to pat by i-extension. The i-extension of a sequential pattern <I1, I2,...Ih> with an item x is defined as <I1, I2,...Ih {x}>. For each candidate pat generated by an extension, SPAM calculates its support to determine if it is frequent. This is done by making a join operation (see [4] for details) and counting the number of sequences where the pattern appears. The Id List representation used by SPAM is based on bitmaps to get faster operations [4]. If the pattern pat is frequent, it is then used in a recursive call to SEARCH to generate patterns starting with the prefix pat. Note that in the recursive call, only items that resulted in a frequent pattern by extension of pat are considered for extending pat. SPAM prunes the search space by not extending infrequent patterns. This can be done due to the property that an infrequent sequential pattern cannot be extended to form a frequent pattern [2]. Different data mining algorithms are study in this section the performance of this data mining algorithms shown below. This helps to evaluate the entire studies algorithm and compare them with each other. Table 1 Performance Evaluations of CHARM Database name Item Avg. Length Time(s) Max Pattern (%) Chess 76 37 20 20 Connect 130 43 40 10 Mushrooms 120 23 9 0.075 Gazelle 498 2.5 10 0.01 Table 2 Performance Evaluations of Top K rule Database name Item Avg. Length Time(s) Max Pattern (%) Chess 75 37 8 1.49 Connect 129 43 283 25.51 Mushrooms 128 23 20 3.46 Gazelle 498 2.5 368 46.39 Table 3 Performance Evaluations of CM SPAM Database name Item Avg. Length Time(s) Max Pattern (%) Chess 76 37 15 18.81 Connect 130 43 40 12.3 Mushrooms 120 23 60 0.59 Gazelle 498 2.5 80 24.08 IV. DATA SETS Data sets are the collections of data store in the file. Data sets are used to give the input to the data mining algorithm to obtain the optimized result. Different companies and organizations are maintaining the information in the data sets. In this paper for the implementation of the proposed method we collected demo datasets of online retailing e-commerce websites Flipkart and Amazon. 269

Each dataset contains 1000, 2000, 5000 and 10000 entries, and is available for Frequent Pattern Mining, Association Rule Mining and Sequential Pattern Mining. We applied these datasets to each of the algorithms and found out the results in terms of memory and time complexity. V. NEURAL NETWORK It is Non-linear predictive models that learn through training and resemble biological neural networks in structure. An artificial neural network (ANN), often just called a "neural network", is a computational model or mathematical model based on biological neural networks, in other words we can say that it is an emulation of biological neural system. Artificial neural network consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. Most of the time ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase [22]. Figure 1 Basic Neural Network Structure A neural network has to be configured such that the application of a set of inputs produces (either 'direct' or via a relaxation process) the desired set of outputs. Different methods to set the strengths of the connections presents. Prior knowledge is a one way is to set the weights explicitly. Another way is to 'train' the neural network by feeding it teaching patterns and letting it change its weights according to some learning rule. VI. DIFFERENT NEURAL NETWORK TECHNIQUES There are many different neural network techniques are presented by different authors in there papers. In this paper three different artificial neural network algorithms are used with data mining algorithms to improve the computational complexity of the data mining algorithm following are the three different neural network techniques used in this paper. A. Feed forward Neural Network A feed forward neural network (FFNN) is the one of the simplest neural network technique, such as in Figure, consists of three layers: an input, hidden and output layer. In every layer there are one or more processing elements (PEs). There are connections between the processing elements in each layer that have a weight (parameter) associated with the element. During training this weight is adjusted. Information only travels in the forward direction through the network - there are no feedback loops. 270

ISSN 0976-6375(Online), Volume 5, Issue 12, December (2014), pp. 264-274 IAEME Figure 2 Flow of the feed forward neural network B. The Back Propagation Algorithm Propagation of error or Back propagation is a common method of teaching artificial neural networks how to perform a particular task. The back propagation algorithm is used in layered feed forward ANNs. This means that the artificial neurons are organized in layers, and send their signals forward, and then the errors are propagated backwards. The back propagation algorithm uses supervised learning, which means that we provide the algorithm with examples of the inputs and outputs we want the network to compute, and then the error (difference between actual and expected results) is calculated. The idea of the back propagation algorithm is to reduce this error, until the ANN learns the training data. C. Radial Basic Function Network The idea of Radial Basis Function (RBF) Networks derives from the theory of function approximation. We have already seen how Multi-Layer Perceptron (MLP) networks with a hidden layer of sigmoidal units can learn to approximate functions. RBF Networks take a slightly different approach. Their main features are: 1. They are two-layer feed-forward networks. 2. The hidden nodes implement a set of radial basis functions (e.g. Gaussian functions). 3. The output nodes implement linear summation functions as in an MLP. 4. The network training is divided into two stages: first the weights from the input to hidden layer are determined, and then the weights from the hidden to output layer. 5. The training/learning is very fast. 6. The networks are very good at interpolation. Figure 3 Radial Basis Function Network 271

An input vector xi which lies in the receptive field for center, would activate and by proper choice of weights the target output is obtained. The output is given as: : Weight of center, : some radial function. Different neural network algorithms are study in this section the comparative performance of these neural netwok algorithms shown below. This helps to evaluate the entire studies algorithm and compare them with each other. Table 4: Compression of different neural network techniques [30] Neural Network Algorithms Classification Accuracy Training Set Evaluation Set Back Full Set 84.3 % 85.2 % Propogation Reduced set 87.3 % 80.7 % Feedforword Full Set 80.4 % 82.5 % Reduced set 83.9 % 77.4 % Radial Basic Function Full Set 83.5 % 83.2 % Reduced set 89.8 % 82.1 % LVQ Full Set 79.5 % 74.2 % Reduced set 82.8 % 79.1 % VII. CONCLUSION In this paper we review the data mining with the neural network techniques. In our research we study the different data mining algorithms from which best three we selected for the implementation. These algorithms are CHARM, Top-K rule and CM-SPAM algorithm. We study this data mining algorithms and results of this algorithms form different papers are shown in this paper. Different neural network techniques which we research for the implementation are study and review briefly. Neural network techniques Back Propagation, Feed Forward and Radial Basic Functions are used with data mining algorithms to improve the mining performance. VIII. ACKNOWLEDGEMENT First Author would like to acknowledge Dr. P.K.Butey for their cooperation and useful suggestion to the research work and Dr. A. K. Shende Principle of Kamla Nehru Mahavidyalaya, Nagpur. REFERENCES 1. Z. Chen. Data Mining and Uncertain Reasoning: An Integrated Approach. Wiley, 2001. 2. Neelamadhab Padhy, Dr. Pragnyaban Mishra, Rasmita Panigrahi The Survey of Data Mining Applications And Feature Scope at International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.3, June 2012. 272

ISSN 0976-6375(Online), Volume 5, Issue 12, December (2014), pp. 264-274 IAEME 3. Introduction to Data Mining and Knowledge Discovery, Third Edition ISBN: 1-892095-02-5, Two Crows Corporation, 10500 Falls Road, Potomac, MD 20854 (U.S.A.), 1999. 4. Larose, D. T., Discovering Knowledge in Data: An Introduction to Data Mining, ISBN 0-471-66657-2, ohn Wiley & Sons, Inc, 2005. 5. Dunham, M. H., Sridhar S., Data Mining: Introductory and Advanced Topics, Pearson Education, New Delhi, ISBN: 81-7758-785-4, 1st Edition, 2006 6. J. P. Bigus. Data Mining with Neural Networks: Solving Business Problems- from Application Development to Decision Support. McGraw-Hill, New York, 1996. 7. S. Mitra, Pal S. K., and Mitra P. Data mining in soft computing framework: a survey. IEEE Trans. Neural Networks, 13(1):3 14, 2002. 8. M. Goebel and L. Gruenwald. A survey of data mining and knowledge discovery software tools. ACM SIGKDD Explorations, 1:20 33, 1999. 9. S. Abe. Pattern Classification: Neuro-Fuzzy Methods and Their Comparison. Springer Verlag, London, 2001. 10. Y. Bengio, J. M. Buhmann, M. Embrechts, and J.M. Zurada. Introduction to the special issue on neural networks for data mining and knowledge discovery. IEEE Trans. Neural Networks, 11:545 549, 2000. 11. A.B Tickle, R. Andrews, M. Golea, and J. Diederich. The truth will come to light: Directions and challenges in extracting the knowledge embedded within trained artificial networks. IEEE Trans. Neural Networks, 9(5):1057 1068, 1998. 12. T. Kohonen. Self-Organizing Maps. Springer Verlag, 1997. 13. T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection. IEEE Trans. Neural Networks, 11:574 585, 2000. 14. T. Kohonen. Improved versions of learning vector quantization. In Proc. Int. Joint Conf. on Neural Networks, pages 545 550, San Diego, 1990. 15. B. Hammer and T. Villmann. Generalized relevance learning vector quantization. Neural Networks, 15:1059 1068, 2002. 16. A. Cataron and R. Andonie. RLVQ determination using OWA operators. In M. Hamza, editor, Proceedings of the Third IASTED International Conference on Artificial Intelligence and Applications (AIA 2003), Benalmadena, Spain, September 8-10, pages 434 438, ACTA Press, 2003. 17. M. J. Zaki and K. Gouda. Fast vertical mining using Diffsets. Technical Report 01-1, Computer Science Dept., Rensselaer Polytechnic Institute, March 2001. 18. M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372-390, May-June 2000. 19. R. Agrawal, T. Imielminski and A. Swami, Mining Association Rules Between Sets of Items in Large Databases, Proc. ACM Intern. Conf. on Management of Data, ACM Press, June 1993, pp. 207-216. 20. P. Tzvetkov, X. Yan and J. Han, TSP: Mining Top-k Closed Sequential Patterns, Knowledge and Information Systems, vol. 7, no. 4, 2005, pp. 438-457. 21. D. Kun Ta, J.-L. Huang and M.-S. Chen, Mining Top-k Frequent Patterns in the Presence of the Memory Constraint, VLDB Journal, vol. 17, no. 5, 2008, pp. 1321-1344. 22. J. Wang, Y. Lu and P. Tzvetkov, Mining Top-k Frequent Closed Itemsets, IEEE Trans. Knowledge and Data Engineering, vol. 17, no. 5, 2005, pp. 652-664. 23. A. Pietracaprina and F. Vandin, Efficient Incremental Mining of Top-k Frequent Closed Itemsets, Proc. Tenth. Intern. Conf. Discovery Science, Oct. 2004, Springer, pp. 275-280. 24. G. I. Webb and S. Zhang, k-optimal-rule-discovery, Data Mining and Knowledge Discovery, vol. 10, no. 1, 2005, pp. 39-79. 273

25. G. I. Webb, Filtered top-k association discovery, WIREs Data Mining and Knowledge Discovery, vol.1, 2011, pp. 183-192. 26. Y. You, J. Zhang, Z. Yang and G. Liu, Mining Top-k Fault Tolerant Association Rules by Redundant Pattern Disambiguation in Data Streams, Proc. 2010 Intern. Conf. Intelligent Computing and Cognitive Informatics, March 2010, IEEE Press, pp. 470-473. 27. Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys 43(1), 1 41 (2010). 28. Brad A. Hawickhorst and Stephen A. Zahorian, A Comparison of Three Neural Network Architectures for Automatic Speech Recognition. 29. Mr. M. Karthikeyan, Mr. M. Suriya Kumar and Dr. S. Karthikeyan, A Literature Review on The Data Mining and Information Security International journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 141-146, ISSN Print: 0976 6367, ISSN Online: 0976 6375. 30. Shikha Dixit and Appu Kuttan. K.K, Artificial Neural Network Based Data Mining Approach For Human Heart Disease Prediction International journal of Computer Engineering & Technology (IJCET), Volume 5, Issue 6, 2014, pp. 136-142, ISSN Print: 0976 6367, ISSN Online: 0976 6375. 31. Henry Navarro and Leonardo Bennun, Descriptive Examples of The Limitations of Artificial Neural Networks Applied To The Analysis of Independent Stochastic Data International journal of Computer Engineering & Technology (IJCET), Volume 5, Issue 5, 2014, pp. 40-42, ISSN Print: 0976 6367, ISSN Online: 0976 6375 AUTHORS DETAILS Ms. ARUNA J. CHAMATKAR is MCA from Rashtrasant Tukdoji Maharaj Nagpur University, Nagpur. Currently pursuing PhD from RTM Nagpur University under the guidance of Dr. Pradeep K. Butey. Her research area is Data Mining and Neural Network. Dr. PRADEEP K. BUTEY is research Supervisor for Computer Science at RTM Nagpur university. He is the Head of Department (Computer Science) at Kamla Nehru Mahavidyalaya, Nagpur. His area of interest includes Fuzzy Logic, Neural Network and Data Mining. 274