Data Mining based on Rough Set and Decision Tree Optimization

Size: px
Start display at page:

Download "Data Mining based on Rough Set and Decision Tree Optimization"

Transcription

1 Data Mining based on Rough Set and Decision Tree Optimization College of Information Engineering, North China University of Water Resources and Electric Power, China, Abstract This paper presents a new kind of decision tree classification algorithm based on rough set theory. Firstly, the growth of decision tree and tree pruning algorithms are analyzed and compared. And optimizes the decision tree algorithm from two aspects: attribute reduction and pruning. Secondly, presents a reduction algorithm which is called ER briefly, based on the attribute dependency and a pruning algorithm for decision tree based on rough set theory. Lastly, the proposed algorithm which is used in the supplier evaluation system verifies the validity by comparing with C4.5 algorithm. 1. Introduction Keywords: Data Mining, Decision Tree, Rough Set, Pruning In recent years, with the development of computer technology, the information and data we store are more and more, how to find hidden information behind the data, and further guidance on the behavior of our industry is an important issue we face. Data mining [1]technology is to solve the above problem, which can identify potential links between the data, employ higher level of analysis, and make the ideal decision-making in order to predict the future development trend. With development of many years, data mining has increasingly shown its strong vitality. The core part of the data mining is to set up the model of data set. The ways of constructing the data model are not same, either the data mining methods. There are many different methods can be used during data mining, such as neural networks, decision trees, genetic algorithms and visualization technology. Data classification is an important feature in the data mining. Also there are many ways of data classification, such as the decision tree method, Bayesian networks, genetic algorithms, association-based classification methods, rough sets, k-nearest neighbor method and so on. Among them, the decision tree method is one of the commonly used methods of data classification. Compared with other classification methods, the decision tree [2] method has the following significant advantages: high speed, high accuracy, easily be understood and strong scalability. Now, the technology of decision tree has been gained attention of researchers in many data mining systems. Data mining system has been launched by domestic and foreign companies, most of which have adopted the decision tree method. The SASEnterprise Miner of SAS Company is described in paper [3], where the system is a generic data mining tool. And the system collects and analyzes a variety of statistical dates and buying patterns of customers, which help user finding trends of business, explain the known facts, predict future results, and identify the key factors of tasks needed to complete. IBM's Intelligent Miner described in paper, with typical data sets automatically generated, the association discovery, the sequence of regular pattern discovery, conceptual classification and visualization functions, can realize data selection automatically, data conversion, data mining and results showing, which has been shown a better data mining tool. Clementine of Solution Inc. provides a visual rapid modeling environment, which is composed of data acquisition, mining, finishing, modeling, and reporting components. The Knowledge SEEKER of Angoss Company is an analysis program based on decision tree, with a fairly complete classification tree analysis functions. DataCruncher of RightPoint Company described in paper [4] is a data mining engine based on client / server mode, which has the capable of analysis of huge amounts data in data warehouse, and direct connection with many of today's mainstream relational database and data mining tools. For the deficiencies of the existing decision tree algorithms, many researchers try to make efforts to control the size of the decision tree and to improve the accuracy of the decision tree, by studying a variety of pre-pruning algorithm and post-pruning algorithm to control the size of the tree, at the same time modifying the test attribute space, improving test attribute selection method, limiting the data set, and changing the data structure, while put forward a number of new algorithms and standards. The paper presents an improved attribute reduction algorithm based on rough set theory of attribute dependence, in which the time complexity of the algorithm is greatly improved while keeping the International Journal of Digital Content Technology and its Applications(JDCTA) Volume6,Number12,July 2012 doi: /jdcta.vol6.issue

2 ability of classification and the optimum set of attribute reduction will be found without going through a lot of computing. After studying the existing post-pruning algorithms based on rough set theory, aiming at improving inefficiency of the algorithms, an improved post-pruning algorithm for decision tree based on rough set theory is presented in the paper, which reduces the time complexity. 2. Related concepts 2.1. Data Mining Data mining is a process of extracting information and knowledge which is potentially useful while people do not know in advance from a large number of incomplete, noisy, fuzzy and random data [5]. Data mining is a cross-discipline which is concerned by many researchers from various fields and is affected by a number of disciplines including database technology, statistics, artificial intelligence, machine learning, pattern recognition, high-performance computing, visualization technology, and information science etc. The entire data mining process is composed by a number of mining steps; the main steps are listed as followed: Data clearing, its role is to remove data noise and apparently unrelated data of mining topics. Data integration, its role is to combine the data from multiple data sources. Data transformation; its role is to convert the data for ease of data mining. Data mining which is a fundamental step in the knowledge mining is to mine the data model, or the law of knowledge using intelligent methods. Pattern evaluation, its role is to select meaningful patterns of knowledge based on certain evaluation criteria from the mining results.knowledge presentation, its role is to show users the mining related knowledge using visualization and knowledge representation technology. A variety of knowledge of decision-making can be achieved using data mining technology for users. In many cases, users do not know which information and knowledge is valuable. Therefore, as to a data mining system, it should be able to simultaneously search and find the knowledge of a variety of model to meet user expectations and actual needs. In addition, the data mining system should also be able to dig out the pattern of knowledge of a variety of levels. There are many commonly used techniques about data mining such as: decision trees, neural networks, genetic algorithms, rough set methods etc Rough set theory Rough set theory has become one of the important basic theories of data mining. Decision tree and rough set theory combined with data mining methods have been widely used in data mining. The most significant advantages of rough set theory are to deal with incomplete, inaccurate, and incompatible data which carry out attribute reduction of decision tree taking advantage of rough set theory while remove the redundant attribute of rough set under the premise of keeping the classification ability. Rough Sets as a new mathematical theory dealing with imprecise and incomplete data, is originally proposed by Polish mathematician Pawlak, which has attracted the attention of scholars of all countries from the early 1990s. The rough set theory is set up based on the classification, and knowledge understanding is treated as data division, which is composed of equivalence relations in a particular space. The rough set theory is widely used in pattern recognition, machine learning, data mining and intelligent control because it can extract the implicit knowledge while does not require any prior knowledge to process existing knowledge. The basic idea of rough set theory is to export the classification rules of concept by knowledge reduction under the premise of keeping the classification ability. At present, rough set theory has been successfully applied in many fields. In paper [6], specific application about rough set theory is introduced in many fields, such as knowledge discovery, expert systems, pattern recognition, stock data analysis, earthquake prediction, rough control, medical diagnostics, artificial neural networks, decision analysis etc. At present, the rough set theory applications include the following aspects: decision assessment and data mining and rule generation etc. Decision-making evaluation method based on rough set theory can improve the objectivity of evaluation; also can transform a complex, ambiguous, subjective reasoning of the evaluation process into a series of objective, quantifiable, stylized problemsolving activities, which carry out scientific evaluation and right choice to provide right decisionmaking advice with decision makers. Data Mining and Rule Generation is the most important application of rough set theory in practice. Rough set theory has the ability of 481

3 searching minimum set of data, using the qualitative and quantitative data, while produce decision from the data. Pattern recognition is one of the main applications of rough set theory, which can be used for feature selection, feature representation and classification and clustering. The new feature selecting technologies based on the characteristics of the rough set method has the function of avoiding loss of information and resolving dimension problem of data set. Rough set is very important to artificial intelligence and cognitive science, which has attracted much attention since it rises, and is given lots of attention Decision tree Decision tree is a tree structure which is similar to flow diagram. The every internal node represents a test on an attribute, namely, the logical judgment following the form of ai=vi, where ai represents the property and vi is an attribute value of the property. The branch of the tree is on behalf of each test results, that is, every possible value and the every side is one-to-one correspondence. And the leaf node represents a category [7]. The input of the decision tree is a group of data with a category label while the structure of the result is a binary tree or multiple trees. The node in the tree can be divided into two categories: decision nodes and leaf nodes. The decision tree is a commonly used method for supervised learning. Firstly, a subset of instance is selected from the training set. Then build a decision tree with these subsets and the remaining set of training instances are used to test the accuracy of decision tree. If the instances can be classified by the decision tree, the process ends. If there is any error in instance classification, the instance is added to the selected training instance subsets and builds a new tree until the decision tree can classify all the not selected training instances correctly. The basic algorithm of generating decision tree is as following: Input: the training samples, where the value of each attribution is discrete and the available candidate attribute set is represented by attribute_list. Output: the decision tree. Create a node N. If all samples in the node belong to the same class C, the root node corresponds to all the training samples. Return N as a leaf node and marked as category C; If attribute_list is null, then return N as a leaf node and mark the node as the category which has the largest of samples contained in the node. Select an attribute with the greatest information gain from the attribute_list and the node N is marked as test_attribute. For the each known value of the test_attribute, denoted by ai, divide the sample set included in the node N. Based on the condition of the following: test_attribute=ai, produce a corresponding branch from the node N to in dictate the test conditions. Supposing si as the sample collection produced in the condition of test_attribute=ai. If si is null, the corresponding leaf node is marked as the category whose number is the largest in the sample included in the node. Otherwise, the corresponding leaf node is marked as Generate_decision_tree. The termination of the recursive operations algorithm is as following: (1) All samples of nodes belong to the same class. (2) No remaining attributes can be used to further divide the sample. (3) There are no samples meeting the condition test_attribute=ai. The basic decision tree algorithm is a greedy algorithm, which constructs a decision tree using a recursive way with top-down and dividing and ruling. The generate_decision_tree algorithm is a basic version of the well-known decision tree algorithm ID Classification algorithm with decision tree While a decision tree is built up, many branches of the tree are constructed based on the abnormal data in the training sample set. Branches pruning is proposed to solve the problem of noise. There are many post-pruning algorithm such as REP(Reduced Error Pruning),PEP(Pessimistic Error Pruning), MPE(Minimum Error Pruning),CCP(Cost-Complexity Pruning),EBP(Error Based Pruning) and so on. At present the research about decision tree mainly exist in the following fields: dimensionality reduction (test attribute reduction), attributes test standards, pruning and other problems The data for data mining may contain hundreds of condition attributes, and each attribute is treated as a dimension. There are many significant key attributes for data mining in condition attributes and also there are a large number of irrelevant, redundant or even harmful attributes for mining task. So reducing the number of attributes used in the establishment of decision tree not only adept at 482

4 handling large-scale high-dimensional data and improve the practicality of decision tree but also an effective means to filter out the harmful, redundant attributes to improve the prediction accuracy of decision tree.in the decision tree-building process, how to choose the condition attribute as the root node and nodes at the test attribute is one of the core issues of the decision tree algorithm. Information entropy is an important metric used to analyze the degree of uncertainty in information theory, which gains the minimum amount of information for a given condition from the statistical point of view and measure the degree of uncertainty by the amount of information required. When the decision tree is created, due to noise and isolated points, many branches reflect the abnormal training data. At the same time due to the noise data, the error, or interference data in the training set, thus the decision tree generated based on the training set often contains some wrong information. The existing pruning methods can be divided into pre-pruning and post-pruning. In pre-pruning, the decision tree is pruned by early stopping the tree construction, while the node becomes a leaf node once stops. Post-pruning method prunes the less inappropriate branches on the growth decision tree. 3. Data Ming based on rough sets and decision tree optimization The rough set theory has become one of the important basic theories of data mining. Combination of data mining methods of the decision tree and rough set theory has been widely used in data mining [8]. Dealing with incomplete, inaccurate and incompatible data is the most significant advantage of rough set theory, which can take advantage of rough set theory to employ attribute reduction of decision tree and remove the redundant attributes in the premise of maintaining the same classification ability. The paper presents an algorithm based on attribute dependency-based decision tree attribute reduction and post-pruning with rough set theory Attribution reduction based on attributes dependency Attribute reduction is the core content of the rough set theory, which does not affect the original system by deleting irrelevant or unimportant condition attributes. So the original system can be simplified. Experimental results show that decision tree computational cost is proportional to the number of attributes used in the contribution. Generally speaking, the less the reduction attributes, the less rules generated and the lower test costs of new objects classification. In this paper, the purpose of the improved attribute reduction algorithm is to ensure the effectiveness of the algorithm under the premise of reduction Attribute reduction algorithm commonly used Attribute reduction algorithm commonly used in rough set is based on the core set and add more important attribute to the collection gradually until meet the conditions: POS Reduct (D) =POS C (D), where Reduct denotes reduction, C denotes condition attribute set and D denotes decision attribute set [9]. In the algorithm, the entire condition attribute set C is treated as a reduction. The collection of unnecessary attributes is removed gradually using heuristic information of the region to get t the attribute reduction set while meet the conditions of satisfying the above equation. Attribute reduction algorithm commonly used in rough set generally take two steps to complete as following: Input: a decision table Output: a relative reduction of the decision table Calculating core Core=C; // Core expresses core and C expresses Condition attribute set For (I=0; I<K; I++) // K expresses the number of attribute {P=C-{Ci}}; // Ci expresses the i-th attribute value Dependence p (D) = POS p (D)/ U ; // the dependence of collection D on P If (Dependence p (D) = =1) Core= Core P; } 483

5 Reduction Supposed R=Core D(C); P=C- Core D(C). Do Select the attribute of ai from P, calculating the maximum value of the following formula; Pos=POS P (D)-POS (p-{ai}) (D) R=R {ai} P=P-{ai} Until POS R (D) = POS C (D) Return R In this paper, an improved reduction algorithm based on the dependence of attribute is proposed, which is called ER for short. The better attribute reduction set can be found without going through a lot of computing. In the algorithm, the core is calculated firstly. Then add a reduction attribute based on the core, where the attribute should ensure that the new attribute set is bigger than the dependence of the original collection which the attribute is not added before. Repeat this process, until all the dependence of attributes in the reduction set and the original information tables are consistent. The algorithm is described as follows: ER(C,D),where C denotes condition attribute set and D denotes decision attribute set. Input: a decision table. Output: a relative reduction of the decision table. (1) Calculating core Core=C; // Core expresses core and C expresses Condition attribute set For (I=0; I<K; I++) //n expresses the number of attribute {P=C-{Ci}}; // Ci expresses the i-th attribute value Dependence p (D) = POS p (D)/ U ; // the dependence of collection D on P If (Dependence p (D) = =1) Core= Core P; } (2) Reduction R {Core} Do T R X (C-R) If Dependence R {x} (D) >Dependence T (D) { T R {x} } R T Until Dependence R (D) == Dependence C (D) Return R In this paper, the improved algorithm ER also calculates the core firstly, and then employs attribute reduction step. Compared with Reduct Algorithm, they have the same way to calculate the core. But in ER algorithm, attribute reduction is to meet the condition of Dependence R (D) == Dependence C (D).And do not calculate all the attributes of the collection of T,which will greatly reduce the time complexity. UCI data set is a commonly used standard test data, which collects a large number of the database used in a variety of machine learning methods. We choose five discrete databases and carry out experiment using attribute reduction algorithm based on attribute dependency. The experimental results are shown in the table

6 Table 1. UCI Test Data Sets Database name Number of attributes after Number of original reduction attributes Deduct ER Standardized Audio logy Database German Credit Date Kinship Domain Chess End-Game Mushroom database Domain The test results from the UCI data set show that the proposed algorithm has better improvement not only in time complexity but also the reduction results Decision tree optimization based on rough set Decision tree pruning is one of the main content in decision tree optimization study now. Pruning technology is divided into pre-pruning and post- pruning. Pre-pruning technology only focuses the local information of the tree. There is certain blindness, which may make the decision tree prematurely stop growing and difficult to determine whether the child nodes of the node be cut off has the value of existence. Generally, we can not obtain the optimal decision tree using pre-pruning method. However, post-pruning take advantage of the global information of the decision tree, so it is often better than prepruning and commonly used in practice. Based on the study of rough set theory, this paper presents an improved pruning algorithm for decision tree based on rough set theory Post- pruning method of decision tree While a decision tree is just built up, many branches of the tree are constructed based on the abnormal data in the training sample set (due to noise, etc.). Branches pruning is proposed to solve the problem of noise. The post-pruning method of decision tree trims off the excess branches from a "fully grown" tree [10-13]. Most of the existing decision tree post-pruning algorithms are often improved based on REP algorithm as a benchmark. REP algorithm is first proposed by Quinlan, which is one of the simplest pruning methods. In REP, it need an independent test set (set of pruning data) to calculate the accuracy of the sub-tree. And a tree node will be treated as the pruning of candidate objects, which process is as following: for each sub-tree S of tree T, make it a leaf node generate a new tree bottom-up. If in the test set, the new tree can get a smaller or equal classification error, and the sub-tree S does not contain the sub-tree of the same nature, then S is to be deleted instead of leaf node. Repeat this process, until without increase the classification error on the test set while each sub-tree is replaced by leaf nodes. However, the including nodes because of coincidence regularity of the training set will be deleted, because the same coincidences do not likely appear in the test set. Comparing the error rate repeat, always select the deletion node which may improve the accuracy of decision tree on the test set to prune until further pruning will reduce the accuracy of the decision tree on the test set. The decision tree obtained using REP method is the most accurate sub-tree on the test set and is the smallest scale tree. In addition, its computational complexity is linear. Because the probability of the sub-tree whether to be pruned is assessed by accessing each non-leaf nodes of the decision tree once. Furthermore, comparing with the original decision tree, the forecast bias of future examples based on post- pruning decision tree is small because of using an independent test set. However, there are inadequacies in this method, which is biased in favor of excessive pruning. Branches corresponding to those instances in the test set which rarely appear in the training data should be deleted in the pruning process. This problem is particularly prominent when the test set is much smaller than the training set. If the training data set is small, this method usually does not be considered Improved post-pruning algorithm based on rough set theory 485

7 In this paper, improved post-pruning algorithm is described as follows: Firstly, calculate the core of attributes using the above method. Core attribute is often more important for classification. So the nodes in decision tree corresponding with core attribute are called important nodes. Next, for the each non-leaf node A, assume the corresponding sub-tree as T, calculate the root node s error rate denoted as e of T' and calculate the important node s error rate contained in T ', finding out the minimum error rate denoted as e'. Then employ the decision tree pruning which meets one of the following conditions: (1)There is not important leaf nodes contained in sub-tree T ; (2) There is important leaf nodes contained in sub-tree T and e e'. Algorithm: Prune (T) Input: A decision tree T with complete growth Output: Pruning tree {TP} Starting from the root of the tree T For all sub-tree T of T { e: Classification error rate of the root node of T ' e : the minimum classification error rate of important nodes in branches of T. If not find important nodes or e e' { Pruning sub-tree to leaf nodes and be marked as the class represented by the majority instances of T } Compared with the commonly used pruning algorithm, the pruning method proposed in this paper; simply calculate the error rate of sub-tree root nodes and important nodes contained in sub-tree without having to calculate the non-critical node error rate in the pruning process, which largely reduce the computational complexity. In addition, when the pruning method combining the attribute reduction methods mentioned above to construct decision trees, the previously calculated core attributes can be used directly while calculate the important attribute in such pruning algorithm because the core attribute has been calculated during attribute reduction, which reduce the complexity of the decision tree algorithm and improve the efficiency of the decision tree constructing. We've already introduced REP pruning method, which is a relatively simple pruning method. The decision tree will be established using the training set in this section. 4. Experiments Data mining is a very complex process. Each type of data mining technology has its own characteristics, and implementation steps. The different requirements of input/output data in form, structure, parameter setting, and training, testing and model evaluation methods indicate the difference of the meaning and the ability of the algorithm application areas. Data mining is closely related with the specific application. The goal of each data mining application problem, the data collection, the extent of the problem and the selection of algorithm do not be same. We select service provider information from the SQL Server 2000 data base table as a data mining objects. The system supplier evaluation mainly carries out mining of supplier information and makes the discrimination of supplier s importance to guide decision-making of company. In this paper, the model of creating a decision tree is shown as figure

8 Figure 1. Flow chart of decision tree modeling We extract some of the data for data mining from 500 raw data after preprocessing. Then construct decision tree based on information gain of ID3 algorithm. The decision tree obtained follows the form of the figure 2. Finally, after pruning, the final decision tree can be obtained as figure 3. Figure 2. Initial Decision Tree Figure 3. Final Decision Tree In order to further assess the model proposed in this algorithm, the four databases in the public databases of the UCI are selected for simulation test. And the results obtained from the proposed decision tree algorithm are compared with the corresponding results from the EBP pruning method of C4.5 algorithm. The four basic databases information are shown in table 2 and the comparison results of test are shown in table 3. Table 2. Database information Database Australian German Sonar Sat Sample number Attribute number Category number

9 Table 3. Test Results Algorithm Database Decision tree C4.5 Australian Number of condition attributes for building tree German Sonar Sat Prediction accuracy Australian 87.2% 83.3% German 75.1% 73.2% Sonar 81.3% 74.1% Sat 81.9% 85.8% As showing in table 3, the algorithm decision tree significantly reduces the number of attributes used to create a decision tree comparing with the algorithm of C4.5, because attribute reduction has been employed using ER algorithm before building decision tree. Because the calculation cost of decision tree is proportional to the number of attributes for building decision tree, so the Decision Tree algorithm significantly reduced the computational cost. Meanwhile, the post-pruning algorithm of decision tree has less complexity, which also improves the efficiency of constructing decision tree. The experiments show that in most datasets, the prediction accuracy of the decision tree algorithm is better than C4.5.In addition, they have roughly the same size of building decision tree. 5. Conclusions The paper proposes an attribute reduction method based on the attribute dependence (ER) by studying on data mining and rough set theory and compares the method with commonly used rough set attribute reduction methods. A post-pruning method of decision tree based on rough set theory is proposed. And the experimental results show that the decision tree constructed from post- pruning method is smaller than the tree based on REP, and has high accuracy. The decision tree constructed using ER and post-pruning method based on rough set is applied in supplier evaluation system. Practice has proved that the decision tree constructed by this method has relatively small size, with high prediction accuracy. 6. References [1] Pawlak Z, Skowron A, "Rudiments of Rough Sets", Information Science, vol. 117, no.1, pp. 3-37, [2] Fan Ming, Meng Xiao Feng, "Data mining: Concept and technique", Beijing: Machinery Industry Publication, China, [3] Manish Mehta, Jordan Rissanen, Rakesh Arrayal, "MDL-based Decision Tree Pruning", International Conference on Knowledge Discovery in Databases and Data Mining, pp , [4] J. Mingers, "An Empirical Comparison of Pruning Methods for Decision Tree Induction", Machine Learning, vol. 4, no.2, pp , [5] Agrawal R, Lmielinshi T, Swim A, "Database Mining: A Performance Perspective", IEEE Trans. on Knowledge and Data Engineering, vol. 5, no.6, pp , [6] Shang Zhi, "Algorithm of Attribute Value Reduction and Its Application Based on Rough Sets", Computer Applications and Software, vol. 26, no.2, pp , [7] J.R. Quinlan, "Induction of Decision Tree", Machine Learning, vol. 1, no.1, pp ,

10 [8] Han J W, Kamber M, "Data Mining: Concepts and Techniques", Morgan Kaufmann Publishers, San Francisco, [9] Supriya K.D., Krishna R, "Clustering Web Transactions using Rough Approximation", Fuzzy Set and Systems, vol. 148, no.1, pp , [10] Jinmao Wei, "Rough set based Approach to Selection of Node", International Journal of computation Cognition, vol. 1, no.2, pp [11] Xuelei Xu, Chunwei Lou, "Applying Decision Tree Algorithms in English Vocabulary Test Item Selection", IJACT: International Journal of Advancements in Computing Technology, vol. 4, no. 4, pp , [12] Sudheep Elayidom.M, Sumam Mary Idikkula, Joseph Alexander, "Design and Performance analysis of Data mining techniques Based on Decision trees and Naive Bayes classifier For", JCIT: [13] Journal of Convergence Information Technology, vol. 6, no. 5, pp ,

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Data Mining in the Application of Criminal Cases Based on Decision Tree

Data Mining in the Application of Criminal Cases Based on Decision Tree 8 Journal of Computer Science and Information Technology, Vol. 1 No. 2, December 2013 Data Mining in the Application of Criminal Cases Based on Decision Tree Ruijuan Hu 1 Abstract A briefing on data mining

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com

More information

Classification On The Clouds Using MapReduce

Classification On The Clouds Using MapReduce Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt

More information

AnalysisofData MiningClassificationwithDecisiontreeTechnique

AnalysisofData MiningClassificationwithDecisiontreeTechnique Global Journal of omputer Science and Technology Software & Data Engineering Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Towards applying Data Mining Techniques for Talent Mangement

Towards applying Data Mining Techniques for Talent Mangement 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Performance Analysis of Decision Trees

Performance Analysis of Decision Trees Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad sakshikashyap09@gmail.com

More information

Weather forecast prediction: a Data Mining application

Weather forecast prediction: a Data Mining application Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,ashwini.mandale@gmail.com,8407974457 Abstract

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.

More information

Introduction to Data Mining Techniques

Introduction to Data Mining Techniques Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and

More information

Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska. Classification Lecture Notes Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Method of Fault Detection in Cloud Computing Systems

Method of Fault Detection in Cloud Computing Systems , pp.205-212 http://dx.doi.org/10.14257/ijgdc.2014.7.3.21 Method of Fault Detection in Cloud Computing Systems Ying Jiang, Jie Huang, Jiaman Ding and Yingli Liu Yunnan Key Lab of Computer Technology Application,

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

Indian Agriculture Land through Decision Tree in Data Mining

Indian Agriculture Land through Decision Tree in Data Mining Indian Agriculture Land through Decision Tree in Data Mining Kamlesh Kumar Joshi, M.Tech(Pursuing 4 th Sem) Laxmi Narain College of Technology, Indore (M.P) India k3g.kamlesh@gmail.com 9926523514 Pawan

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Healthcare Data Mining: Prediction Inpatient Length of Stay

Healthcare Data Mining: Prediction Inpatient Length of Stay 3rd International IEEE Conference Intelligent Systems, September 2006 Healthcare Data Mining: Prediction Inpatient Length of Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, Elia El-Darzi 1 Abstract

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Optimization of C4.5 Decision Tree Algorithm for Data Mining Application

Optimization of C4.5 Decision Tree Algorithm for Data Mining Application Optimization of C4.5 Decision Tree Algorithm for Data Mining Application Gaurav L. Agrawal 1, Prof. Hitesh Gupta 2 1 PG Student, Department of CSE, PCST, Bhopal, India 2 Head of Department CSE, PCST, Bhopal,

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

The Research of Data Mining Based on Neural Networks

The Research of Data Mining Based on Neural Networks 2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51.09 The Research of Data Mining

More information

Knowledge Based Descriptive Neural Networks

Knowledge Based Descriptive Neural Networks Knowledge Based Descriptive Neural Networks J. T. Yao Department of Computer Science, University or Regina Regina, Saskachewan, CANADA S4S 0A2 Email: jtyao@cs.uregina.ca Abstract This paper presents a

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

More information

Neural Networks in Data Mining

Neural Networks in Data Mining IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department

More information

DATA MINING METHODS WITH TREES

DATA MINING METHODS WITH TREES DATA MINING METHODS WITH TREES Marta Žambochová 1. Introduction The contemporary world is characterized by the explosion of an enormous volume of data deposited into databases. Sharp competition contributes

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016 Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

More information

S.Thiripura Sundari*, Dr.A.Padmapriya**

S.Thiripura Sundari*, Dr.A.Padmapriya** Structure Of Customer Relationship Management Systems In Data Mining S.Thiripura Sundari*, Dr.A.Padmapriya** *(Department of Computer Science and Engineering, Alagappa University, Karaikudi-630 003 **

More information

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES The International Arab Conference on Information Technology (ACIT 2013) PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES 1 QASEM A. AL-RADAIDEH, 2 ADEL ABU ASSAF 3 EMAN ALNAGI 1 Department of Computer

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp

More information

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell Leo_Pipino@UML.edu David Kopcso Babson College Kopcso@Babson.edu Abstract: A series of simulations

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,

More information

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

More information

Data mining techniques: decision trees

Data mining techniques: decision trees Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES JULIA IGOREVNA LARIONOVA 1 ANNA NIKOLAEVNA TIKHOMIROVA 2 1, 2 The National Nuclear Research

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics 1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

DATA PREPARATION FOR DATA MINING

DATA PREPARATION FOR DATA MINING Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

More information