# D A T A M I N I N G C L A S S I F I C A T I O N

Save this PDF as:

Size: px
Start display at page:

Download "D A T A M I N I N G C L A S S I F I C A T I O N"

## Transcription

1 D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe. The tendency is to keep increasing year after year. It is not hard to find databases with Terabytes of data in enterprises and research facilities. That is over 1,099,511,627,776 bytes of data. There is invaluable information and knowledge hidden in such databases; and without automatic methods for extracting this information it is practically impossible to mine for them. Throughout the years many algorithms were created to extract what is called nuggets of knowledge from large sets of data. There are several different methodologies to approach this problem: classification, association rule, clustering, etc. This paper will focus on classification which is described in more details in the next section. PROBLEM DESCRIPTION Classification consists of predicting a certain outcome based on a given input. In order to predict the outcome, the algorithm processes a training set containing a set of attributes and the respective outcome, usually called goal or prediction attribute. The algorithm tries to discover relationships between the attributes that would make it possible to predict the outcome. Next the algorithm is given a data set not seen before, called prediction set, which contains the same set of attributes, except for the prediction attribute not yet known. The algorithm analyses the input and produces a prediction. The prediction accuracy defines how good the algorithm is. For example, in a medical database the training set would have relevant patient information recorded previously, where the prediction attribute is whether or not the patient had a heart problem. Table 1 below illustrates the training and prediction sets of such database. Training set Age Heart rate Blood pressure Heart problem /70 Yes /76 No /65 No Prediction set Age Heart rate Blood pressure Heart problem /89? /63? /65? TABLE 1 TRAINING AND PREDICTION SETS FOR MEDICAL DATABASE

2 Among several types of knowledge representation present in the literature, classification normally uses prediction rules to express knowledge. Prediction rules are expressed in the form of IF-THEN rules, where the antecedent (IF part) consists of a conjunction of conditions and the rule consequent (THEN part) predicts a certain predictions attribute value for an item that satisfies the antecedent. Using the example above, a rule predicting the first row in the training set may be represented as following: IF (Age=65 AND Heart rate>70) OR (Age>60 AND Blood pressure>140/70) THEN Heart problem=yes In most cases the prediction rule is immensely larger than the example above. Conjunction has a nice property for classification; each condition separated by OR s defines smaller rules that captures relations between attributes. Satisfying any of these smaller rules means that the consequent is the prediction. Each smaller rule is formed with AND s which facilitates narrowing down relations between attributes. How well predictions are done is measured in percentage of predictions hit against the total number of predictions. A decent rule ought to have a hit rate greater than the occurrence of the prediction attribute. In other words, if the algorithm is trying to predict rain in Seattle and it rains 80% of the time, the algorithm could easily have a hit rate of 80% by just predicting rain all the time. Therefore, 80% is the base prediction rate that any algorithm should achieve in this case. The optimal solution is a rule with 100% prediction hit rate, which is very hard, when not impossible, to achieve. Therefore, except for some very specific problems, classification by definition can only be solved by approximation algorithms. APPROXIMATION ALGORITHMS STATISTICAL ALGORITHMS: ID3 AND C4.5 The ID3 algorithm was originally developed by J. Ross Quinlan at the University of Sydney, and he first presented it in the 1975 book Machine Learning. The ID3 algorithm induces classification models, or decision trees, from data. It is a supervised learning algorithm that is trained by examples for different classes. After being trained, the algorithm should be able to predict the class of a new item. ID3 identifies attributes that differentiate one class from another. All attributes must be known in advance, and must also be either continuous or selected from a set of known values. For instance, temperature (continuous), and country of citizenship (set of known values) are valid attributes. To determine which attributes are the most important, ID3 uses the statistical property of entropy. Entropy measures the amount of information in an attribute. This is how the decision tree, which will be used in testing future cases, is built. One of the limitations of ID3 is that it is very sensitive to attributes with a large number of values (e.g. social security numbers). The entropy of such attributes is very low, and they don t help you in performing any type of prediction. The C4.5 algorithm overcomes this problem by using another statistical property known as information gain. Information gain measures how well a given attribute separates the training sets into the output classes. Therefore, the C4.5 algorithm extends

3 the ID3 algorithm through the use of information gain to reduce the problem of artificially low entropy values for attributes such as social security numbers. GENETIC PROGRAMMING Genetic programming (GP) has been vastly used in research in the past 10 years to solve data mining classification problems. The reason genetic programming is so widely used is the fact that prediction rules are very naturally represented in GP. Additionally, GP has proven to produce good results with global search problems like classification. The search space for classification can be described as having several peaks, this causes local search algorithms, such as simulated annealing, to perform badly. GP consists of stochastic search algorithms based on abstractions of the processes of Darwinian evolution. Each candidate solution is represented by an individual in GP. The solution is coded into chromosome like structures that can be mutated and/or combined with some other individual s chromosome. Each individual contains a fitness value, which measures the quality of the individual, in other words how close the candidate solution is from being optimal. Based on the fitness value, individuals are selected to mate. This process creates a new individual by combining two or more chromosomes, this process is called crossover. They are combined with each other in the hope that these new individuals will evolve and become better than their parents. Additionally to matting, chromosomes can be mutated at random. The running time of GPs is usually controlled by the user. There are many parameters used to determine when the algorithm should stop, and each data set can have very different settings. In all cases, the best individual is stored across generations and is returned when the algorithm stops. The most commonly used parameter is number of generations. Another stop parameters used is minimum expected hit ratio, in which case the algorithm will run until a candidate solution has a hit ratio greater than expected. This however can cause the algorithm to run forever. Combinations of stop conditions can also be used to ensure stoppage. NEURAL NETWORKS Neural networks were modeled after the cognitive processes of the brain. They are capable of predicting new observations from existing observations. A neural network consists of interconnected processing elements also called units, nodes, or neurons. The neurons within the network work together, in parallel, to produce an output function. Since the computation is performed by the collective neurons, a neural network can still produce the output function even if some of the individual neurons are malfunctioning (the network is robust and fault tolerant). In general, each neuron within a neural network has an associated activation number. Also, each connection between neurons has a weight associated with it. These quantities simulate their counterparts in the biological brain: firing rate of a neuron, and strength of a synapse. The activation of a neuron depends on the activation of the other neurons and the weight of the edges that are connected to it. The neurons within a neural network are usually arranged in layers. The number of layers within the neural network, and the number of neurons within each layer normally matches the nature of the investigated phenomenon.

4 After the size has been determined, the network is usually then subjected to training. Here, the network receives a sample training input with its associated classes. It then applies an iterative process on the input in order to adjust the weights of the network so that its future predictions are optimal. After the training phase, the network is ready to perform predictions in new sets of data. Neural networks can often produce very accurate predictions. However, one of their greatest criticisms is the fact that they represent a black-box approach to research. They do not provide any insight into the underlying nature of the phenomena. ANT COLONY Ant Colony algorithms were first introduced in the 1992 PhD thesis of Marco Dorigo. They offer a way of finding good paths within a graph, and they were inspired by the behavior of ants in finding paths from the colony to food. When traveling from their colony to food sources, ants deposit chemicals called pheromones on the trails. The trail is used so that ants can find their way back to the colony. If other ants find this path they are likely to follow it. This will cause more pheromone to be deposited on the trail, which has the effect of reinforcing it. However, if a trail has not been used for a while, the pheromone starts to evaporate. Short paths have the advantage of being marched over faster and more often, therefore, the pheromone density remains high. So a short path will have more ants traveling on it, increasing the pheromone density even more, and eventually all the ants will follow it. Ant algorithms mimic this behavior in order to find optimal paths within a graph. Initially, all paths have a random small amount of pheromone deposited on it. An ant departs from the starting node and starts the process of visiting the other nodes in the graph. At each node, the ant decides which node it should visit next. The ant rates the attractiveness of traveling to each node in the graph. A nearby node with a large amount of pheromone deposited in its path will be very attractive. Also, a distant node with little amount of pheromone deposited in its path will be very unattractive. This attractiveness information will be use in order to compute the probability that a given route will be taken. Ant algorithms give good results relatively fast. They can be run continuously in order to adapt to changes in real-time: an advantage as compared to genetic algorithms. For instance, an obstacle such as a traffic jam would quickly lead the ants to discover a different good path. This can be of interest in applications such as network routing and urban transportation. PAPER DISCUSSION The classification algorithm described in [Mendes et al. 2001] offers an interesting combination of approaches. It consists of main GP algorithm, where each individual represents an IF-THEN prediction rule, having the rule modeled as a Boolean expression tree. Matting is done by selecting a random subtree on each individual and then swapping them. Tests have shown that the algorithm has a tendency of creating very large rules. This happens because it is possible to create a 100% accurate rule by making a subrule for each row in the training data set and making them match. This indeed is optimal for the training set, but clearly performs badly with new data. To avoid this, the fitness function penalizes large rules. Interesting another cause of rule bloat is the opposite;

5 rules that do not match any of the rows in the training set. These rules do not affect the individual fitness value in the training set, but it can cause the rule to perform badly with real data. Additionally, it makes rules larger which consume more memory, and takes longer to evaluate. The algorithm deals with this by pruning all non-matching subtrees from the individuals. Continuous attributes poses a challenge to such algorithms because rules impose hard constraints on attributes, e.g. (Age>65 AND Heart rate>70). However, a patient does not start having heart problem on the day he turns 65, but the probability of having heart problems increase as he gets close to turning 65. The algorithm being discussed solves this problem by using fuzzy logic to describe continuous attributes. This allows for smooth transition between states. The challenge however is to partition the attribute correctly, i.e. when should the chance of heart attack start increasing and how fast is should increase are difficult questions to answers. So there is another genetic algorithm (GA) that evolves individuals representing fuzzy membership functions. In the beginning all continuous attributes are equally partitioned, then after each generation of the main GP, a generation of the GA runs to adjust the membership functions. The goal is to find membership functions that maximize the main GP fitness. This algorithm showed accuracy of the resulting rule superior to other similar algorithms. This can be attributed in part to fuzzy rules and how it evolves along side with the prediction rules. CONCLUSION Data mining offers promising ways to uncover hidden patterns within large amounts of data. These hidden patterns can potentially be used to predict future behavior. The availability of new data mining algorithms, however, should be met with caution. First of all, these techniques are only as good as the data that has been collected. Good data is the first requirement for good data exploration. Assuming good data is available, the next step is to choose the most appropriate technique to mine the data. However, there are tradeoffs to consider when choosing the appropriate data mining technique to be used in a certain application. There are definite differences in the types of problems that are conductive to each technique. The best model is often found by trial and error: trying different technologies and algorithms. Often times, the data analyst should compare or even combine available techniques in order to obtain the best possible results. REFERENCES Data Mining. March March 2007 <http://en.wikipedia.org/wiki/datamining/>. Classifying Text with ID3 and C4.5. June March 2007 <http://www.ddj.com/ />. Data Mining Techniques March 2007 <http://www.statsoft.com/textbook/stdatmin.html/>. Ant Colony Algorithm. January March 2007 <http://www.egilh.com/blog/archive/2007/01/08/3322.aspx/>.

6 [Mendes et al. 2001] R.F. Mendes, F.B. Voznika, A.A. Freitas and J.C. Nievola. Discovering fuzzy classification rules with genetic programming and co-evolution. Principles of Data Mining and Knowledge Discovery (Proceedings of the 5 th European Conference, PKDD 2001) Lecture Notes in Artificial Intelligence 2168, , Springer, Genetic Programming, John R. Koza, MIT Press, Genetic Programming An Introduction, W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Morgan Kaufmann Publishers, Data Mining and Knowledge Discovery with Evolutionary Algorithms, A.A. Freitas, Springer-Verlag, 2002.

### Evolutionary Data Mining With Automatic Rule Generalization

Evolutionary Data Mining With Automatic Rule Generalization ROBERT CATTRAL, FRANZ OPPACHER, DWIGHT DEUGO Intelligent Systems Research Unit School of Computer Science Carleton University Ottawa, On K1S

### Improving Decision Making and Managing Knowledge

Improving Decision Making and Managing Knowledge Decision Making and Information Systems Information Requirements of Key Decision-Making Groups in a Firm Senior managers, middle managers, operational managers,

### EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

### Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Computational intelligence in intrusion detection systems

Computational intelligence in intrusion detection systems --- An introduction to an introduction Rick Chang @ TEIL Reference The use of computational intelligence in intrusion detection systems : A review

### Neural Networks and Back Propagation Algorithm

Neural Networks and Back Propagation Algorithm Mirza Cilimkovic Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15 Ireland mirzac@gmail.com Abstract Neural Networks (NN) are important

### DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

### Using Artificial Life Techniques to Generate Test Cases for Combinatorial Testing

Using Artificial Life Techniques to Generate Test Cases for Combinatorial Testing Presentation: TheinLai Wong Authors: T. Shiba,, T. Tsuchiya, T. Kikuno Osaka University Backgrounds Testing is an important

### Project Ideas in Computer Science. Keld Helsgaun

Project Ideas in Computer Science Keld Helsgaun 1 Keld Helsgaun Research: Combinatorial optimization Heuristic search (artificial intelligence) Simulation Programming tools Teaching: Programming, algorithms

### An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients

An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients Celia C. Bojarczuk 1, Heitor S. Lopes 2 and Alex A. Freitas 3 1 Departamento

### Projects - Neural and Evolutionary Computing

Projects - Neural and Evolutionary Computing 2014-2015 I. Application oriented topics 1. Task scheduling in distributed systems. The aim is to assign a set of (independent or correlated) tasks to some

### Learning. Artificial Intelligence. Learning. Types of Learning. Inductive Learning Method. Inductive Learning. Learning.

Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Artificial Intelligence Learning Chapter 8 Learning is useful as a system construction method, i.e., expose

### 14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

### Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone

### GPSQL Miner: SQL-Grammar Genetic Programming in Data Mining

GPSQL Miner: SQL-Grammar Genetic Programming in Data Mining Celso Y. Ishida, Aurora T. R. Pozo Computer Science Department - Federal University of Paraná PO Box: 19081, Centro Politécnico - Jardim das

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

### An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. XML data mining research based on multi-level technology

[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 13 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(13), 2014 [7602-7609] XML data mining research based on multi-level technology

### Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

### What is Artificial Intelligence?

Introduction to Artificial Intelligence What is Artificial Intelligence? One definition: AI is the study of how to make computers do things that people generally do better Many approaches and issues, e.g.:

### An ACO Approach to Solve a Variant of TSP

An ACO Approach to Solve a Variant of TSP Bharat V. Chawda, Nitesh M. Sureja Abstract This study is an investigation on the application of Ant Colony Optimization to a variant of TSP. This paper presents

### Genetic programming with regular expressions

Genetic programming with regular expressions Børge Svingen Chief Technology Officer, Open AdExchange bsvingen@openadex.com 2009-03-23 Pattern discovery Pattern discovery: Recognizing patterns that characterize

### not possible or was possible at a high cost for collecting the data.

Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

### Web Mining using Artificial Ant Colonies : A Survey

Web Mining using Artificial Ant Colonies : A Survey Richa Gupta Department of Computer Science University of Delhi ABSTRACT : Web mining has been very crucial to any organization as it provides useful

### Alpha Cut based Novel Selection for Genetic Algorithm

Alpha Cut based Novel for Genetic Algorithm Rakesh Kumar Professor Girdhar Gopal Research Scholar Rajesh Kumar Assistant Professor ABSTRACT Genetic algorithm (GA) has several genetic operators that can

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### International Journal of Electronics and Computer Science Engineering 1449

International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

### White Paper. Data Mining for Business

White Paper Data Mining for Business January 2010 Contents 1. INTRODUCTION... 3 2. WHY IS DATA MINING IMPORTANT?... 3 FUNDAMENTALS... 3 Example 1...3 Example 2...3 3. OPERATIONAL CONSIDERATIONS... 4 ORGANISATIONAL

### Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it

### Sanjeev Kumar. contribute

RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

### Weather forecast prediction: a Data Mining application

Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,ashwini.mandale@gmail.com,8407974457 Abstract

### Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Foundations. Computational Intelligence

Foundations Computational Intelligence Intelligence A standard dictionary definition of intelligence is: "1 a (1): The ability to learn or understand or to deal with new or trying situations : REASON;

### Artificial Intelligence (AI)

Overview Artificial Intelligence (AI) A brief introduction to the field. Won t go too heavily into the theory. Will focus on case studies of the application of AI to business. AI and robotics are closely

### Obtaining Optimal Software Effort Estimation Data Using Feature Subset Selection

Obtaining Optimal Software Effort Estimation Data Using Feature Subset Selection Abirami.R 1, Sujithra.S 2, Sathishkumar.P 3, Geethanjali.N 4 1, 2, 3 Student, Department of Computer Science and Engineering,

### Keywords: Travelling Salesman Problem, Map Reduce, Genetic Algorithm. I. INTRODUCTION

ISSN: 2321-7782 (Online) Impact Factor: 6.047 Volume 4, Issue 6, June 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study

### Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

### A Survey on Intrusion Detection System with Data Mining Techniques

A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,

### Research Article www.ijptonline.com EFFICIENT TECHNIQUES TO DEAL WITH BIG DATA CLASSIFICATION PROBLEMS G.Somasekhar 1 *, Dr. K.

ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com EFFICIENT TECHNIQUES TO DEAL WITH BIG DATA CLASSIFICATION PROBLEMS G.Somasekhar 1 *, Dr. K.Karthikeyan 2 1 Research

### Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

### A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

### In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

### Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

### AN EFFICIENT LOAD BALANCING APPROACH IN CLOUD SERVER USING ANT COLONY OPTIMIZATION

AN EFFICIENT LOAD BALANCING APPROACH IN CLOUD SERVER USING ANT COLONY OPTIMIZATION Shanmuga Priya.J 1, Sridevi.A 2 1 PG Scholar, Department of Information Technology, J.J College of Engineering and Technology

### Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

### Performance Analysis of Decision Trees

Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India

### American International Journal of Research in Science, Technology, Engineering & Mathematics

American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-349, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

### testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

Extracting Knowledge from Biomedical Data through Logic Learning Machines and Rulex Marco Muselli Institute of Electronics, Computer and Telecommunication Engineering National Research Council of Italy,

### A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM

A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM MS. DIMPI K PATEL Department of Computer Science and Engineering, Hasmukh Goswami college of Engineering, Ahmedabad, Gujarat ABSTRACT The Internet

### A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

### Finding Liveness Errors with ACO

Hong Kong, June 1-6, 2008 1 / 24 Finding Liveness Errors with ACO Francisco Chicano and Enrique Alba Motivation Motivation Nowadays software is very complex An error in a software system can imply the

### NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

### Using Artificial Intelligence and Machine Learning Techniques. Some Preliminary Ideas. Presentation to CWiPP 1/8/2013 ICOSS Mark Tomlinson

Using Artificial Intelligence and Machine Learning Techniques. Some Preliminary Ideas. Presentation to CWiPP 1/8/2013 ICOSS Mark Tomlinson Artificial Intelligence Models Very experimental, but timely?

### Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

### Data Mining and Pattern Recognition for Large-Scale Scientific Data

Data Mining and Pattern Recognition for Large-Scale Scientific Data Chandrika Kamath Center for Applied Scientific Computing Lawrence Livermore National Laboratory October 15, 1998 We need an effective

### IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

### The Use of Evolutionary Algorithms in Data Mining. Khulood AlYahya Sultanah AlOtaibi

The Use of Evolutionary Algorithms in Data Mining Ayush Joshi Jordan Wallwork Khulood AlYahya Sultanah AlOtaibi MScISE BScAICS MScISE MScACS 1 Abstract With the huge amount of data being generated in the

### Chapter 11. Managing Knowledge

Chapter 11 Managing Knowledge VIDEO CASES Video Case 1: How IBM s Watson Became a Jeopardy Champion. Video Case 2: Tour: Alfresco: Open Source Document Management System Video Case 3: L'Oréal: Knowledge

### Random forest algorithm in big data environment

Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

### Contents. Dedication List of Figures List of Tables. Acknowledgments

Contents Dedication List of Figures List of Tables Foreword Preface Acknowledgments v xiii xvii xix xxi xxv Part I Concepts and Techniques 1. INTRODUCTION 3 1 The Quest for Knowledge 3 2 Problem Description

### A Comparative Study on the Use of Classification Algorithms in Financial Forecasting

A Comparative Study on the Use of Classification Algorithms in Financial Forecasting Fernando E. B. Otero and Michael Kampouridis School of Computing University of Kent, Chatham Maritime, UK {F.E.B.Otero,M.Kampouridis}@kent.ac.uk

### A Robust Method for Solving Transcendental Equations

www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. Al-Amin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,

### INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

### A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery Alex A. Freitas Postgraduate Program in Computer Science, Pontificia Universidade Catolica do Parana Rua Imaculada Conceicao,

### An Ant Colony Optimization Approach to the Software Release Planning Problem

SBSE for Early Lifecyle Software Engineering 23 rd February 2011 London, UK An Ant Colony Optimization Approach to the Software Release Planning Problem with Dependent Requirements Jerffeson Teixeira de

### An Application of Machine Learning to Network Intrusion Detection

An Application of Machine Learning to Network Intrusion Detection Chris Sinclair Applied Research Laboratories The University of Texas at Austin sinclair@arlututexasedu Lyn Pierce epierce@arlututexasedu

### 8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

### Intellectual Property / Copyright Material

What is Data Mining? Author: BALWANT RAI Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 02/27/04 Email: erg@evaltech.com Abstract: In this paper we will be going

### MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL G. Maria Priscilla 1 and C. P. Sumathi 2 1 S.N.R. Sons College (Autonomous), Coimbatore, India 2 SDNB Vaishnav College

### Data Mining Applications in Higher Education

Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

### CONSTRUCTING PREDICTIVE MODELS TO ASSESS THE IMPORTANCE OF VARIABLES IN EPIDEMIOLOGICAL DATA USING A GENETIC ALGORITHM SYSTEM EMPLOYING DECISION TREES

CONSTRUCTING PREDICTIVE MODELS TO ASSESS THE IMPORTANCE OF VARIABLES IN EPIDEMIOLOGICAL DATA USING A GENETIC ALGORITHM SYSTEM EMPLOYING DECISION TREES A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE

### Introduction. Swarm Intelligence - Thiemo Krink EVALife Group, Dept. of Computer Science, University of Aarhus

Swarm Intelligence - Thiemo Krink EVALife Group, Dept. of Computer Science, University of Aarhus Why do we need new computing techniques? The computer revolution changed human societies: communication

### LOAD BALANCING IN CLOUD COMPUTING

LOAD BALANCING IN CLOUD COMPUTING Neethu M.S 1 PG Student, Dept. of Computer Science and Engineering, LBSITW (India) ABSTRACT Cloud computing is emerging as a new paradigm for manipulating, configuring,

### Solving the Travelling Salesman Problem Using the Ant Colony Optimization

Ivan Brezina Jr. Zuzana Čičková Solving the Travelling Salesman Problem Using the Ant Colony Optimization Article Info:, Vol. 6 (2011), No. 4, pp. 010-014 Received 12 July 2010 Accepted 23 September 2011

### Self-Learning Genetic Algorithm for a Timetabling Problem with Fuzzy Constraints

Self-Learning Genetic Algorithm for a Timetabling Problem with Fuzzy Constraints Radomír Perzina, Jaroslav Ramík perzina(ramik)@opf.slu.cz Centre of excellence IT4Innovations Division of the University

### DATA MINING IN FINANCE

DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia

### CLOUD DATABASE ROUTE SCHEDULING USING COMBANATION OF PARTICLE SWARM OPTIMIZATION AND GENETIC ALGORITHM

CLOUD DATABASE ROUTE SCHEDULING USING COMBANATION OF PARTICLE SWARM OPTIMIZATION AND GENETIC ALGORITHM *Shabnam Ghasemi 1 and Mohammad Kalantari 2 1 Deparment of Computer Engineering, Islamic Azad University,

### SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,

### Web Usage Mining: Identification of Trends Followed by the user through Neural Network

International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 617-624 International Research Publications House http://www. irphouse.com /ijict.htm Web

### Web Mining as a Tool for Understanding Online Learning

Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA jadb3@mizzou.edu James Laffey University of Missouri Columbia Columbia, MO USA LaffeyJ@missouri.edu

### Apache Hama Design Document v0.6

Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault

### Using Ant Colony Optimization for Infrastructure Maintenance Scheduling

Using Ant Colony Optimization for Infrastructure Maintenance Scheduling K. Lukas, A. Borrmann & E. Rank Chair for Computation in Engineering, Technische Universität München ABSTRACT: For the optimal planning

### Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.

Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 42-268-7599 Copyright Andrew W. Moore Slide Decision Trees Decision trees

### A Review of Data Mining Techniques

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

### ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan

### What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling)

data analysis data mining quality control web-based analytics What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) StatSoft

### IAI : Biological Intelligence and Neural Networks

IAI : Biological Intelligence and Neural Networks John A. Bullinaria, 2005 1. How do Humans do Intelligent Things? 2. What are Neural Networks? 3. What are Artificial Neural Networks used for? 4. Introduction

### SOFT COMPUTING AND ITS USE IN RISK MANAGEMENT

SOFT COMPUTING AND ITS USE IN RISK MANAGEMENT doc. Ing. Petr Dostál, CSc. Brno University of Technology, Kolejní 4, 612 00 Brno, Czech Republic, Institute of Informatics, Faculty of Business and Management,

### EXECUTIVE SUPPORT SYSTEMS (ESS) STRATEGIC INFORMATION SYSTEM DESIGNED FOR UNSTRUCTURED DECISION MAKING THROUGH ADVANCED GRAPHICS AND COMMUNICATIONS *

EXECUTIVE SUPPORT SYSTEMS (ESS) STRATEGIC INFORMATION SYSTEM DESIGNED FOR UNSTRUCTURED DECISION MAKING THROUGH ADVANCED GRAPHICS AND COMMUNICATIONS * EXECUTIVE SUPPORT SYSTEMS DRILL DOWN: ability to move

### An approach towards building human behavior models automatically by observation

An approach towards building human behavior models automatically by observation Hans K. Fernlund and Avelino J. Gonzalez Intelligent Systems Laboratory School of Electrical Engineering and Computer Science

### Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling

### Influence Discovery in Semantic Networks: An Initial Approach

2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing

### A Binary Model on the Basis of Imperialist Competitive Algorithm in Order to Solve the Problem of Knapsack 1-0

212 International Conference on System Engineering and Modeling (ICSEM 212) IPCSIT vol. 34 (212) (212) IACSIT Press, Singapore A Binary Model on the Basis of Imperialist Competitive Algorithm in Order

### Neural Networks in Data Mining

IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department

### INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

### A Service Revenue-oriented Task Scheduling Model of Cloud Computing

Journal of Information & Computational Science 10:10 (2013) 3153 3161 July 1, 2013 Available at http://www.joics.com A Service Revenue-oriented Task Scheduling Model of Cloud Computing Jianguang Deng a,b,,