# Decision tree algorithm short Weka tutorial

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili Machine leanring for Web Mining a.a

2 Machine Learning: brief summary Example You need to write a program that: given a Level Hierarchy of a company given an employe described trough some attributes (the number of attributes can be very high) assign to the employe the correct level into the hierarchy.

3 Machine Learning: brief summary Example You need to write a program that: given a Level Hierarchy of a company given an employe described trough some attributes (the number of attributes can be very high) assign to the employe the correct level into the hierarchy. How many if are necessary to select the correct level?

4 Machine Learning: brief summary Example You need to write a program that: given a Level Hierarchy of a company given an employe described trough some attributes (the number of attributes can be very high) assign to the employe the correct level into the hierarchy. How many if are necessary to select the correct level? How many time is necessary to study the relations between the hierarchy and attributes?

5 Machine Learning: brief summary Example You need to write a program that: given a Level Hierarchy of a company given an employe described trough some attributes (the number of attributes can be very high) assign to the employe the correct level into the hierarchy. How many if are necessary to select the correct level? How many time is necessary to study the relations between the hierarchy and attributes? Solution Learn the function to link each employe to the correct level.

6 Supervised Learning process: two steps Learning (Training) Learn a model using the training data

7 Supervised Learning process: two steps Learning (Training) Learn a model using the training data Testing Test the model using unseen test data to assess the model accuracy

8 Supervised Learning process: two steps Learning (Training) Learn a model using the training data Testing Test the model using unseen test data to assess the model accuracy

9 Learning Algorithms Probabilistic Functions (Bayesian Classifier) Functions to partitioning Vector Space Non-Linear: KNN, Neural Networks,... Linear: Support Vector Machines, Perceptron,...

10 Learning Algorithms Probabilistic Functions (Bayesian Classifier) Functions to partitioning Vector Space Non-Linear: KNN, Neural Networks,... Linear: Support Vector Machines, Perceptron,... Boolean Functions (Decision Trees)

11 Decision Tree: Domain Example The class to learn is: approve a loan

12 Decision Tree Decision Tree example for the loan problem

13 Is the decision tree unique? No. Here is a simpler tree.

14 Is the decision tree unique? No. Here is a simpler tree. We want smaller tree and accurate tree. Easy to understand and perform better.

15 Is the decision tree unique? No. Here is a simpler tree. We want smaller tree and accurate tree. Easy to understand and perform better.

16 Is the decision tree unique? No. Here is a simpler tree. We want smaller tree and accurate tree. Easy to understand and perform better. Finding the best tree is NP-hard.

17 Is the decision tree unique? No. Here is a simpler tree. We want smaller tree and accurate tree. Easy to understand and perform better. Finding the best tree is NP-hard. All current tree building algorithms are heuristic algorithms

18 Is the decision tree unique? No. Here is a simpler tree. We want smaller tree and accurate tree. Easy to understand and perform better. Finding the best tree is NP-hard. All current tree building algorithms are heuristic algorithms A decision tree can be converted to a set of rules.

19 From a decision tree to a set of rules

20 From a decision tree to a set of rules Each path from the root to a leaf is a rule

21 From a decision tree to a set of rules Each path from the root to a leaf is a rule Rules Own_house = true Class = yes Own_house = false, Has_job = true Class = yes Own_house = false, Has_job = false Class = no

22 Choose an attribute to partition data How chose the best attribute set?

23 Choose an attribute to partition data How chose the best attribute set? The objective is to reduce the impurity or uncertainty in data as much as possible

24 Choose an attribute to partition data How chose the best attribute set? The objective is to reduce the impurity or uncertainty in data as much as possible A subset of data is pure if all instances belong to the same class.

25 Choose an attribute to partition data How chose the best attribute set? The objective is to reduce the impurity or uncertainty in data as much as possible A subset of data is pure if all instances belong to the same class. The heuristic is to choose the attribute with the maximum Information Gain or Gain Ratio based on information theory.

26 Choose an attribute to partition data How chose the best attribute set? The objective is to reduce the impurity or uncertainty in data as much as possible A subset of data is pure if all instances belong to the same class. The heuristic is to choose the attribute with the maximum Information Gain or Gain Ratio based on information theory.

27 Information Gain Entropy of D Entropy is a measure of the uncertainty associated with a random variable. Given a set of examples D is possible to compute the original entropy of the dataset such as: C H[D] = j=1 where C is the set of desired class. P(c j )log 2 P(c j )

28 Entropy As the data become purer and purer, the entropy value becomes smaller and smaller.

29 Information Gain Entropy of D Given a set of examples D is possible to compute the original entropy of the dataset such as: C H[D] = P(c j )log 2 P(c j ) where C is the set of desired class. j=1 Entropy of an attribute A i If we make attribute A i, with v values, the root of the current tree, this will partition D into v subsets D 1,D 2,...,D v. The expected entropy if A i is used as the current root: H Ai [D] = v j=1 D j D H[D j]

30 Information Gain Information Gain Information gained by selecting attribute A i to branch or to partition the data is given by the difference of prior entropy and the entropy of selected branch gain(d,a i ) = H[D] H Ai [D]

31 Information Gain Information Gain Information gained by selecting attribute A i to branch or to partition the data is given by the difference of prior entropy and the entropy of selected branch gain(d,a i ) = H[D] H Ai [D] We choose the attribute with the highest gain to branch/split the current tree.

32 Decision Tree: Domain Example Back to the example The class to learn is: approve a loan

33 Example 9 examples belong to "YES" category and 6 to "NO". Exploiting prior knowledge we have: C H[D] = P(c j )log 2 P(c j ) j=1 H[D] = 6 15 log log = 0.971

34 Example 9 examples belong to "YES" category and 6 to "NO". Exploiting prior knowledge we have: C H[D] = P(c j )log 2 P(c j ) j=1 H[D] = 6 15 log log = while partitioning through the Age feature: where H Age [D] = 5 15 H[D 1] 5 15 H[D 2] 5 15 H[D 3] = H[D 1 ] = log 3 2( ) log 2 2( ) = H[D 2 ] = log 2 2( ) log 3 2( ) = H[D 3 ] = log 1 2( ) log 4 2( ) = 0.722

35 Example

36 Example H[D] = 6 15 log log = H OH [D] = 6 15 H[D 1] 9 15 H[D 2] = =

37 Example gain(d,age) = = gain(d,own_house) = = gain(d,has_job) = = gain(d,credit) = = H[D] = 6 15 log log = H OH [D] = 6 15 H[D 1] 9 15 H[D 2] = =

38 Example gain(d,age) = = gain(d,own_house) = = gain(d,has_job) = = gain(d,credit) = = H[D] = 6 15 log log = H OH [D] = 6 15 H[D 1] 9 15 H[D 2] = =

39 Algorithm for decision tree learning Basic algorithm (a greedy divide-and-conquer algorithm) Assume attributes are categorical now

40 Algorithm for decision tree learning Basic algorithm (a greedy divide-and-conquer algorithm) Assume attributes are categorical now Tree is constructed in a top-down recursive manner

41 Algorithm for decision tree learning Basic algorithm (a greedy divide-and-conquer algorithm) Assume attributes are categorical now Tree is constructed in a top-down recursive manner At start, all the training examples are at the root

42 Algorithm for decision tree learning Basic algorithm (a greedy divide-and-conquer algorithm) Assume attributes are categorical now Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Examples are partitioned recursively based on selected attributes

43 Algorithm for decision tree learning Basic algorithm (a greedy divide-and-conquer algorithm) Assume attributes are categorical now Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Examples are partitioned recursively based on selected attributes Attributes are selected on the basis of an impurity function (e.g., information gain)

44 Algorithm for decision tree learning Basic algorithm (a greedy divide-and-conquer algorithm) Assume attributes are categorical now Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Examples are partitioned recursively based on selected attributes Attributes are selected on the basis of an impurity function (e.g., information gain) Conditions for stopping partitioning All examples for a given node belong to the same class

45 Algorithm for decision tree learning Basic algorithm (a greedy divide-and-conquer algorithm) Assume attributes are categorical now Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Examples are partitioned recursively based on selected attributes Attributes are selected on the basis of an impurity function (e.g., information gain) Conditions for stopping partitioning All examples for a given node belong to the same class There are no remaining attributes for further partitioning? majority class is the leaf

46 Algorithm for decision tree learning Basic algorithm (a greedy divide-and-conquer algorithm) Assume attributes are categorical now Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Examples are partitioned recursively based on selected attributes Attributes are selected on the basis of an impurity function (e.g., information gain) Conditions for stopping partitioning All examples for a given node belong to the same class There are no remaining attributes for further partitioning? majority class is the leaf There are no examples left

47 Algorithm for decision tree learning

48 What is?

49 What is? Collection of ML algorithms - open-source Java package Site: Documentation: documentation.html

50 What is? Collection of ML algorithms - open-source Java package Site: Documentation: documentation.html Schemes for classification include: decision trees, rule learners, naive Bayes, decision tables, locally weighted regression, SVMs, instance-based learners, logistic regression, voted perceptrons, multi-layer perceptron

51 What is? Collection of ML algorithms - open-source Java package Site: Documentation: documentation.html Schemes for classification include: decision trees, rule learners, naive Bayes, decision tables, locally weighted regression, SVMs, instance-based learners, logistic regression, voted perceptrons, multi-layer perceptron For classification, Weka allows train/test split or Cross-fold validation Schemes for clustering: EM and Cobweb

52 ARFF File Format Require

53 ARFF File Format Require @RELATION declaration associates a name with the <relation-name>

54 ARFF File Format Require @RELATION declaration associates a name with the declaration specifies the name and type of an <attribute-name> <datatype> Datatype can be numeric, nominal, string or date

55 ARFF File Format Require @RELATION declaration associates a name with the declaration specifies the name and type of an <attribute-name> <datatype> Datatype can be numeric, nominal, string or sepallength petalwidth class {Setosa,Versicolor,Virginica}

56 ARFF File Format Require @RELATION declaration associates a name with the declaration specifies the name and type of an <attribute-name> <datatype> Datatype can be numeric, nominal, string or sepallength petalwidth class declaration is a single line denoting the start of the data segment Missing values are represented by?

57 ARFF File Format Require @RELATION declaration associates a name with the declaration specifies the name and type of an <attribute-name> <datatype> Datatype can be numeric, nominal, string or sepallength petalwidth class declaration is a single line denoting the start of the data segment Missing values are represented 1.4, 0.2, Setosa 1.4,?, Versicolor

58 ARFF Sparse File Format Similar to AARF files except that data value 0 are not represented

59 ARFF Sparse File Format Similar to AARF files except that data value 0 are not represented Non-zero attributes are specified by attribute number and value

60 ARFF Sparse File Format Similar to AARF files except that data value 0 are not represented Non-zero attributes are specified by attribute number and value 0, X, 0, Y, class A" 0, 0, W, 0, class B"

61 ARFF Sparse File Format Similar to AARF files except that data value 0 are not represented Non-zero attributes are specified by attribute number and value 0, X, 0, Y, class A" 0, 0, W, 0, class B" {1 X, 3 Y, 4 class A"} {2 W, 4 class B"}

62 ARFF Sparse File Format Similar to AARF files except that data value 0 are not represented Non-zero attributes are specified by attribute number and value 0, X, 0, Y, class A" 0, 0, W, 0, class B" {1 X, 3 Y, 4 class A"} {2 W, 4 class B"} Note that the omitted values in a sparse instance are 0, they are not missing values! If a value is unknown, you must explicitly represent it with a question mark (?)

63 Running Learning Schemes java -Xmx512m -cp weka.jar <learner class> [options]

64 Running Learning Schemes java -Xmx512m -cp weka.jar <learner class> [options] Example learner classes: Decision Tree: weka.classifiers.trees.j48 Naive Bayes: weka.classifiers.bayes.naivebayes k-nn: weka.classifiers.lazy.ibk

65 Running Learning Schemes java -Xmx512m -cp weka.jar <learner class> [options] Example learner classes: Decision Tree: weka.classifiers.trees.j48 Naive Bayes: weka.classifiers.bayes.naivebayes k-nn: weka.classifiers.lazy.ibk Important generic options: -t <training file> Specify training file -T <test files> Specify Test file. If none testing is performed on training data -x <number of folds> Number of folds for cross-validation -l <input file> Use saved model -d <output file> Output model to file -split-percentage <train size> Size of training set -c <class index> Index of attribute to use as class (NB: the index start from 1) -p <attribute index> Only output the predictions and one attribute (0 for none) for all test instances.

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

### Classification and Prediction

Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

### Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

### A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad sakshikashyap09@gmail.com

### Analysis Tools and Libraries for BigData

+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

### Data Mining Practical Machine Learning Tools and Techniques

Some Core Learning Representations Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision trees Learning Rules Association rules

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### Web Document Clustering

Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

### Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

### Practical Data Mining COMP-321B. Tutorial 4: Preprocessing

Practical Data Mining COMP-321B Tutorial 4: Preprocessing Shevaun Ryan Mark Hall August 1, 2006 c 2006 University of Waikato 1 Introduction For this tutorial we will be using the Preprocess panel, the

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Data mining techniques: decision trees

Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

### Data Mining Analysis (breast-cancer data)

Data Mining Analysis (breast-cancer data) Jung-Ying Wang Register number: D9115007, May, 2003 Abstract In this AI term project, we compare some world renowned machine learning tools. Including WEKA data

### International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

### Machine Learning in Spam Filtering

Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

### Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

### Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

### Introduction Predictive Analytics Tools: Weka

Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING

I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and

### Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status

Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of

### Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

### Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Classification and Prediction

Classification and Prediction 1. Objectives...2 2. Classification vs. Prediction...3 2.1. Definitions...3 2.2. Supervised vs. Unsupervised Learning...3 2.3. Classification and Prediction Related Issues...4

### An Introduction to WEKA. As presented by PACE

An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/

### Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Engineering the input and output Attribute selection Scheme independent, scheme

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

### Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers

Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers Manjit Kaur Department of Computer Science Punjabi University Patiala, India manjit8718@gmail.com Dr. Kawaljeet Singh University

### Predicting Student Performance by Using Data Mining Methods for Classification

BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

### Decision-Tree Learning

Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

### 152 International Journal of Computer Science and Technology. Paavai Engineering College, India

IJCST Vol. 2, Issue 3, September 2011 Improving the Performance of Data Mining Algorithms in Health Care Data 1 P. Santhi, 2 V. Murali Bhaskaran, 1,2 Paavai Engineering College, India ISSN : 2229-4333(Print)

### Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

### Learning is a very general term denoting the way in which agents:

What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

### Data Mining Essentials

This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides

### MACHINE LEARNING. Introduction. Alessandro Moschitti

MACHINE LEARNING Introduction Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Course Schedule Lectures Tuesday, 14:00-16:00

Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

### Introduction to Learning & Decision Trees

Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### Open-Source Machine Learning: R Meets Weka

Open-Source Machine Learning: R Meets Weka Kurt Hornik Christian Buchta Achim Zeileis Weka? Weka is not only a flightless endemic bird of New Zealand (Gallirallus australis, picture from Wekapedia) but

### ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

### CS570 Introduction to Data Mining. Classification and Prediction. Partial slide credits: Han and Kamber Tan,Steinbach, Kumar

CS570 Introduction to Data Mining Classification and Prediction Partial slide credits: Han and Kamber Tan,Steinbach, Kumar 1 Classification and Prediction Overview Classification algorithms and methods

### Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

### Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: p.ducange@iet.unipi.it Office: Dipartimento di Ingegneria

### BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

### Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

### More Data Mining with Weka

More Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz More Data Mining with Weka a practical course

### What is Classification? Data Mining Classification. Certainty. Usual Examples. Predictive / Definitive. Techniques

What is Classification? Data Mining Classification Kevin Swingler Assigning an object to a certain class based on its similarity to previous examples of other objects Can be done with reference to original

### Introduction to Machine Learning Connectionist and Statistical Language Processing

Introduction to Machine Learning Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Introduction to Machine Learning p.1/22

### Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

### Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

### Rule based Classification of BSE Stock Data with Data Mining

International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

### A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

### Data Mining Algorithms for Classification

Data Mining Algorithms for Classification BSc Thesis Artificial Intelligence Author: Patrick Ozer Radboud University Nijmegen January 2008 Supervisor: Dr. I.G. Sprinkhuizen-Kuyper Radboud University Nijmegen

### 8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

### C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

### Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

### Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

### Data Mining Techniques and its Applications in Banking Sector

Data Mining Techniques and its Applications in Banking Sector Dr. K. Chitra 1, B. Subashini 2 1 Assistant Professor, Department of Computer Science, Government Arts College, Melur, Madurai. 2 Assistant

### Data mining knowledge representation

Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

### Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

### EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

### Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

### A semi-supervised Spam mail detector

A semi-supervised Spam mail detector Bernhard Pfahringer Department of Computer Science, University of Waikato, Hamilton, New Zealand Abstract. This document describes a novel semi-supervised approach

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

### Predictive Dynamix Inc

Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished

### CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

### Feature Selection with Decision Tree Criterion

Feature Selection with Decision Tree Criterion Krzysztof Grąbczewski and Norbert Jankowski Department of Computer Methods Nicolaus Copernicus University Toruń, Poland kgrabcze,norbert@phys.uni.torun.pl

### Improving spam mail filtering using classification algorithms with discretization Filter

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### Visualizing class probability estimators

Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers

### Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

### Tutorial Exercises for the Weka Explorer

CHAPTER Tutorial Exercises for the Weka Explorer 17 The best way to learn about the Explorer interface is simply to use it. This chapter presents a series of tutorial exercises that will help you learn

### LCs for Binary Classification

Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### Ensemble of Decision Tree Classifiers for Mining Web Data Streams

Ensemble of Decision Tree Classifiers for Mining Web Data Streams Fauzia Yasmeen Tani Department of CSE Jahangirnagar University Dhaka-1342, Bangladesh Dewan Md. Farid Department of CSE Jahangirnagar University

### Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

### A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model

A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model ABSTRACT Mrs. Arpana Bharani* Mrs. Mohini Rao** Consumer credit is one of the necessary processes but lending bears

### TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP

TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions