Knowledge Discovery and Data Mining

Similar documents
Data Mining Classification: Decision Trees

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Classification and Prediction

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining for Knowledge Management. Classification

Professor Anita Wasilewska. Classification Lecture Notes

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Social Media Mining. Data Mining Essentials

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Decision Trees from large Databases: SLIQ

Decision-Tree Learning

Lecture 10: Regression Trees

Gerry Hobbs, Department of Statistics, West Virginia University

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Data Mining Methods: Applications for Institutional Research

Data mining techniques: decision trees

Data Preprocessing. Week 2

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Predictive Modeling of Titanic Survivors: a Learning Competition

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model

Data Mining: Foundation, Techniques and Applications

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Data Mining Techniques Chapter 6: Decision Trees

Introduction to Learning & Decision Trees

Knowledge Discovery and Data Mining

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

Full and Complete Binary Trees

Data Mining. Nonlinear Classification

Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables

Foundations of Artificial Intelligence. Introduction to Data Mining

DATA MINING METHODS WITH TREES

Data Mining Part 5. Prediction

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1

Business Problems and Data Science Solutions

Chapter 20: Data Analysis

Performance Analysis of Decision Trees

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Data Mining Practical Machine Learning Tools and Techniques

Knowledge Discovery and Data Mining

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Optimization of C4.5 Decision Tree Algorithm for Data Mining Application

How To Solve The Kd Cup 2010 Challenge

Microsoft Azure Machine learning Algorithms

AnalysisofData MiningClassificationwithDecisiontreeTechnique

Decision Tree Induction in High Dimensional, Hierarchically Distributed Databases

Classification and Prediction

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

Data Mining Apriori Algorithm

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Decision Trees. JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology. Doctoral School, Catania-Troina, April, 2008

Applying Data Mining Technique to Sales Forecast

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Introduction of Information Visualization and Visual Analytics. Chapter 4. Data Mining

Binary Search Trees CMPSC 122

How To Cluster

An Overview and Evaluation of Decision Tree Methodology

How To Make A Credit Risk Model For A Bank Account

Introduction to Data Mining Techniques

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Data Mining on Streams

Data Mining: Data Preprocessing. I211: Information infrastructure II

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Binary Logistic Regression

Unit 4 DECISION ANALYSIS. Lesson 37. Decision Theory and Decision Trees. Learning objectives:

Classification Techniques (1)

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

A Decision Tree Classification Model for University Admission System

Chapter 12 Discovering New Knowledge Data Mining

Knowledge Discovery and Data Mining

D-optimal plans in observational studies

Classification algorithm in Data mining: An Overview

Decision Tree Learning on Very Large Data Sets

Decision Trees What Are They?

Data Mining in the Application of Criminal Cases Based on Decision Tree

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Employer Health Insurance Premium Prediction Elliott Lui

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.

Performance and efficacy simulations of the mlpack Hoeffding tree

International Journal of Computer Sciences and Engineering Open Access

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Mining Customer s Data for Vehicle Insurance Prediction System using Decision Tree Classifier S. S. Thakur 1, and J. K. Sing 2 1

Categorical Data Visualization and Clustering Using Subjective Factors

Less naive Bayes spam detection

A Data Mining Tutorial

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Data Mining Applications in Fund Raising

Transcription:

Knowledge Discovery and Data Mining Unit # 4 1 Acknowledgement Most of the slides in this presentation are taken from course slides provided by Han and Kimber(Data Mining Concepts and Techniques) and Tan, Steinbach and Kumar (Introduction to Data Mining) 2 1

Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseenrecords should be assigned a class as accurately as possible. A test setis used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 3 Classification: Motivation age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 31 40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31 40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31 40 medium no excellent yes 31 40 high yes fair yes >40 medium no excellent no 4 2

10 10 9/20/2011 Decision/Classification Tree age? <=30 overcast 31..40 >40 student? yes credit rating? no yes excellent fair no yes yes 5 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Learn Model Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K? Apply Model 6 3

10 10 9/20/2011 Example of a Decision Tree Tid Refund Marital Status Taxable Income Cheat Splitting Attributes 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Refund Yes No MarSt Single, Divorced TaxInc < 80K > 80K YES Married Training Data Model: Decision Tree 7 Another Example of Decision Tree Tid Refund Marital Status Taxable Income 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Cheat 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Married MarSt Yes Single, Divorced Refund No TaxInc < 80K > 80K YES There could be more than one tree that fits the same data! 8 4

10 10 1 0 9/20/2011 Decision Tree Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Learn Model Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K? Apply Model Decision Tree 9 Apply Model to Test Data Start from the root of tree. Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES 10 5

1 0 1 0 9/20/2011 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES 11 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES 12 6

1 0 1 0 9/20/2011 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES 13 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES 14 7

10 10 1 0 9/20/2011 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married Assign Cheat to No TaxInc < 80K > 80K YES 15 Decision Tree Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Learn Model Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K? Apply Model Decision Tree 16 8

Tree Induction Greedy strategy. Split the records based on an attribute test that optimizes certain criterion. Issues Determine how to split the records How to specify the attribute test condition? How to determine the best split? Determine when to stop splitting 17 How to Specify Test Condition? Depends on attribute types Nominal Ordinal Continuous Depends on number of ways to split 2-way split Multi-way split 18 9

How to determine the Best Split Greedy approach: Nodes with homogeneousclass distribution are preferred Need a measure of node impurity: Non-homogeneous, High degree of impurity Homogeneous, Low degree of impurity 19 Measures of Node Impurity Gini Index Entropy Misclassification error 20 10

Measure of Impurity: GINI Gini Index for a given node t : GINI ( t) = 1 (TE: p( j t) is the relative frequency of class j at node t). Maximum (1-1/n c ) when records are equally distributed among all classes, implying least interesting information Minimum (0.0) when all records belong to one class, implying most interesting information j [ p( j t)] 2 C1 0 C2 6 Gini=0.000 C1 1 C2 5 Gini=0.278 C1 2 C2 4 Gini=0.444 C1 3 C2 3 Gini=0.500 21 Examples for computing GINI GINI( t) = 1 j [ p( j t)] 2 C1 0 C2 6 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Gini = 1 P(C1) 2 P(C2) 2 = 1 0 1 = 0 C1 1 C2 5 P(C1) = 1/6 P(C2) = 5/6 Gini = 1 (1/6) 2 (5/6) 2 = 0.278 C1 2 C2 4 P(C1) = 2/6 P(C2) = 4/6 Gini = 1 (2/6) 2 (4/6) 2 = 0.444 22 11

Binary Attributes: Computing GINI Index Splits into two partitions Effect of Weighing partitions: Larger and Purer Partitions are sought for. Gini(N1) = 1 (6/7) 2 (1/7) 2 = 0.24 Gini(N2) = 1 (3/7) 2 (4/7) 2 = 0.49 Sajjad Haider Yes Node N1 Student? N1 N2 Yes 6 3 No 1 4 Gini=0.365 No Node N2 Buy Computer Yes 9 No 5 Gini(Student) = 7/14 * 0.24 + 7/14 * 0.49 =?? Gini = 0.46 GINI Index for Buy Computer Example Gini (Income): Gini (Credit_Rating): Gini (Age): 24 12

Alternative Splitting Criteria based on Entropy Entropy at a given node t: Entropy ( t) = p( j t)log p( j t) j (TE: p( j t) is the relative frequency of class j at node t). Measures homogeneity of a node. Maximum (log n c ) when records are equally distributed among all classes implying least information Minimum (0.0) when all records belong to one class, implying most information Entropy based computations are similar to the GINI index computations 25 Entropy in a nut-shell Low Entropy High Entropy 26 13

Examples for computing Entropy Entropy t) = p( j t)log p( j t) j ( 2 C1 0 C2 6 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Entropy = 0 log 0 1 log 1 = 0 0 = 0 C1 1 C2 5 P(C1) = 1/6 P(C2) = 5/6 Entropy = (1/6) log 2 (1/6) (5/6) log 2 (1/6) = 0.65 C1 2 C2 4 P(C1) = 2/6 P(C2) = 4/6 Entropy = (2/6) log 2 (2/6) (4/6) log 2 (4/6) = 0.92 27 Splitting Criteria based on Classification Error Classification error at a node t : Error( t) = 1 max P( i t) i Measures misclassification error made by a node. Maximum (1-1/n c ) when records are equally distributed among all classes, implying least interesting information Minimum (0.0) when all records belong to one class, implying most interesting information 28 14

Examples for Computing Error Error( t) = 1 max P( i t) i C1 0 C2 6 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Error = 1 max (0, 1) = 1 1 = 0 C1 1 C2 5 P(C1) = 1/6 P(C2) = 5/6 Error = 1 max (1/6, 5/6) = 1 5/6 = 1/6 C1 2 C2 4 P(C1) = 2/6 P(C2) = 4/6 Error = 1 max (2/6, 4/6) = 1 4/6 = 1/3 29 Comparison among Splitting Criteria For a 2-class problem: 30 15

Inducing a decision tree There are many possible trees How to find the most compact one that is consistent with the data? The keyto building a decision tree -which attribute to choose in order to branch. The heuristicis to choose the attribute with the minimum GINI/Entropy. 31 Algorithm for Decision Tree Induction Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Attributes are categorical Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., GINI/Entropy) Conditions for stopping partitioning All examples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf There are no examples left 32 16

Extracting Classification Rules from Trees Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf Each attribute-value pair along a path forms a conjunction. The leaf node holds the class prediction Rules are easier for humans to understand Example IF age= <=30 AND student= no THEN buys_computer= no IF age= <=30 AND student= yes THEN buys_computer= yes IF age = 31 40 THEN buys_computer = yes IF age = >40 AND credit_rating = excellent THEN buys_computer = yes IF age = <=30 AND credit_rating = fair THEN buys_computer = no 33 Questions? Why the method to generate a classification tree is a heuristic and not a guaranteed method? Hint: Think of a situation where a is the best attribute, but the combination of b and c would actually be better than any of a and b, or a and c. That is, knowing b and c you can classify, but knowing only a and b (or only a and c) you cannot. This shows that the attributes may not be independent. How could we deal with this? Hint: Consider also combination of attributes, not only a, b, c, but also ab, bc, ca What is a problem with this approach? 34 17

Example Attribute 1 Attribute 2 Attribute 3 Class A 70 T C1 A 90 T C2 A 85 F C2 A 95 F C2 A 70 F C1 B 90 T C1 B 78 F C1 B 65 T C1 B 75 F C1 C 80 T C2 C 70 T C2 C 80 F C1 C 80 F C1 C 96 F C1 35 Example II Height Hair Eyes Class Short Blond Blue + Tall Blond Brown - Tall Red Blue + Short Dark Blue - Tall Dark Blue - Tall Blond Blue + Tall Dark Brown - Short Blond Brown - 36 18

Tree Induction Greedy strategy. Split the records based on an attribute test that optimizes certain criterion. Issues Determine how to split the records How to specify the attribute test condition? How to determine the best split? Determine when to stop splitting 37 Stopping Criteria for Tree Induction Stop expanding a node when all the records belong to the same class Stop expanding a node when all the records have similar attribute values Early termination (to be discussed later) 38 19

Characteristics of Decision Tree Induction Decision tree induction is a non-parametric approach for building classification models. In other words, it doesn t require any prior assumptions regarding the type of probability distributions satisfied by the class and other attributes. Finding an optimal decision tree is an NP-complete problem. Many decision tree algorithms emplya heuristic-based approach to guide their search in the vast hypothesis space. For example, the algorithm discussed in this unit uses a greedy, top-down, recursive partitioning startegyfor growing a decision tree. 39 Characteristics of Decision Tree Induction (Cont d) Techniques developed for constructing decision trees are computationally inexpensive, making it possible to quickly construct models even when the training set size is very large. Furthermore, once a decision tree has been built, classifying a test record is extremely fast, with a worst-case complexity of O(w), where w is the maximum depth of the tree. Decision tree, specially smaller-sized trees, are relatively easy to interpret. Decision tree algorithms are quite robust to the presence of noise. 40 20

Characteristics of Decision Tree Induction (Cont d) The presence of redundant attributes does not adversely affect the accuracy of decision trees. An attribute is redundant if it is strongly correlated with another attribute in the data. One of the two redundnantattributes will not be used for splitting once the other attribute has been chosen. Studies have shown that the choice of impurity measures has little effect on the performance of decision tree induction algorithms. 41 Advantages of Decision Tree Based Classification Inexpensive to construct Extremely fast at classifying unknown records Easy to interpret for small-sized trees Accuracy is comparable to other classification techniques for many simple data sets 42 21