Data Mining Classification

Similar documents
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Professor Anita Wasilewska. Classification Lecture Notes

Data Mining Classification: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees

Decision Trees from large Databases: SLIQ

Lecture 10: Regression Trees

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Classification and Prediction

Social Media Mining. Data Mining Essentials

Pre-Algebra Lecture 6

Introduction to Learning & Decision Trees

Data Mining Fundamentals

Text Analytics Illustrated with a Simple Data Set

Data Mining for Knowledge Management. Classification

Classification Techniques (1)

Foundations of Artificial Intelligence. Introduction to Data Mining

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Genetic programming with regular expressions

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Data mining techniques: decision trees

Data Mining Techniques


Data Preprocessing. Week 2

Section IV.1: Recursive Algorithms and Recursion Trees

Decision Trees What Are They?

Data Mining Algorithms Part 1. Dejan Sarka

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

White Paper. Data Mining for Business

Setting up a basic database in Access 2003

Algorithm and Flowchart Structured Programming 1

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Data Mining: Overview. What is Data Mining?

(b) You draw two balls from an urn and track the colors. When you start, it contains three blue balls and one red ball.

Finding Minimal Neural Networks for Business Intelligence Applications

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

DATA MINING METHODS WITH TREES

Data Mining Applications in Manufacturing

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model

The Classes P and NP. mohamed@elwakil.net

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Data Mining Practical Machine Learning Tools and Techniques

Predicting Student Performance by Using Data Mining Methods for Classification

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Decision Making under Uncertainty

Unit 4 DECISION ANALYSIS. Lesson 37. Decision Theory and Decision Trees. Learning objectives:

Overview. Data Mining. Predicting Stock Market Returns. Predicting Health Risk. Wharton Department of Statistics. Wharton

Introduction to Hypothesis Testing

Analysis of Algorithms I: Optimal Binary Search Trees

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Data Mining on Streams

Data quality in Accounting Information Systems

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

OA3-10 Patterns in Addition Tables

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

LECTURE 4. Last time: Lecture outline

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

A Quick Guide to Constructing an SPSS Code Book Prepared by Amina Jabbar, Centre for Research on Inner City Health

Data Mining for Fun and Profit

Downloading <Jumping PRO> from Page 2

Microsoft Azure Machine learning Algorithms

Data Mining and Visualization

Lecture 1: Course overview, circuits, and formulas

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Data Mining Lab 5: Introduction to Neural Networks

Chapter 20: Data Analysis

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Clinical Data Mining An Approach for Identification of Refractive Errors

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

Scientific Notation. Section 7-1 Part 2

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Monday Morning Data Mining

Data Mining Part 5. Prediction

Association Between Variables

Efficient Integration of Data Mining Techniques in Database Management Systems

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

1 Introduction to Internet Content Distribution

Bright Futures, HEDIS, USPHTF, AAP, PIPP, Other (Please

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

The Chi-Square Test. STAT E-50 Introduction to Statistics

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Joseph Twagilimana, University of Louisville, Louisville, KY

Results BQ MiA Project

Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen

Now, observe again the 10 digits we use to represent numbers Notice that not only is each digit different from every other

A Data Mining Tutorial

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

D A T A M I N I N G C L A S S I F I C A T I O N

Transcription:

Data Mining Classification Jingpeng Li 1 of 26 What is Classification? Assigning an object to a certain class based on its similarity to previous examples of other objects Can be done with reference to original data or based on a model of that data E.g: Me: Its round, green, and edible You: It s an apple! 2 of 26 1

Usual Examples Classifying transactions as genuine or fraudulent e.g credit card usage, insurance claims, cell phone calls Classifying prospects as good or bad customers Classifying engine faults by their symptoms 3 of 26 Certainty As with most data mining solutions, a classification usually comes with a degree of certainty. It might be the probability of the object belonging to the class or it might be some other measure of how closely the object resembles other examples from that class 4 of 26 2

Techniques Non-parametric, e.g. K nearest neighbour Mathematical models, e.g. neural networks Rule based models, e.g. decision trees 5 of 26 Predictive / Definitive Classification may indicate a propensity to act in a certain way, e.g. A prospect is likely to become a customer. This is predictive. Classification may indicate similarity to objects that are definitely members of a given class, e.g. small, round, green = apple 6 of 26 3

Simple Worked Example Risk of making a claim on a motor insurance policy This is a predictive classification they haven t made the claim yet, but do they look like other people who have? To keep it simple, let s look at just age and gender 7 of 26 The Data Age Gender Claim? 30 Female No 31 Male No 27 Male No 20 Male Yes 29 Female No 32 Male No 46 Male No 45 Male No 33 Male No 25 Female No 38 Female No 21 Female No 38 Female No 42 Male No 29 Male No 37 Male No 40 Female No Age 30 Claim No claim Male Female 8 of 26 4

K-Nearest Neighbour Performed on raw data Count number of other examples that are close Winner is most common Age 30 Male Female New person to classify 9 of 26 Rule Based If Gender = Male and Age < 30 then Claim If Gender = Male and Age > 30 then No Claim Etc New person to classify Age 30 Male Female 10 of 26 5

Decision Trees A good automatic rule discovery technique is the decision tree Produces a set of branching decisions that end in a classification Works best on nominal attributes numeric ones need to be split into bins 11 of 26 A Decision Tree Legs Note: Not all attributes are used in all decisions 4 0 2 Size Swims? Med Small Y N Cat Mouse Fish Snake Bird 12 of 26 6

Making a Classification Each node represents a single variable Each branch represents a value that variable can take To classify a single example, start at the top of the tree and see which variable it represents Follow the branch that corresponds to the value that variable takes in your example Keep going until you reach a leaf, where your object is classified! University of Stirling 2016 13 of 26 Tree Structure There are lots of ways to arrange a decision tree Does it matter which variables go where? Yes: You need to optimise the number of correct classifications You want to make the classification process as fast as possible 14 of 26 7

A Tree Building Algorithm Divide and Conquer: Choose the variable that is at the top of the tree Create a branch for each possible value For each branch, repeat the process until there are no more branches to make (i.e. stop when all the instances at the current branch are in the same class) But how do you choose which variable to split? 15 of 26 The ID3 Algorithm Split on the variable that gives the greatest information gain Information can be thought of as a measure of uncertainty Information is a measure based on the probability of something happening 16 of 26 8

Information Example If I pick a random card form a deck and you have to guess what it is, which would you rather be told: It is red (which has a probability of 0.5), or it is a picture card (which has a probability of 4/13 = 0.31) 17 of 26 Calculating Information The information associated with a single event: I(e) = -log(p e ) where p e is the probability of event e occurring I(Red) = -log(0.5) = 1 I(Picture card) = -log(0.31) = 1.7 18 of 26 9

Average Information The weighted average information across all possible values of a variable is called Entropy. It is calculated as the sum of the probability of each possible event times its information value: P( xi ) I( xi ) H X ) P( x )log( P( x )) ( i i where log is the base 2 log. 19 of 26 Entropy of IsPicture? I(Picture) = -log(4/13) = 1.7 I(Not Picture) = -log(9/13) = 0.53 H = 4/13*1.7 + 9/13*0.53 =0.89 Entropy H(X) is a measure of uncertainty in variable X The more even the distribution of X becomes, the higher the entropy gets 20 of 26 10

Unfair Coin Entropy The more even the distribution of X becomes, the higher the entropy gets 21 of 26 Conditional Entropy We now introduce conditional entropy: H(outcome known) The uncertainty about the outcome, given that we know known 22 of 26 11

Information Gain If we know H(Outcome) And we know H(Outcome Input) We can calculate how much Input tells us about Outcome simply as: H(Outcome) - H(Outcome Input) This is the information gain of Input 23 of 26 Picking the Top Node ID3 picks the top node of the network by calculating the information gain of the output class for each input variable, and picks the one that removes the most uncertainty It creates a branch for each value the chosen variable can take 24 of 26 12

Adding Branches Branches are added by making the same information gain calculation for data defined by the location on the tree of the current branch If all objects at the current leaf are in the same class, no more branching is needed The algorithm also stops when all the data has been accounted for 25 of 26 Person Hair Length Weight Age Class Homer 0 250 36 M Marge 10 150 34 F Bart 2 90 10 M Lisa 6 78 8 F Maggie 4 20 1 F Abe 1 170 70 M Selma 8 160 41 F Otto 10 180 38 M Krusty 6 200 45 M Comic 8 290 38? 13

p p n n Entropy ( S) log 2 log 2 p n p n p n p n yes no Hair Length <= 5? Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = 0.9911 Let us try splitting on Hair length Gain( A) E( Current set) E( all child sets ) Gain(Hair Length <= 5) = 0.9911 (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911 p p n n Entropy ( S) log 2 log 2 p n p n p n p n yes Weight <= 160? no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = 0.9911 Let us try splitting on Weight Gain( A) E( Current set) E( all child sets ) Gain(Weight <= 160) = 0.9911 (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900 14

p p n n Entropy ( S) log 2 log 2 p n p n p n p n yes age <= 40? no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = 0.9911 Let us try splitting on Age Gain( A) E( Current set) E( all child sets ) Gain(Age <= 40) = 0.9911 (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183 Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified So we simply recurse! yes Weight <= 160? no This time we find that we can split on Hair length, and we are done! yes no Hair Length <= 2? 15

We need don t need to keep the data around, just the test conditions. Weight <= 160? How would these people be classified? yes Hair Length <= 2? no Male yes no Male Female It is trivial to convert Decision Trees to rules Weight <= 160? yes Hair Length <= 2? yes no no Male Male Female Rules to Classify Males/Females If Weight greater than 160, classify as Male Elseif Hair Length less than or equal to 2, classify as Male Else classify as Female 16

Other Classification Methods You will meet a certain type of neural network in a later lecture these too are good at classification There are many, many, many other methods for building classification systems 33 of 26 17