Performance Measures for Machine Learning



Similar documents
Evaluation & Validation: Credibility: Evaluating what has been learned

Performance Measures in Data Mining

Knowledge Discovery and Data Mining

Health Care and Life Sciences

Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

Data Mining Practical Machine Learning Tools and Techniques

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Quality and Complexity Measures for Data Linkage and Deduplication

Business Case Development for Credit and Debit Card Fraud Re- Scoring Models

Azure Machine Learning, SQL Data Mining and R

How To Cluster

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Online Performance Anomaly Detection with

CSC574 - Computer and Network Security Module: Intrusion Detection

Link Prediction Analysis in the Wikipedia Collaboration Graph

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

Mining the Software Change Repository of a Legacy Telephony System

PREDICTING SUCCESS IN THE COMPUTER SCIENCE DEGREE USING ROC ANALYSIS

Performance Metrics. number of mistakes total number of observations. err = p.1/1

Data Mining Algorithms Part 1. Dejan Sarka

Leak Detection Theory: Optimizing Performance with MLOG

Fraud Detection for Online Retail using Random Forests

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

Data Mining - The Next Mining Boom?

Big Data in Education

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering

ROC Graphs: Notes and Practical Considerations for Data Mining Researchers

ROC Curve, Lift Chart and Calibration Plot

2013 MBA Jump Start Program. Statistics Module Part 3

An Approach to Detect Spam s by Using Majority Voting

1. Classification problems

Discovering Criminal Behavior by Ranking Intelligence Data

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

The Relationship Between Precision-Recall and ROC Curves

Lecture 5: Evaluation

Evaluation of Diagnostic Tests

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

E-discovery Taking Predictive Coding Out of the Black Box

Performance Metrics for Graph Mining Tasks

Measuring Lift Quality in Database Marketing

Bayes Theorem & Diagnostic Tests Screening Tests

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

Learning to Recognize Missing Attachments

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Using Random Forest to Learn Imbalanced Data

Using Static Code Analysis Tools for Detection of Security Vulnerabilities

Enhancing Quality of Data using Data Mining Method

Chapter 6. The stacking ensemble approach

A Systematic Review of Fault Prediction approaches used in Software Engineering

Section 12.6: Directional Derivatives and the Gradient Vector

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

You buy a TV for $1000 and pay it off with $100 every week. The table below shows the amount of money you sll owe every week. Week

Problem Set #3 Answer Key

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

Data Mining. Nonlinear Classification

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Model Building and Gains from Trade

A Data Mining-Based Response Model for Target Selection in Direct Marketing

CS Computer and Network Security: Intrusion Detection

Lecture 8: Signal Detection and Noise Assumption

Measuring Intrusion Detection Capability: An Information-Theoretic Approach

Pearson's Correlation Tests

How To Understand The Impact Of A Computer On Organization

Despite its emphasis on credit-scoring/rating model validation,

Detecting Credit Card Fraud

GRADES 7, 8, AND 9 BIG IDEAS

How To Prevent Network Attacks

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Supervised Learning (Big Data Analytics)

Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach. ab422@cornell.edu November 11, Amit Belani

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Measurement in ediscovery

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

Anomaly detection. Problem motivation. Machine Learning

Mining Repositories to Assist in Project Planning and Resource Allocation

Behavioral profiling in CCTV cameras by combining multiple subtle suspicious observations of different surveillance operators

Detection Sensitivity and Response Bias

Proactive Fault Management

ISSN: (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

Data Mining Approaches to Collections and Case Closure. Background

Irrational Numbers. A. Rational Numbers 1. Before we discuss irrational numbers, it would probably be a good idea to define rational numbers.

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

Abstract. 1. Introduction Methodology

Detecting Flooding Attacks Using Power Divergence

Lesson 20. Probability and Cumulative Distribution Functions

Dynamic Predictive Modeling in Claims Management - Is it a Game Changer?

Risk and Risk Management in the Credit Card Industry*

CALIBRATION PRINCIPLES

Part 2: Analysis of Relationship Between Two Variables

BUILDING CLASSIFICATION MODELS FROM IMBALANCED FRAUD DETECTION DATA

Machine Learning Final Project Spam Filtering

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Slope and Rate of Change

Benchmarking default prediction models: pitfalls and remedies in model validation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Compute the derivative by definition: The four step procedure

Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction

Transcription:

Performance Measures for Machine Learning 1

Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2

Accuracy Target: 0/1, -1/+1, True/False, Prediction = f(inputs) = f(x): 0/1 or Real Threshold: f(x) > thresh => 1, else => 0 threshold(f(x)): 0/1 #right / #total accuracy = Â i=1kn ( 1- (target i - threshold( f ( x r i )))) 2 p( correct ): p(threshold(f(x)) = target) N 3

Confusion Matrix Predicted 1 Predicted 0 correct True 0 True 1 a c b d incorrect threshold accuracy = (a+d) / (a+b+c+d) 4

Prediction Threshold Predicted 1 Predicted 0 True 0 True 1 0 b 0 d threshold > MAX(f(x)) all cases predicted 0 (b+d) = total accuracy = %False = %0 s Predicted 1 Predicted 0 True 0 True 1 a 0 c 0 threshold < MIN(f(x)) all cases predicted 1 (a+c) = total accuracy = %True = %1 s 5

optimal threshold 82% 0 s in data 18% 1 s in data 6

threshold demo 7

Problems with Accuracy Assumes equal cost for both kinds of errors cost(b-type-error) = cost (c-type-error) is 99% accuracy good? can be excellent, good, mediocre, poor, terrible depends on problem is 10% accuracy bad? information retrieval BaseRate = accuracy of predicting predominant class (on most problems obtaining BaseRate accuracy is easy) 8

Percent Reduction in Error 80% accuracy = 20% error suppose learning increases accuracy from 80% to 90% error reduced from 20% to 10% 50% reduction in error 99.90% to 99.99% = 90% reduction in error 50% to 75% = 50% reduction in error can be applied to many other measures 9

Costs (Error Weights) Predicted 1 Predicted 0 True 0 True 1 w a w c w b w d Often W a = W d = zero and W b W c zero 10

11

12

Lift not interested in accuracy on entire dataset want accurate predictions for 5%, 10%, or 20% of dataset don t care about remaining 95%, 90%, 80%, resp. typical application: marketing lift(threshold) = %positives > threshold % dataset > threshold how much better than random prediction on the fraction of the dataset predicted true (f(x) > threshold) 13

Lift Predicted 1 Predicted 0 True 0 True 1 a c b d lift = a (a + b) (a + c) (a + b + c + d) threshold 14

lift = 3.5 if mailings sent to 20% of the customers 15

Lift and Accuracy do not always correlate well Problem 1 Problem 2 (thresholds arbitrarily set at 0.5 for both lift and accuracy) 16

Precision and Recall typically used in document retrieval Precision: how many of the returned documents are correct precision(threshold) Recall: how many of the positives does the model return recall(threshold) Precision/Recall Curve: sweep thresholds 17

Precision/Recall Predicted 1 Predicted 0 True 0 True 1 a c b d PRECISION = a /(a + c) RECALL = a /(a + b) threshold 18

19

Summary Stats: F & BreakEvenPt PRECISION = a /(a + c) RECALL = a /(a + b) harmonic average of precision and recall F = 2 * (PRECISION RECALL) (PRECISION + RECALL) BreakEvenPoint = PRECISION = RECALL 20

better performance worse performance 21

F and BreakEvenPoint do not always correlate well Problem 1 Problem 2 22

Predicted 1 Predicted 0 Predicted 1 Predicted 0 True 0 True 1 true positive false positive false negative true negative True 0 True 1 TP FP FN TN Predicted 1 Predicted 0 Predicted 1 Predicted 0 True 0 True 1 hits false alarms misses correct rejections True 0 True 1 P(pr1 tr1) P(pr1 tr0) P(pr0 tr1) P(pr0 tr0) 23

ROC Plot and ROC Area Receiver Operator Characteristic Developed in WWII to statistically model false positive and false negative detections of radar operators Better statistical foundations than most other measures Standard measure in medicine and biology Becoming more popular in ML 24

ROC Plot Sweep threshold and plot TPR vs. FPR Sensitivity vs. 1-Specificity P(true true) vs. P(true false) Sensitivity = a/(a+b) = Recall = LIFT numerator 1 - Specificity = 1 - d/(c+d) 25

diagonal line is random prediction 26

Properties of ROC ROC Area: 1.0: perfect prediction 0.9: excellent prediction 0.8: good prediction 0.7: mediocre prediction 0.6: poor prediction 0.5: random prediction <0.5: something wrong! 27

Properties of ROC Slope is non-increasing Each point on ROC represents different tradeoff (cost ratio) between false positives and false negatives Slope of line tangent to curve defines the cost ratio ROC Area represents performance averaged over all possible cost ratios If two ROC curves do not intersect, one method dominates the other If two ROC curves intersect, one method is better for some cost ratios, and other method is better for other cost ratios 28

Problem 1 Problem 2 29

Problem 1 Problem 2 30

Problem 1 Problem 2 31

Summary the measure you optimize to makes a difference the measure you report makes a difference use measure appropriate for problem/community accuracy often is not sufficient/appropriate ROC is gaining popularity in the ML community only accuracy generalizes to >2 classes! 32