Performance Metrics. number of mistakes total number of observations. err = p.1/1



Similar documents
Evaluation & Validation: Credibility: Evaluating what has been learned

Consistent Binary Classification with Generalized Performance Metrics

Performance Measures in Data Mining

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

Data Mining Practical Machine Learning Tools and Techniques

Chapter 6. The stacking ensemble approach

Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Measures for Machine Learning

Knowledge Discovery and Data Mining

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Using Random Forest to Learn Imbalanced Data

1. Classification problems

Online Performance Anomaly Detection with

Data Mining - Evaluation of Classifiers

Feature Subset Selection in Spam Detection

A CLASSIFIER FUSION-BASED APPROACH TO IMPROVE BIOLOGICAL THREAT DETECTION. Palaiseau cedex, France; 2 FFI, P.O. Box 25, N-2027 Kjeller, Norway.

How To Solve The Class Imbalance Problem In Data Mining

Mining the Software Change Repository of a Legacy Telephony System

An Approach to Detect Spam s by Using Majority Voting

Validation parameters: An introduction to measures of

Performance Metrics for Graph Mining Tasks

SVM Ensemble Model for Investment Prediction

Using Static Code Analysis Tools for Detection of Security Vulnerabilities

Data Mining Algorithms Part 1. Dejan Sarka

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

Addressing the Class Imbalance Problem in Medical Datasets

CSC574 - Computer and Network Security Module: Intrusion Detection

Data Mining Lab 5: Introduction to Neural Networks

International Journal of Advance Research in Computer Science and Management Studies

STANDARDISATION AND CLASSIFICATION OF ALERTS GENERATED BY INTRUSION DETECTION SYSTEMS

Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers

Nino Pellegrino October the 20th, 2015

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

More Data Mining with Weka

How To Cluster

Anomaly detection. Problem motivation. Machine Learning

A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Practical Graph Mining with R. 5. Link Analysis

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Experiments in Web Page Classification for Semantic Web

Health Care and Life Sciences

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

Methodologies for Evaluation of Standalone CAD System Performance

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

Quality and Complexity Measures for Data Linkage and Deduplication

ROC Graphs: Notes and Practical Considerations for Data Mining Researchers

A Content based Spam Filtering Using Optical Back Propagation Technique

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

Using multiple models: Bagging, Boosting, Ensembles, Forests

Globally Optimal Crowdsourcing Quality Management

Programming Exercise 3: Multi-class Classification and Neural Networks

Predicting breast cancer survivability: a comparison of three data mining methods

Data Mining Techniques for Prognosis in Pancreatic Cancer

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

CSCI-B 565 DATA MINING Project Report for K-means Clustering algorithm Computer Science Core Fall 2012 Indiana University

A Decision Tree Classification Model for University Admission System

Predicting Defaults of Loans using Lending Club s Loan Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Predict Influencers in the Social Network

ARE YOU IMPLEMENTING A CMDB OR A PROCESS?

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Mining Life Insurance Data for Customer Attrition Analysis

Section 6: Model Selection, Logistic Regression and more...

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Business Case Development for Credit and Debit Card Fraud Re- Scoring Models

Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report

Including the Salesperson Effect in Purchasing Behavior Models Using PROC GLIMMIX

Using Graph Theory to Analyze Gene Network Coherence

Choosing number of stages of multistage model for cancer modeling: SOP for contractor and IRIS analysts

The Relationship Between Precision-Recall and ROC Curves

FRAUD DETECTION IN MOBILE TELECOMMUNICATION

ISSN: (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA

Data Mining. Nonlinear Classification

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

2.5.3 Use basic database skills to enter information in a database

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Application of Data Mining Techniques to Model Breast Cancer Data

Data Mining - The Next Mining Boom?

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Lecture 13: Validation

Network Intrusion Detection Systems

Evaluation of selected data mining algorithms implemented in Medical Decision Support Systems Kamila Aftarczuk

Transcription:

p.1/1 Performance Metrics The simplest performance metric is the model error defined as the number of mistakes the model makes on a data set divided by the number of observations in the data set, err = number of mistakes total number of observations.

p.2/1 The Model Error In order to define the model error formally we introduce the 0-1 loss function. This function compares the output of a model for a particular observation with the label of this observation. If the model commits a prediction error on this observation then the loss function returns a 1, otherwise it returns a0. Formally, let (x, y) D be an observation where D R n {+1, 1} and let ˆf : R n {+1, 1} be a model, then we define the 0-1 loss function L : {+1, 1} {+1, 1} {0, 1} as, L y, ˆf(x) 8 < 0 if y = ˆf(x), = : 1 if y ˆf(x). Let D = {(x 1,y 1 ),...,(x l,y l )} R n {+1, 1}, let the model ˆf and the loss function L be as defined above, then we can write the model error as, err D [ ˆf] = 1 l lx L i=1 y i, ˆf(x i ), where (x i,y i ) D. The model error is the average loss of a model over a data set.

p.3/1 Model Accuracy We can also characterize the performance of a model in terms of its accuracy, acc = number of correct predictions total number of observations. Again, we can use the 0-1 loss function to define this metric more concisely, acc D [ ˆf] = 1 l =1 1 l l lx L i=1 lx L i=1 =1 err D [ ˆf]. y i, ˆf(x! i ) y i, ˆf(x i ) acc D [ ˆf] =1 err D [ ˆf]

p.4/1 Example As an example of the above metrics, consider a model ĝ which commits 5 prediction errors when applied to a data set Q of length 100. We can compute the error as, err Q [ĝ] = 1 (5) = 0.05. 100 We can compute the accuracy of the model as, acc Q [ĝ] =1 err Q [ĝ] =1 0.05 = 0.95.

p.5/1 Model Errors Let (x, y) R n {+1, 1} be an observation and let ˆf : R n {+1, 1} be a model, then we have the following four possibilities when the model is applied to the observation, ˆf(x) = 8 +1 if y =+1, called the true positive >< 1 if y =+1, called the false negative +1 if y = 1, called the false positive >: 1 if y = 1, called the true negative This means that models can commit two types of model errors. Under certain circumstances it is important to distinguish these types of errors when evaluating a model. We use a confusion matrix to report these errors in an effective manner.

p.6/1 The Confusion Matrix A confusion matrix for a binary classification model is a 2 2 table that displays the observed labels against the predicted labels of a data set. Observed (y) \ Predicted (ŷ) +1-1 +1 True Positive (TP) False Negative (FN) -1 False Positive (FP) True Negative (TN) One way to visualize the confusion matrix is to consider that applying a model ˆf to an observation (x, y) will give us two labels. The first label y is due to the observation and the second label ŷ = ˆf(x) is due to the prediction of the model. Therefore, an observation with the label pair (y, ŷ) will be mapped onto a confusion matrix as follows, (+1,+1) TP (-1,+1) FP (+1,-1) FN (-1,-1) TN

p.7/1 The Confusion Matrix Example: Observed \ Predicted +1-1 +1 95 7-1 4 94 A confusion matrix of a model applied to a set of 200 observations. On this set of observations the model commits 7 false negative errors and 4 false positive errors in addition to the 95 true positive and 94 true negative predictions.

p.8/1 Example Wisconsin Breast Cancer Dataset: > library(e1071) > wdbc.df <- read.csv("wdbc.csv") > svm.model <- svm(diagnosis., data=wdbc.df, type="c-classification", kernel="linear", cost=1) > predict <- fitted(svm.model) > cm <- table(wdbc.df$diagnosis,predict) > cm predict B M B 355 2 M 5 207 > err <- (cm[1,2] + cm[2,1])/length(predict) * 100 > err [1] 1.230228

p.9/1 Model Evaluation Model evaluation is the process of finding an optimal model for the problem at hand. You guessed, we are talking about optimization! Let, D = {(x 1,y 1 ),...,(x l,y l )} R n {+1, 1} be our training data, then we can write our parameterized model as, where (x i,y i ) D. ˆf D [k, λ, C](x) =sign! lx α C,i y i k[λ](x i, x) b, i=1 With this we can write our model error as, h i err D ˆfD [k, λ, C] = 1 l lx L i=1 y i, ˆf D [k, λ, C](x i ), where (x i,y i ) D.

p. 10/1 Training Error The optimal training error is defined as, h min err D ˆfD [k, λ, C]i k,λ,c = min k,λ,c 1 l lx L i=1 y i, ˆf D [k, λ, C](x i ).

p. 11/1 Training Error Observation: The problem here is that we can always find a set of model parameters that make the model complex enough to drive the training error down to zero. The training error as a model evaluation criterion is overly optimistic.

p. 12/1 The Hold-Out Method Here we split the training data D into a training set and a testing set P and Q, respectively, such that, D = P Q and P Q =. The optimal training error is then computed as the optimization problem, h h i min err P ˆfP [k, λ, C]i = err P ˆfP [k,λ,c ]. k,λ,c Here, the optimal training error is obtained with model ˆfP [k,λ,c ]. The optimal test error is computed as an optimization using Q as the test set, h h i min err Q ˆfP [k, λ, C]i = err Q ˆfP [k,λ,c ]. k,λ,c The optimal test error is achieved by some model ˆfP [k,λ,c ].

p. 13/1 The Hold-Out Method high Test Error Training Error Error high Model Complexity