Performance Measures in Data Mining


 Abner Mathews
 3 years ago
 Views:
Transcription
1 Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München Master Lab Course Data Mining, SS 2015, Jul 1st
2 Outline Item Set and Association Rule Weights Simple Measures Complex Measures Basic Performance Measures Complex Measures Performance Curves Regression Assessment Strategies
3 Item Set and Association Rule Weights Simple Measures Measures for Item Sets Various algorithms can yield frequent item sets. From frequent item sets c and c {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule). Support (of an item set): sup(x Y ) = sup(y X) = P(X Y ) how many times the item set is found in the database Confidence (of a rule): conf (X Y ) = P(Y X) = P(X and Y )/P(X) = sup(x Y )/sup(x)
4 Item Set and Association Rule Weights Simple Measures Measures for Item Sets Various algorithms can yield frequent item sets. From frequent item sets c and c {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule). Support (of an item set): sup(x Y ) = sup(y X) = P(X Y ) how many times the item set is found in the database Confidence (of a rule): conf (X Y ) = P(Y X) = P(X and Y )/P(X) = sup(x Y )/sup(x)
5 Item Set and Association Rule Weights Simple Measures Measures for Item Sets Various algorithms can yield frequent item sets. From frequent item sets c and c {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule). Support (of an item set): sup(x Y ) = sup(y X) = P(X Y ) how many times the item set is found in the database Confidence (of a rule): conf (X Y ) = P(Y X) = P(X and Y )/P(X) = sup(x Y )/sup(x)
6 Item Set and Association Rule Weights Complex Measures Measures for Item Sets cont. d Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting): Leverage (of an item set): lev(x Y ) = P(X and Y ) (P(X)P(Y )) Lift (of a rule): lift(x Y ) = lift(y X) = P(X Y )/(P(X)P(Y )) = conf (X Y )/sup(y ) = conf (Y X)/sup(X) Conviction (of a rule): Similar to Lift, but directed Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y. conviction(x Y ) = P(X)P( Y )/P(X Y ) = (1 sup(y )/(1 conf (X Y ))
7 Item Set and Association Rule Weights Complex Measures Measures for Item Sets cont. d Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting): Leverage (of an item set): lev(x Y ) = P(X and Y ) (P(X)P(Y )) Lift (of a rule): lift(x Y ) = lift(y X) = P(X Y )/(P(X)P(Y )) = conf (X Y )/sup(y ) = conf (Y X)/sup(X) Conviction (of a rule): Similar to Lift, but directed Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y. conviction(x Y ) = P(X)P( Y )/P(X Y ) = (1 sup(y )/(1 conf (X Y ))
8 Item Set and Association Rule Weights Complex Measures Measures for Item Sets cont. d Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting): Leverage (of an item set): lev(x Y ) = P(X and Y ) (P(X)P(Y )) Lift (of a rule): lift(x Y ) = lift(y X) = P(X Y )/(P(X)P(Y )) = conf (X Y )/sup(y ) = conf (Y X)/sup(X) Conviction (of a rule): Similar to Lift, but directed Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y. conviction(x Y ) = P(X)P( Y )/P(X Y ) = (1 sup(y )/(1 conf (X Y ))
9 Item Set and Association Rule Weights Complex Measures Supplementary Material JMeasure empirically observed accuracy of rule Crossentropy (measuring how good a distribution approximates another distribution) between the binary variables φ and θ with vs. without conditioning on event θ ( J(θ φ) = p(θ) p(φ θ)log p(φ θ) ) 1 p(φ θ) + (1 p(φ θ)) log p(φ) 1 p(φ)
10 Basic Performance Measures Basic Building Blocks for Performance Measures True Positive (TP): positive instances predicted as positive True Negative (TN): negative instances predicted as negative False Positive (FP): negative instances predicted as positive False Negative (FN): positive instances predicted as negative Confusion Matrix: Predicted a Predicted b Real a TP FN Real b FP TN
11 Basic Performance Measures Performance Measures Accuracy, acc = = Error rate, err = = TP + TN TP + FN + FP + TN Number of correct predictions Total number of predictions FN + FP TP + FN + FP + TN = 1 acc Number of wrong predictions Total number of predictions
12 Basic Performance Measures Performance Measures cont d True Positive Rate, TPR, Sensitivity = True Negative Rate, TNR, Specificity = False Positive Rate, FPR = False Negative Rate, FNR = FP TN + FP TP TP + FN FN TP + FN TN TN + FP
13 Basic Performance Measures Performance Measures cont d True Positive Rate, TPR, Sensitivity = True Negative Rate, TNR, Specificity = False Positive Rate, FPR = False Negative Rate, FNR = FP TN + FP TP TP + FN FN TP + FN TN TN + FP
14 Basic Performance Measures Performance Measures cont d True Positive Rate, TPR, Sensitivity = True Negative Rate, TNR, Specificity = False Positive Rate, FPR = False Negative Rate, FNR = FP TN + FP TP TP + FN FN TP + FN TN TN + FP
15 Basic Performance Measures Performance Measures cont d True Positive Rate, TPR, Sensitivity = True Negative Rate, TNR, Specificity = False Positive Rate, FPR = False Negative Rate, FNR = FP TN + FP TP TP + FN FN TP + FN TN TN + FP
16 Basic Performance Measures Performance Measures cont d F 1 measure = Precision, p = Recall, r = 2rp r + p = TP TP + FP TP TP + FN 2 TP 2 TP + FP + FN
17 Basic Performance Measures Performance Measures cont d F 1 measure = Precision, p = Recall, r = 2rp r + p = TP TP + FP TP TP + FN 2 TP 2 TP + FP + FN
18 Basic Performance Measures Performance Measures cont d F 1 measure = Precision, p = Recall, r = 2rp r + p = TP TP + FP TP TP + FN 2 TP 2 TP + FP + FN
19 Complex Measures Performance Curves Performance Curves "costs" of different error types are different prediction behaviour changes over the test set performance display in 2D different domains prefer different chart types
20 Complex Measures Performance Curves Lift Charts Taken from from marketing area to evaluate mailing success yaxis: number or percentage of responders xaxis: sample red diagonal: random lift green line: optimum lift
21 Complex Measures Performance Curves ROC Curves Receiver Operator Characteristics yaxis: TPR xaxis: FPR diagonal: random guessing (TPR=FPR) Taken from
22 Complex Measures Performance Curves Sensitivity vs. Specificity preferred in medicine yaxis: TPR xaxis: TNR (specificity) also frequently as ROC curve with 1  specificity Taken from
23 Complex Measures Performance Curves Recall Precision Curves Taken from preferred in information retrieval positives are the documents retrieved in response to a query true positives are documents really relevant to the query yaxis: precision xaxis: recall
24 Regression Error Measures for Regression Mean squared error, MSE = (p 1 a 1 ) (p n a n ) 2 n Root mean squared error, RMSE = (p1 a 1 ) (p n a n ) 2 n Mean absolute error, MAE = p 1 a p n a n n
25 Regression Error Measures for Regression Mean squared error, MSE = (p 1 a 1 ) (p n a n ) 2 n Root mean squared error, RMSE = (p1 a 1 ) (p n a n ) 2 n Mean absolute error, MAE = p 1 a p n a n n
26 Regression Error Measures for Regression Mean squared error, MSE = (p 1 a 1 ) (p n a n ) 2 n Root mean squared error, RMSE = (p1 a 1 ) (p n a n ) 2 n Mean absolute error, MAE = p 1 a p n a n n
27 Regression Relative Error Measures Relative squared error = (p 1 a 1 ) (p n a n ) 2 (a 1 ā) (a n ā) 2 Root relative squared error = (p 1 a 1 ) (p n a n ) 2 (a 1 ā) (a n ā) 2 Relative absolute error = p 1 a p n a n a 1 ā + + a n ā
28 Assessment Strategies General Problem each algorithm abstracts from observations (instances) the aspects kept and the aspect discarded differ between the learning scheme (inductive bias) this means also: information about individual instances are contained in the model, too individual instance information leads to overfitting
29 Assessment Strategies Solution Strategies use fresh date, i.e. instances not used for the training for very large numbers of instances: simple split in test and training set most common: 10fold cross validation LOOCV: Leave one out cross validation
Performance Measures for Machine Learning
Performance Measures for Machine Learning 1 Performance Measures Accuracy Weighted (CostSensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2 Accuracy Target: 0/1, 1/+1, True/False,
More informationEvaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
More informationData Mining Practical Machine Learning Tools and Techniques
Counting the cost Data Mining Practical Machine Learning Tools and Techniques Slides for Section 5.7 In practice, different types of classification errors often incur different costs Examples: Loan decisions
More informationCOMP9417: Machine Learning and Data Mining
COMP9417: Machine Learning and Data Mining A note on the twoclass confusion matrix, lift charts and ROC curves Session 1, 2003 Acknowledgement The material in this supplementary note is based in part
More informationCLASSIFICATION JELENA JOVANOVIĆ. Web:
CLASSIFICATION JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is classification? Binary and multiclass classification Classification algorithms Performance measures
More informationOverview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.unisb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
More informationEvaluation in Machine Learning (1) Evaluation in machine learning. Aims. 14s1: COMP9417 Machine Learning and Data Mining.
Acknowledgements Evaluation in Machine Learning (1) 14s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 19, 2014 Material derived
More informationCOMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.
COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15  ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.standrews.ac.uk twk@standrews.ac.uk Tom Kelsey ID505917AUC
More informationPerformance Metrics. number of mistakes total number of observations. err = p.1/1
p.1/1 Performance Metrics The simplest performance metric is the model error defined as the number of mistakes the model makes on a data set divided by the number of observations in the data set, err =
More informationEvaluation and Credibility. How much should we believe in what was learned?
Evaluation and Credibility How much should we believe in what was learned? Outline Introduction Classification with Train, Test, and Validation sets Handling Unbalanced Data; Parameter Tuning Crossvalidation
More informationTan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status
Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationHani Tamim, PhD Associate Professor Department of Internal Medicine American University of Beirut
Hani Tamim, PhD Associate Professor Department of Internal Medicine American University of Beirut A graphical plot which illustrates the performance of a binary classifier system as its discrimination
More informationCross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
More informationMachine Learning model evaluation. Luigi Cerulo Department of Science and Technology University of Sannio
Machine Learning model evaluation Luigi Cerulo Department of Science and Technology University of Sannio Accuracy To measure classification performance the most intuitive measure of accuracy divides the
More informationData Mining Practical Machine Learning Tools and Techniques
Credibility: Evaluating what s been learned Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Issues: training, testing,
More informationEvaluation and CostSensitive Learning. CostSensitive Learning
1 J. Fürnkranz Evaluation and CostSensitive Learning Evaluation Holdout Estimates Crossvalidation Significance Testing Sign test ROC Analysis CostSensitive Evaluation ROC space ROC convex hull Rankers
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationEvaluating Machine Learning Algorithms. Machine DIT
Evaluating Machine Learning Algorithms Machine Learning @ DIT 2 Evaluating Classification Accuracy During development, and in testing before deploying a classifier in the wild, we need to be able to quantify
More informationData Mining  The Next Mining Boom?
Howard Ong Principal Consultant Aurora Consulting Pty Ltd Abstract This paper introduces Data Mining to its audience by explaining Data Mining in the context of Corporate and Business Intelligence Reporting.
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationA General Framework for Mining ConceptDrifting Data Streams with Skewed Distributions
A General Framework for Mining ConceptDrifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at UrbanaChampaign IBM T. J. Watson Research Center
More informationThe many faces of ROC analysis in machine learning. Peter A. Flach University of Bristol, UK
The many faces of ROC analysis in machine learning Peter A. Flach University of Bristol, UK www.cs.bris.ac.uk/~flach/ Objectives After this tutorial, you will be able to [model evaluation] produce ROC
More informationAnalysis on Weighted AUC for Imbalanced Data Learning Through Isometrics
Journal of Computational Information Systems 8: (22) 37 378 Available at http://www.jofcis.com Analysis on Weighted AUC for Imbalanced Data Learning Through Isometrics Yuanfang DONG, 2, Xiongfei LI,, Jun
More informationAn analysis of suitable parameters for efficiently applying Kmeans clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying Kmeans clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
More informationData Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:
More informationOnline Performance Anomaly Detection with
ΘPAD: Online Performance Anomaly Detection with Tillmann Bielefeld 1 1 empuxa GmbH, Kiel KoSSESymposium Application Performance Management (Kieker Days 2012) November 29, 2012 @ Wissenschaftszentrum Kiel
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Daybyday Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationAn Approach to Detect Spam Emails by Using Majority Voting
An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H12 Islamabad, Pakistan Usman Qamar Faculty,
More informationPREDICTING SUCCESS IN THE COMPUTER SCIENCE DEGREE USING ROC ANALYSIS
PREDICTING SUCCESS IN THE COMPUTER SCIENCE DEGREE USING ROC ANALYSIS Arturo Fornés arforser@fiv.upv.es, José A. Conejero aconejero@mat.upv.es 1, Antonio Molina amolina@dsic.upv.es, Antonio Pérez aperez@upvnet.upv.es,
More informationROC Graphs: Notes and Practical Considerations for Data Mining Researchers
ROC Graphs: Notes and Practical Considerations for Data Mining Researchers Tom Fawcett 1 1 Intelligent Enterprise Technologies Laboratory, HP Laboratories Palo Alto Alexandre Savio (GIC) ROC Graphs GIC
More informationUsing Random Forest to Learn Imbalanced Data
Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationLecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationEvaluation in Machine Learning (2) Aims. Introduction
Acknowledgements Evaluation in Machine Learning (2) 15s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales Material derived from slides
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationClassifiers & Classification
Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University Email: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationMachine Learning. Topic: Evaluating Hypotheses
Machine Learning Topic: Evaluating Hypotheses Bryan Pardo, Machine Learning: EECS 349 Fall 2011 How do you tell something is better? Assume we have an error measure. How do we tell if it measures something
More informationHealth Care and Life Sciences
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations Wen Zhu 1, Nancy Zeng 2, Ning Wang 2 1 K&L consulting services, Inc, Fort Washington,
More informationMeasuring the propensity to purchase Creating and interpreting the gain chart. Ricco RAKOTOMALALA
Measuring the propensity to purchase Creating and interpreting the gain chart Ricco RAKOTOMALALA Tutoriels Tanagra  http://tutorielsdatamining.blogspot.fr/ 1 Customer targeting process Promoting a new
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationA Decision Tree for Weather Prediction
BULETINUL UniversităŃii Petrol Gaze din Ploieşti Vol. LXI No. 1/2009 7782 Seria Matematică  Informatică  Fizică A Decision Tree for Weather Prediction Elia Georgiana Petre Universitatea PetrolGaze
More information1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Email Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 22773878, Volume1, Issue6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationBUILDING CLASSIFICATION MODELS FROM IMBALANCED FRAUD DETECTION DATA
BUILDING CLASSIFICATION MODELS FROM IMBALANCED FRAUD DETECTION DATA Terence Yong Koon Beh 1, Swee Chuan Tan 2, Hwee Theng Yeo 3 1 School of Business, SIM University 1 yky2k@yahoo.com, 2 jamestansc@unisim.edu.sg,
More informationIMPACT OF COMPUTER ON ORGANIZATION
International Journal of Research in Engineering & Technology (IJRET) Vol. 1, Issue 1, June 2013, 16 Impact Journals IMPACT OF COMPUTER ON ORGANIZATION A. D. BHOSALE 1 & MARATHE DAGADU MITHARAM 2 1 Department
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationPerformance Evaluation Metrics for Software Fault Prediction Studies
Acta Polytechnica Hungarica Vol. 9, No. 4, 2012 Performance Evaluation Metrics for Software Fault Prediction Studies Cagatay Catal Istanbul Kultur University, Department of Computer Engineering, Atakoy
More informationA Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing
A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing Maryam Daneshmandi mdaneshmandi82@yahoo.com School of Information Technology Shiraz Electronics University Shiraz, Iran
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 20150305
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 20150305 Roman Kern (KTI, TU Graz) Ensemble Methods 20150305 1 / 38 Outline 1 Introduction 2 Classification
More informationCODE ASSESSMENT METHODOLOGY PROJECT (CAMP) Comparative Evaluation:
This document contains information exempt from mandatory disclosure under the FOIA. Exemptions 2 and 4 apply. CODE ASSESSMENT METHODOLOGY PROJECT (CAMP) Comparative Evaluation: Coverity Prevent 2.4.0 Fortify
More informationBusiness Case Development for Credit and Debit Card Fraud Re Scoring Models
Business Case Development for Credit and Debit Card Fraud Re Scoring Models Kurt Gutzmann Managing Director & Chief ScienAst GCX Advanced Analy.cs LLC www.gcxanalyacs.com October 20, 2011 www.gcxanalyacs.com
More informationCSC574  Computer and Network Security Module: Intrusion Detection
CSC574  Computer and Network Security Module: Intrusion Detection Prof. William Enck Spring 2013 1 Intrusion An authorized action... that exploits a vulnerability... that causes a compromise... and thus
More informationFRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANNBASED KNOWLEDGEDISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANNBASED KNOWLEDGEDISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
More informationT61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
More informationAddressing the Class Imbalance Problem in Medical Datasets
Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,
More informationDiscovering Criminal Behavior by Ranking Intelligence Data
UNIVERSITY OF AMSTERDAM Faculty of Science Discovering Criminal Behavior by Ranking Intelligence Data by 5889081 A thesis submitted in partial fulfillment for the degree of Master of Science in the field
More informationLecture 5: Evaluation
Lecture 5: Evaluation Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk 1 Overview 1 Recap/Catchup
More informationROC analysis of classifiers in machine learning: A survey
Intelligent Data Analysis 17 (2013) 531 558 531 DOI 10.3233/IDA130592 IOS Press ROC analysis of classifiers in machine learning: A survey Matjaž Majnik and Zoran Bosnić Faculty of Computer and Information
More informationKeywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
More informationMining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
More informationApplication of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware Cumhur Doruk Bozagac Bilkent University, Computer Science and Engineering Department, 06532 Ankara, Turkey
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationStock Market Forecasting Using Machine Learning Algorithms
Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationEvaluation of selected data mining algorithms implemented in Medical Decision Support Systems Kamila Aftarczuk
Master Thesis Software Engineering Thesis no: MSE200721 September 2007 Evaluation of selected data mining algorithms implemented in Medical Decision Support Systems Kamila Aftarczuk This thesis is submitted
More informationGetting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationThe Relationship Between PrecisionRecall and ROC Curves
Jesse Davis jdavis@cs.wisc.edu Mark Goadrich richm@cs.wisc.edu Department of Computer Sciences and Department of Biostatistics and Medical Informatics, University of WisconsinMadison, 2 West Dayton Street,
More informationData Mining as a tool to Predict the Churn Behaviour among Indian bank customers
Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers Manjit Kaur Department of Computer Science Punjabi University Patiala, India manjit8718@gmail.com Dr. Kawaljeet Singh University
More informationStatistical Validation and Data Analytics in ediscovery. Jesse Kornblum
Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?
More informationThe Operational Value of Social Media Information. Social Media and Customer Interaction
The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio MorenoGarcia
More informationTweaking Naïve Bayes classifier for intelligent spam detection
682 Tweaking Naïve Bayes classifier for intelligent spam detection Ankita Raturi 1 and Sunil Pranit Lal 2 1 University of California, Irvine, CA 92697, USA. araturi@uci.edu 2 School of Computing, Information
More informationTowards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial
More informationPredicting Deadline Transgressions Using Event Logs
Predicting Deadline Transgressions Using Event Logs Anastasiia Pika 1, Wil M. P. van der Aalst 2,1, Colin J. Fidge 1, Arthur H. M. ter Hofstede 1,2, and Moe T. Wynn 1 1 Queensland University of Technology,
More informationAnalysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
More informationData and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. StaffStudent Liaison Meeting
Inf1DA 2010 2011 III: 1 / 89 Informatics 1 School of Informatics, University of Edinburgh Data and Analysis Part III Unstructured Data Ian Stark February 2011 Inf1DA 2010 2011 III: 2 / 89 Part III Unstructured
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More informationData Mining Using SAS Enterprise Miner 7.1
Data Mining Using SAS Enterprise Miner 7.1 Lorne Rothman Lorne.rothman@sas.com Principal Statistician SAS Institute (Canada) Inc. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining The
More informationMachine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
More informationEdiscovery Taking Predictive Coding Out of the Black Box
Ediscovery Taking Predictive Coding Out of the Black Box Joseph H. Looby Senior Managing Director FTI TECHNOLOGY IN CASES OF COMMERCIAL LITIGATION, the process of discovery can place a huge burden on
More informationLecture 8: Signal Detection and Noise Assumption
ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,
More informationViviSight: A Sophisticated, Datadriven Business Intelligence Tool for Churn and Loan Default Prediction
ViviSight: A Sophisticated, Datadriven Business Intelligence Tool for Churn and Loan Default Prediction Barun Paudel 1, T.H. Gopaluwewa 1, M.R.De. Waas Gunawardena 1, W.C.H. Wijerathna 1, Rohan Samarasinghe
More informationUsing Machine Learning on Sensor Data
Journal of Computing and Information Technology  CIT 18, 2010, 4, 341 347 doi:10.2498/cit.1001913 341 Using Machine Learning on Sensor Data Alexandra Moraru 1,MarkoPesko 1,2, Maria Porcius 3, Carolina
More informationBinary and Ranked Retrieval
Binary and Ranked Retrieval Binary Retrieval RSV(d i,q j ) {0,1} Does not allow the user to control the magnitude of the output. In fact, for a given query, the system may return underdimensioned output
More informationROC Curve, Lift Chart and Calibration Plot
Metodološki zvezki, Vol. 3, No. 1, 26, 8918 ROC Curve, Lift Chart and Calibration Plot Miha Vuk 1, Tomaž Curk 2 Abstract This paper presents ROC curve, lift chart and calibration plot, three well known
More informationSelecting Data Mining Model for Web Advertising in Virtual Communities
Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland email: jerzy.surma@gmail.com Mariusz Łapczyński
More informationIs there ethics in algorithms? Searching for ethics in contemporary technology. Martin Peterson
Is there ethics in algorithms? Searching for ethics in contemporary technology Martin Peterson www.martinpeterson.org m.peterson@tue.nl What is an algorithm? An algorithm is a finite it sequence of welldefined
More informationConsolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, and José I. Martín Dept. of Computer
More informationMAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationLink Prediction Analysis in the Wikipedia Collaboration Graph
Link Prediction Analysis in the Wikipedia Collaboration Graph Ferenc Molnár Department of Physics, Applied Physics, and Astronomy Rensselaer Polytechnic Institute Troy, New York, 121 Email: molnaf@rpi.edu
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationA Decision Tree Classification Model for University Admission System
A Decision Tree Classification Model for University Admission ystem Abdul Fattah Mashat Faculty of Computing and Information Technology Jeddah, audi Arabia Mohammed M. Fouad Faculty of Informatics and
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set
More information