# Performance Measures for Machine Learning

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Performance Measures for Machine Learning 1

2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2

3 Accuracy Target: 0/1, -1/+1, True/False, Prediction = f(inputs) = f(x): 0/1 or Real Threshold: f(x) > thresh => 1, else => 0 threshold(f(x)): 0/1 #right / #total accuracy = Â i=1kn ( 1- (target i - threshold( f ( x r i )))) 2 p( correct ): p(threshold(f(x)) = target) N 3

4 Confusion Matrix Predicted 1 Predicted 0 correct True 0 True 1 a c b d incorrect threshold accuracy = (a+d) / (a+b+c+d) 4

5 Prediction Threshold Predicted 1 Predicted 0 True 0 True 1 0 b 0 d threshold > MAX(f(x)) all cases predicted 0 (b+d) = total accuracy = %False = %0 s Predicted 1 Predicted 0 True 0 True 1 a 0 c 0 threshold < MIN(f(x)) all cases predicted 1 (a+c) = total accuracy = %True = %1 s 5

6 optimal threshold 82% 0 s in data 18% 1 s in data 6

7 threshold demo 7

8 Problems with Accuracy Assumes equal cost for both kinds of errors cost(b-type-error) = cost (c-type-error) is 99% accuracy good? can be excellent, good, mediocre, poor, terrible depends on problem is 10% accuracy bad? information retrieval BaseRate = accuracy of predicting predominant class (on most problems obtaining BaseRate accuracy is easy) 8

9 Percent Reduction in Error 80% accuracy = 20% error suppose learning increases accuracy from 80% to 90% error reduced from 20% to 10% 50% reduction in error 99.90% to 99.99% = 90% reduction in error 50% to 75% = 50% reduction in error can be applied to many other measures 9

10 Costs (Error Weights) Predicted 1 Predicted 0 True 0 True 1 w a w c w b w d Often W a = W d = zero and W b W c zero 10

11 11

12 12

13 Lift not interested in accuracy on entire dataset want accurate predictions for 5%, 10%, or 20% of dataset don t care about remaining 95%, 90%, 80%, resp. typical application: marketing lift(threshold) = %positives > threshold % dataset > threshold how much better than random prediction on the fraction of the dataset predicted true (f(x) > threshold) 13

14 Lift Predicted 1 Predicted 0 True 0 True 1 a c b d lift = a (a + b) (a + c) (a + b + c + d) threshold 14

15 lift = 3.5 if mailings sent to 20% of the customers 15

16 Lift and Accuracy do not always correlate well Problem 1 Problem 2 (thresholds arbitrarily set at 0.5 for both lift and accuracy) 16

17 Precision and Recall typically used in document retrieval Precision: how many of the returned documents are correct precision(threshold) Recall: how many of the positives does the model return recall(threshold) Precision/Recall Curve: sweep thresholds 17

18 Precision/Recall Predicted 1 Predicted 0 True 0 True 1 a c b d PRECISION = a /(a + c) RECALL = a /(a + b) threshold 18

19 19

20 Summary Stats: F & BreakEvenPt PRECISION = a /(a + c) RECALL = a /(a + b) harmonic average of precision and recall F = 2 * (PRECISION RECALL) (PRECISION + RECALL) BreakEvenPoint = PRECISION = RECALL 20

21 better performance worse performance 21

22 F and BreakEvenPoint do not always correlate well Problem 1 Problem 2 22

23 Predicted 1 Predicted 0 Predicted 1 Predicted 0 True 0 True 1 true positive false positive false negative true negative True 0 True 1 TP FP FN TN Predicted 1 Predicted 0 Predicted 1 Predicted 0 True 0 True 1 hits false alarms misses correct rejections True 0 True 1 P(pr1 tr1) P(pr1 tr0) P(pr0 tr1) P(pr0 tr0) 23

24 ROC Plot and ROC Area Receiver Operator Characteristic Developed in WWII to statistically model false positive and false negative detections of radar operators Better statistical foundations than most other measures Standard measure in medicine and biology Becoming more popular in ML 24

25 ROC Plot Sweep threshold and plot TPR vs. FPR Sensitivity vs. 1-Specificity P(true true) vs. P(true false) Sensitivity = a/(a+b) = Recall = LIFT numerator 1 - Specificity = 1 - d/(c+d) 25

26 diagonal line is random prediction 26

27 Properties of ROC ROC Area: 1.0: perfect prediction 0.9: excellent prediction 0.8: good prediction 0.7: mediocre prediction 0.6: poor prediction 0.5: random prediction <0.5: something wrong! 27

28 Properties of ROC Slope is non-increasing Each point on ROC represents different tradeoff (cost ratio) between false positives and false negatives Slope of line tangent to curve defines the cost ratio ROC Area represents performance averaged over all possible cost ratios If two ROC curves do not intersect, one method dominates the other If two ROC curves intersect, one method is better for some cost ratios, and other method is better for other cost ratios 28

29 Problem 1 Problem 2 29

30 Problem 1 Problem 2 30

31 Problem 1 Problem 2 31

32 Summary the measure you optimize to makes a difference the measure you report makes a difference use measure appropriate for problem/community accuracy often is not sufficient/appropriate ROC is gaining popularity in the ML community only accuracy generalizes to >2 classes! 32

### Data Mining Practical Machine Learning Tools and Techniques

Counting the cost Data Mining Practical Machine Learning Tools and Techniques Slides for Section 5.7 In practice, different types of classification errors often incur different costs Examples: Loan decisions

### Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

### Performance Measures in Data Mining

Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München

### The many faces of ROC analysis in machine learning. Peter A. Flach University of Bristol, UK

The many faces of ROC analysis in machine learning Peter A. Flach University of Bristol, UK www.cs.bris.ac.uk/~flach/ Objectives After this tutorial, you will be able to [model evaluation] produce ROC

### Machine Learning model evaluation. Luigi Cerulo Department of Science and Technology University of Sannio

Machine Learning model evaluation Luigi Cerulo Department of Science and Technology University of Sannio Accuracy To measure classification performance the most intuitive measure of accuracy divides the

### Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

### COMP9417: Machine Learning and Data Mining

COMP9417: Machine Learning and Data Mining A note on the two-class confusion matrix, lift charts and ROC curves Session 1, 2003 Acknowledgement The material in this supplementary note is based in part

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC

### Hani Tamim, PhD Associate Professor Department of Internal Medicine American University of Beirut

Hani Tamim, PhD Associate Professor Department of Internal Medicine American University of Beirut A graphical plot which illustrates the performance of a binary classifier system as its discrimination

### Evaluation in Machine Learning (1) Evaluation in machine learning. Aims. 14s1: COMP9417 Machine Learning and Data Mining.

Acknowledgements Evaluation in Machine Learning (1) 14s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 19, 2014 Material derived

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise

### Machine Learning (CS 567)

Machine Learning (CS 567) Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol Han (cheolhan@usc.edu)

### Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

Inf1-DA 2010 2011 III: 1 / 89 Informatics 1 School of Informatics, University of Edinburgh Data and Analysis Part III Unstructured Data Ian Stark February 2011 Inf1-DA 2010 2011 III: 2 / 89 Part III Unstructured

### Health Care and Life Sciences

Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations Wen Zhu 1, Nancy Zeng 2, Ning Wang 2 1 K&L consulting services, Inc, Fort Washington,

### ROC Graphs: Notes and Practical Considerations for Data Mining Researchers

ROC Graphs: Notes and Practical Considerations for Data Mining Researchers Tom Fawcett 1 1 Intelligent Enterprise Technologies Laboratory, HP Laboratories Palo Alto Alexandre Savio (GIC) ROC Graphs GIC

### Evaluation and Credibility. How much should we believe in what was learned?

Evaluation and Credibility How much should we believe in what was learned? Outline Introduction Classification with Train, Test, and Validation sets Handling Unbalanced Data; Parameter Tuning Cross-validation

### CLASSIFICATION JELENA JOVANOVIĆ. Web:

CLASSIFICATION JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is classification? Binary and multiclass classification Classification algorithms Performance measures

### Evaluation and Cost-Sensitive Learning. Cost-Sensitive Learning

1 J. Fürnkranz Evaluation and Cost-Sensitive Learning Evaluation Hold-out Estimates Cross-validation Significance Testing Sign test ROC Analysis Cost-Sensitive Evaluation ROC space ROC convex hull Rankers

### Data Mining Practical Machine Learning Tools and Techniques

Credibility: Evaluating what s been learned Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Issues: training, testing,

### Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

### Quality and Complexity Measures for Data Linkage and Deduplication

Quality and Complexity Measures for Data Linkage and Deduplication Peter Christen and Karl Goiser Department of Computer Science, Australian National University, Canberra ACT 0200, Australia {peter.christen,karl.goiser}@anu.edu.au

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### The validation of Credit Rating and Scoring Models

The validation of Credit Rating and Scoring Models raffaella.calabrese1@unimib.it University of Milano-Bicocca Swiss Statistics Meeting Geneva, Switzerland October 29th, 2009 Outline The validation process

### Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status

Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of

### Business Case Development for Credit and Debit Card Fraud Re- Scoring Models

Business Case Development for Credit and Debit Card Fraud Re- Scoring Models Kurt Gutzmann Managing Director & Chief ScienAst GCX Advanced Analy.cs LLC www.gcxanalyacs.com October 20, 2011 www.gcxanalyacs.com

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

### Online Performance Anomaly Detection with

ΘPAD: Online Performance Anomaly Detection with Tillmann Bielefeld 1 1 empuxa GmbH, Kiel KoSSE-Symposium Application Performance Management (Kieker Days 2012) November 29, 2012 @ Wissenschaftszentrum Kiel

### Introduction to Machine Learning

Introduction to Machine Learning Isabelle Guyon isabelle@clopinet.com What is Machine Learning? Learning algorithm Trained machine TRAINING DATA Answer Query What for? Classification Time series prediction

### Classifiers & Classification

Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

### Performance Metrics. number of mistakes total number of observations. err = p.1/1

p.1/1 Performance Metrics The simplest performance metric is the model error defined as the number of mistakes the model makes on a data set divided by the number of observations in the data set, err =

### CSC574 - Computer and Network Security Module: Intrusion Detection

CSC574 - Computer and Network Security Module: Intrusion Detection Prof. William Enck Spring 2013 1 Intrusion An authorized action... that exploits a vulnerability... that causes a compromise... and thus

### Confidence Intervals for the Area Under an ROC Curve

Chapter 261 Confidence Intervals for the Area Under an ROC Curve Introduction Receiver operating characteristic (ROC) curves are used to assess the accuracy of a diagnostic test. The technique is used

### Link Prediction Analysis in the Wikipedia Collaboration Graph

Link Prediction Analysis in the Wikipedia Collaboration Graph Ferenc Molnár Department of Physics, Applied Physics, and Astronomy Rensselaer Polytechnic Institute Troy, New York, 121 Email: molnaf@rpi.edu

### Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

### An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University

### Leak Detection Theory: Optimizing Performance with MLOG

Itron White Paper Water Loss Management Leak Detection Theory: Optimizing Performance with MLOG Rich Christensen Vice President, Research & Development 2009, Itron Inc. All rights reserved. Introduction

### PREDICTING SUCCESS IN THE COMPUTER SCIENCE DEGREE USING ROC ANALYSIS

PREDICTING SUCCESS IN THE COMPUTER SCIENCE DEGREE USING ROC ANALYSIS Arturo Fornés arforser@fiv.upv.es, José A. Conejero aconejero@mat.upv.es 1, Antonio Molina amolina@dsic.upv.es, Antonio Pérez aperez@upvnet.upv.es,

### Evaluation in Machine Learning (2) Aims. Introduction

Acknowledgements Evaluation in Machine Learning (2) 15s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales Material derived from slides

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### Evaluating Machine Learning Algorithms. Machine DIT

Evaluating Machine Learning Algorithms Machine Learning @ DIT 2 Evaluating Classification Accuracy During development, and in testing before deploying a classifier in the wild, we need to be able to quantify

### T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

### Machine Learning. Topic: Evaluating Hypotheses

Machine Learning Topic: Evaluating Hypotheses Bryan Pardo, Machine Learning: EECS 349 Fall 2011 How do you tell something is better? Assume we have an error measure. How do we tell if it measures something

### Statistical foundations of machine learning

Machine learning p. 1/45 Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Machine learning p. 2/45

### Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:

### Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

### Big Data in Education

Big Data in Education Alex J. Bowers, Ph.D. Associate Professor of Education Leadership Teachers College, Columbia University April 24, 2014 Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight

### A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

### Analysis on Weighted AUC for Imbalanced Data Learning Through Isometrics

Journal of Computational Information Systems 8: (22) 37 378 Available at http://www.jofcis.com Analysis on Weighted AUC for Imbalanced Data Learning Through Isometrics Yuanfang DONG, 2, Xiongfei LI,, Jun

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### 1. Classification problems

Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

### Evaluation of Diagnostic Tests

Biostatistics for Health Care Researchers: A Short Course Evaluation of Diagnostic Tests Presented ed by: Siu L. Hui, Ph.D. Department of Medicine, Division of Biostatistics Indiana University School of

### Evaluating Protein-protein Interaction Predictors with a Novel 3-Dimensional Metric

Evaluating Protein-protein Interaction Predictors with a Novel 3-Dimensional Metric Haohan Wang 1, Madhavi K. Ganapathiraju, PhD 2 1 Language Technologies Institute, School of Computer Science, Carnegie

### Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

### Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

### An Approach to Detect Spam Emails by Using Majority Voting

An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,

### False_ If there are no fixed costs, then average cost cannot be higher than marginal cost for all output levels.

LECTURE 10: SINGLE INPUT COST FUNCTIONS ANSWERS AND SOLUTIONS True/False Questions False_ When a firm is using only one input, the cost function is simply the price of that input times how many units of

### The Relationship Between Precision-Recall and ROC Curves

Jesse Davis jdavis@cs.wisc.edu Mark Goadrich richm@cs.wisc.edu Department of Computer Sciences and Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 2 West Dayton Street,

### Measuring Lift Quality in Database Marketing

Measuring Lift Quality in Database Marketing Gregory Piatetsky-Shapiro Xchange Inc. One Lincoln Plaza, 89 South Street Boston, MA 2111 gps@xchange.com Sam Steingold Xchange Inc. One Lincoln Plaza, 89 South

### Data Mining - The Next Mining Boom?

Howard Ong Principal Consultant Aurora Consulting Pty Ltd Abstract This paper introduces Data Mining to its audience by explaining Data Mining in the context of Corporate and Business Intelligence Reporting.

### Fraud Detection for Online Retail using Random Forests

Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.

### Lecture 5: Evaluation

Lecture 5: Evaluation Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk 1 Overview 1 Recap/Catchup

### A Systematic Review of Fault Prediction approaches used in Software Engineering

A Systematic Review of Fault Prediction approaches used in Software Engineering Sarah Beecham Lero The Irish Software Engineering Research Centre University of Limerick, Ireland Tracy Hall Brunel University

### A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences

### ROC Graphs: Notes and Practical Considerations for Data Mining Researchers

ROC Graphs: Notes and Practical Considerations for Data Mining Researchers Tom Fawcett Intelligent Enterprise Technologies Laboratory HP Laboratories Palo Alto HPL-23-4 January 7 th, 23* E-mail: tom_fawcett@hp.com

### STATISTICA Scorecard

Formula Guide is a comprehensive tool dedicated for developing, evaluating, and monitoring scorecard models. For more information see TUTORIAL Developing Scorecards Using STATISTICA Scorecard [4]. is an

### ROC Curve, Lift Chart and Calibration Plot

Metodološki zvezki, Vol. 3, No. 1, 26, 89-18 ROC Curve, Lift Chart and Calibration Plot Miha Vuk 1, Tomaž Curk 2 Abstract This paper presents ROC curve, lift chart and calibration plot, three well known

### You buy a TV for \$1000 and pay it off with \$100 every week. The table below shows the amount of money you sll owe every week. Week 1 2 3 4 5 6 7 8 9

Warm Up: You buy a TV for \$1000 and pay it off with \$100 every week. The table below shows the amount of money you sll owe every week Week 1 2 3 4 5 6 7 8 9 Money Owed 900 800 700 600 500 400 300 200 100

### Using Random Forest to Learn Imbalanced Data

Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,

### Is there ethics in algorithms? Searching for ethics in contemporary technology. Martin Peterson

Is there ethics in algorithms? Searching for ethics in contemporary technology Martin Peterson www.martinpeterson.org m.peterson@tue.nl What is an algorithm? An algorithm is a finite it sequence of well-defined

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Using Static Code Analysis Tools for Detection of Security Vulnerabilities

Using Static Code Analysis Tools for Detection of Security Vulnerabilities Katerina Goseva-Popstajanova & Andrei Perhinschi Lane Deptartment of Computer Science and Electrical Engineering West Virginia

### Learning to Recognize Missing E-mail Attachments

Learning to Recognize Missing E-mail Attachments Technical Report TUD KE 2009-05 Marco Ghiglieri, Johannes Fürnkranz Knowledge Engineering Group, Technische Universität Darmstadt Knowledge Engineering

### Discovering Criminal Behavior by Ranking Intelligence Data

UNIVERSITY OF AMSTERDAM Faculty of Science Discovering Criminal Behavior by Ranking Intelligence Data by 5889081 A thesis submitted in partial fulfillment for the degree of Master of Science in the field

### Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?

### Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.

Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling

### Anomaly detection. Problem motivation. Machine Learning

Anomaly detection Problem motivation Machine Learning Anomaly detection example Aircraft engine features: = heat generated = vibration intensity Dataset: New engine: (vibration) (heat) Density estimation

### Bayes Theorem & Diagnostic Tests Screening Tests

Bayes heorem & Screening ests Bayes heorem & Diagnostic ests Screening ests Some Questions If you test positive for HIV, what is the probability that you have HIV? If you have a positive mammogram, what

### Cost curves: An improved method for visualizing classifier performance

Mach Learn (2006) 65:95 130 DOI 10.1007/s10994-006-8199-5 Cost curves: An improved method for visualizing classifier performance Chris Drummond Robert C. Holte Received: 18 May 2005 / Revised: 23 November

### Problem Set #3 Answer Key

Problem Set #3 Answer Key Economics 305: Macroeconomic Theory Spring 2007 1 Chapter 4, Problem #2 a) To specify an indifference curve, we hold utility constant at ū. Next, rearrange in the form: C = ū

### E-discovery Taking Predictive Coding Out of the Black Box

E-discovery Taking Predictive Coding Out of the Black Box Joseph H. Looby Senior Managing Director FTI TECHNOLOGY IN CASES OF COMMERCIAL LITIGATION, the process of discovery can place a huge burden on

### ROC Analysis of Partitioning Method for Activity Recognition Using Two Microphones

ROC Analysis of Partitioning Method for Activity Recognition Using Two Microphones Jame A Ward, Paul Lukowicz and Gerhard Tröster Abstract. We present an analysis, using ROC curves, of a method for partitioning

### ROC Analysis of Partitioning Method for Activity Recognition Using Two Microphones

ROC Analysis of Partitioning Method for Activity Recognition Using Two Microphones JamieAWard, Paul Lukowicz and Gerhard Tröster Abstract. We present an analysis, using ROC curves, of a method for partitioning

### IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,

### Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach. ab422@cornell.edu November 11, 2009. Amit Belani

Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach Amit Belani ab422@cornell.edu November 11, 2009 Abstract A bag-of-words based probabilistic classifier is trained using regularized

### GRADES 7, 8, AND 9 BIG IDEAS

Table 1: Strand A: BIG IDEAS: MATH: NUMBER Introduce perfect squares, square roots, and all applications Introduce rational numbers (positive and negative) Introduce the meaning of negative exponents for

### CS 5410 - Computer and Network Security: Intrusion Detection

CS 5410 - Computer and Network Security: Intrusion Detection Professor Kevin Butler Fall 2015 Locked Down You re using all the techniques we will talk about over the course of the semester: Strong access

### Model Building and Gains from Trade

Model Building and Gains from Trade Previously... Economics is the study of how people allocate their limited resources to satisfy their nearly unlimited wants. Scarcity refers to the limited nature of

### Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, and José I. Martín Dept. of Computer

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Chapter 21. More About Tests and Intervals. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 21 More About Tests and Intervals Copyright 2012, 2008, 2005 Pearson Education, Inc. Zero In on the Null Null hypotheses have special requirements. To perform a hypothesis test, the null must be

### Abstract. 1. Introduction. 1.1. Methodology

Fingerprint Recognition System Performance in the Maritime Environment Hourieh Fakourfar 1, Serge Belongie 2* 1 Department of Electrical and Computer Engineering, and 2 Department of Computer Science and

### Guido Sciavicco. 11 Novembre 2015

classical and new techniques Università degli Studi di Ferrara 11 Novembre 2015 in collaboration with dr. Enrico Marzano, CIO Gap srl Active Contact System Project 1/27 Contents What is? Embedded Wrapper

### Detecting Credit Card Fraud

Case Study Detecting Credit Card Fraud Analysis of Behaviometrics in an online Payment environment Introduction BehavioSec have been conducting tests on Behaviometrics stemming from card payments within

### Despite its emphasis on credit-scoring/rating model validation,

RETAIL RISK MANAGEMENT Empirical Validation of Retail Always a good idea, development of a systematic, enterprise-wide method to continuously validate credit-scoring/rating models nonetheless received

### Section 12.6: Directional Derivatives and the Gradient Vector

Section 26: Directional Derivatives and the Gradient Vector Recall that if f is a differentiable function of x and y and z = f(x, y), then the partial derivatives f x (x, y) and f y (x, y) give the rate

### Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification