An Analysis of Reliable Classifiers through ROC Isometrics

Similar documents
Effect Sizes Based on Means

Price Elasticity of Demand MATH 104 and MATH 184 Mark Mac Lean (with assistance from Patrick Chan) 2011W

1 Gambler s Ruin Problem

The Online Freeze-tag Problem

Monitoring Frequency of Change By Li Qin

A MOST PROBABLE POINT-BASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION

Large-Scale IP Traceback in High-Speed Internet: Practical Techniques and Theoretical Foundation

A Simple Model of Pricing, Markups and Market. Power Under Demand Fluctuations

Multiperiod Portfolio Optimization with General Transaction Costs

An Introduction to Risk Parity Hossein Kazemi

An important observation in supply chain management, known as the bullwhip effect,

Softmax Model as Generalization upon Logistic Discrimination Suffers from Overfitting

Point Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11)

The risk of using the Q heterogeneity estimator for software engineering experiments

DAY-AHEAD ELECTRICITY PRICE FORECASTING BASED ON TIME SERIES MODELS: A COMPARISON

A Modified Measure of Covert Network Performance

Joint Production and Financing Decisions: Modeling and Analysis

FREQUENCIES OF SUCCESSIVE PAIRS OF PRIME RESIDUES

Static and Dynamic Properties of Small-world Connection Topologies Based on Transit-stub Networks

Jena Research Papers in Business and Economics

X How to Schedule a Cascade in an Arbitrary Graph

ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS

Machine Learning with Operational Costs

Local Connectivity Tests to Identify Wormholes in Wireless Networks

CRITICAL AVIATION INFRASTRUCTURES VULNERABILITY ASSESSMENT TO TERRORIST THREATS

Time-Cost Trade-Offs in Resource-Constraint Project Scheduling Problems with Overlapping Modes

Forensic Science International

A Multivariate Statistical Analysis of Stock Trends. Abstract

An Associative Memory Readout in ESN for Neural Action Potential Detection

F inding the optimal, or value-maximizing, capital

Web Application Scalability: A Model-Based Approach

Comparing Dissimilarity Measures for Symbolic Data Analysis

Minimizing the Communication Cost for Continuous Skyline Maintenance

Large firms and heterogeneity: the structure of trade and industry under oligopoly

Risk in Revenue Management and Dynamic Pricing

An inventory control system for spare parts at a refinery: An empirical comparison of different reorder point methods

SQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWO-DIMENSIONAL GRID

The Magnus-Derek Game

Stochastic Derivation of an Integral Equation for Probability Generating Functions

Service Network Design with Asset Management: Formulations and Comparative Analyzes

On Multicast Capacity and Delay in Cognitive Radio Mobile Ad-hoc Networks

Two-resource stochastic capacity planning employing a Bayesian methodology

PARAMETER CHOICE IN BANACH SPACE REGULARIZATION UNDER VARIATIONAL INEQUALITIES

Risk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7

Automatic Search for Correlated Alarms

Concurrent Program Synthesis Based on Supervisory Control

The fast Fourier transform method for the valuation of European style options in-the-money (ITM), at-the-money (ATM) and out-of-the-money (OTM)

Load Balancing Mechanism in Agent-based Grid

A Virtual Machine Dynamic Migration Scheduling Model Based on MBFD Algorithm

High Quality Offset Printing An Evolutionary Approach

Beyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis

On the predictive content of the PPI on CPI inflation: the case of Mexico

c 2009 Je rey A. Miron 3. Examples: Linear Demand Curves and Monopoly

Asymmetric Information, Transaction Cost, and. Externalities in Competitive Insurance Markets *

On the (in)effectiveness of Probabilistic Marking for IP Traceback under DDoS Attacks

Normally Distributed Data. A mean with a normal value Test of Hypothesis Sign Test Paired observations within a single patient group

Pressure Drop in Air Piping Systems Series of Technical White Papers from Ohio Medical Corporation

The Economics of the Cloud: Price Competition and Congestion

Managing specific risk in property portfolios

Introduction to NP-Completeness Written and copyright c by Jie Wang 1

6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks

United Arab Emirates University College of Sciences Department of Mathematical Sciences HOMEWORK 1 SOLUTION. Section 10.1 Vectors in the Plane

IMPROVING NAIVE BAYESIAN SPAM FILTERING

POISSON PROCESSES. Chapter Introduction Arrival processes

Title: Stochastic models of resource allocation for services

C-Bus Voltage Calculation

Buffer Capacity Allocation: A method to QoS support on MPLS networks**

Re-Dispatch Approach for Congestion Relief in Deregulated Power Systems

Alpha Channel Estimation in High Resolution Images and Image Sequences

Storage Basics Architecting the Storage Supplemental Handout

Computational Finance The Martingale Measure and Pricing of Derivatives

The Economics of the Cloud: Price Competition and Congestion

Synopsys RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Development FRANCE

Pinhole Optics. OBJECTIVES To study the formation of an image without use of a lens.

CABRS CELLULAR AUTOMATON BASED MRI BRAIN SEGMENTATION

Robust portfolio choice with CVaR and VaR under distribution and mean return ambiguity

IEEM 101: Inventory control

Service Network Design with Asset Management: Formulations and Comparative Analyzes

Characterizing and Modeling Network Traffic Variability

Stability Improvements of Robot Control by Periodic Variation of the Gain Parameters

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 29, NO. 4, APRIL Load-Balancing Spectrum Decision for Cognitive Radio Networks

Provable Ownership of File in De-duplication Cloud Storage

COST CALCULATION IN COMPLEX TRANSPORT SYSTEMS

On Software Piracy when Piracy is Costly

Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network

TRANSMISSION Control Protocol (TCP) has been widely. On Parameter Tuning of Data Transfer Protocol GridFTP for Wide-Area Networks

Principles of Hydrology. Hydrograph components include rising limb, recession limb, peak, direct runoff, and baseflow.

HALF-WAVE & FULL-WAVE RECTIFICATION

Project Management and. Scheduling CHAPTER CONTENTS

Penalty Interest Rates, Universal Default, and the Common Pool Problem of Credit Card Debt

The Relationship Between Precision-Recall and ROC Curves

Design of A Knowledge Based Trouble Call System with Colored Petri Net Models

Expert Systems with Applications

Piracy and Network Externality An Analysis for the Monopolized Software Industry

The impact of metadata implementation on webpage visibility in search engine results (Part II) q

Failure Behavior Analysis for Reliable Distributed Embedded Systems

Factoring Variations in Natural Images with Deep Gaussian Mixture Models

Probabilistic models for mechanical properties of prestressing strands

Evaluating a Web-Based Information System for Managing Master of Science Summer Projects

Transcription:

An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit Maastricht, PO Box 616, 6200 MD Maastricht, The Netherlands. Abstract Reliable classifiers abstain from uncertain instance classifications. In this aer we extend our revious aroach to construct reliable classifiers which is based on isometrics in Receiver Oerator Characteristic (ROC) sace. We analyze the conditions to obtain a reliable classifier with higher erformance than reviously ossible. Our results show that the aroach is generally alicable to boost erformance on each class simultaneously. Moreover, the aroach is able to construct a classifier with at least a desired erformance er class. 1. Introduction Machine learning classifiers were alied to various classification roblems. Nevertheless, only few classifiers were emloyed in domains with high misclassification costs, e.g., medical diagnosis and legal ractice. In these domains it is desired to have classifiers that abstain from uncertain instance classifications such that a desired level of reliability is obtained. These classifiers are called reliable classifiers. Recently, we roosed an easy-to-visualize aroach to reliable instance classification (Vanderlooy et al., 2006). Classification erformance is visualized as an ROC curve and a reliable classifier is constructed by skiing the art of the curve that reresents instances difficult to classify. The transformation to the ROC curve of the reliable classifier was rovided. An analysis showed when and where this new curve dominates the original one. If the underlying data of both curves have aroximately equal class distributions, then dominance immediately results in erfor- Aearing in Proceedings of the ICML 2006 worksho on ROC Analysis in Machine Learning, Pittsburgh, USA, 2006. Coyright 2006 by the author(s)/owner(s). mance increase. However, in case of different class distributions and a erformance measure that is classdistribution deendent, dominance of an ROC curve does not always guarantee an increase in erformance. In this aer we analyze for which erformance metrics the aroach boosts erformance on each class simultaneously. We restrict to widely used metrics characterized by rotating linear isometrics (Fürnkranz & Flach, 2005). Furthermore, skew sensitive metrics are used to generalize the aroach to each ossible scenario of error costs and class distributions. This aer is organized as follows. Section 2 rovides terminology and notation. Section 3 gives a brief background on ROC curves. Sections 4 and 5 introduce skew sensitive evaluation and isometrics, resectively. Section 7 defines reliable classifiers and their visualization in ROC sace. In Section 8 we rovide our main contribution. Section 9 concludes the aer. 2. Terminology and Notation We consider classification roblems with two classes: ositive () and negative (n). A discrete classifier is a maing from instances to classes. Counts of true ositives, false ositives, true negatives and false negatives are denoted by TP, FP, TN, and FN, resectively. The number of ositive instances is P = TP + FN. Similarly, N = TN + FP is the number of negative instances. From these counts the following statistics are derived: TP tr = TP + FN FP fr = FP + TN TN tnr = TN + FP FN fnr = TP + FN True ositive rate is denoted by tr and true negative rate by tnr. False ositive rate and false negative rate are denoted by fr and fnr, resectively. Note that tnr = 1 fr and fnr = 1 tr. Most classifiers are rankers or scoring classifiers. They

outut two ositive values l(x ) and l(x n) that indicate the likelihood that an instance x is ositive and negative, resectively. The score of an instance combines these values as follows: l(x) = l(x ) l(x n) (1) and can be used to rank instances from most likely ositive to most likely negative (Lachiche & Flach, 2003). 3. ROC Curves The erformance of a discrete classifier can be reresented by a oint (fr, tr) in ROC sace. Otimal erformance is obtained in (0, 1). Points (0, 0) and (1, 1) reresent classifiers that always redict the negative and ositive class, resectively. The ascending diagonal connects these oints and reresents the strategy of random classification. A threshold on the score l(x) transforms a scoring classifier into a discrete one. Instances with a score higher than or equal to this threshold are classified as ositive. The remaining instances are classified as negative. An ROC curve shows what haens with the corresonding confusion matrix for each ossible threshold (Fawcett, 2003). The convex hull of the ROC curve (ROCCH) removes concavities. Theorem 1 For any oint (fr, tr) on an ROCCH a classifier can be constructed that has the erformance reresented by that oint. Provost and Fawcett (2001) rove this theorem. For simlicity of resentation, in the following we will assume that ROC curves are convex and all oints can be obtained by a threshold. 4. Skew Sensitive Evaluation The metrics tr, fr, tnr, and fnr evaluate erformance on a single class. This follows from the confusion matrix since values are used from a single column. In most cases a metric is desired that indicates erformance on both classes simultaneously. Unfortunately, such metric assumes that the class distribution of the alication domain is known and used in the test set. Accuracy is a well-known examle. Provost et al. (1998) showed that classifier selection with this metric has two severe shortcomings with regard to class and error costs distributions. To overcome these roblems, Flach (2003) considers class and error costs distributions as a arameter of erformance metrics. Evaluation with these metrics is called skew sensitive evaluation. The arameter is called the skew ratio and exresses the relative imortance of negative versus ositive class: c = c(, n) P (n) c(n, ) P () (2) Here, c(, n) and c(n, ) denote the costs of a false ositive and false negative, resectively 1. The robabilities of a ositive and negative instance are denoted by P () = P and P (n) = N class ratio is then P (n) P () = N P., resectively. The From Eq. 2 it is clear that we can cover all ossible scenarios of class and cost distributions by a single value of c used as arameter in the erformance metric. If c < 1 (c > 1), then the ositive (negative) class is most imortant. In the following we assume without restriction that c is the ratio of negative to ositive instances in the test set, i.e., c = N P. The reader should kee in mind that our results are also valid for c = c(,n) N c(n,) P. 5. ROC Isometrics Classifier erformance is evaluated on both classes. We define a ositive (negative) erformance metric as a metric that measures erformance on the ositive (negative) classifications. The skew sensitive metrics used in this aer are summarized in Table 1. An exlanation of these metrics follows. ROC isometrics are collections of oints in ROC sace with the same value for a erformance metric. Flach (2003) and Fürnkranz and Flach (2005) investigate isometrics to understand metrics. However, isometrics can be used for the task of classifier selection and to construct reliable classifiers (see Section 6). Table 1 also shows the isometrics for the erformance metrics. They are obtained by fixing the erformance metric and rewriting its equation to that of a line in ROC sace. Varying the value of the metric results in linear lines that rotate around a single oint in which the metric is undefined. 5.1. Precision Positive recision, rec c, is defined as the roortion of true ositives to the total number of ositive classifications. The isometrics are linear lines that rotate around the origin (0, 0). 1 Benefits of true ositives and true negatives are incororated by adding them to the corresonding errors. This oeration normalizes the cost matrix such that the two values on the main diagonal are zero.

Table 1. Performance metrics and corresonding isometrics defined in terms of fr, tr, c = N, α P R+, and ˆm = Metric Indicator Formula Isometric Pos. recision Neg. recision Pos. F -measure Neg. F -measure Pos. gm-estimate Neg. gm-estimate rec c rec c n F c,α F c,α n gm tr tr+c fr tr = recc 1 rec c fr c tnr tnr+ 1 c fnr (1+α 2 )tr α 2 +tr+c fr tr = (1+α 2 )tnr α 2 +tnr+ 1 c fnr tr = 1 recc n rec c n F c,α 1+α 2 F c,α tr = 1+α2 F c,α n Fn c,α tr+ ˆm tr+c fr+ ˆm(1+c) tr = gm 1 gm tnr+ ˆm tnr+ 1 1+c c fnr+ ˆm c tr = ˆm 1 gmc, n c fr + 1 1 recc n c c fr + rec c n α2 F c,α 1+α 2 F c,α c fr + 1 + (1+α2 )(F c,α c fr + c fr + 1 n 1) c Fn c,α ˆm ˆm(gmc, (1+c) 1) 1 gm ˆm ˆm 1 gmc, n c + (1+c) c) ˆm(gmc, n m. Figure 1. Precision isometrics in ROC sace: solid lines are rec 1 -isometrics and dashed lines are rec 1 n-isometrics. Figure 2. F -measure isometrics in ROC sace: solid lines are F 1,1 -isometrics and dashed lines are Fn 1,1 -isometrics. The case of negative recision, rec c n, is similar. Corresonding isometrics rotate around oint (1, 1). Figure 1 shows rec c -isometrics and rec c n-isometrics for c = 1. In this and subsequent figures the value of the erformance metric is varied from 0.1 to 0.9. 5.2. F -measure Positive recision is maximized when all ositive classifications are correct. To know if rec c uses enough ositive instances to be considered as reliable, it is combined with tr. Note that rec c and tr are antagonistic, i.e., if rec c goes u, then tr usually goes down (and vice versa). Rijsbergen (1979) introduced the ositive F -measure for the trade-off between these metrics: F c,α = (1 + ( α2 ) rec c tr ) 1 + α 2 tr α 2 rec c = + tr α 2 (3) + tr + c fr where the arameter α indicates the imortance given to rec c relative to tr. If α < 1 (α > 1) then tr is less (more) imortant than rec c. If α = 1, then both terms are equally imortant. The isometrics of F c,α are linear lines rotating around ( α2 c, 0). Therefore, they can be seen as a shifted version of the ositive recision isometrics. The larger c and/or the smaller α, the smaller the difference with rec c -isometrics. Similar to F c,α the negative F -measure is a metric for the trade-off between rec c n and tnr. Isometrics are a shifted version of the rec c n-isometrics and rotate around (1, 1 + α 2 c). Figure 2 shows F c,α -isometrics and Fn c,α -isometrics for c = 1 and α = 1 in the relevant region (0, 1) (0, 1) of ROC sace. 5.3. Generalized m-estimate The m-estimate comutes a recision estimate assuming that m instances are a riori classified. One of the

main reasons why it is favored over recision is that it is less sensitive to noise and more effective in avoiding overfitting (Fürnkranz & Flach, 2005; Lavrac & Dzeroski, 1994, Chaters 8-10). This is esecially true if the metric is used for the minority class when the class distribution is very skewed. The ositive m-estimate assumes that m instances are a riori classified as ositive. These instances are distributed according to the class distribution in the training set: or equivalently: = TP + m P TP + FP + m gm c,m = tr + m tr + c fr + m P gm c,m (4) (5) Figure 3. Generalized m-estimate isometrics in ROC sace: solid lines are gm 1,0.1 -isometrics and dashed lines are -isometrics. gm 1,0.1 n To eliminate absolute numbers P and N we define ˆm = m and obtain the formula in Table 1. Fürnkranz and Flach (2005) call this metric the ositive gmestimate (generalized m-estimate) since ˆm defines the rotation oint of the isometrics (see below) 2. The isometrics of the gm -estimate rotate around ( ˆm, ˆm). If ˆm = 0, then we obtain rec c -isometrics. For ˆm the erformance metric converges to 1 1+c = P () and the corresonding isometric is the ascending diagonal. The case of the negative gm-estimate is similar. The rotation oint of the isometrics is (1 + ˆm, 1 + ˆm). Figure 3 shows gm -isometrics and -isometrics for c = 1 and ˆm = 0.1. For simlicity of resentation, in the following the isometric of a ositive (negative) erformance metric is simly called a ositive (negative) isometric. 6. Classifier Design through Isometrics In Vanderlooy et al. (2006) we used recision isometrics as a tool to design classifiers. We generalize this aroach to include all isometrics defined in Section 5. For secific skew ratio, a ositive isometric is build with a desired ositive erformance. By definition, the intersection oint (fr a, tr a ) with an ROCCH reresents a classifier with this erformance. Similarly, the intersection oint (fr b, tr b ) of a negative isometric and the ROCCH reresents a classifier with negative erformance defined by that isometric. If we 2 The gm-estimate of Fürnkranz and Flach (2005) is more general than ours since they also vary a = 1 in Eq. 5. (a) (b) (c) Figure 4. Location of intersection between a ositive and negative isometric: (a) Case 1, (b) Case 2, and (c) Case 3. assume that the ositive and negative isometrics intersect each other in the relevant region of ROC sace, then three cases can be distinguished to construct the desired classifier (see Figure 4). Case 1: the isometrics intersect on the ROCCH The discrete classifier corresonding to this oint has the erformance defined by both isometrics. Theorem 1 guarantees that we can construct it. Therefore, the isometrics rovide an aroach to construct a classifier with a desired erformance er class. Case 2: the isometrics intersect below the ROCCH This classifier can also be constructed. However, the classifiers corresonding to any oint on the ROCCH between (fr b, tr b ) and (fr a, tr a ) have better erformance. Case 3: the isometrics intersect above the ROCCH There is no classifier with the desired erformances. To increase erformance instances between (fr a, tr a ) and (fr b, tr b ) are not classified. In case

of more than one intersection oint for the ositive (negative) isometric and the ROCCH, the intersection oint with highest tr (lowest fr) is chosen such that fr a < fr b. Then, the number of unclassified instances is minimized. The resulting classifier is called a reliable classifier. 7. Reliable Instance Classification A scoring classifier is almost never otimal: there exists negative instances with higher score than some ositive instances. A reliable classifier abstains from these uncertain instance classifications. It simulates the behavior of a human exert in fields with high error costs. For examle, in medical diagnosis an exert does not state a ossibly incorrect diagnosis but she says I do not know and erforms more tests. Similar to Ferri and Hernández-Orallo (2004), we define a reliable classifier as a filtering mechanism with two thresholds a > b. An instance x is classified as ositive if l(x) a. If l(x) b, then x is classified as negative. Otherwise, the instance is left unclassified. Unclassified instances can be rejected, assed to a human, or to another classifier (Ferri et al., 2004). Pietraszek (2005) chooses a and b to minimize exected cost, also considering the abstention costs. Here, we focus on erformance on the classified instances. Counts of unclassified ositives and unclassified negatives are denoted by UP and UN, resectively. Unclassified ositive rate and unclassified negative rate are then defined as follows: UP ur = TP+FN +UP (6) UN unr = FP+TN +UN (7) We define thresholds a and b to corresond with oints (fr a, tr a ) and (fr b, tr b ), resectively. The ROC curve of the reliable classifier is obtained by skiing the art between (fr a, tr a ) and (fr b, tr b ). By definition we have: ur = tr b tr a (8) unr = fr b fr a (9) The transformation from the original ROC curve to that of the reliable classifier is given in Theorem 2. Theorem 2 If the art between oints (fr a, tr a ) and (fr b, tr b ) of an ROC curve is skied with 0 < ur < 1 and 0 < unr < 1, then oints (fr x, tr x ) on this curve between (0, 0) and (fr a, tr a ) are transformed into oints (fr x, tr x) such that: fr x = fr x 1 unr, tr x = tr x 1 ur (10) Figure 5. ROCCH 2 is obtained by not covering the art between (fr a, tr a ) and (fr b, tr b ) of ROCCH 1. The length of the horizontal (vertical) line below ROCCH 1 equals unr (ur). Points (fr x, tr x ) between (fr b, tr b ) and (1, 1) are transformed into oints (fr x, tr x) such that: fr x = 1 1 fr x 1 unr, tr x = 1 1 tr x 1 ur (11) The roof is in Vanderlooy et al. (2006). Note that the transformations of (fr a, tr a ) and (fr b, tr b ) are the same oint on the new ROC curve. Figure 5 shows an examle of a transformation. The intersection oints are obtained with recision isometrics for c = 1, rec c = 0.93, and rec c n = 0.87. Theorem 3 If the original ROC curve is convex, then the ROC curve obtained by not considering the oints between (fr a, tr a ) and (fr b, tr b ) is also convex. We roved this theorem in Vanderlooy et al. (2006). There, we also analyzed when and where the original ROCCH is dominated by that of the reliable classifier. Note that the underlying data of both ROCCHs can have a different class distribution when ur unr. For skew insensitive metrics or when ur unr, dominance of a ROCCH will immediately result in erformance increase. In the next Section 8 we analyze when the skew sensitive erformance metrics in Table 1 can be boosted by abstention. 8. Effect on Performance We defined (fr a, tr a ) and (fr b, tr b ) as intersection oints of an ROCCH and ositive and negative isometric, resectively. The tye of isometrics defines the effect on the erformance of the reliable classifier corresonding to (fr a, tr a) as defined in Theorem 2.

8.1. Precision Theorem 4 rovides an easy and comutationally efficient aroach to construct a classifier with a desired recision er class. Theorem 4 If oints (fr a, tr a ) and (fr b, tr b ) are defined by an rec c -isometric and rec c n-isometric resectively, then the oint (fr a, tr a) has the recisions of both isometrics. The roof of this theorem and also of following theorems are included in the aendix. Since isometrics of skew sensitive erformance metrics are used, the aroach does not commit to costs and class distributions 3. Thus, when the alication domain changes a new reliable classifier can be constructed from the original ROC curve only. Theorem 4 together with the next Theorem 5 rovides an aroach to construct a classifier with desired accuracy. This aroach overcomes the roblems with accuracy exlained in Section 4. From the roof it follows that if the recisions are not equal, then the accuracy is bounded by the smallest and largest recision. Theorem 5 If oint (fr a, tr a) has rec c = rec c n, then the accuracy in this oint equals the recisions. 8.2. F -measure Theorem 6 shows that also the F -measure can be boosted on both classes if a art of an ROC curve is not covered. In this case, the resulting classifier has higher erformance than defined by both isometrics. Figure 6 gives an examle where ositive (negative) erformance is increased with aroximately 5% (10%). Theorem 6 If oints (fr a, tr a ) and (fr b, tr b ) are defined by an F c,α -isometric and Fn c,α -isometric resectively, then the oint (fr a, tr a) has higher erformance than defined by both isometrics. 8.3. Generalized m-estimate To analyze the effect of abstention on the gm-estimate, we can consider the number of a riori classified instances m to be fixed or the arameter ˆm to be fixed. Consider the case when m is not changed after transformation. In this case ur and unr can change the distribution of a riori instances over the classes. If ur < unr, then the distribution of these instances in 3 Remember that, although our roofs use the simlest case c = N c(,n) N, the results are also valid for c =. P c(n,) P Figure 6. Designing with F -measure isometrics: F 2,1 = 0.72 in (fr a, tr a ) and Fn 2,1 = 0.75 in (fr b, tr b ). The reliable classifier reresented by (fr a, tr 1.84,1 a ) has F = 0.7693 and Fn 1.84,1 = 0.8597. The abstention is reresented by ur = 0.1541 and unr = 0.2116. the ositive gm-estimate moves to the true ositives resulting in higher erformance. For the negative gmestimate, the distribution moves to the false negatives resulting in lower erformance. The case of ur > unr is the other way around. Therefore, an increase in erformance in both classes is only ossible iff ur = unr. For the case when ˆm is not changed after transformation, a similar reasoning results in imrovement of the ositive gm-estimate if ur unr and tr a fr a. The latter condition holds for all oints on the ROCCH. Similarly, imrovement in the negative gmestimate occurs if ur unr and tr b fr b. Thus, we find the following theorems for the gm-estimate. Theorem 7 If oint (fr a, tr a ) is defined by an gm -estimate isometric with m > 0 and if ur unr, then the oint (fr a, tr a) has at least the ositive erformance defined by that isometric. Theorem 8 If oint (fr b, tr b ) is defined by an -estimate isometric with m > 0 and if ur unr, then the oint (fr a, tr a) has at least the negative erformance defined by that isometric. Corollary 1 If oints (fr a, tr a ) and (fr b, tr b ) are defined by an gm -estimate isometric and - estimate isometric resectively with m > 0 and if ur = unr, then the oint (fr a, tr a) has at least the gm-estimates of both isometrics. We suggest to use the gm-estimate for the minority class only and to use a normal recision for the majority class. From Theorems 7 and 8, if the minority

We may conclude that the roosed aroach is able to boost erformance on each class simultaneously. Benefits of the aroach are numerous: it guarantees a classifier with an accetable erformance in domains with high error costs, it is efficient in terms of time and sace, classifier indeendent, and it incororates changing error costs and class distributions easily. Acknowledgments We thank the reviewers for useful comments and suggestions. This work was artially suorted by the Dutch Organization for Scientific Research (NWO), grant nr: 634.000.435. Figure 7. Designing with recision and gm-estimate isometrics: rec 0.3 = 0.97 in (fr a, tr a ) and gm 0.3,0.1 n = 0.55 in (fr b, tr b ). The reliable classifier reresented by (fr a, tr a ) has rec0.3 = 0.97 and gm 0.34,0.18 n = 0.5584. The abstention is reresented by ur = 0.4549 and unr = 0.3763. class is the ositive (negative) class, then we need an abstention characterized by ur unr (ur unr). Figure 7 shows an examle with fixed m and the negative class as minority class. Therefore, we want that the -estimate isometric covers a large art in ROC sace and consequently the condition ur unr is easily satisfied. 9. Conclusions A reliable classifier abstains from uncertain instance classifications. Benefits are significant in alication domains with high error costs, e.g., medical diagnosis and legal ractice. A classifier is transformed into a reliable one by not covering a art of its ROC curve. This art is defined by two isometrics indicating erformance on a different class. In case of a classifier and corresonding reliable classifier, dominance of an ROC curve immediately reresents an increase in erformance if the underlying data of both curves have aroximately equal class distributions. Since this assumtion is too strong, we analyzed when erformance can be boosted by abstention. We showed how to construct a (reliable) classifier with a desired recision er class. We did the same for accuracy. For the F -measure a classifier is obtained with at least the desired erformance er class. To revent a ossible erformance decrease with the gmestimate, we roose to use it for the minority class and to use a normal recision for the majority class. References Fawcett, T. (2003). ROC grahs: Notes and ractical considerations for researchers (Technical Reort HPL-2003-4). HP Laboratories. Ferri, C., Flach, P., & Hernández-Orallo, J. (2004). Delegating classifiers. Proceedings of the 1st International Worksho on ROC Analysis in Artificial Intelligence (ROCAI-2004) (. 37 44). Ferri, C., & Hernández-Orallo, J. (2004). Cautious classifiers. Proceedings of the 1st International Worksho on ROC Analysis in Artificial Intelligence (ROCAI-2004) (. 27 36). Flach, P. (2003). The geometry of ROC sace: Understanding machine learning metrics through ROC isometrics. Proceedings of the 20th International Conference on Machine Learning (ICML-2003) (. 194 201). Fürnkranz, J., & Flach, P. (2005). Roc n rule learning towards a better understanding of covering algorithms. Machine Learning, 58, 39 77. Lachiche, N., & Flach, P. (2003). Imroving accuracy and cost of two-class and multi-class robabilistic classifiers using ROC curves. Proceedings of the 20th International Conference on Machine Learning (ICML-2003) (. 416 423). Lavrac, N., & Dzeroski, S. (1994). Inductive logic rogramming: Techniques and alications. Ellis Horwood, New York. Pietraszek, T. (2005). Otimizing abstaining classifiers using ROC analysis. Proceedings of the 22th International Conference on Machine Learning (ICML- 2005) (. 665 672).

Provost, F., & Fawcett, T. (2001). Robust classification for imrecise environments. Machine Learning, 42, 203 231. Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comaring induction algorithms. Proceedings of the 15th International Conference on Machine Learning (ICML- 1998) (. 43 48). Rijsbergen, C. V. (1979). Information retrieval. Deartment of Comuter Science, University of Glasgow. 2nd edition. Vanderlooy, S., Srinkhuizen-Kuyer, I., & Smirnov, E. (2006). Reliable classifiers in ROC sace. Proceedings of the 15th Annual Machine Learning Conference of Belgium and the Netherlands (BENELEARN-2006) (. 113 120). A. Proofs Proof of Theorem 4 The ositive recisions in (fr a, tr a ) and (fr a, tr a) are defined as follows: rec c (fr a, tr a ) = ( rec c fr a, tr a) = tr a tr a + c fr a (12) tr a tr a + c fr a (13) with c = c 1 unr 1 ur. Substitution of Eq. 10 in Eq. 13 results in Eq 12. In a similar way, Eq. 11 is used to show that the negative recisions in (fr b, tr b ) and (fr b, tr b ) are the same. The theorem follows since (fr b, tr b ) = (fr a, tr a). Proof of Theorem 5 Since the ositive recision and negative recision in (fr a, tr a) are equal, we can write: tr a = a ( tr a + c fr ) a (14) ( tnr a = a tnr a + 1 ) c fnr a (15) with a = rec c = rec c n. It follows that: tr a + c tnr a = a ( tr a + c fr a + c tnr a + fnr ) a or equivalently: tr a + c tnr a a = tr a + c fr a + c tnr a + fnr a and this is the accuracy with skew ratio c. (16) (17) Proof of Theorem 6 The ositive F -measures in (fr a, tr a ) and (fr a, tr a) are defined as follows: F c,α (fr a, tr a ) = ( F c,α fr a, tr a) = ( ) 1 + α 2 tr a α 2 (18) + tr a + c fr ( ) a 1 + α 2 tr a α 2 + tr a + c fr (19) a Using Eq. 10 and c = c 1 unr 1 ur, the right-hand side of Eq. 19 becomes: ( ) 1 + α 2 tr a α 2 (20) (1 ur) + tr a + c fr a ( ) It follows that F c,α fr a, tr a > F c,α (fr a, tr a ) since 0 < ur < 1. The case of the negative F -measure is similar. Proof of Theorem 7 The ositive gm-estimates in (fr a, tr a ) and (fr a, tr a) are defined as follows: gm gm c, ˆm with ˆm = tr+ ˆm tr+c fr+ ˆm(1+c) (21) (fr a, tr a ) = ( fr a, tr a) = tr + ˆm tr +c fr + ˆm (1+c ) (22) m, and c = c 1 unr 1 ur. Case 1: m is not changed after transformation In this case we can write ˆm m = P (1 ur)+n(1 unr). Substitution of Eq. 10 in Eq. 22 results in the following right-hand side: Clearly, gm c, ˆm 1 ur P (1 ur)+n(1 unr) tr + m (23) tr + c fr + ˆm(1 + c) ( ) fr a, tr a gm (fr a, tr a ) iff: 1 ur P (1 ur) + N(1 unr) 1 P + N This holds iff ur unr. (24) Case 2: ˆm is not changed after transformation Substitution of Eq. 10 in Eq. 22 with fixed ˆm results in the following right-hand side: tr + ˆm(1 ur) tr + c fr + ˆm(1 ur + c(1 unr)) (25) Straightforward ( ) comutation results in gm c, ˆm fr a, tr a gm (fr a, tr a ) iff: ˆm(unr ur) + (tr a unr fr a ur) 0 (26) This holds if ur unr and tr a fr a. Proof of Theorem 8 The roof is similar to that of Theorem 7.