A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms



Similar documents
Learning with Skewed Class Distributions

Using Random Forest to Learn Imbalanced Data

Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ

ClusterOSS: a new undersampling method for imbalanced learning

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Advanced Ensemble Strategies for Polynomial Models

Random Forest Based Imbalanced Data Cleaning and Classification

DATA MINING FOR IMBALANCED DATASETS: AN OVERVIEW

Data Mining for Direct Marketing: Problems and

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Editorial: Special Issue on Learning from Imbalanced Data Sets

Handling imbalanced datasets: A review

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

Roulette Sampling for Cost-Sensitive Learning

New Ensemble Combination Scheme

On the effect of data set size on bias and variance in classification learning

Learning on the Border: Active Learning in Imbalanced Data Classification

Decision Tree Learning on Very Large Data Sets

Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results

Data Mining. Nonlinear Classification

Knowledge Discovery and Data Mining

A Novel Classification Approach for C2C E-Commerce Fraud Detection

Selecting Data Mining Model for Web Advertising in Virtual Communities

BUILDING CLASSIFICATION MODELS FROM IMBALANCED FRAUD DETECTION DATA

REVIEW OF ENSEMBLE CLASSIFICATION

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

How To Solve The Class Imbalance Problem In Data Mining

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Ensemble Data Mining Methods

Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1

Comparison of Data Mining Techniques used for Financial Data Analysis

Data Mining - Evaluation of Classifiers

Introducing diversity among the models of multi-label classification ensemble

Chapter 6. The stacking ensemble approach

A COMPARATIVE STUDY OF DECISION TREE ALGORITHMS FOR CLASS IMBALANCED LEARNING IN CREDIT CARD FRAUD DETECTION

On the Class Imbalance Problem *

E-commerce Transaction Anomaly Classification

Addressing the Class Imbalance Problem in Medical Datasets

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

How To Identify A Churner

Interactive Exploration of Decision Tree Results

Preprocessing Imbalanced Dataset Using Oversampling Approach

Meta Learning Algorithms for Credit Card Fraud Detection

Comparison of Bagging, Boosting and Stacking Ensembles Applied to Real Estate Appraisal

Expert Systems with Applications

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

An Analysis of Four Missing Data Treatment Methods for Supervised Learning

Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms

Learning Decision Trees for Unbalanced Data

Mining the Software Change Repository of a Legacy Telephony System

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)

Model Combination. 24 Novembre 2009

Getting Even More Out of Ensemble Selection

Minority Report in Fraud Detection: Classification of Skewed Data

Mining Life Insurance Data for Customer Attrition Analysis

Data Mining Analytics for Business Intelligence and Decision Support

L25: Ensemble learning

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

SVM Ensemble Model for Investment Prediction

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY

Review of Ensemble Based Classification Algorithms for Nonstationary and Imbalanced Data

Fraud Detection for Online Retail using Random Forests

Experiments in Web Page Classification for Semantic Web

II. RELATED WORK. Sentiment Mining

On the application of multi-class classification in physical therapy recommendation

Colon cancer survival prediction using ensemble data mining on SEER data

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble

Healthcare Measurement Analysis Using Data mining Techniques

Data Mining Practical Machine Learning Tools and Techniques

Knowledge Discovery and Data Mining

The class imbalance problem in pattern classification and learning. V. García J.S. Sánchez R.A. Mollineda R. Alejo J.M. Sotoca

Electronic Payment Fraud Detection Techniques

A Robust Method for Solving Transcendental Equations

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Classification for Fraud Detection with Social Network Analysis

Performance Analysis of Decision Trees

Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction

Using One-Versus-All classification ensembles to support modeling decisions in data stream mining

How To Detect Fraud With Reputation Features And Other Features

A Novel Ensemble Learning-based Approach for Click Fraud Detection in Mobile Advertising

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

Transcription:

Proceedings of the International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE 2009 30 June, 1 3 July 2009. A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms Claudia Regina Milaré 1, Gustavo E.A.P.A. Batista 1 and André C.P.L.F. Carvalho 1 1 Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo emails: cmilare@gmail.com, gbatista@icmc.usp.br, andre@icmc.usp.br Abstract There is an increasing interest in application of Evolutionary Algorithms to induce classification rules. This hybrid approach can aid in areas that classical methods to rule induction have not been completely successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when some classes heavily outnumbers other classes. Frequently, classical Machine Learning classifiers are not able to learn in the presence of imbalanced data sets, outputting classifiers that always predict the most numerous classes. In this work we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are given to classical Machine Learning systems that output rule sets. The rule sets are combined in a pool of rules and an evolutionary algorithm is used to build a classifier from this pool of rules. This hybrid approach has some advantages over under-sampling since it reduces the amount of discarded information, and some advantages over over-sampling since it avoids overfitting. This approach was experimentally analyzed and our experiments show the proposed approach can improve classification measured as the area under the ROC curve. Key words: Evolutionary Algorithms, Rule Subset Selection, Treatment of Imbalanced Classes.

1 Introduction A Hybrid Approach to Learn with Imbalanced Classes There is an increasing interest in application of evolutionary algorithms to induce classification rules. This hybrid approach can aid in areas that classical methods for rule induction have not been completely successful. One example is the induction of classification rules in class imbalanced domains. Imbalanced data occur when some classes heavily outnumber other classes. Frequently, classical Machine Learning inducers are not able to learn in the presence of imbalanced data sets. These inducers usually output classifiers that always predict the most numerous classes. However, in several domains, minority classes are the classes of interest, which are associated with highest costs. For instance, in detection of fraudulent transactions in credit cards and phone calls, diagnosis of rare diseases and prediction of weather events such as hoar-frosts, misclassifying minority class examples is more expensive than majority class examples, and a classifier that simply outputs a majority class is completely useless. In this work we propose a novel hybrid approach to deal with this problem. We believe the class imbalance problem is also a search problem. Therefore, an evolutionary algorithm is used to better search the hypothesis space. Our approach creates several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are given to classical Machine Learning systems that output rule sets. The rule sets are combined in a pool of rules and an evolutionary algorithm is used to select some of the rules from the pool in order to build a classifier. Our method uses under-sampling to create several balanced data sets. Undersampling eliminates majority class cases in order to create balanced data sets. A major drawback of under-sampling is that it can discard potentially useful data that could be important to the induction process. Our method overcomes this limitation by creating several data samples and, therefore, increasing the probability that all data is used for learning. Another method frequently used to treat imbalanced data sets is over-sampling. Over-sampling artificially increases the number of minority class cases. A major drawback of this method is that it supposedly increases the likelihood of occurring overfitting, since it makes exact copies of the minority class examples. Our method avoids overfitting by limiting the number of rules in each classifier. More precisely, the classifiers created by our method have the same number of rules as the individual classifiers induced from balanced data. In this work, we deal, without loss of generality, with two-class problems, where one of the classes significantly outnumbers the other class. Conceptlearning problems allow us to use the area under the ROC curve (AUC) [11] as the main metric to assess our results. The use of AUC is highly recommended in experiments with imbalanced classes since metrics that are dependent of costs

Claudia Milaré, Gustavo Batista, André Carvalho and class distribution, such as accuracy and error rate, are misleading in imbalanced domains. Our approach was experimentally analyzed in three application domains. Our experiments with C4.5Rules and Ripper, two well-know rule set induction algorithm, show the proposed approach can significantly improve classification measured as the area under the ROC curve. This paper is organized as follows: Section 2 presents some related work; Section 3 describes the proposed approach; Section 4 empirically evaluates the proposed approach on some application domains; and finally, in Section 5 concludes this paper and presents some directions for future work. 2 Related Work This section reviews some related work on class imbalance and methods to combine classifiers. 2.1 Class Imbalance The predictive accuracy of several Machine Learning algorithms is largely affected by the distribution of examples among the classes. Since most learning systems are designed with the assumption of well-balanced data sets, they fail to induce a classifier able to predict the minority class. These algorithms tend to assign new examples to the most frequent class. The investigation of alternatives to efficiently deal with class imbalance problem is a very important research issue, since imbalanced data sets can be found in several domains. For example, in fraud detection in telephonic calls [12] and credit card transactions [29], the number of legitimate transactions is usually much higher than the number of that fraudulent transactions; in analysis of insurance risk [25], only few clients ask for the insurance premium in a given time window; and in direct marketing [20], the number of respondents is usually very small (about 1%) for the majority of marketing campaigns. Several papers have analyzed the problem of learning from imbalance data sets (for instance [24, 20, 19, 12, 18, 17, 4]). Among the strategies followed by these works, three main approaches have been frequently used: Assigning different costs to misclassifications: higher costs are assigned to minority classes; Random under-sampling: artificially balancing the training data by eliminating examples from the majority class;

A Hybrid Approach to Learn with Imbalanced Classes Random over-sampling: artificially balancing the training data by replicating examples from the minority class. Our approach uses under-sampling as an intermediate step to create several balanced data sets. Other methods use a similar approach to deal with class imbalances. For instance, [7] split the examples from the majority class into several non-overlapping subsets with approximately the same number of examples as the minority class. Each subset is combined with all minority class examples, forming balanced data sets that are given to a learning algorithm. The classifiers obtained are integrated using stacking [30]. A similar approach is proposed in [21], in which Adaboost [14] integrates the outputs of several classifiers induced from balanced data sets obtained with under-sampling. Our approach differs from previously published work, since we are interested in creating symbolic classifiers, i.e., we are interested in creating classifiers that can be easily interpreted by humans. Although, ensembles can be build over several individual symbolic classifiers, the final classifier cannot be considered a symbolic classifier, since this classifier cannot be easily interpreted. 2.2 Combining Classifiers There are several papers in literature describing different alternatives for the combination of knowledge bases. A straightforward approach to combine knowledge bases is to use ensembles [23, 10]. An ensemble is composed by a set of individual classifiers which predictions are combined to determine the label of a new instance. Frequently, an ensemble is more precise than its individual classifiers. Despite the gain in performance that generally is obtained by ensembles, the combination of symbolic classifiers usually results in a non-symbolic final classifier. Two well-known ensemble methods are Bagging and Boosting. Bagging [6] is one of the oldest and simplest techniques for creating an ensemble of classifiers. This technique uses majority vote to combine predictions from individual classifiers, assigning the most frequently predicted class to the final classification. Differently from bagging, in Boosting [28], each training example is associated with a weight. This weight is related to the accuracy of the induced hypothesis for that particular example. A hypothesis is induced per iteration and the weights associated with each example might be modified. A second approach to combine knowledge bases is to integrated the knowledge generated by different classifiers into a unique knowledge base and, then, to use a rule selection method to create a classifier. In [26], the authors proposed an algorithm named ROCCER to select rules based on the performance of the rules on the ROC space. Another technique that uses a similar approach is the

Claudia Milaré, Gustavo Batista, André Carvalho GARSS algorithm [3]. GARSS uses an Evolutionary Algorithm to select rules that maximize AUC measure. Both ROCCER and GARSS were used in context of associative classification. Other works using Evolutionary Algorithm to select rules that combine knowledge from a large knowledge base can be found in [15, 5]. The general methodology of inducing several classifiers with sampling and integrating the knowledge of these classifiers in a final classifier was initially proposed by Fayyad, Djorgovski, and Weir [13]. This methodology was implemented in the RULLER system and used in the SKICAT project, which objective was to catalog and analyze objects in digitized sky images. According to the authors, this methodology was able to generate a robust set of rules. Moreover, the induced classifier was more precise than astronomers in the classification of cosmic objects from photographs. The methodology adopted in the RULLER system was extended in the XRULLER system (extended RULER) [2] to use knowledge induced from different algorithms. 3 Proposed Approach The approach proposed in this work uses an Evolutionary Algorithm to select rules in order to maximize the AUC metric for problems with imbalanced classes. Evolutionary Algorithms are search algorithms based on natural selection and genetics [16]. They evolve sets of potential solutions (population), usually encoded as a sequence of bits (chromosomes). The evolution is performed by applying a set of transformations (usually, the genetic operations crossover and mutation), and by evaluating the quality (fitness) of the solutions. As previously mentioned, our approach is based on under-sampling. However, in order to reduce the probability of information loss due to discarded data, our approach creates several under-sampled training sets. Given a training set T = T + T, where T + is the set of positive (minority) examples, and T is the set of negative (majority) examples, n random samples T1,..., Tn are created from T. Each random sample has the same number of examples as the positive set, i.e., T i = T +, 1 i n. In total, n balanced training sets T i are created by joining T + with each T i, i.e., T i = T + T i, 1 i n. The parameter n was set to 100 in our experiments. Rule sets are induced from each training set T i, the rules from all rule sets are integrated into a unique pool of rules, and the repeated rules are discarded. In our experiments, the Machine Learning algorithms C4.5Rules [27] and Ripper [9] were used to induce the rule sets. The pool of rules is given as input to an Evolutionary Algorithm. A primary key, represented as a natural number, is associated to each rule in the pool. Therefore, each rule can be accessed independently by its key. Our method uses

A Hybrid Approach to Learn with Imbalanced Classes the Pittsburgh approach to encode classifiers as chromosomes. Thus, an array of keys is used to represent a chromosome, i.e., a rule set to be interpreted as a classifier. In our implementation, the initial population is randomly composed by 40 chromosomes. The evaluation function used is the AUC metric measured over the training examples. The selection method is the fitness-proportionate selection, in which the number of times a chromosome is expected to reproduce is proportional to its fitness. A simple crossover operator was applied with probability 0.4. The mutation operator alters the value of randomly selected elements in the chromosomes. Thus, it exchanges a random selected rule by other rule. The mutation operator was applied with probability 0.1. Mutation and crossover rates were chosen based on our previous experience with Evolutionary Algorithms [22]. The number of generations is limited to 20. Finally, our implementation uses an elitism operator for population replacement. According to this method, the best chromosome from each population is preserved to the next generation. As previously mentioned, each chromosome represents a classifier. We have fixed the number of rules in each chromosome, aiming to avoid overfitting. As the fitness function is the AUC measured over the training set, allowing the chromosomes to grow up causes the Evolutionary Algorithm to create chromosomes with many rules that do not generalize well. In our experiments, we set the number of rules of each chromosome differently to each data set. This number is approximately the mean number of rules obtained by the classifiers induced over each T i, and varies from 6 up to 20, depending on the data set. Therefore, the classifiers induced by our approach are as complex, in terms of number of rules, as the individual classifiers. 4 Experimental Evaluation Experiments were carried out in order to evaluate the proposed approach. The experiments were performed using three benchmark data sets, collected from the UCI repository [1], and one real-world data set used in [8]. These data sets are related to classification problems, covering different application domains. They are all imbalanced data sets. Table 1 summarizes the main features of these data sets, which are: Identifier identification of the data set used in the text; #Examples the total number of examples; #Attributes (quanti., quali.) the total number of attributes, as well as the number of continuous and nominal attributes; #Classes the number of classes; Missing Data indicates if there are missing values; and, Majority Class Error the expected error of a classifier that always outputs the majority class. Each of the three data sets was divided into 10 pairs of training and test sets using random subsampling. The training examples are 70% of the original

Claudia Milaré, Gustavo Batista, André Carvalho Table 1: Data sets summary description. Identifier #Examples #Attributes #Classes Missing Majority Class (quanti., quali.) Data Error Abalone 4176 8 (7, 1) 2 No 2,717% Mammography 11182 6 (6,0) 2 No 2,316% Page-blocks 5472 10 (10,0) 2 No 10,216% data set and the test set 30%. Within each fold, 100 balanced data sets were created with all minority class examples and a random sample of the majority class examples, as previously described. These balanced data sets were given to symbolic Machine Learning algorithms. The symbolic algorithms used in the experiments were C4.5Rules [27] and Ripper [9]. The C4.5Rules and Ripper were run with their default parameters. The classification rules generated for each training set and by each algorithm were combined into a pool of rules. An Evolutionary Algorithm was used to build a classifier from this pool of rules. As described in Section 3, the chromosome size was defined according to the average size of the classifiers generated by C4.5Rules and Ripper. Table 2 shows the size of chromosomes used by the Evolutionary Algorithm for each combination of data set and inducer. Table 2: Chromosomes sizes. Data Set C4.5Rules Ripper Abalone 20 6 Mammography 10 6 Page-blocks 15 10 Table 3 presents the results obtained with the C4.5Rules inducer. All results are mean AUC values calculated over the 10 pairs of training and test sets. The results are divided into three columns. Column C4.5Rules presents the results obtained by the C4.5Rules inducer over imbalanced data; column Under-sampling presents the results obtained by C4.5Rules over a balanced data set obtained by under-sampling the training set; and, column EA-C4.5Rules presents the results obtained by the proposed approach. As can be observed, EA-C4.5Rules presents the higher mean AUC value for all data sets. Table 4 presents the results obtained with the Ripper inducer. Similarly with the previous table, column Ripper presents the results obtained by the Ripper inducer over imbalanced data; column Under-sampling presents the results obtained by Ripper over a balanced data set obtained by under-sampling the training set; and, column EA-Ripper presents the results obtained by the proposed approach.

A Hybrid Approach to Learn with Imbalanced Classes Table 3: Average AUC score for C4.5Rules inducer. Data Set C4.5Rules Under-sampling EA-C4.5Rules Abalone 77.93 (02.23) 78.31 (02.27) 80.95 (00.91) Mammography 87.20 (03.70) 89.43 (02.55) 90.30 (02.87) Page-blocks 97.26 (01.42) 96.97 (00.84) 97.56 (01.09) As can be observed, EA-Ripper also presents the highest mean AUC values for all data sets. Table 4: Average AUC score for Ripper inducer. Data Set Ripper Under-sampling EA-Ripper Abalone 59.91 (02.49) 75.30 (03.88) 81.06 (02.84) Mammography 78.20 (02.68) 88.73 (01.82) 91.97 (02.68) Page-blocks 92.41 (01.31) 95.10 (00.80) 96.71 (00.47) 5 Conclusion and Future Work This work presents a novel hybrid approach to deal with class imbalances that combines symbolic Machine Learning inducers and Evolutionary Algorithms. Our approach uses Evolutionary Algorithms to perform a more extensive search over the hypothesis space. In addition, our approach creates classifiers with the same number of rules as the classifiers induced from balanced data sets obtained by under-sampling. Therefore, our method is able to create more precise classifiers, in terms of area under the ROC curve, and with the same complexity as the individual classifiers. As future work, we plan to investigate new metrics to compose rules into a classifier. These metrics indicate which rule should be fired in cases that multiple rules cover an example. These metrics have a direct influence over the performance of classifiers and novel metrics, designed to imbalanced classes, can potentially improve classification on imbalanced data sets. Acknowledgements. This research was partially supported by the Brazilian research agencies FAPESP and CNPq. References [1] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.

Claudia Milaré, Gustavo Batista, André Carvalho [2] J. A. Baranauskas and M. C. Monard. Combining symbolic classifiers from multiple inducers. Knowledge Based Systems, 16(3):129 136, 2003. Elsevier Science. [3] G. E. A. P. A. Batista, C. R. Milaré, R. C. Prati, and M. C. Monard. A comparison of methods for rule subset selection applied to associative classification. Inteligencia Artificial, (32):29 35, 2006. [4] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter, 6(1):20 29, 2004. Special issue on Learning from Imbalanced Datasets. [5] F. C. Bernardini, R. C. Prati, and M. C. Monard. Evolving sets of symbolic classifiers into a single symbolic classifier using genetic algorithms. In International Conference on Hybrid Intelligent Systems, pages 525 530, Washington, DC, USA, 2008. IEEE Computer Society. [6] L. Breiman. Bagging predictors. Machine Learning, 24(2):123 140, 1996. [7] P. K. Chan and S. J. Stolfo. Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In International Conference on Knowledge Discovery and Data Mining, pages 164 168, 1998. [8] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321 357, 2002. [9] W. Cohen. Fast effective rule induction. In International Conference on Machine Learning, pages 115 123, 1995. [10] T. G. Dietterich. Machine Learning Research: Four Current Directions. Technical report, Oregon State University, 1997. [11] T. Fawcett. Roc graphs: Notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Labs, 2004. [12] T. Fawcett and F. J. Provost. Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery, 1(3):291 316, 1997. [13] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. The kdd process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11):27 34, 1996.

A Hybrid Approach to Learn with Imbalanced Classes [14] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 139, 1997. [15] A. Ghosh and B. Nath. Multi-objective rule mining using genetic algorithms. Information Sciences, 163(1-3):123 133, 2004. [16] D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, 1989. [17] N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5):429 449, 2002. [18] M. Kubat, R. Holte, and S. Matwin. Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30:195 215, 1998. [19] M. Kubat and S. Matwin. Addressing the course of imbalanced training sets: One-sided selection. In International Conference in Machine Learning, pages 179 186. Morgan Kaufmann, 1997. [20] C. X. Ling and C. Li. Data mining for direct mining: Problems and solutions. In International Conference on Knownledge Discovery and Data Mining, pages 73 79, 1998. [21] X. Y. Liu, J. Wu, and Z. H. Zhou. Exploratory undersampling for classimbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(2):539 550, 2009. [22] C. R. Milaré, G. E. A. P. A. Batista, A. C. P. L. F. Carvalho, and M. C. Monard. Applying genetic and symbolic learning algorithms to extract rules from artificial neural neworks. In Proc. Mexican International Conference on Artificial Intelligence, volume 2972 of LNAI, pages 833 843. Springer-Verlag, 2004. [23] D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11:169 198, 1999. [24] M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk. Reducing misclassification costs. In International Conference in Machine Learning, pages 217 225, 1994. [25] E. P. D. Pednault, B. K. Rosen, and C. Apte. Handling imbalanced data sets in insurance risk modeling. Technical Report RC-21731, IBM Research Report, March 2000.

Claudia Milaré, Gustavo Batista, André Carvalho [26] R. C. Prati and P. A. Flach. Roccer: An algorithm for rule learning based on ROC analysis. In International Joint Conference on Artificial Intelligence (IJCAI 2005), pages 823 828, 2005. [27] J. R. Quinlan. C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, CA, 1988. [28] R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197 227, 1990. [29] S. J. Stolfo, D. W. Fan, W. Lee, A. L. Prodromidis, and P. K. Chan. Credit card fraud detection using meta-learning: Issues and initial results. In AAAI- 97 Workshop on AI Methods in Fraud and Risk Management, pages 83 90, 1997. [30] D. H. Wolpert. Stacked generalization. Neural Networks, 5:241 259, 1992.