A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms
|
|
- Iris Paul
- 8 years ago
- Views:
Transcription
1 Proceedings of the International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE June, 1 3 July A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms Claudia Regina Milaré 1, Gustavo E.A.P.A. Batista 1 and André C.P.L.F. Carvalho 1 1 Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo s: cmilare@gmail.com, gbatista@icmc.usp.br, andre@icmc.usp.br Abstract There is an increasing interest in application of Evolutionary Algorithms to induce classification rules. This hybrid approach can aid in areas that classical methods to rule induction have not been completely successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when some classes heavily outnumbers other classes. Frequently, classical Machine Learning classifiers are not able to learn in the presence of imbalanced data sets, outputting classifiers that always predict the most numerous classes. In this work we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are given to classical Machine Learning systems that output rule sets. The rule sets are combined in a pool of rules and an evolutionary algorithm is used to build a classifier from this pool of rules. This hybrid approach has some advantages over under-sampling since it reduces the amount of discarded information, and some advantages over over-sampling since it avoids overfitting. This approach was experimentally analyzed and our experiments show the proposed approach can improve classification measured as the area under the ROC curve. Key words: Evolutionary Algorithms, Rule Subset Selection, Treatment of Imbalanced Classes.
2 1 Introduction A Hybrid Approach to Learn with Imbalanced Classes There is an increasing interest in application of evolutionary algorithms to induce classification rules. This hybrid approach can aid in areas that classical methods for rule induction have not been completely successful. One example is the induction of classification rules in class imbalanced domains. Imbalanced data occur when some classes heavily outnumber other classes. Frequently, classical Machine Learning inducers are not able to learn in the presence of imbalanced data sets. These inducers usually output classifiers that always predict the most numerous classes. However, in several domains, minority classes are the classes of interest, which are associated with highest costs. For instance, in detection of fraudulent transactions in credit cards and phone calls, diagnosis of rare diseases and prediction of weather events such as hoar-frosts, misclassifying minority class examples is more expensive than majority class examples, and a classifier that simply outputs a majority class is completely useless. In this work we propose a novel hybrid approach to deal with this problem. We believe the class imbalance problem is also a search problem. Therefore, an evolutionary algorithm is used to better search the hypothesis space. Our approach creates several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are given to classical Machine Learning systems that output rule sets. The rule sets are combined in a pool of rules and an evolutionary algorithm is used to select some of the rules from the pool in order to build a classifier. Our method uses under-sampling to create several balanced data sets. Undersampling eliminates majority class cases in order to create balanced data sets. A major drawback of under-sampling is that it can discard potentially useful data that could be important to the induction process. Our method overcomes this limitation by creating several data samples and, therefore, increasing the probability that all data is used for learning. Another method frequently used to treat imbalanced data sets is over-sampling. Over-sampling artificially increases the number of minority class cases. A major drawback of this method is that it supposedly increases the likelihood of occurring overfitting, since it makes exact copies of the minority class examples. Our method avoids overfitting by limiting the number of rules in each classifier. More precisely, the classifiers created by our method have the same number of rules as the individual classifiers induced from balanced data. In this work, we deal, without loss of generality, with two-class problems, where one of the classes significantly outnumbers the other class. Conceptlearning problems allow us to use the area under the ROC curve (AUC) [11] as the main metric to assess our results. The use of AUC is highly recommended in experiments with imbalanced classes since metrics that are dependent of costs
3 Claudia Milaré, Gustavo Batista, André Carvalho and class distribution, such as accuracy and error rate, are misleading in imbalanced domains. Our approach was experimentally analyzed in three application domains. Our experiments with C4.5Rules and Ripper, two well-know rule set induction algorithm, show the proposed approach can significantly improve classification measured as the area under the ROC curve. This paper is organized as follows: Section 2 presents some related work; Section 3 describes the proposed approach; Section 4 empirically evaluates the proposed approach on some application domains; and finally, in Section 5 concludes this paper and presents some directions for future work. 2 Related Work This section reviews some related work on class imbalance and methods to combine classifiers. 2.1 Class Imbalance The predictive accuracy of several Machine Learning algorithms is largely affected by the distribution of examples among the classes. Since most learning systems are designed with the assumption of well-balanced data sets, they fail to induce a classifier able to predict the minority class. These algorithms tend to assign new examples to the most frequent class. The investigation of alternatives to efficiently deal with class imbalance problem is a very important research issue, since imbalanced data sets can be found in several domains. For example, in fraud detection in telephonic calls [12] and credit card transactions [29], the number of legitimate transactions is usually much higher than the number of that fraudulent transactions; in analysis of insurance risk [25], only few clients ask for the insurance premium in a given time window; and in direct marketing [20], the number of respondents is usually very small (about 1%) for the majority of marketing campaigns. Several papers have analyzed the problem of learning from imbalance data sets (for instance [24, 20, 19, 12, 18, 17, 4]). Among the strategies followed by these works, three main approaches have been frequently used: Assigning different costs to misclassifications: higher costs are assigned to minority classes; Random under-sampling: artificially balancing the training data by eliminating examples from the majority class;
4 A Hybrid Approach to Learn with Imbalanced Classes Random over-sampling: artificially balancing the training data by replicating examples from the minority class. Our approach uses under-sampling as an intermediate step to create several balanced data sets. Other methods use a similar approach to deal with class imbalances. For instance, [7] split the examples from the majority class into several non-overlapping subsets with approximately the same number of examples as the minority class. Each subset is combined with all minority class examples, forming balanced data sets that are given to a learning algorithm. The classifiers obtained are integrated using stacking [30]. A similar approach is proposed in [21], in which Adaboost [14] integrates the outputs of several classifiers induced from balanced data sets obtained with under-sampling. Our approach differs from previously published work, since we are interested in creating symbolic classifiers, i.e., we are interested in creating classifiers that can be easily interpreted by humans. Although, ensembles can be build over several individual symbolic classifiers, the final classifier cannot be considered a symbolic classifier, since this classifier cannot be easily interpreted. 2.2 Combining Classifiers There are several papers in literature describing different alternatives for the combination of knowledge bases. A straightforward approach to combine knowledge bases is to use ensembles [23, 10]. An ensemble is composed by a set of individual classifiers which predictions are combined to determine the label of a new instance. Frequently, an ensemble is more precise than its individual classifiers. Despite the gain in performance that generally is obtained by ensembles, the combination of symbolic classifiers usually results in a non-symbolic final classifier. Two well-known ensemble methods are Bagging and Boosting. Bagging [6] is one of the oldest and simplest techniques for creating an ensemble of classifiers. This technique uses majority vote to combine predictions from individual classifiers, assigning the most frequently predicted class to the final classification. Differently from bagging, in Boosting [28], each training example is associated with a weight. This weight is related to the accuracy of the induced hypothesis for that particular example. A hypothesis is induced per iteration and the weights associated with each example might be modified. A second approach to combine knowledge bases is to integrated the knowledge generated by different classifiers into a unique knowledge base and, then, to use a rule selection method to create a classifier. In [26], the authors proposed an algorithm named ROCCER to select rules based on the performance of the rules on the ROC space. Another technique that uses a similar approach is the
5 Claudia Milaré, Gustavo Batista, André Carvalho GARSS algorithm [3]. GARSS uses an Evolutionary Algorithm to select rules that maximize AUC measure. Both ROCCER and GARSS were used in context of associative classification. Other works using Evolutionary Algorithm to select rules that combine knowledge from a large knowledge base can be found in [15, 5]. The general methodology of inducing several classifiers with sampling and integrating the knowledge of these classifiers in a final classifier was initially proposed by Fayyad, Djorgovski, and Weir [13]. This methodology was implemented in the RULLER system and used in the SKICAT project, which objective was to catalog and analyze objects in digitized sky images. According to the authors, this methodology was able to generate a robust set of rules. Moreover, the induced classifier was more precise than astronomers in the classification of cosmic objects from photographs. The methodology adopted in the RULLER system was extended in the XRULLER system (extended RULER) [2] to use knowledge induced from different algorithms. 3 Proposed Approach The approach proposed in this work uses an Evolutionary Algorithm to select rules in order to maximize the AUC metric for problems with imbalanced classes. Evolutionary Algorithms are search algorithms based on natural selection and genetics [16]. They evolve sets of potential solutions (population), usually encoded as a sequence of bits (chromosomes). The evolution is performed by applying a set of transformations (usually, the genetic operations crossover and mutation), and by evaluating the quality (fitness) of the solutions. As previously mentioned, our approach is based on under-sampling. However, in order to reduce the probability of information loss due to discarded data, our approach creates several under-sampled training sets. Given a training set T = T + T, where T + is the set of positive (minority) examples, and T is the set of negative (majority) examples, n random samples T1,..., Tn are created from T. Each random sample has the same number of examples as the positive set, i.e., T i = T +, 1 i n. In total, n balanced training sets T i are created by joining T + with each T i, i.e., T i = T + T i, 1 i n. The parameter n was set to 100 in our experiments. Rule sets are induced from each training set T i, the rules from all rule sets are integrated into a unique pool of rules, and the repeated rules are discarded. In our experiments, the Machine Learning algorithms C4.5Rules [27] and Ripper [9] were used to induce the rule sets. The pool of rules is given as input to an Evolutionary Algorithm. A primary key, represented as a natural number, is associated to each rule in the pool. Therefore, each rule can be accessed independently by its key. Our method uses
6 A Hybrid Approach to Learn with Imbalanced Classes the Pittsburgh approach to encode classifiers as chromosomes. Thus, an array of keys is used to represent a chromosome, i.e., a rule set to be interpreted as a classifier. In our implementation, the initial population is randomly composed by 40 chromosomes. The evaluation function used is the AUC metric measured over the training examples. The selection method is the fitness-proportionate selection, in which the number of times a chromosome is expected to reproduce is proportional to its fitness. A simple crossover operator was applied with probability 0.4. The mutation operator alters the value of randomly selected elements in the chromosomes. Thus, it exchanges a random selected rule by other rule. The mutation operator was applied with probability 0.1. Mutation and crossover rates were chosen based on our previous experience with Evolutionary Algorithms [22]. The number of generations is limited to 20. Finally, our implementation uses an elitism operator for population replacement. According to this method, the best chromosome from each population is preserved to the next generation. As previously mentioned, each chromosome represents a classifier. We have fixed the number of rules in each chromosome, aiming to avoid overfitting. As the fitness function is the AUC measured over the training set, allowing the chromosomes to grow up causes the Evolutionary Algorithm to create chromosomes with many rules that do not generalize well. In our experiments, we set the number of rules of each chromosome differently to each data set. This number is approximately the mean number of rules obtained by the classifiers induced over each T i, and varies from 6 up to 20, depending on the data set. Therefore, the classifiers induced by our approach are as complex, in terms of number of rules, as the individual classifiers. 4 Experimental Evaluation Experiments were carried out in order to evaluate the proposed approach. The experiments were performed using three benchmark data sets, collected from the UCI repository [1], and one real-world data set used in [8]. These data sets are related to classification problems, covering different application domains. They are all imbalanced data sets. Table 1 summarizes the main features of these data sets, which are: Identifier identification of the data set used in the text; #Examples the total number of examples; #Attributes (quanti., quali.) the total number of attributes, as well as the number of continuous and nominal attributes; #Classes the number of classes; Missing Data indicates if there are missing values; and, Majority Class Error the expected error of a classifier that always outputs the majority class. Each of the three data sets was divided into 10 pairs of training and test sets using random subsampling. The training examples are 70% of the original
7 Claudia Milaré, Gustavo Batista, André Carvalho Table 1: Data sets summary description. Identifier #Examples #Attributes #Classes Missing Majority Class (quanti., quali.) Data Error Abalone (7, 1) 2 No 2,717% Mammography (6,0) 2 No 2,316% Page-blocks (10,0) 2 No 10,216% data set and the test set 30%. Within each fold, 100 balanced data sets were created with all minority class examples and a random sample of the majority class examples, as previously described. These balanced data sets were given to symbolic Machine Learning algorithms. The symbolic algorithms used in the experiments were C4.5Rules [27] and Ripper [9]. The C4.5Rules and Ripper were run with their default parameters. The classification rules generated for each training set and by each algorithm were combined into a pool of rules. An Evolutionary Algorithm was used to build a classifier from this pool of rules. As described in Section 3, the chromosome size was defined according to the average size of the classifiers generated by C4.5Rules and Ripper. Table 2 shows the size of chromosomes used by the Evolutionary Algorithm for each combination of data set and inducer. Table 2: Chromosomes sizes. Data Set C4.5Rules Ripper Abalone 20 6 Mammography 10 6 Page-blocks Table 3 presents the results obtained with the C4.5Rules inducer. All results are mean AUC values calculated over the 10 pairs of training and test sets. The results are divided into three columns. Column C4.5Rules presents the results obtained by the C4.5Rules inducer over imbalanced data; column Under-sampling presents the results obtained by C4.5Rules over a balanced data set obtained by under-sampling the training set; and, column EA-C4.5Rules presents the results obtained by the proposed approach. As can be observed, EA-C4.5Rules presents the higher mean AUC value for all data sets. Table 4 presents the results obtained with the Ripper inducer. Similarly with the previous table, column Ripper presents the results obtained by the Ripper inducer over imbalanced data; column Under-sampling presents the results obtained by Ripper over a balanced data set obtained by under-sampling the training set; and, column EA-Ripper presents the results obtained by the proposed approach.
8 A Hybrid Approach to Learn with Imbalanced Classes Table 3: Average AUC score for C4.5Rules inducer. Data Set C4.5Rules Under-sampling EA-C4.5Rules Abalone (02.23) (02.27) (00.91) Mammography (03.70) (02.55) (02.87) Page-blocks (01.42) (00.84) (01.09) As can be observed, EA-Ripper also presents the highest mean AUC values for all data sets. Table 4: Average AUC score for Ripper inducer. Data Set Ripper Under-sampling EA-Ripper Abalone (02.49) (03.88) (02.84) Mammography (02.68) (01.82) (02.68) Page-blocks (01.31) (00.80) (00.47) 5 Conclusion and Future Work This work presents a novel hybrid approach to deal with class imbalances that combines symbolic Machine Learning inducers and Evolutionary Algorithms. Our approach uses Evolutionary Algorithms to perform a more extensive search over the hypothesis space. In addition, our approach creates classifiers with the same number of rules as the classifiers induced from balanced data sets obtained by under-sampling. Therefore, our method is able to create more precise classifiers, in terms of area under the ROC curve, and with the same complexity as the individual classifiers. As future work, we plan to investigate new metrics to compose rules into a classifier. These metrics indicate which rule should be fired in cases that multiple rules cover an example. These metrics have a direct influence over the performance of classifiers and novel metrics, designed to imbalanced classes, can potentially improve classification on imbalanced data sets. Acknowledgements. This research was partially supported by the Brazilian research agencies FAPESP and CNPq. References [1] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.
9 Claudia Milaré, Gustavo Batista, André Carvalho [2] J. A. Baranauskas and M. C. Monard. Combining symbolic classifiers from multiple inducers. Knowledge Based Systems, 16(3): , Elsevier Science. [3] G. E. A. P. A. Batista, C. R. Milaré, R. C. Prati, and M. C. Monard. A comparison of methods for rule subset selection applied to associative classification. Inteligencia Artificial, (32):29 35, [4] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter, 6(1):20 29, Special issue on Learning from Imbalanced Datasets. [5] F. C. Bernardini, R. C. Prati, and M. C. Monard. Evolving sets of symbolic classifiers into a single symbolic classifier using genetic algorithms. In International Conference on Hybrid Intelligent Systems, pages , Washington, DC, USA, IEEE Computer Society. [6] L. Breiman. Bagging predictors. Machine Learning, 24(2): , [7] P. K. Chan and S. J. Stolfo. Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In International Conference on Knowledge Discovery and Data Mining, pages , [8] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16: , [9] W. Cohen. Fast effective rule induction. In International Conference on Machine Learning, pages , [10] T. G. Dietterich. Machine Learning Research: Four Current Directions. Technical report, Oregon State University, [11] T. Fawcett. Roc graphs: Notes and practical considerations for researchers. Technical Report HPL , HP Labs, [12] T. Fawcett and F. J. Provost. Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery, 1(3): , [13] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. The kdd process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11):27 34, 1996.
10 A Hybrid Approach to Learn with Imbalanced Classes [14] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1): , [15] A. Ghosh and B. Nath. Multi-objective rule mining using genetic algorithms. Information Sciences, 163(1-3): , [16] D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, [17] N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5): , [18] M. Kubat, R. Holte, and S. Matwin. Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30: , [19] M. Kubat and S. Matwin. Addressing the course of imbalanced training sets: One-sided selection. In International Conference in Machine Learning, pages Morgan Kaufmann, [20] C. X. Ling and C. Li. Data mining for direct mining: Problems and solutions. In International Conference on Knownledge Discovery and Data Mining, pages 73 79, [21] X. Y. Liu, J. Wu, and Z. H. Zhou. Exploratory undersampling for classimbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(2): , [22] C. R. Milaré, G. E. A. P. A. Batista, A. C. P. L. F. Carvalho, and M. C. Monard. Applying genetic and symbolic learning algorithms to extract rules from artificial neural neworks. In Proc. Mexican International Conference on Artificial Intelligence, volume 2972 of LNAI, pages Springer-Verlag, [23] D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11: , [24] M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk. Reducing misclassification costs. In International Conference in Machine Learning, pages , [25] E. P. D. Pednault, B. K. Rosen, and C. Apte. Handling imbalanced data sets in insurance risk modeling. Technical Report RC-21731, IBM Research Report, March 2000.
11 Claudia Milaré, Gustavo Batista, André Carvalho [26] R. C. Prati and P. A. Flach. Roccer: An algorithm for rule learning based on ROC analysis. In International Joint Conference on Artificial Intelligence (IJCAI 2005), pages , [27] J. R. Quinlan. C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, CA, [28] R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2): , [29] S. J. Stolfo, D. W. Fan, W. Lee, A. L. Prodromidis, and P. K. Chan. Credit card fraud detection using meta-learning: Issues and initial results. In AAAI- 97 Workshop on AI Methods in Fraud and Risk Management, pages 83 90, [30] D. H. Wolpert. Stacked generalization. Neural Networks, 5: , 1992.
Learning with Skewed Class Distributions
CADERNOS DE COMPUTAÇÃO XX (2003) Learning with Skewed Class Distributions Maria Carolina Monard and Gustavo E.A.P.A. Batista Laboratory of Computational Intelligence LABIC Department of Computer Science
More informationUsing Random Forest to Learn Imbalanced Data
Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,
More informationAnalyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ
Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ David Cieslak and Nitesh Chawla University of Notre Dame, Notre Dame IN 46556, USA {dcieslak,nchawla}@cse.nd.edu
More informationClusterOSS: a new undersampling method for imbalanced learning
1 ClusterOSS: a new undersampling method for imbalanced learning Victor H Barella, Eduardo P Costa, and André C P L F Carvalho, Abstract A dataset is said to be imbalanced when its classes are disproportionately
More informationConstrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm
Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm Martin Hlosta, Rostislav Stríž, Jan Kupčík, Jaroslav Zendulka, and Tomáš Hruška A. Imbalanced Data Classification
More informationDECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationRandom Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
More informationDATA MINING FOR IMBALANCED DATASETS: AN OVERVIEW
Chapter 40 DATA MINING FOR IMBALANCED DATASETS: AN OVERVIEW Nitesh V. Chawla Department of Computer Science and Engineering University of Notre Dame IN 46530, USA Abstract Keywords: A dataset is imbalanced
More informationData Mining for Direct Marketing: Problems and
Data Mining for Direct Marketing: Problems and Solutions Charles X. Ling and Chenghui Li Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 Tel: 519-661-3341;
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationEditorial: Special Issue on Learning from Imbalanced Data Sets
Editorial: Special Issue on Learning from Imbalanced Data Sets Nitesh V. Chawla Nathalie Japkowicz Retail Risk Management,CIBC School of Information 21 Melinda Street Technology and Engineering Toronto,
More informationHandling imbalanced datasets: A review
Handling imbalanced datasets: A review Sotiris Kotsiantis, Dimitris Kanellopoulos, Panayiotis Pintelas Educational Software Development Laboratory Department of Mathematics, University of Patras, Greece
More informationConsolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, and José I. Martín Dept. of Computer
More informationToward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection
Toward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection Philip K. Chan Computer Science Florida Institute of Technolog7 Melbourne, FL 32901 pkc~cs,
More informationRoulette Sampling for Cost-Sensitive Learning
Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca
More informationMaximum Profit Mining and Its Application in Software Development
Maximum Profit Mining and Its Application in Software Development Charles X. Ling 1, Victor S. Sheng 1, Tilmann Bruckhaus 2, Nazim H. Madhavji 1 1 Department of Computer Science, The University of Western
More informationNew Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
More informationOn the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
More informationLearning on the Border: Active Learning in Imbalanced Data Classification
Learning on the Border: Active Learning in Imbalanced Data Classification Şeyda Ertekin 1, Jian Huang 2, Léon Bottou 3, C. Lee Giles 2,1 1 Department of Computer Science and Engineering 2 College of Information
More informationDecision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
More informationInductive Learning in Less Than One Sequential Data Scan
Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Research Hawthorne, NY 10532 {weifan,haixun,psyu}@us.ibm.com Shaw-Hwa Lo Statistics Department,
More informationCredit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results Salvatore 2 J.
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationA Novel Classification Approach for C2C E-Commerce Fraud Detection
A Novel Classification Approach for C2C E-Commerce Fraud Detection *1 Haitao Xiong, 2 Yufeng Ren, 2 Pan Jia *1 School of Computer and Information Engineering, Beijing Technology and Business University,
More informationSelecting Data Mining Model for Web Advertising in Virtual Communities
Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland e-mail: jerzy.surma@gmail.com Mariusz Łapczyński
More informationBUILDING CLASSIFICATION MODELS FROM IMBALANCED FRAUD DETECTION DATA
BUILDING CLASSIFICATION MODELS FROM IMBALANCED FRAUD DETECTION DATA Terence Yong Koon Beh 1, Swee Chuan Tan 2, Hwee Theng Yeo 3 1 School of Business, SIM University 1 yky2k@yahoo.com, 2 jamestansc@unisim.edu.sg,
More informationREVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
More informationHow To Solve The Class Imbalance Problem In Data Mining
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS 1 A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches Mikel Galar,
More informationTOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More informationEnsemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
More informationCredit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis Department of Computer Science Columbia University
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationIntroducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationA COMPARATIVE STUDY OF DECISION TREE ALGORITHMS FOR CLASS IMBALANCED LEARNING IN CREDIT CARD FRAUD DETECTION
International Journal of Economics, Commerce and Management United Kingdom Vol. III, Issue 12, December 2015 http://ijecm.co.uk/ ISSN 2348 0386 A COMPARATIVE STUDY OF DECISION TREE ALGORITHMS FOR CLASS
More informationMultiple Classifiers -Integration and Selection
1 A Dynamic Integration Algorithm with Ensemble of Classifiers Seppo Puuronen 1, Vagan Terziyan 2, Alexey Tsymbal 2 1 University of Jyvaskyla, P.O.Box 35, FIN-40351 Jyvaskyla, Finland sepi@jytko.jyu.fi
More informationOn the Class Imbalance Problem *
Fourth International Conference on Natural Computation On the Class Imbalance Problem * Xinjian Guo, Yilong Yin 1, Cailing Dong, Gongping Yang, Guangtong Zhou School of Computer Science and Technology,
More informationCombining Multiple Models Across Algorithms and Samples for Improved Results
Combining Multiple Models Across Algorithms and Samples for Improved Results Haleh Vafaie, PhD. Federal Data Corporation Bethesda, MD Hvafaie@feddata.com Dean Abbott Abbott Consulting San Diego, CA dean@abbottconsulting.com
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationSharing Classifiers among Ensembles from Related Problem Domains
Sharing Classifiers among Ensembles from Related Problem Domains Yi Zhang yi-zhang-2@uiowa.edu W. Nick Street nick-street@uiowa.edu Samuel Burer samuel-burer@uiowa.edu Department of Management Sciences,
More informationAddressing the Class Imbalance Problem in Medical Datasets
Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,
More informationMining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
More informationFeature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
More informationHow To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationInteractive Exploration of Decision Tree Results
Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,amorin@irisa.fr) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,
More informationPreprocessing Imbalanced Dataset Using Oversampling Approach
Journal of Recent Research in Engineering and Technology, 2(11), 2015, pp 10-15 Article ID J111503 ISSN (Online): 2349 2252, ISSN (Print):2349 2260 Bonfay Publications Research article Preprocessing Imbalanced
More informationMeta Learning Algorithms for Credit Card Fraud Detection
International Journal of Engineering Research and Development e-issn: 2278-67X, p-issn: 2278-8X, www.ijerd.com Volume 6, Issue 6 (March 213), PP. 16-2 Meta Learning Algorithms for Credit Card Fraud Detection
More informationComparison of Bagging, Boosting and Stacking Ensembles Applied to Real Estate Appraisal
Comparison of Bagging, Boosting and Stacking Ensembles Applied to Real Estate Appraisal Magdalena Graczyk 1, Tadeusz Lasota 2, Bogdan Trawiński 1, Krzysztof Trawiński 3 1 Wrocław University of Technology,
More informationExpert Systems with Applications
Expert Systems with Applications 36 (2009) 4626 4636 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Handling class imbalance in
More informationSample subset optimization for classifying imbalanced biological data
Sample subset optimization for classifying imbalanced biological data Pengyi Yang 1,2,3, Zili Zhang 4,5, Bing B. Zhou 1,3 and Albert Y. Zomaya 1,3 1 School of Information Technologies, University of Sydney,
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationAn Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
More informationA Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
More informationAn Analysis of Four Missing Data Treatment Methods for Supervised Learning
An Analysis of Four Missing Data Treatment Methods for Supervised Learning Gustavo E. A. P. A. Batista and Maria Carolina Monard University of São Paulo - USP Institute of Mathematics and Computer Science
More informationFinancial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms
Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms Johan Perols Assistant Professor University of San Diego, San Diego, CA 92110 jperols@sandiego.edu April
More informationSituational Awareness at Internet Scale: Detection of Extremely Rare Crisis Periods
Situational Awareness at Internet Scale: Detection of Extremely Rare Crisis Periods 2008 Sandia Workshop on Data Mining and Data Analysis David Cieslak, dcieslak@cse.nd.edu, http://www.nd.edu/~dcieslak/,
More informationLearning Decision Trees for Unbalanced Data
Learning Decision Trees for Unbalanced Data David A. Cieslak and Nitesh V. Chawla {dcieslak,nchawla}@cse.nd.edu University of Notre Dame, Notre Dame IN 46556, USA Abstract. Learning from unbalanced datasets
More informationA Comparative Assessment of the Performance of Ensemble Learning in Customer Churn Prediction
The International Arab Journal of Information Technology, Vol. 11, No. 6, November 2014 599 A Comparative Assessment of the Performance of Ensemble Learning in Customer Churn Prediction Hossein Abbasimehr,
More informationMining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
More informationCost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)
Cost Drivers of a Parametric Cost Estimation Model for Mining Projects (DMCOMO) Oscar Marbán, Antonio de Amescua, Juan J. Cuadrado, Luis García Universidad Carlos III de Madrid (UC3M) Abstract Mining is
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationGetting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise
More informationMinority Report in Fraud Detection: Classification of Skewed Data
ABSTRACT Minority Report in Fraud Detection: Classification of Skewed Data This paper proposes an innovative fraud detection method, built upon existing fraud detection research and Minority Report, to
More informationMining Life Insurance Data for Customer Attrition Analysis
Mining Life Insurance Data for Customer Attrition Analysis T. L. Oshini Goonetilleke Informatics Institute of Technology/Department of Computing, Colombo, Sri Lanka Email: oshini.g@iit.ac.lk H. A. Caldera
More informationData Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
More informationL25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
More informationWelcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
More informationSVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationUSING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY
USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY Sérgio Moro and Raul M. S. Laureano Instituto Universitário de Lisboa (ISCTE IUL) Av.ª das Forças Armadas 1649-026
More informationReview of Ensemble Based Classification Algorithms for Nonstationary and Imbalanced Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. IX (Feb. 2014), PP 103-107 Review of Ensemble Based Classification Algorithms for Nonstationary
More informationFraud Detection for Online Retail using Random Forests
Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationII. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
More informationOn the application of multi-class classification in physical therapy recommendation
RESEARCH Open Access On the application of multi-class classification in physical therapy recommendation Jing Zhang 1,PengCao 1,DouglasPGross 2 and Osmar R Zaiane 1* Abstract Recommending optimal rehabilitation
More informationColon cancer survival prediction using ensemble data mining on SEER data
Colon cancer survival prediction using ensemble data mining on SEER data Reda Al-Bahrani, Ankit Agrawal, Alok Choudhary Dept. of Electrical Engg. and Computer Science Northwestern University Evanston,
More informationData Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
More informationPredictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble
Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble Dr. Hongwei Patrick Yang Educational Policy Studies & Evaluation College of Education University of Kentucky
More informationA Comparative Evaluation of Meta-Learning Strategies over Large and Distributed Data Sets
A Comparative Evaluation of Meta-Learning Strategies over Large and Distributed Data Sets Andreas L. Prodromidis and Salvatore J. Stolfo Columbia University, Computer Science Dept., New York, NY 10027,
More informationHealthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationThe class imbalance problem in pattern classification and learning. V. García J.S. Sánchez R.A. Mollineda R. Alejo J.M. Sotoca
The class imbalance problem in pattern classification and learning V. García J.S. Sánchez R.A. Mollineda R. Alejo J.M. Sotoca Pattern Analysis and Learning Group Dept.de Llenguatjes i Sistemes Informàtics
More informationElectronic Payment Fraud Detection Techniques
World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 2, No. 4, 137-141, 2012 Electronic Payment Fraud Detection Techniques Adnan M. Al-Khatib CIS Dept. Faculty of Information
More informationA Robust Method for Solving Transcendental Equations
www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. Al-Amin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,
More informationENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
More informationFeature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution
More informationClassification for Fraud Detection with Social Network Analysis
Classification for Fraud Detection with Social Network Analysis Miguel Pironet 1, Claudia Antunes 1, Pedro Moura 2, João Gomes 2 1 Instituto Superior Técnico - Alameda, Departamento de Engenharia Informática,
More informationPerformance Analysis of Decision Trees
Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India
More informationChoosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction
Choosing the Best Classification Performance Metric for Wrapper-based Software Metric Selection for Defect Prediction Huanjing Wang Western Kentucky University huanjing.wang@wku.edu Taghi M. Khoshgoftaar
More informationUsing One-Versus-All classification ensembles to support modeling decisions in data stream mining
Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Patricia E.N. Lutu Department of Computer Science, University of Pretoria, South Africa Patricia.Lutu@up.ac.za
More informationHow To Detect Fraud With Reputation Features And Other Features
Detection Using Reputation Features, SVMs, and Random Forests Dave DeBarr, and Harry Wechsler, Fellow, IEEE Computer Science Department George Mason University Fairfax, Virginia, 030, United States {ddebarr,
More informationA Novel Ensemble Learning-based Approach for Click Fraud Detection in Mobile Advertising
A Novel Ensemble Learning-based Approach for Click Fraud Detection in Mobile Advertising Kasun S. Perera 1, Bijay Neupane 1, Mustafa Amir Faisal 2, Zeyar Aung 1, and Wei Lee Woon 1 1 Institute Center for
More informationA General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center
More information