Reducing multiclass to binary by coupling probability estimates


 Karen Francis
 2 years ago
 Views:
Transcription
1 Reducing multiclass to inary y coupling proaility estimates Bianca Zadrozny Department of Computer Science and Engineering University of California, San Diego La Jolla, CA Astract This paper presents a method for otaining class memership proaility estimates for multiclass classification prolems y coupling the proaility estimates produced y inary classifiers. This is an extension for aritrary code matrices of a method due to Hastie and Tishirani for pairwise coupling of proaility estimates. Experimental results with Boosted Naive Bayes show that our method produces calirated class memership proaility estimates, while having similar classification accuracy as lossased decoding, a method for otaining the most likely class that does not generate proaility estimates. 1 Introduction The two most wellknown approaches for reducing a multiclass classification prolem to a set of inary classification prolems are known as oneagainstall and allpairs. In the oneagainstall approach, we train a classifier for each of the classes using as positive examples the training examples that elong to that class, and as negatives all the other training examples. In the allpairs approach, we train a classifier for each possile pair of classes ignoring the examples that do not elong to the classes in question. Although these two approaches are the most ovious, Allwein et al. [Allwein et al., 2000] have shown that there are many other ways in which a multiclass prolem can e decomposed into a numer of inary classification prolems. We can represent each such decomposition y a code matrix M k l, where k is the numer of classes and l is the numer of inary classification prolems. If M c 1 then the examples elonging to class c are considered to e positive examples for the inary classification prolem. Similarly, if M c 1 the examples elonging to c are considered to e negative examples for. Finally, if M c 0 the examples elonging to c are not used in training a classifier for. For example, in the 3class case, the allpairs code matrix is c c c This approach for representing the decomposition of a multiclass prolem into inary pro
2 lems is a generalization of the ErrorCorrecting Output Codes (ECOC) scheme proposed y Dietterich and Bakiri [Dietterich and Bakiri, 1995]. The ECOC scheme does not allow zeros in the code matrix, meaning that all examples are used in each inary classification prolem. Orthogonal to the prolem of choosing a code matrix for reducing multiclass to inary is the prolem of classifying an example given the laels assigned y each inary classifier. Given an example x, Allwein et al. [Allwein et al., 2000] first create a vector v of length l containing the 1,+1 laels assigned to x y each inary classifier. Then, they compute the Hamming distance etween v and each row of M, and find the row c that is closest to v according to this metric. The lael c is then assigned to x. This method is called Hamming decoding. For the case in which the inary classifiers output a score whose magnitude is a measure of confidence in the prediction, they use a lossased decoding approach that takes into account the scores to calculate the distance etween v and each row of M, instead of using the Hamming distance. This method is called lossased decoding. Allwein et al. [Allwein et al., 2000] present theoretical and experimental results indicating that this method is etter than Hamming decoding. However, oth of these methods simply assign a class lael to each example. They do not output class memership proaility estimates ˆP C c X x for an example x. These proaility estimates are important when the classification outputs are not used in isolation and must e comined with other sources of information, such as misclassification costs [Zadrozny and Elkan, 2001a] or the outputs of another classifier. Given a code matrix M and a inary classification learning algorithm that outputs proaility estimates, we would like to couple the estimates given y each inary classifier in order to otain class proaility memership estimates for the multiclass prolem. Hastie and Tishirani [Hastie and Tishirani, 1998] descrie a solution for otaining proaility estimates ˆP C c X x in the allpairs case y coupling the pairwise proaility estimates, which we descrie in Section 2. In Section 3, we extend the method to aritrary code matrices. In Section 4 we discuss the lossased decoding approach in more detail and compare it mathematically to the method y Hastie and Tishirani. In Section 5 we present experimental results. 2 Coupling pairwise proaility estimates We are given pairwise proaility estimates r i j x for every class i j, otained y training a classifier using the examples elonging to class i as positives and the examples elonging to class j as negatives. We would like to couple these estimates to otain a set of class memership proailities p i x P C c i X x for each example x. The r i j are related to the p i according to r i j x P C i C i C j X x p i x p i x p j x Since we additionally require that i p i x 1, there are k 1 free parameters and k k 1 2 constraints. This implies that there may not exist p i satisfying these constraints. Let n i j e the numer of training examples used to train the inary classifier that predicts r i j. In order to find the est approximation ˆr i j x ˆp i x ˆp i x ˆp j x, Hastie and Tishirani fit the BradleyTerrey model for paired comparisons [Bradley and Terry, 1952] y minimizing the average weighted KullackLeiler distance l x etween r i j x and
3 ˆr i j x for each x, given y l x i j n i j r i j x log r i j x ˆr i j x 1 r i j x log 1 r i j x 1 ˆr i j x The algorithm is as follows: 1. Start with some guess for the ˆp i x and corresponding ˆr i j x. 2. Repeat until convergence: (a) For each i 1 2 k () Renormalize the ˆp i x. (c) Recompute the ˆr i j x. ˆp i x ˆp i x j i n i j r i j x j i n i j ˆr i j x Hastie and Tishirani [Hastie and Tishirani, 1998] prove that the KullackLeiler distance etween r i j x and ˆr i j x decreases at each step. Since this distance is ounded elow y zero, the algorithm converges. At convergence, the ˆr i j are consistent with the ˆp i. The class predicted for each example x is ĉ x argmax ˆp i x. Hastie and Tishirani also prove that the ˆp i x are in the same order as the noniterative estimates p i x j i r i j x for each x. Thus, the p i x are sufficient for predicting the most likely class for each example. However, as shown y Hastie and Tishirani, they are not accurate proaility estimates ecause they tend to underestimate the differences etween the ˆp i x values. 3 Extending the HastieTishirani method to aritrary code matrices For an aritrary code matrix M, instead of having pairwise proaility estimates, we have an estimate r x for each column of M, such that r x C c C c X x P c I c I J where I and J are the set of classes for which M c I p c x c I J p c x 1 and M 1, respectively. We would like to otain a set of class memership proailities p i x for each example x compatile with the r x and suject to i p i x 1. In this case, the numer of free parameters is k 1 and the numer of constraints is l 1, where l is the numer of columns of the code matrix. Since for most code matrices l is greater than k 1, in general there is no exact solution to this prolem. For this reason, we propose an algorithm analogous to the HastieTishirani method presented in the previous section to find the est approximate proaility estimates ˆp i (x) such that ˆr x c I ˆp c x c I J ˆp c x and the KullackLeiler distance etween ˆr x and r x is minimized. Let n e the numer of training examples used to train the inary classifier that corresponds to column of the code matrix. The algorithm is as follows: 1. Start with some guess for the ˆp i x and corresponding ˆr x. 2. Repeat until convergence:
4 (a) For each i 1 2 k ˆp i x s t M i ˆp i 1 n r x s t M i 1 n 1 r x x s t M i 1 n ˆr x s t M i 1 n 1 ˆr x () Renormalize the ˆp i x. (c) Recompute the ˆr x. If the code matrix is the allpairs matrix, this algorithm reduces to the original method y Hastie and Tishirani. Let B i e the set of matrix columns for which M i 1 and B i e the set of matrix columns for which M c 1. By analogy with the noniterative estimates suggested y Hastie and Tishirani, we can define noniterative estimates p i x B i x 1 B r i x. For the allpairs code matrix, these estimates are the same as the ones suggested y Hastie and Tishirani. However, for aritrary matrices, we cannot prove that the noniterative estimates predict the same class as the iterative estimates. 4 Lossased decoding In this section, we discuss how to apply the lossased decoding method to classifiers that output class memership proaility estimates. We also study the conditions under which this method predicts the same class as the HastieTishirani method, in the allpairs case. The lossased decoding method [Allwein et al., 2000] requires that each inary classifier output a margin score satisfying two requirements. First, the score should e positive if the example is classified as positive, and negative if the example is classified as negative. Second, the magnitude of the score should e a measure of confidence in the prediction. The method works as follows. Let f x e the margin score predicted y the classifier corresponding to column of the code matrix for example x. For each row c of the code matrix M and for each example x, we compute the distance etween f and M c as d L x c l L M c f x (1) 1 where L is a loss function that is dependent on the nature of the inary classifier and M c = 0, 1 or 1. We then lael each example x with the lael c for which d L is minimized. If the inary classification learning algorithm outputs scores that are proaility estimates, they do not satisfy the first requirement ecause the proaility estimates are all etween 0 and 1. However, we can transform the proaility estimates r x output y each classifier into margin scores y sutracting 1 2 from the scores, so that we consider as positives the examples x for which r x is aove 1/2, and as negatives the examples x for which r x is elow 1/2. We now prove a theorem that relates the lossased decoding method to the Hastie Tishirani method, for a particular class of loss functions. Theorem 1 The lossased decoding method for allpairs code matrices predicts the same class lael as the iterative estimates ˆp i x given y Hastie and Tishirani, if the loss function is of the form L y ay, for any a 0. Proof: We first show that, if the loss function is of the form L y ay, the lossased decoding method predicts the same class lael as the noniterative estimates p i x, for the allpairs code matrix.
5 Dataset #Training Examples #Test Examples #Attriutes #Classes satimage pendigits soyean Tale 1: Characteristics of the datasets used in the experiments. The noniterative estimates p i x are given y p c x x 1 r x B c x x B c B c where B c and B c are the sets of matrix columns for which M c 1 and M c 1, respectively. Considering that L y ay and f x r x M c 0, we can rewrite Equation 1 as d x c a r x 1 2 a r x B c 1 2 a 1 2, and eliminating the terms for which x x B c For the allpairs code matrix the following relationship holds: 1 2 B c k 1 2, where k is the numer of classes. So, the distance d x c is d x c a r x B c k 1 2 x B c 1 2 B c B c B c It is now easy to see that the class c x which minimizes d x c for example x, also maximizes p c x. Furthermore, if d x i d x j then p x i p x j, which means that the ranking of the classes for each example is the same. Since the noniterative estimates p c x are in the same order as the iterative estimates ˆp c x, we can conclude that the HastieTishirani method is equivalent to the lossased decoding method if L y ay, in terms of class prediction, for the allpairs code matrix. Allwein et al. do not consider loss functions of the form L y ay, and uses nonlinear loss functions such as L y e y. In this case, the class predicted y lossased decoding may differ from the one predicted y the method y Hastie and Tishirani. This theorem applies only to the allpairs code matrix. For other matrices such that B c B c is a linear function of B c (such as the oneagainstall matrix), we can prove that lossased decoding (with L y ay) predicts the same class as the noniterative estimates. However, in this case, the noniterative estimates do not necessarily predict the same class as the iterative ones. 5 Experiments We performed experiments using the following multiclass datasets from the UCI Machine Learning Repository [Blake and Merz, 1998]: satimage, pendigits and soyean. Tale 1 summarizes the characteristics of each dataset. The inary learning algorithm used in the experiments is oosted naive Bayes [Elkan, 1997], since this is a method that cannot e easily extended to handle multiclass prolems directly. For all the experiments, we ran 10 rounds of oosting.
6 Method Code Matrix Error Rate MSE Lossased (L y y) Allpairs Lossased (L y e y ) Allpairs HastieTishirani (noniterative) Allpairs HastieTishirani (iterative) Allpairs Lossased (L y y) Oneagainstall Lossased (L y e y ) Oneagainstall Extended HastieTishirani (noniterative) Oneagainstall Extended HastieTishirani (iterative) Oneagainstall Lossased (L y y) Sparse Lossased (L y e y ) Sparse Extended HastieTishirani (noniterative) Sparse Extended HastieTishirani (iterative) Sparse Multiclass Naive Bayes Tale 2: Test set results on the satimage dataset. We use three different code matrices for each dataset: allpairs, oneagainstall and a sparse random matrix. The sparse random matrices have 15 log 2 k columns, and each element is 0 with proaility 1/2 and 1 or +1 with proaility 1/4 each. This is the same type of sparse random matrix used y Allwein et al.[allwein et al., 2000]. In order to have good error correcting properties, the Hamming distance ρ etween each pair of rows in the matrix must e large. We select the matrix y generating 10,000 random matrices and selecting the one for which ρ is maximized, checking that each column has at least one 1 and one 1, and that the matrix does not have two identical columns. We evaluate the performance of each method using two metrics. The first metric is the error rate otained when we assign each example to the most likely class predicted y the method. This metric is sufficient if we are only interested in classifying the examples correctly and do not need accurate proaility estimates of class memership. The second metric is squared error, defined for one example x as SE x j t j x p j x 2, where p j x is the proaility estimated y the method for example x and class j, and t j x is the true proaility of class j for x. Since for most realworld datasets true laels are known, ut not proailities, t j x is defined to e 1 if the lael of x is j and 0 otherwise. We calculate the squared error for each x to otain the mean squared error (MSE). The mean squared error is an adequate metrics for assessing the accuracy of proaility estimates [Zadrozny and Elkan, 2001]. This metric cannot e applied to the lossased decoding method, since it does not produce proaility estimates. Tale 2 shows the results of the experiments on the satimage dataset for each type of code matrix. As a aseline for comparison, we also show the results of applying multiclass Naive Bayes to this dataset. We can see that the iterative HastieTishirani procedure (and its extension to aritrary code matrices) succeeds in lowering the MSE significantly compared to the noniterative estimates, which indicates that it produces proaility estimates that are more accurate. In terms of error rate, the differences etween methods are small. For oneagainstall matrices, the iterative method performs consistently worse, while for sparse random matrices, it performs consistently etter. Figure 1 shows how the MSE is lowered at each iteration of the HastieTishirani algorithm, for the three types of code matrices. Tale 3 shows the results of the same experiments on the datasets pendigits and soyean. Again, the MSE is significantly lowered y the iterative procedure, in all cases. For the soyean dataset, using the sparse random matrix, the iterative method again has a lower error rate than the other methods, which is even lower than the error rate using the allpairs matrix. This is an interesting result, since in this case the allpairs matrix has 171 columns (corresponding to 171 classifiers), while the sparse matrix has only 64 columns.
7 0.12 Satimage all pairs one against all sparse MSE Iteration Figure 1: Convergence of the MSE for the satimage dataset. pendigits soyean Method Code Matrix Error Rate MSE Error Rate MSE Lossased (L y y) Allpairs Lossased (L y e y ) Allpairs HastieTishirani (noniterative) Allpairs HastieTishirani (iterative) Allpairs Lossased (L y y) Oneagainstall Lossased (L y e y ) Oneagainstall Ext. HastieTishirani (nonit.) Oneagainstall Ext. HastieTishirani (it.) Oneagainstall Lossased (L y y) Sparse Lossased (L y e y ) Sparse Ext. HastieTishirani (nonit.) Sparse Ext. HastieTishirani (it.) Sparse Multiclass Naive Bayes Tale 3: Test set results on the pendigits and soyean datasets. 6 Conclusions We have presented a method for producing class memership proaility estimates for multiclass prolems, given proaility estimates for a series of inary prolems determined y an aritrary code matrix. Since research in designing optimal code matrices is still ongoing [Utschick and Weichselerger, 2001] [Crammer and Singer, 2000], it is important to e ale to otain class memership proaility estimates from aritrary code matrices. In current research, the effectiveness of a code matrix is determined primarily y the classification accuracy. However, since many applications require accurate class memership proaility estimates for each of the classes, it is important to also compare the different types of code matrices according to their aility of producing such estimates. Our extension of Hastie and Tishirani s method is useful for this purpose. Our method relies on the proaility estimates given y the inary classifiers to produce the multiclass proaility estimates. However, the proaility estimates produced y Boosted
8 Naive Bayes are not calirated proaility estimates. An interesting direction for future work is in determining whether the caliration of the proaility estimates given y the inary classifiers improves the caliration of the multiclass proailities. References [Allwein et al., 2000] Allwein, E. L., Schapire, R. E., and Singer, Y. (2000). Reducing multiclass to inary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1: [Blake and Merz, 1998] Blake, C. L. and Merz, C. J. (1998). UCI repository of machine learning dataases. Department of Information and Computer Sciences, University of California, Irvine. mlearn/mlrepository.html. [Bradley and Terry, 1952] Bradley, R. and Terry, M. (1952). Rank analysis of incomplete lock designs, I: The method of paired comparisons. Biometrics, pages [Crammer and Singer, 2000] Crammer, K. and Singer, Y. (2000). On the learnaility and design of output codes for multiclass prolems. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages [Dietterich and Bakiri, 1995] Dietterich, T. G. and Bakiri, G. (1995). Solving multiclass learning prolems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2: [Elkan, 1997] Elkan, C. (1997). Boosting and naive ayesian learning. Technical Report CS97557, University of California, San Diego. [Hastie and Tishirani, 1998] Hastie, T. and Tishirani, R. (1998). Classification y pairwise coupling. In Advances in Neural Information Processing Systems, volume 10. MIT Press. [Utschick and Weichselerger, 2001] Utschick, W. and Weichselerger, W. (2001). Stochastic organization of output codes in multiclass learning prolems. Neural Computation, 13(5): [Zadrozny and Elkan, 2001a] Zadrozny, B. and Elkan, C. (2001a). Learning and making decisions when costs and proailities are oth unknown. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pages ACM Press. [Zadrozny and Elkan, 2001] Zadrozny, B. and Elkan, C. (2001). Otaining calirated proaility estimates from decision trees and naive ayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, pages Morgan Kaufmann Pulishers, Inc.
Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each
More informationTransforming Classifier Scores into Accurate Multiclass Probability Estimates
Transforming Classifier Scores into Accurate Multiclass Probability Estimates Bianca Zadrozny & Charles Elkan Presenter: Myle Ott 1 Motivation (the same old story) Easy to rank examples in order of classmembership
More informationA Simple Costsensitive Multiclass Classification Algorithm Using Oneversusone Comparisons
Data Mining and Knowledge Discovery manuscript No. (will e inserted y the editor) A Simple Costsensitive Multiclass Classification Algorithm Using Oneversusone Comparisons HsuanTien Lin Astract Many
More informationSubclass ErrorCorrecting Output Codes
Subclass ErrorCorrecting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat
More informationAdapting Codes and Embeddings for Polychotomies
Adapting Codes and Embeddings for Polychotomies Gunnar Rätsch, Alexander J. Smola RSISE, CSL, Machine Learning Group The Australian National University Canberra, 2 ACT, Australia Gunnar.Raetsch, Alex.Smola
More informationNumber Who Chose This Maximum Amount
1 TASK 3.3.1: MAXIMIZING REVENUE AND PROFIT Solutions Your school is trying to oost interest in its athletic program. It has decided to sell a pass that will allow the holder to attend all athletic events
More informationOn the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
More informationSurvey on Multiclass Classification Methods
Survey on Multiclass Classification Methods Mohamed Aly November 2005 Abstract Supervised classification algorithms aim at producing a learning model from a labeled training set. Various
More informationEnsemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
More informationTHE 2ADIC, BINARY AND DECIMAL PERIODS OF 1/3 k APPROACH FULL COMPLEXITY FOR INCREASING k
#A28 INTEGERS 12 (2012) THE 2ADIC BINARY AND DECIMAL PERIODS OF 1/ k APPROACH FULL COMPLEXITY FOR INCREASING k Josefina López Villa Aecia Sud Cinti Chuquisaca Bolivia josefinapedro@hotmailcom Peter Stoll
More informationEstimating Missing Attribute Values Using DynamicallyOrdered Attribute Trees
Estimating Missing Attribute Values Using DynamicallyOrdered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jingwang1@uiowa.edu W. Nick Street Management Sciences Department,
More informationNew Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
More informationClassification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationMachine Learning Techniques Reductions Between Prediction Quality Metrics
Machine Learning Techniques Reductions Between Prediction Quality Metrics Alina Beygelzimer and John Langford and Bianca Zadrozny Abstract Machine learning involves optimizing a loss function on unlabeled
More informationPricing Based Framework for Benefit Scoring
Pricing Based Framework for Benefit Scoring Nitesh Chawla University of Notre Dame Notre Dame, IN 46556 nchawla@nd.edu Xiangning Li University of Notre Dame Notre Dame, IN 46556 xli3@nd.edu ABSTRACT Data
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors ChiaHui Chang and ZhiKai Ding Department of Computer Science and Information Engineering, National Central University, ChungLi,
More informationRoulette Sampling for CostSensitive Learning
Roulette Sampling for CostSensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca
More informationTraining Methods for Adaptive Boosting of Neural Networks for Character Recognition
Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal
More informationConvergence of Power Series Lecture Notes
Convergence of Power Series Lecture Notes Consider a power series, say $ % 0aB œ B B B B â. Does this series converge? This is a question that we have een ignoring, ut it is time to face it. Whether or
More informationOptimizing Area Under the ROC Curve using Ranking SVMs
Optimizing Area Under the ROC Curve using Ranking SVMs Kaan Ataman Department of Management Sciences The University of Iowa kaanataman@uiowa.edu W. Nick Street Department of Management Sciences The University
More informationSolutions to Assignment 4
Solutions to Assignment 4 Math 412, Winter 2003 3.1.18 Define a new addition and multiplication on Z y a a + 1 and a a + a, where the operations on the righthand side off the equal signs are ordinary
More informationMulticlass Classification: A Coding Based Space Partitioning
Multiclass Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique,
More informationGetting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationThe Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
More informationMagnetometer Realignment: Theory and Implementation
Magnetometer Realignment: heory and Implementation William Premerlani, Octoer 16, 011 Prolem A magnetometer that is separately mounted from its IMU partner needs to e carefully aligned with the IMU in
More informationCustomer Relationship Management by SemiSupervised Learning
MiddleEast Journal of Scientific Research 16 (5): 614620, 2013 ISSN 19909233 IDOSI Pulications, 2013 DOI: 10.5829/idosi.mejsr.2013.16.05.930 Customer Relationship Management y SemiSupervised Learning
More informationU.S. Presidential Election Forecasts: Through the Lense of Linear Algebra. Cassia S. Wagner
U.S. Presidential Election Forecasts: Through the Lense of Linear Algebra Cassia S. Wagner May 11, 2012 Abstract Markov Chains use multiplication of a transformation matrix and a probability vector to
More informationSupervised Learning with Unsupervised Output Separation
Supervised Learning with Unsupervised Output Separation Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa 150 Louis Pasteur, P.O. Box 450 Stn. A Ottawa, Ontario,
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationImpact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
More informationDECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationAn Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia ElDarzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationData Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735739, 2006 ISSN 15493636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 RealTime Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationarxiv:1506.04135v1 [cs.ir] 12 Jun 2015
Reducing offline evaluation bias of collaborative filtering algorithms Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 arxiv:1506.04135v1 [cs.ir] 12 Jun 2015 1  Viadeo
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More information10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method
578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after
More informationFeature Selection with Decision Tree Criterion
Feature Selection with Decision Tree Criterion Krzysztof Grąbczewski and Norbert Jankowski Department of Computer Methods Nicolaus Copernicus University Toruń, Poland kgrabcze,norbert@phys.uni.torun.pl
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 3448 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationA. Break even analysis
Lecture (50 minutes) Eighth week lessons Function (continued) & Quadratic Equations (Divided into 3 lectures of 50 minutes each) a) Break even analysis ) Supply, Demand and market equilirium. c) Class
More informationAutomatic Web Page Classification
Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory
More informationA Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationUsing the Singular Value Decomposition
Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces
More informationCS 6220: Data Mining Techniques Course Project Description
CS 6220: Data Mining Techniques Course Project Description College of Computer and Information Science Northeastern University Spring 2013 General Goal In this project, you will have an opportunity to
More informationComparison of Kmeans and Backpropagation Data Mining Algorithms
Comparison of Kmeans and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationSummary/Review of Matrix Algebra. Matrix Algebra. Table Structure of Data
Summary/Review of Matrix Algera Introduction to Matrices Descriptors & ojects, Linear algera, Order Association Matrices R and Qmode Special Matrices Trace, Diagonal, Identity, Scalars, Transpose Vectors
More informationLinear Systems. Singular and Nonsingular Matrices. Find x 1, x 2, x 3 such that the following three equations hold:
Linear Systems Example: Find x, x, x such that the following three equations hold: x + x + x = 4x + x + x = x + x + x = 6 We can write this using matrixvector notation as 4 {{ A x x x {{ x = 6 {{ b General
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationDATA ANALYTICS USING R
DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data
More informationWhich Is the Best Multiclass SVM Method? An Empirical Study
Which Is the Best Multiclass SVM Method? An Empirical Study KaiBo Duan 1 and S. Sathiya Keerthi 2 1 BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 askbduan@ntu.edu.sg
More informationUsing Random Forest to Learn Imbalanced Data
Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,
More information10.1 Systems of Linear Equations: Substitution and Elimination
726 CHAPTER 10 Systems of Equations and Inequalities 10.1 Systems of Linear Equations: Sustitution and Elimination PREPARING FOR THIS SECTION Before getting started, review the following: Linear Equations
More informationA Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY
More informationA Direct Numerical Method for Observability Analysis
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method
More informationOnesided Support Vector Regression for Multiclass Costsensitive Classification
Onesided Support Vector Regression for Multiclass Costsensitive Classification HanHsing Tu r96139@csie.ntu.edu.tw HsuanTien Lin htlin@csie.ntu.edu.tw Department of Computer Science and Information
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationCombining SVM classifiers for email antispam filtering
Combining SVM classifiers for email antispam filtering Ángela Blanco Manuel MartínMerino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Email Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 22773878, Volume1, Issue6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationBenchmarking OpenSource Tree Learners in R/RWeka
Benchmarking OpenSource Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationSoftware Reliability Measuring using Modified Maximum Likelihood Estimation and SPC
Software Reliaility Measuring using Modified Maximum Likelihood Estimation and SPC Dr. R Satya Prasad Associate Prof, Dept. of CSE Acharya Nagarjuna University Guntur, INDIA K Ramchand H Rao Dept. of CSE
More informationDomain Adaptation meets Active Learning
Domain Adaptation meets Active Learning Piyush Rai, Avishek Saha, Hal Daumé III, and Suresh Venkatasuramanian School of Computing, University of Utah Salt Lake City, UT 84112 {piyush,avishek,hal,suresh}@cs.utah.edu
More informationApplied Data Mining Analysis: A StepbyStep Introduction Using RealWorld Data Sets
Applied Data Mining Analysis: A StepbyStep Introduction Using RealWorld Data Sets http://info.salfordsystems.com/jsm2015ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationA Binary Recursive Gcd Algorithm
A Binary Recursive Gcd Algorithm Damien Stehlé and Paul Zimmermann LORIA/INRIA Lorraine, 615 rue du jardin otanique, BP 101, F5460 VillerslèsNancy, France, {stehle,zimmerma}@loria.fr Astract. The inary
More informationCounting Primes whose Sum of Digits is Prime
2 3 47 6 23 Journal of Integer Sequences, Vol. 5 (202), Article 2.2.2 Counting Primes whose Sum of Digits is Prime Glyn Harman Department of Mathematics Royal Holloway, University of London Egham Surrey
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 20150305
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 20150305 Roman Kern (KTI, TU Graz) Ensemble Methods 20150305 1 / 38 Outline 1 Introduction 2 Classification
More informationIn Defense of OneVsAll Classification
Journal of Machine Learning Research 5 (2004) 101141 Submitted 4/03; Revised 8/03; Published 1/04 In Defense of OneVsAll Classification Ryan Rifkin Honda Research Institute USA 145 Tremont Street Boston,
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK  michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationA Hybrid Algorithm for Solving the Absolute Value Equation
A Hybrid Algorithm for Solving the Absolute Value Equation Olvi L. Mangasarian Abstract We propose a hybrid algorithm for solving the NPhard absolute value equation (AVE): Ax x = b, where A is an n n
More informationGeneralized Inverse Computation Based on an Orthogonal Decomposition Methodology.
International Conference on Mathematical and Statistical Modeling in Honor of Enrique Castillo. June 2830, 2006 Generalized Inverse Computation Based on an Orthogonal Decomposition Methodology. Patricia
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationTOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM ThanhNghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.unisb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationMinimizing Probing Cost and Achieving Identifiability in Network Link Monitoring
Minimizing Proing Cost and Achieving Identifiaility in Network Link Monitoring Qiang Zheng and Guohong Cao Department of Computer Science and Engineering The Pennsylvania State University Email: {quz3,
More informationLinear Dependence Tests
Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks
More informationData Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
More informationApplied Mathematical Sciences, Vol. 7, 2013, no. 112, 55915597 HIKARI Ltd, www.mhikari.com http://dx.doi.org/10.12988/ams.2013.
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 55915597 HIKARI Ltd, www.mhikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationBuilding Ensembles of Neural Networks with Classswitching
Building Ensembles of Neural Networks with Classswitching Gonzalo MartínezMuñoz, Aitor SánchezMartínez, Daniel HernándezLobato and Alberto Suárez Universidad Autónoma de Madrid, Avenida Francisco Tomás
More informationSUPPORT VECTOR MACHINE (SVM) is the optimal
130 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 Multiclass Posterior Probability Support Vector Machines Mehmet Gönen, Ayşe Gönül Tanuğur, and Ethem Alpaydın, Senior Member, IEEE
More information7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
More information5 Double Integrals over Rectangular Regions
Chapter 7 Section 5 Doule Integrals over Rectangular Regions 569 5 Doule Integrals over Rectangular Regions In Prolems 5 through 53, use the method of Lagrange multipliers to find the indicated maximum
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationClustering in Machine Learning. By: Ibrar Hussain Student ID:
Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning Kmeans Clustering Example of kmeans
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University Email: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More information*corresponding author
Key Engineering Materials Vol. 588 (2014) pp 249256 Online availale since 2013/Oct/11 at www.scientific.net (2014) Trans Tech Pulications, Switzerland doi:10.4028/www.scientific.net/kem.588.249 Remote
More information