Reducing multiclass to binary by coupling probability estimates
|
|
- Karen Francis
- 7 years ago
- Views:
Transcription
1 Reducing multiclass to inary y coupling proaility estimates Bianca Zadrozny Department of Computer Science and Engineering University of California, San Diego La Jolla, CA zadrozny@cs.ucsd.edu Astract This paper presents a method for otaining class memership proaility estimates for multiclass classification prolems y coupling the proaility estimates produced y inary classifiers. This is an extension for aritrary code matrices of a method due to Hastie and Tishirani for pairwise coupling of proaility estimates. Experimental results with Boosted Naive Bayes show that our method produces calirated class memership proaility estimates, while having similar classification accuracy as loss-ased decoding, a method for otaining the most likely class that does not generate proaility estimates. 1 Introduction The two most well-known approaches for reducing a multiclass classification prolem to a set of inary classification prolems are known as one-against-all and all-pairs. In the one-against-all approach, we train a classifier for each of the classes using as positive examples the training examples that elong to that class, and as negatives all the other training examples. In the all-pairs approach, we train a classifier for each possile pair of classes ignoring the examples that do not elong to the classes in question. Although these two approaches are the most ovious, Allwein et al. [Allwein et al., 2000] have shown that there are many other ways in which a multiclass prolem can e decomposed into a numer of inary classification prolems. We can represent each such decomposition y a code matrix M k l, where k is the numer of classes and l is the numer of inary classification prolems. If M c 1 then the examples elonging to class c are considered to e positive examples for the inary classification prolem. Similarly, if M c 1 the examples elonging to c are considered to e negative examples for. Finally, if M c 0 the examples elonging to c are not used in training a classifier for. For example, in the 3-class case, the all-pairs code matrix is c c c This approach for representing the decomposition of a multiclass prolem into inary pro-
2 lems is a generalization of the Error-Correcting Output Codes (ECOC) scheme proposed y Dietterich and Bakiri [Dietterich and Bakiri, 1995]. The ECOC scheme does not allow zeros in the code matrix, meaning that all examples are used in each inary classification prolem. Orthogonal to the prolem of choosing a code matrix for reducing multiclass to inary is the prolem of classifying an example given the laels assigned y each inary classifier. Given an example x, Allwein et al. [Allwein et al., 2000] first create a vector v of length l containing the -1,+1 laels assigned to x y each inary classifier. Then, they compute the Hamming distance etween v and each row of M, and find the row c that is closest to v according to this metric. The lael c is then assigned to x. This method is called Hamming decoding. For the case in which the inary classifiers output a score whose magnitude is a measure of confidence in the prediction, they use a loss-ased decoding approach that takes into account the scores to calculate the distance etween v and each row of M, instead of using the Hamming distance. This method is called loss-ased decoding. Allwein et al. [Allwein et al., 2000] present theoretical and experimental results indicating that this method is etter than Hamming decoding. However, oth of these methods simply assign a class lael to each example. They do not output class memership proaility estimates ˆP C c X x for an example x. These proaility estimates are important when the classification outputs are not used in isolation and must e comined with other sources of information, such as misclassification costs [Zadrozny and Elkan, 2001a] or the outputs of another classifier. Given a code matrix M and a inary classification learning algorithm that outputs proaility estimates, we would like to couple the estimates given y each inary classifier in order to otain class proaility memership estimates for the multiclass prolem. Hastie and Tishirani [Hastie and Tishirani, 1998] descrie a solution for otaining proaility estimates ˆP C c X x in the all-pairs case y coupling the pairwise proaility estimates, which we descrie in Section 2. In Section 3, we extend the method to aritrary code matrices. In Section 4 we discuss the loss-ased decoding approach in more detail and compare it mathematically to the method y Hastie and Tishirani. In Section 5 we present experimental results. 2 Coupling pairwise proaility estimates We are given pairwise proaility estimates r i j x for every class i j, otained y training a classifier using the examples elonging to class i as positives and the examples elonging to class j as negatives. We would like to couple these estimates to otain a set of class memership proailities p i x P C c i X x for each example x. The r i j are related to the p i according to r i j x P C i C i C j X x p i x p i x p j x Since we additionally require that i p i x 1, there are k 1 free parameters and k k 1 2 constraints. This implies that there may not exist p i satisfying these constraints. Let n i j e the numer of training examples used to train the inary classifier that predicts r i j. In order to find the est approximation ˆr i j x ˆp i x ˆp i x ˆp j x, Hastie and Tishirani fit the Bradley-Terrey model for paired comparisons [Bradley and Terry, 1952] y minimizing the average weighted Kullack-Leiler distance l x etween r i j x and
3 ˆr i j x for each x, given y l x i j n i j r i j x log r i j x ˆr i j x 1 r i j x log 1 r i j x 1 ˆr i j x The algorithm is as follows: 1. Start with some guess for the ˆp i x and corresponding ˆr i j x. 2. Repeat until convergence: (a) For each i 1 2 k () Renormalize the ˆp i x. (c) Recompute the ˆr i j x. ˆp i x ˆp i x j i n i j r i j x j i n i j ˆr i j x Hastie and Tishirani [Hastie and Tishirani, 1998] prove that the Kullack-Leiler distance etween r i j x and ˆr i j x decreases at each step. Since this distance is ounded elow y zero, the algorithm converges. At convergence, the ˆr i j are consistent with the ˆp i. The class predicted for each example x is ĉ x argmax ˆp i x. Hastie and Tishirani also prove that the ˆp i x are in the same order as the non-iterative estimates p i x j i r i j x for each x. Thus, the p i x are sufficient for predicting the most likely class for each example. However, as shown y Hastie and Tishirani, they are not accurate proaility estimates ecause they tend to underestimate the differences etween the ˆp i x values. 3 Extending the Hastie-Tishirani method to aritrary code matrices For an aritrary code matrix M, instead of having pairwise proaility estimates, we have an estimate r x for each column of M, such that r x C c C c X x P c I c I J where I and J are the set of classes for which M c I p c x c I J p c x 1 and M 1, respectively. We would like to otain a set of class memership proailities p i x for each example x compatile with the r x and suject to i p i x 1. In this case, the numer of free parameters is k 1 and the numer of constraints is l 1, where l is the numer of columns of the code matrix. Since for most code matrices l is greater than k 1, in general there is no exact solution to this prolem. For this reason, we propose an algorithm analogous to the Hastie-Tishirani method presented in the previous section to find the est approximate proaility estimates ˆp i (x) such that ˆr x c I ˆp c x c I J ˆp c x and the Kullack-Leiler distance etween ˆr x and r x is minimized. Let n e the numer of training examples used to train the inary classifier that corresponds to column of the code matrix. The algorithm is as follows: 1. Start with some guess for the ˆp i x and corresponding ˆr x. 2. Repeat until convergence:
4 (a) For each i 1 2 k ˆp i x s t M i ˆp i 1 n r x s t M i 1 n 1 r x x s t M i 1 n ˆr x s t M i 1 n 1 ˆr x () Renormalize the ˆp i x. (c) Recompute the ˆr x. If the code matrix is the all-pairs matrix, this algorithm reduces to the original method y Hastie and Tishirani. Let B i e the set of matrix columns for which M i 1 and B i e the set of matrix columns for which M c 1. By analogy with the non-iterative estimates suggested y Hastie and Tishirani, we can define non-iterative estimates p i x B i x 1 B r i x. For the all-pairs code matrix, these estimates are the same as the ones suggested y Hastie and Tishirani. However, for aritrary matrices, we cannot prove that the non-iterative estimates predict the same class as the iterative estimates. 4 Loss-ased decoding In this section, we discuss how to apply the loss-ased decoding method to classifiers that output class memership proaility estimates. We also study the conditions under which this method predicts the same class as the Hastie-Tishirani method, in the all-pairs case. The loss-ased decoding method [Allwein et al., 2000] requires that each inary classifier output a margin score satisfying two requirements. First, the score should e positive if the example is classified as positive, and negative if the example is classified as negative. Second, the magnitude of the score should e a measure of confidence in the prediction. The method works as follows. Let f x e the margin score predicted y the classifier corresponding to column of the code matrix for example x. For each row c of the code matrix M and for each example x, we compute the distance etween f and M c as d L x c l L M c f x (1) 1 where L is a loss function that is dependent on the nature of the inary classifier and M c = 0, 1 or 1. We then lael each example x with the lael c for which d L is minimized. If the inary classification learning algorithm outputs scores that are proaility estimates, they do not satisfy the first requirement ecause the proaility estimates are all etween 0 and 1. However, we can transform the proaility estimates r x output y each classifier into margin scores y sutracting 1 2 from the scores, so that we consider as positives the examples x for which r x is aove 1/2, and as negatives the examples x for which r x is elow 1/2. We now prove a theorem that relates the loss-ased decoding method to the Hastie- Tishirani method, for a particular class of loss functions. Theorem 1 The loss-ased decoding method for all-pairs code matrices predicts the same class lael as the iterative estimates ˆp i x given y Hastie and Tishirani, if the loss function is of the form L y ay, for any a 0. Proof: We first show that, if the loss function is of the form L y ay, the loss-ased decoding method predicts the same class lael as the non-iterative estimates p i x, for the all-pairs code matrix.
5 Dataset #Training Examples #Test Examples #Attriutes #Classes satimage pendigits soyean Tale 1: Characteristics of the datasets used in the experiments. The non-iterative estimates p i x are given y p c x x 1 r x B c x x B c B c where B c and B c are the sets of matrix columns for which M c 1 and M c 1, respectively. Considering that L y ay and f x r x M c 0, we can rewrite Equation 1 as d x c a r x 1 2 a r x B c 1 2 a 1 2, and eliminating the terms for which x x B c For the all-pairs code matrix the following relationship holds: 1 2 B c k 1 2, where k is the numer of classes. So, the distance d x c is d x c a r x B c k 1 2 x B c 1 2 B c B c B c It is now easy to see that the class c x which minimizes d x c for example x, also maximizes p c x. Furthermore, if d x i d x j then p x i p x j, which means that the ranking of the classes for each example is the same. Since the non-iterative estimates p c x are in the same order as the iterative estimates ˆp c x, we can conclude that the Hastie-Tishirani method is equivalent to the loss-ased decoding method if L y ay, in terms of class prediction, for the all-pairs code matrix. Allwein et al. do not consider loss functions of the form L y ay, and uses non-linear loss functions such as L y e y. In this case, the class predicted y loss-ased decoding may differ from the one predicted y the method y Hastie and Tishirani. This theorem applies only to the all-pairs code matrix. For other matrices such that B c B c is a linear function of B c (such as the one-against-all matrix), we can prove that loss-ased decoding (with L y ay) predicts the same class as the non-iterative estimates. However, in this case, the non-iterative estimates do not necessarily predict the same class as the iterative ones. 5 Experiments We performed experiments using the following multiclass datasets from the UCI Machine Learning Repository [Blake and Merz, 1998]: satimage, pendigits and soyean. Tale 1 summarizes the characteristics of each dataset. The inary learning algorithm used in the experiments is oosted naive Bayes [Elkan, 1997], since this is a method that cannot e easily extended to handle multiclass prolems directly. For all the experiments, we ran 10 rounds of oosting.
6 Method Code Matrix Error Rate MSE Loss-ased (L y y) All-pairs Loss-ased (L y e y ) All-pairs Hastie-Tishirani (non-iterative) All-pairs Hastie-Tishirani (iterative) All-pairs Loss-ased (L y y) One-against-all Loss-ased (L y e y ) One-against-all Extended Hastie-Tishirani (non-iterative) One-against-all Extended Hastie-Tishirani (iterative) One-against-all Loss-ased (L y y) Sparse Loss-ased (L y e y ) Sparse Extended Hastie-Tishirani (non-iterative) Sparse Extended Hastie-Tishirani (iterative) Sparse Multiclass Naive Bayes Tale 2: Test set results on the satimage dataset. We use three different code matrices for each dataset: all-pairs, one-against-all and a sparse random matrix. The sparse random matrices have 15 log 2 k columns, and each element is 0 with proaility 1/2 and -1 or +1 with proaility 1/4 each. This is the same type of sparse random matrix used y Allwein et al.[allwein et al., 2000]. In order to have good error correcting properties, the Hamming distance ρ etween each pair of rows in the matrix must e large. We select the matrix y generating 10,000 random matrices and selecting the one for which ρ is maximized, checking that each column has at least one 1 and one 1, and that the matrix does not have two identical columns. We evaluate the performance of each method using two metrics. The first metric is the error rate otained when we assign each example to the most likely class predicted y the method. This metric is sufficient if we are only interested in classifying the examples correctly and do not need accurate proaility estimates of class memership. The second metric is squared error, defined for one example x as SE x j t j x p j x 2, where p j x is the proaility estimated y the method for example x and class j, and t j x is the true proaility of class j for x. Since for most real-world datasets true laels are known, ut not proailities, t j x is defined to e 1 if the lael of x is j and 0 otherwise. We calculate the squared error for each x to otain the mean squared error (MSE). The mean squared error is an adequate metrics for assessing the accuracy of proaility estimates [Zadrozny and Elkan, 2001]. This metric cannot e applied to the loss-ased decoding method, since it does not produce proaility estimates. Tale 2 shows the results of the experiments on the satimage dataset for each type of code matrix. As a aseline for comparison, we also show the results of applying multiclass Naive Bayes to this dataset. We can see that the iterative Hastie-Tishirani procedure (and its extension to aritrary code matrices) succeeds in lowering the MSE significantly compared to the non-iterative estimates, which indicates that it produces proaility estimates that are more accurate. In terms of error rate, the differences etween methods are small. For one-against-all matrices, the iterative method performs consistently worse, while for sparse random matrices, it performs consistently etter. Figure 1 shows how the MSE is lowered at each iteration of the Hastie-Tishirani algorithm, for the three types of code matrices. Tale 3 shows the results of the same experiments on the datasets pendigits and soyean. Again, the MSE is significantly lowered y the iterative procedure, in all cases. For the soyean dataset, using the sparse random matrix, the iterative method again has a lower error rate than the other methods, which is even lower than the error rate using the all-pairs matrix. This is an interesting result, since in this case the all-pairs matrix has 171 columns (corresponding to 171 classifiers), while the sparse matrix has only 64 columns.
7 0.12 Satimage all pairs one against all sparse MSE Iteration Figure 1: Convergence of the MSE for the satimage dataset. pendigits soyean Method Code Matrix Error Rate MSE Error Rate MSE Loss-ased (L y y) All-pairs Loss-ased (L y e y ) All-pairs Hastie-Tishirani (non-iterative) All-pairs Hastie-Tishirani (iterative) All-pairs Loss-ased (L y y) One-against-all Loss-ased (L y e y ) One-against-all Ext. Hastie-Tishirani (non-it.) One-against-all Ext. Hastie-Tishirani (it.) One-against-all Loss-ased (L y y) Sparse Loss-ased (L y e y ) Sparse Ext. Hastie-Tishirani (non-it.) Sparse Ext. Hastie-Tishirani (it.) Sparse Multiclass Naive Bayes Tale 3: Test set results on the pendigits and soyean datasets. 6 Conclusions We have presented a method for producing class memership proaility estimates for multiclass prolems, given proaility estimates for a series of inary prolems determined y an aritrary code matrix. Since research in designing optimal code matrices is still on-going [Utschick and Weichselerger, 2001] [Crammer and Singer, 2000], it is important to e ale to otain class memership proaility estimates from aritrary code matrices. In current research, the effectiveness of a code matrix is determined primarily y the classification accuracy. However, since many applications require accurate class memership proaility estimates for each of the classes, it is important to also compare the different types of code matrices according to their aility of producing such estimates. Our extension of Hastie and Tishirani s method is useful for this purpose. Our method relies on the proaility estimates given y the inary classifiers to produce the multiclass proaility estimates. However, the proaility estimates produced y Boosted
8 Naive Bayes are not calirated proaility estimates. An interesting direction for future work is in determining whether the caliration of the proaility estimates given y the inary classifiers improves the caliration of the multiclass proailities. References [Allwein et al., 2000] Allwein, E. L., Schapire, R. E., and Singer, Y. (2000). Reducing multiclass to inary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1: [Blake and Merz, 1998] Blake, C. L. and Merz, C. J. (1998). UCI repository of machine learning dataases. Department of Information and Computer Sciences, University of California, Irvine. mlearn/mlrepository.html. [Bradley and Terry, 1952] Bradley, R. and Terry, M. (1952). Rank analysis of incomplete lock designs, I: The method of paired comparisons. Biometrics, pages [Crammer and Singer, 2000] Crammer, K. and Singer, Y. (2000). On the learnaility and design of output codes for multiclass prolems. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages [Dietterich and Bakiri, 1995] Dietterich, T. G. and Bakiri, G. (1995). Solving multiclass learning prolems via error-correcting output codes. Journal of Artificial Intelligence Research, 2: [Elkan, 1997] Elkan, C. (1997). Boosting and naive ayesian learning. Technical Report CS97-557, University of California, San Diego. [Hastie and Tishirani, 1998] Hastie, T. and Tishirani, R. (1998). Classification y pairwise coupling. In Advances in Neural Information Processing Systems, volume 10. MIT Press. [Utschick and Weichselerger, 2001] Utschick, W. and Weichselerger, W. (2001). Stochastic organization of output codes in multiclass learning prolems. Neural Computation, 13(5): [Zadrozny and Elkan, 2001a] Zadrozny, B. and Elkan, C. (2001a). Learning and making decisions when costs and proailities are oth unknown. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pages ACM Press. [Zadrozny and Elkan, 2001] Zadrozny, B. and Elkan, C. (2001). Otaining calirated proaility estimates from decision trees and naive ayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, pages Morgan Kaufmann Pulishers, Inc.
Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each
More informationA Simple Cost-sensitive Multiclass Classification Algorithm Using One-versus-one Comparisons
Data Mining and Knowledge Discovery manuscript No. (will e inserted y the editor) A Simple Cost-sensitive Multiclass Classification Algorithm Using One-versus-one Comparisons Hsuan-Tien Lin Astract Many
More informationNumber Who Chose This Maximum Amount
1 TASK 3.3.1: MAXIMIZING REVENUE AND PROFIT Solutions Your school is trying to oost interest in its athletic program. It has decided to sell a pass that will allow the holder to attend all athletic events
More informationSub-class Error-Correcting Output Codes
Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat
More informationAdapting Codes and Embeddings for Polychotomies
Adapting Codes and Embeddings for Polychotomies Gunnar Rätsch, Alexander J. Smola RSISE, CSL, Machine Learning Group The Australian National University Canberra, 2 ACT, Australia Gunnar.Raetsch, Alex.Smola
More informationSurvey on Multiclass Classification Methods
Survey on Multiclass Classification Methods Mohamed Aly November 2005 Abstract Supervised classification algorithms aim at producing a learning model from a labeled training set. Various
More informationOn the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
More informationEnsemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
More informationNew Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
More informationClassification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationMachine Learning Techniques Reductions Between Prediction Quality Metrics
Machine Learning Techniques Reductions Between Prediction Quality Metrics Alina Beygelzimer and John Langford and Bianca Zadrozny Abstract Machine learning involves optimizing a loss function on unlabeled
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationarxiv:1506.04135v1 [cs.ir] 12 Jun 2015
Reducing offline evaluation bias of collaborative filtering algorithms Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 arxiv:1506.04135v1 [cs.ir] 12 Jun 2015 1 - Viadeo
More informationRoulette Sampling for Cost-Sensitive Learning
Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca
More informationTraining Methods for Adaptive Boosting of Neural Networks for Character Recognition
Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal
More informationGetting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationMagnetometer Realignment: Theory and Implementation
Magnetometer Realignment: heory and Implementation William Premerlani, Octoer 16, 011 Prolem A magnetometer that is separately mounted from its IMU partner needs to e carefully aligned with the IMU in
More informationMulti-class Classification: A Coding Based Space Partitioning
Multi-class Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique,
More information10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method
578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after
More informationCustomer Relationship Management by Semi-Supervised Learning
Middle-East Journal of Scientific Research 16 (5): 614-620, 2013 ISSN 1990-9233 IDOSI Pulications, 2013 DOI: 10.5829/idosi.mejsr.2013.16.05.930 Customer Relationship Management y Semi-Supervised Learning
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationThe Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationDECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationAn Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
More informationImpact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
More information10.1 Systems of Linear Equations: Substitution and Elimination
726 CHAPTER 10 Systems of Equations and Inequalities 10.1 Systems of Linear Equations: Sustitution and Elimination PREPARING FOR THIS SECTION Before getting started, review the following: Linear Equations
More informationData Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationBenchmarking Open-Source Tree Learners in R/RWeka
Benchmarking Open-Source Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems
More informationA Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationOne-sided Support Vector Regression for Multiclass Cost-sensitive Classification
One-sided Support Vector Regression for Multiclass Cost-sensitive Classification Han-Hsing Tu r96139@csie.ntu.edu.tw Hsuan-Tien Lin htlin@csie.ntu.edu.tw Department of Computer Science and Information
More informationCS 6220: Data Mining Techniques Course Project Description
CS 6220: Data Mining Techniques Course Project Description College of Computer and Information Science Northeastern University Spring 2013 General Goal In this project, you will have an opportunity to
More information7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationCounting Primes whose Sum of Digits is Prime
2 3 47 6 23 Journal of Integer Sequences, Vol. 5 (202), Article 2.2.2 Counting Primes whose Sum of Digits is Prime Glyn Harman Department of Mathematics Royal Holloway, University of London Egham Surrey
More informationWhich Is the Best Multiclass SVM Method? An Empirical Study
Which Is the Best Multiclass SVM Method? An Empirical Study Kai-Bo Duan 1 and S. Sathiya Keerthi 2 1 BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 askbduan@ntu.edu.sg
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationA General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms Ian Davidson 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl country code 1st
More informationSoftware Reliability Measuring using Modified Maximum Likelihood Estimation and SPC
Software Reliaility Measuring using Modified Maximum Likelihood Estimation and SPC Dr. R Satya Prasad Associate Prof, Dept. of CSE Acharya Nagarjuna University Guntur, INDIA K Ramchand H Rao Dept. of CSE
More informationUsing Random Forest to Learn Imbalanced Data
Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,
More informationA Direct Numerical Method for Observability Analysis
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method
More informationA Binary Recursive Gcd Algorithm
A Binary Recursive Gcd Algorithm Damien Stehlé and Paul Zimmermann LORIA/INRIA Lorraine, 615 rue du jardin otanique, BP 101, F-5460 Villers-lès-Nancy, France, {stehle,zimmerma}@loria.fr Astract. The inary
More information5 Double Integrals over Rectangular Regions
Chapter 7 Section 5 Doule Integrals over Rectangular Regions 569 5 Doule Integrals over Rectangular Regions In Prolems 5 through 53, use the method of Lagrange multipliers to find the indicated maximum
More informationApplied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing
More informationA Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY
More informationDomain Adaptation meets Active Learning
Domain Adaptation meets Active Learning Piyush Rai, Avishek Saha, Hal Daumé III, and Suresh Venkatasuramanian School of Computing, University of Utah Salt Lake City, UT 84112 {piyush,avishek,hal,suresh}@cs.utah.edu
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Lecture 3: QR, least squares, linear regression Linear Algebra Methods for Data Mining, Spring 2007, University
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationCombining SVM classifiers for email anti-spam filtering
Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationMinimizing Probing Cost and Achieving Identifiability in Network Link Monitoring
Minimizing Proing Cost and Achieving Identifiaility in Network Link Monitoring Qiang Zheng and Guohong Cao Department of Computer Science and Engineering The Pennsylvania State University E-mail: {quz3,
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
More informationSUPPORT VECTOR MACHINE (SVM) is the optimal
130 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 Multiclass Posterior Probability Support Vector Machines Mehmet Gönen, Ayşe Gönül Tanuğur, and Ethem Alpaydın, Senior Member, IEEE
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationTOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More information*corresponding author
Key Engineering Materials Vol. 588 (2014) pp 249-256 Online availale since 2013/Oct/11 at www.scientific.net (2014) Trans Tech Pulications, Switzerland doi:10.4028/www.scientific.net/kem.588.249 Remote
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationA Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms
Proceedings of the International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE 2009 30 June, 1 3 July 2009. A Hybrid Approach to Learn with Imbalanced Classes using
More informationα = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationIn Defense of One-Vs-All Classification
Journal of Machine Learning Research 5 (2004) 101-141 Submitted 4/03; Revised 8/03; Published 1/04 In Defense of One-Vs-All Classification Ryan Rifkin Honda Research Institute USA 145 Tremont Street Boston,
More informationA Negative Result Concerning Explicit Matrices With The Restricted Isometry Property
A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property Venkat Chandar March 1, 2008 Abstract In this note, we prove that matrices whose entries are all 0 or 1 cannot achieve
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationVisualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationDUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More information[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationK-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationProgramming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
More informationActive Learning with Boosting for Spam Detection
Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More information