Reducing multiclass to binary by coupling probability estimates

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Reducing multiclass to binary by coupling probability estimates"

Transcription

1 Reducing multiclass to inary y coupling proaility estimates Bianca Zadrozny Department of Computer Science and Engineering University of California, San Diego La Jolla, CA Astract This paper presents a method for otaining class memership proaility estimates for multiclass classification prolems y coupling the proaility estimates produced y inary classifiers. This is an extension for aritrary code matrices of a method due to Hastie and Tishirani for pairwise coupling of proaility estimates. Experimental results with Boosted Naive Bayes show that our method produces calirated class memership proaility estimates, while having similar classification accuracy as loss-ased decoding, a method for otaining the most likely class that does not generate proaility estimates. 1 Introduction The two most well-known approaches for reducing a multiclass classification prolem to a set of inary classification prolems are known as one-against-all and all-pairs. In the one-against-all approach, we train a classifier for each of the classes using as positive examples the training examples that elong to that class, and as negatives all the other training examples. In the all-pairs approach, we train a classifier for each possile pair of classes ignoring the examples that do not elong to the classes in question. Although these two approaches are the most ovious, Allwein et al. [Allwein et al., 2000] have shown that there are many other ways in which a multiclass prolem can e decomposed into a numer of inary classification prolems. We can represent each such decomposition y a code matrix M k l, where k is the numer of classes and l is the numer of inary classification prolems. If M c 1 then the examples elonging to class c are considered to e positive examples for the inary classification prolem. Similarly, if M c 1 the examples elonging to c are considered to e negative examples for. Finally, if M c 0 the examples elonging to c are not used in training a classifier for. For example, in the 3-class case, the all-pairs code matrix is c c c This approach for representing the decomposition of a multiclass prolem into inary pro-

2 lems is a generalization of the Error-Correcting Output Codes (ECOC) scheme proposed y Dietterich and Bakiri [Dietterich and Bakiri, 1995]. The ECOC scheme does not allow zeros in the code matrix, meaning that all examples are used in each inary classification prolem. Orthogonal to the prolem of choosing a code matrix for reducing multiclass to inary is the prolem of classifying an example given the laels assigned y each inary classifier. Given an example x, Allwein et al. [Allwein et al., 2000] first create a vector v of length l containing the -1,+1 laels assigned to x y each inary classifier. Then, they compute the Hamming distance etween v and each row of M, and find the row c that is closest to v according to this metric. The lael c is then assigned to x. This method is called Hamming decoding. For the case in which the inary classifiers output a score whose magnitude is a measure of confidence in the prediction, they use a loss-ased decoding approach that takes into account the scores to calculate the distance etween v and each row of M, instead of using the Hamming distance. This method is called loss-ased decoding. Allwein et al. [Allwein et al., 2000] present theoretical and experimental results indicating that this method is etter than Hamming decoding. However, oth of these methods simply assign a class lael to each example. They do not output class memership proaility estimates ˆP C c X x for an example x. These proaility estimates are important when the classification outputs are not used in isolation and must e comined with other sources of information, such as misclassification costs [Zadrozny and Elkan, 2001a] or the outputs of another classifier. Given a code matrix M and a inary classification learning algorithm that outputs proaility estimates, we would like to couple the estimates given y each inary classifier in order to otain class proaility memership estimates for the multiclass prolem. Hastie and Tishirani [Hastie and Tishirani, 1998] descrie a solution for otaining proaility estimates ˆP C c X x in the all-pairs case y coupling the pairwise proaility estimates, which we descrie in Section 2. In Section 3, we extend the method to aritrary code matrices. In Section 4 we discuss the loss-ased decoding approach in more detail and compare it mathematically to the method y Hastie and Tishirani. In Section 5 we present experimental results. 2 Coupling pairwise proaility estimates We are given pairwise proaility estimates r i j x for every class i j, otained y training a classifier using the examples elonging to class i as positives and the examples elonging to class j as negatives. We would like to couple these estimates to otain a set of class memership proailities p i x P C c i X x for each example x. The r i j are related to the p i according to r i j x P C i C i C j X x p i x p i x p j x Since we additionally require that i p i x 1, there are k 1 free parameters and k k 1 2 constraints. This implies that there may not exist p i satisfying these constraints. Let n i j e the numer of training examples used to train the inary classifier that predicts r i j. In order to find the est approximation ˆr i j x ˆp i x ˆp i x ˆp j x, Hastie and Tishirani fit the Bradley-Terrey model for paired comparisons [Bradley and Terry, 1952] y minimizing the average weighted Kullack-Leiler distance l x etween r i j x and

3 ˆr i j x for each x, given y l x i j n i j r i j x log r i j x ˆr i j x 1 r i j x log 1 r i j x 1 ˆr i j x The algorithm is as follows: 1. Start with some guess for the ˆp i x and corresponding ˆr i j x. 2. Repeat until convergence: (a) For each i 1 2 k () Renormalize the ˆp i x. (c) Recompute the ˆr i j x. ˆp i x ˆp i x j i n i j r i j x j i n i j ˆr i j x Hastie and Tishirani [Hastie and Tishirani, 1998] prove that the Kullack-Leiler distance etween r i j x and ˆr i j x decreases at each step. Since this distance is ounded elow y zero, the algorithm converges. At convergence, the ˆr i j are consistent with the ˆp i. The class predicted for each example x is ĉ x argmax ˆp i x. Hastie and Tishirani also prove that the ˆp i x are in the same order as the non-iterative estimates p i x j i r i j x for each x. Thus, the p i x are sufficient for predicting the most likely class for each example. However, as shown y Hastie and Tishirani, they are not accurate proaility estimates ecause they tend to underestimate the differences etween the ˆp i x values. 3 Extending the Hastie-Tishirani method to aritrary code matrices For an aritrary code matrix M, instead of having pairwise proaility estimates, we have an estimate r x for each column of M, such that r x C c C c X x P c I c I J where I and J are the set of classes for which M c I p c x c I J p c x 1 and M 1, respectively. We would like to otain a set of class memership proailities p i x for each example x compatile with the r x and suject to i p i x 1. In this case, the numer of free parameters is k 1 and the numer of constraints is l 1, where l is the numer of columns of the code matrix. Since for most code matrices l is greater than k 1, in general there is no exact solution to this prolem. For this reason, we propose an algorithm analogous to the Hastie-Tishirani method presented in the previous section to find the est approximate proaility estimates ˆp i (x) such that ˆr x c I ˆp c x c I J ˆp c x and the Kullack-Leiler distance etween ˆr x and r x is minimized. Let n e the numer of training examples used to train the inary classifier that corresponds to column of the code matrix. The algorithm is as follows: 1. Start with some guess for the ˆp i x and corresponding ˆr x. 2. Repeat until convergence:

4 (a) For each i 1 2 k ˆp i x s t M i ˆp i 1 n r x s t M i 1 n 1 r x x s t M i 1 n ˆr x s t M i 1 n 1 ˆr x () Renormalize the ˆp i x. (c) Recompute the ˆr x. If the code matrix is the all-pairs matrix, this algorithm reduces to the original method y Hastie and Tishirani. Let B i e the set of matrix columns for which M i 1 and B i e the set of matrix columns for which M c 1. By analogy with the non-iterative estimates suggested y Hastie and Tishirani, we can define non-iterative estimates p i x B i x 1 B r i x. For the all-pairs code matrix, these estimates are the same as the ones suggested y Hastie and Tishirani. However, for aritrary matrices, we cannot prove that the non-iterative estimates predict the same class as the iterative estimates. 4 Loss-ased decoding In this section, we discuss how to apply the loss-ased decoding method to classifiers that output class memership proaility estimates. We also study the conditions under which this method predicts the same class as the Hastie-Tishirani method, in the all-pairs case. The loss-ased decoding method [Allwein et al., 2000] requires that each inary classifier output a margin score satisfying two requirements. First, the score should e positive if the example is classified as positive, and negative if the example is classified as negative. Second, the magnitude of the score should e a measure of confidence in the prediction. The method works as follows. Let f x e the margin score predicted y the classifier corresponding to column of the code matrix for example x. For each row c of the code matrix M and for each example x, we compute the distance etween f and M c as d L x c l L M c f x (1) 1 where L is a loss function that is dependent on the nature of the inary classifier and M c = 0, 1 or 1. We then lael each example x with the lael c for which d L is minimized. If the inary classification learning algorithm outputs scores that are proaility estimates, they do not satisfy the first requirement ecause the proaility estimates are all etween 0 and 1. However, we can transform the proaility estimates r x output y each classifier into margin scores y sutracting 1 2 from the scores, so that we consider as positives the examples x for which r x is aove 1/2, and as negatives the examples x for which r x is elow 1/2. We now prove a theorem that relates the loss-ased decoding method to the Hastie- Tishirani method, for a particular class of loss functions. Theorem 1 The loss-ased decoding method for all-pairs code matrices predicts the same class lael as the iterative estimates ˆp i x given y Hastie and Tishirani, if the loss function is of the form L y ay, for any a 0. Proof: We first show that, if the loss function is of the form L y ay, the loss-ased decoding method predicts the same class lael as the non-iterative estimates p i x, for the all-pairs code matrix.

5 Dataset #Training Examples #Test Examples #Attriutes #Classes satimage pendigits soyean Tale 1: Characteristics of the datasets used in the experiments. The non-iterative estimates p i x are given y p c x x 1 r x B c x x B c B c where B c and B c are the sets of matrix columns for which M c 1 and M c 1, respectively. Considering that L y ay and f x r x M c 0, we can rewrite Equation 1 as d x c a r x 1 2 a r x B c 1 2 a 1 2, and eliminating the terms for which x x B c For the all-pairs code matrix the following relationship holds: 1 2 B c k 1 2, where k is the numer of classes. So, the distance d x c is d x c a r x B c k 1 2 x B c 1 2 B c B c B c It is now easy to see that the class c x which minimizes d x c for example x, also maximizes p c x. Furthermore, if d x i d x j then p x i p x j, which means that the ranking of the classes for each example is the same. Since the non-iterative estimates p c x are in the same order as the iterative estimates ˆp c x, we can conclude that the Hastie-Tishirani method is equivalent to the loss-ased decoding method if L y ay, in terms of class prediction, for the all-pairs code matrix. Allwein et al. do not consider loss functions of the form L y ay, and uses non-linear loss functions such as L y e y. In this case, the class predicted y loss-ased decoding may differ from the one predicted y the method y Hastie and Tishirani. This theorem applies only to the all-pairs code matrix. For other matrices such that B c B c is a linear function of B c (such as the one-against-all matrix), we can prove that loss-ased decoding (with L y ay) predicts the same class as the non-iterative estimates. However, in this case, the non-iterative estimates do not necessarily predict the same class as the iterative ones. 5 Experiments We performed experiments using the following multiclass datasets from the UCI Machine Learning Repository [Blake and Merz, 1998]: satimage, pendigits and soyean. Tale 1 summarizes the characteristics of each dataset. The inary learning algorithm used in the experiments is oosted naive Bayes [Elkan, 1997], since this is a method that cannot e easily extended to handle multiclass prolems directly. For all the experiments, we ran 10 rounds of oosting.

6 Method Code Matrix Error Rate MSE Loss-ased (L y y) All-pairs Loss-ased (L y e y ) All-pairs Hastie-Tishirani (non-iterative) All-pairs Hastie-Tishirani (iterative) All-pairs Loss-ased (L y y) One-against-all Loss-ased (L y e y ) One-against-all Extended Hastie-Tishirani (non-iterative) One-against-all Extended Hastie-Tishirani (iterative) One-against-all Loss-ased (L y y) Sparse Loss-ased (L y e y ) Sparse Extended Hastie-Tishirani (non-iterative) Sparse Extended Hastie-Tishirani (iterative) Sparse Multiclass Naive Bayes Tale 2: Test set results on the satimage dataset. We use three different code matrices for each dataset: all-pairs, one-against-all and a sparse random matrix. The sparse random matrices have 15 log 2 k columns, and each element is 0 with proaility 1/2 and -1 or +1 with proaility 1/4 each. This is the same type of sparse random matrix used y Allwein et al.[allwein et al., 2000]. In order to have good error correcting properties, the Hamming distance ρ etween each pair of rows in the matrix must e large. We select the matrix y generating 10,000 random matrices and selecting the one for which ρ is maximized, checking that each column has at least one 1 and one 1, and that the matrix does not have two identical columns. We evaluate the performance of each method using two metrics. The first metric is the error rate otained when we assign each example to the most likely class predicted y the method. This metric is sufficient if we are only interested in classifying the examples correctly and do not need accurate proaility estimates of class memership. The second metric is squared error, defined for one example x as SE x j t j x p j x 2, where p j x is the proaility estimated y the method for example x and class j, and t j x is the true proaility of class j for x. Since for most real-world datasets true laels are known, ut not proailities, t j x is defined to e 1 if the lael of x is j and 0 otherwise. We calculate the squared error for each x to otain the mean squared error (MSE). The mean squared error is an adequate metrics for assessing the accuracy of proaility estimates [Zadrozny and Elkan, 2001]. This metric cannot e applied to the loss-ased decoding method, since it does not produce proaility estimates. Tale 2 shows the results of the experiments on the satimage dataset for each type of code matrix. As a aseline for comparison, we also show the results of applying multiclass Naive Bayes to this dataset. We can see that the iterative Hastie-Tishirani procedure (and its extension to aritrary code matrices) succeeds in lowering the MSE significantly compared to the non-iterative estimates, which indicates that it produces proaility estimates that are more accurate. In terms of error rate, the differences etween methods are small. For one-against-all matrices, the iterative method performs consistently worse, while for sparse random matrices, it performs consistently etter. Figure 1 shows how the MSE is lowered at each iteration of the Hastie-Tishirani algorithm, for the three types of code matrices. Tale 3 shows the results of the same experiments on the datasets pendigits and soyean. Again, the MSE is significantly lowered y the iterative procedure, in all cases. For the soyean dataset, using the sparse random matrix, the iterative method again has a lower error rate than the other methods, which is even lower than the error rate using the all-pairs matrix. This is an interesting result, since in this case the all-pairs matrix has 171 columns (corresponding to 171 classifiers), while the sparse matrix has only 64 columns.

7 0.12 Satimage all pairs one against all sparse MSE Iteration Figure 1: Convergence of the MSE for the satimage dataset. pendigits soyean Method Code Matrix Error Rate MSE Error Rate MSE Loss-ased (L y y) All-pairs Loss-ased (L y e y ) All-pairs Hastie-Tishirani (non-iterative) All-pairs Hastie-Tishirani (iterative) All-pairs Loss-ased (L y y) One-against-all Loss-ased (L y e y ) One-against-all Ext. Hastie-Tishirani (non-it.) One-against-all Ext. Hastie-Tishirani (it.) One-against-all Loss-ased (L y y) Sparse Loss-ased (L y e y ) Sparse Ext. Hastie-Tishirani (non-it.) Sparse Ext. Hastie-Tishirani (it.) Sparse Multiclass Naive Bayes Tale 3: Test set results on the pendigits and soyean datasets. 6 Conclusions We have presented a method for producing class memership proaility estimates for multiclass prolems, given proaility estimates for a series of inary prolems determined y an aritrary code matrix. Since research in designing optimal code matrices is still on-going [Utschick and Weichselerger, 2001] [Crammer and Singer, 2000], it is important to e ale to otain class memership proaility estimates from aritrary code matrices. In current research, the effectiveness of a code matrix is determined primarily y the classification accuracy. However, since many applications require accurate class memership proaility estimates for each of the classes, it is important to also compare the different types of code matrices according to their aility of producing such estimates. Our extension of Hastie and Tishirani s method is useful for this purpose. Our method relies on the proaility estimates given y the inary classifiers to produce the multiclass proaility estimates. However, the proaility estimates produced y Boosted

8 Naive Bayes are not calirated proaility estimates. An interesting direction for future work is in determining whether the caliration of the proaility estimates given y the inary classifiers improves the caliration of the multiclass proailities. References [Allwein et al., 2000] Allwein, E. L., Schapire, R. E., and Singer, Y. (2000). Reducing multiclass to inary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1: [Blake and Merz, 1998] Blake, C. L. and Merz, C. J. (1998). UCI repository of machine learning dataases. Department of Information and Computer Sciences, University of California, Irvine. mlearn/mlrepository.html. [Bradley and Terry, 1952] Bradley, R. and Terry, M. (1952). Rank analysis of incomplete lock designs, I: The method of paired comparisons. Biometrics, pages [Crammer and Singer, 2000] Crammer, K. and Singer, Y. (2000). On the learnaility and design of output codes for multiclass prolems. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages [Dietterich and Bakiri, 1995] Dietterich, T. G. and Bakiri, G. (1995). Solving multiclass learning prolems via error-correcting output codes. Journal of Artificial Intelligence Research, 2: [Elkan, 1997] Elkan, C. (1997). Boosting and naive ayesian learning. Technical Report CS97-557, University of California, San Diego. [Hastie and Tishirani, 1998] Hastie, T. and Tishirani, R. (1998). Classification y pairwise coupling. In Advances in Neural Information Processing Systems, volume 10. MIT Press. [Utschick and Weichselerger, 2001] Utschick, W. and Weichselerger, W. (2001). Stochastic organization of output codes in multiclass learning prolems. Neural Computation, 13(5): [Zadrozny and Elkan, 2001a] Zadrozny, B. and Elkan, C. (2001a). Learning and making decisions when costs and proailities are oth unknown. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pages ACM Press. [Zadrozny and Elkan, 2001] Zadrozny, B. and Elkan, C. (2001). Otaining calirated proaility estimates from decision trees and naive ayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, pages Morgan Kaufmann Pulishers, Inc.

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each

More information

Transforming Classifier Scores into Accurate Multiclass Probability Estimates

Transforming Classifier Scores into Accurate Multiclass Probability Estimates Transforming Classifier Scores into Accurate Multiclass Probability Estimates Bianca Zadrozny & Charles Elkan Presenter: Myle Ott 1 Motivation (the same old story) Easy to rank examples in order of classmembership

More information

A Simple Cost-sensitive Multiclass Classification Algorithm Using One-versus-one Comparisons

A Simple Cost-sensitive Multiclass Classification Algorithm Using One-versus-one Comparisons Data Mining and Knowledge Discovery manuscript No. (will e inserted y the editor) A Simple Cost-sensitive Multiclass Classification Algorithm Using One-versus-one Comparisons Hsuan-Tien Lin Astract Many

More information

Sub-class Error-Correcting Output Codes

Sub-class Error-Correcting Output Codes Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat

More information

Adapting Codes and Embeddings for Polychotomies

Adapting Codes and Embeddings for Polychotomies Adapting Codes and Embeddings for Polychotomies Gunnar Rätsch, Alexander J. Smola RSISE, CSL, Machine Learning Group The Australian National University Canberra, 2 ACT, Australia Gunnar.Raetsch, Alex.Smola

More information

Number Who Chose This Maximum Amount

Number Who Chose This Maximum Amount 1 TASK 3.3.1: MAXIMIZING REVENUE AND PROFIT Solutions Your school is trying to oost interest in its athletic program. It has decided to sell a pass that will allow the holder to attend all athletic events

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

Survey on Multiclass Classification Methods

Survey on Multiclass Classification Methods Survey on Multiclass Classification Methods Mohamed Aly November 2005 Abstract Supervised classification algorithms aim at producing a learning model from a labeled training set. Various

More information

Ensemble Data Mining Methods

Ensemble Data Mining Methods Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

More information

THE 2-ADIC, BINARY AND DECIMAL PERIODS OF 1/3 k APPROACH FULL COMPLEXITY FOR INCREASING k

THE 2-ADIC, BINARY AND DECIMAL PERIODS OF 1/3 k APPROACH FULL COMPLEXITY FOR INCREASING k #A28 INTEGERS 12 (2012) THE 2-ADIC BINARY AND DECIMAL PERIODS OF 1/ k APPROACH FULL COMPLEXITY FOR INCREASING k Josefina López Villa Aecia Sud Cinti Chuquisaca Bolivia josefinapedro@hotmailcom Peter Stoll

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

Classification by Pairwise Coupling

Classification by Pairwise Coupling Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Machine Learning Techniques Reductions Between Prediction Quality Metrics

Machine Learning Techniques Reductions Between Prediction Quality Metrics Machine Learning Techniques Reductions Between Prediction Quality Metrics Alina Beygelzimer and John Langford and Bianca Zadrozny Abstract Machine learning involves optimizing a loss function on unlabeled

More information

Pricing Based Framework for Benefit Scoring

Pricing Based Framework for Benefit Scoring Pricing Based Framework for Benefit Scoring Nitesh Chawla University of Notre Dame Notre Dame, IN 46556 nchawla@nd.edu Xiangning Li University of Notre Dame Notre Dame, IN 46556 xli3@nd.edu ABSTRACT Data

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Roulette Sampling for Cost-Sensitive Learning

Roulette Sampling for Cost-Sensitive Learning Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca

More information

Training Methods for Adaptive Boosting of Neural Networks for Character Recognition

Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal

More information

Convergence of Power Series Lecture Notes

Convergence of Power Series Lecture Notes Convergence of Power Series Lecture Notes Consider a power series, say $ % 0aB œ B B B B â. Does this series converge? This is a question that we have een ignoring, ut it is time to face it. Whether or

More information

Optimizing Area Under the ROC Curve using Ranking SVMs

Optimizing Area Under the ROC Curve using Ranking SVMs Optimizing Area Under the ROC Curve using Ranking SVMs Kaan Ataman Department of Management Sciences The University of Iowa kaan-ataman@uiowa.edu W. Nick Street Department of Management Sciences The University

More information

Solutions to Assignment 4

Solutions to Assignment 4 Solutions to Assignment 4 Math 412, Winter 2003 3.1.18 Define a new addition and multiplication on Z y a a + 1 and a a + a, where the operations on the right-hand side off the equal signs are ordinary

More information

Multi-class Classification: A Coding Based Space Partitioning

Multi-class Classification: A Coding Based Space Partitioning Multi-class Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique,

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Magnetometer Realignment: Theory and Implementation

Magnetometer Realignment: Theory and Implementation Magnetometer Realignment: heory and Implementation William Premerlani, Octoer 16, 011 Prolem A magnetometer that is separately mounted from its IMU partner needs to e carefully aligned with the IMU in

More information

Customer Relationship Management by Semi-Supervised Learning

Customer Relationship Management by Semi-Supervised Learning Middle-East Journal of Scientific Research 16 (5): 614-620, 2013 ISSN 1990-9233 IDOSI Pulications, 2013 DOI: 10.5829/idosi.mejsr.2013.16.05.930 Customer Relationship Management y Semi-Supervised Learning

More information

U.S. Presidential Election Forecasts: Through the Lense of Linear Algebra. Cassia S. Wagner

U.S. Presidential Election Forecasts: Through the Lense of Linear Algebra. Cassia S. Wagner U.S. Presidential Election Forecasts: Through the Lense of Linear Algebra Cassia S. Wagner May 11, 2012 Abstract Markov Chains use multiplication of a transformation matrix and a probability vector to

More information

Supervised Learning with Unsupervised Output Separation

Supervised Learning with Unsupervised Output Separation Supervised Learning with Unsupervised Output Separation Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa 150 Louis Pasteur, P.O. Box 450 Stn. A Ottawa, Ontario,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

arxiv:1506.04135v1 [cs.ir] 12 Jun 2015

arxiv:1506.04135v1 [cs.ir] 12 Jun 2015 Reducing offline evaluation bias of collaborative filtering algorithms Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 arxiv:1506.04135v1 [cs.ir] 12 Jun 2015 1 - Viadeo

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method 578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after

More information

Feature Selection with Decision Tree Criterion

Feature Selection with Decision Tree Criterion Feature Selection with Decision Tree Criterion Krzysztof Grąbczewski and Norbert Jankowski Department of Computer Methods Nicolaus Copernicus University Toruń, Poland kgrabcze,norbert@phys.uni.torun.pl

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

A. Break even analysis

A. Break even analysis Lecture (50 minutes) Eighth week lessons Function (continued) & Quadratic Equations (Divided into 3 lectures of 50 minutes each) a) Break even analysis ) Supply, Demand and market equilirium. c) Class

More information

Automatic Web Page Classification

Automatic Web Page Classification Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

More information

A Game Theoretical Framework for Adversarial Learning

A Game Theoretical Framework for Adversarial Learning A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Using the Singular Value Decomposition

Using the Singular Value Decomposition Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces

More information

CS 6220: Data Mining Techniques Course Project Description

CS 6220: Data Mining Techniques Course Project Description CS 6220: Data Mining Techniques Course Project Description College of Computer and Information Science Northeastern University Spring 2013 General Goal In this project, you will have an opportunity to

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Summary/Review of Matrix Algebra. Matrix Algebra. Table Structure of Data

Summary/Review of Matrix Algebra. Matrix Algebra. Table Structure of Data Summary/Review of Matrix Algera Introduction to Matrices Descriptors & ojects, Linear algera, Order Association Matrices R- and Q-mode Special Matrices Trace, Diagonal, Identity, Scalars, Transpose Vectors

More information

Linear Systems. Singular and Nonsingular Matrices. Find x 1, x 2, x 3 such that the following three equations hold:

Linear Systems. Singular and Nonsingular Matrices. Find x 1, x 2, x 3 such that the following three equations hold: Linear Systems Example: Find x, x, x such that the following three equations hold: x + x + x = 4x + x + x = x + x + x = 6 We can write this using matrix-vector notation as 4 {{ A x x x {{ x = 6 {{ b General

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

Which Is the Best Multiclass SVM Method? An Empirical Study

Which Is the Best Multiclass SVM Method? An Empirical Study Which Is the Best Multiclass SVM Method? An Empirical Study Kai-Bo Duan 1 and S. Sathiya Keerthi 2 1 BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 askbduan@ntu.edu.sg

More information

Using Random Forest to Learn Imbalanced Data

Using Random Forest to Learn Imbalanced Data Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,

More information

10.1 Systems of Linear Equations: Substitution and Elimination

10.1 Systems of Linear Equations: Substitution and Elimination 726 CHAPTER 10 Systems of Equations and Inequalities 10.1 Systems of Linear Equations: Sustitution and Elimination PREPARING FOR THIS SECTION Before getting started, review the following: Linear Equations

More information

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY

More information

A Direct Numerical Method for Observability Analysis

A Direct Numerical Method for Observability Analysis IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method

More information

One-sided Support Vector Regression for Multiclass Cost-sensitive Classification

One-sided Support Vector Regression for Multiclass Cost-sensitive Classification One-sided Support Vector Regression for Multiclass Cost-sensitive Classification Han-Hsing Tu r96139@csie.ntu.edu.tw Hsuan-Tien Lin htlin@csie.ntu.edu.tw Department of Computer Science and Information

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Combining SVM classifiers for email anti-spam filtering

Combining SVM classifiers for email anti-spam filtering Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Benchmarking Open-Source Tree Learners in R/RWeka

Benchmarking Open-Source Tree Learners in R/RWeka Benchmarking Open-Source Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Software Reliability Measuring using Modified Maximum Likelihood Estimation and SPC

Software Reliability Measuring using Modified Maximum Likelihood Estimation and SPC Software Reliaility Measuring using Modified Maximum Likelihood Estimation and SPC Dr. R Satya Prasad Associate Prof, Dept. of CSE Acharya Nagarjuna University Guntur, INDIA K Ramchand H Rao Dept. of CSE

More information

Domain Adaptation meets Active Learning

Domain Adaptation meets Active Learning Domain Adaptation meets Active Learning Piyush Rai, Avishek Saha, Hal Daumé III, and Suresh Venkatasuramanian School of Computing, University of Utah Salt Lake City, UT 84112 {piyush,avishek,hal,suresh}@cs.utah.edu

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

A Binary Recursive Gcd Algorithm

A Binary Recursive Gcd Algorithm A Binary Recursive Gcd Algorithm Damien Stehlé and Paul Zimmermann LORIA/INRIA Lorraine, 615 rue du jardin otanique, BP 101, F-5460 Villers-lès-Nancy, France, {stehle,zimmerma}@loria.fr Astract. The inary

More information

Counting Primes whose Sum of Digits is Prime

Counting Primes whose Sum of Digits is Prime 2 3 47 6 23 Journal of Integer Sequences, Vol. 5 (202), Article 2.2.2 Counting Primes whose Sum of Digits is Prime Glyn Harman Department of Mathematics Royal Holloway, University of London Egham Surrey

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

In Defense of One-Vs-All Classification

In Defense of One-Vs-All Classification Journal of Machine Learning Research 5 (2004) 101-141 Submitted 4/03; Revised 8/03; Published 1/04 In Defense of One-Vs-All Classification Ryan Rifkin Honda Research Institute USA 145 Tremont Street Boston,

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

A Hybrid Algorithm for Solving the Absolute Value Equation

A Hybrid Algorithm for Solving the Absolute Value Equation A Hybrid Algorithm for Solving the Absolute Value Equation Olvi L. Mangasarian Abstract We propose a hybrid algorithm for solving the NP-hard absolute value equation (AVE): Ax x = b, where A is an n n

More information

Generalized Inverse Computation Based on an Orthogonal Decomposition Methodology.

Generalized Inverse Computation Based on an Orthogonal Decomposition Methodology. International Conference on Mathematical and Statistical Modeling in Honor of Enrique Castillo. June 28-30, 2006 Generalized Inverse Computation Based on an Orthogonal Decomposition Methodology. Patricia

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Minimizing Probing Cost and Achieving Identifiability in Network Link Monitoring

Minimizing Probing Cost and Achieving Identifiability in Network Link Monitoring Minimizing Proing Cost and Achieving Identifiaility in Network Link Monitoring Qiang Zheng and Guohong Cao Department of Computer Science and Engineering The Pennsylvania State University E-mail: {quz3,

More information

Linear Dependence Tests

Linear Dependence Tests Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

Building Ensembles of Neural Networks with Class-switching

Building Ensembles of Neural Networks with Class-switching Building Ensembles of Neural Networks with Class-switching Gonzalo Martínez-Muñoz, Aitor Sánchez-Martínez, Daniel Hernández-Lobato and Alberto Suárez Universidad Autónoma de Madrid, Avenida Francisco Tomás

More information

SUPPORT VECTOR MACHINE (SVM) is the optimal

SUPPORT VECTOR MACHINE (SVM) is the optimal 130 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 Multiclass Posterior Probability Support Vector Machines Mehmet Gönen, Ayşe Gönül Tanuğur, and Ethem Alpaydın, Senior Member, IEEE

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

5 Double Integrals over Rectangular Regions

5 Double Integrals over Rectangular Regions Chapter 7 Section 5 Doule Integrals over Rectangular Regions 569 5 Doule Integrals over Rectangular Regions In Prolems 5 through 53, use the method of Lagrange multipliers to find the indicated maximum

More information

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

More information

Clustering in Machine Learning. By: Ibrar Hussain Student ID:

Clustering in Machine Learning. By: Ibrar Hussain Student ID: Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning K-means Clustering Example of k-means

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

*corresponding author

*corresponding author Key Engineering Materials Vol. 588 (2014) pp 249-256 Online availale since 2013/Oct/11 at www.scientific.net (2014) Trans Tech Pulications, Switzerland doi:10.4028/www.scientific.net/kem.588.249 Remote

More information