Error Correcting Output Codes for multiclass classification: Application to two image vision problems
|
|
|
- Abigail Houston
- 9 years ago
- Views:
Transcription
1 The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012) Error Correcting Output Codes for multiclass classification: Application to two image vision problems Mohammad ali Bagheri Gholam Ali Montazer Sergio Escalera Department of Information Technology, School of Engineering, Tarbiat Modares University, Tehran, Iran Centre de Visio per Computador, Campus UAB, Edifici O, Bellaterra, Barcelona, Spain Abstract Error-correcting output codes (ECOC) represents a powerful framework to deal with multiclass classification problems based on combining binary classifiers. The key factor affecting the performance of ECOC methods is the independence of binary classifiers, without which the ECOC method would be ineffective. In spite of its ability on classification of problems with relatively large number of classes, it has been applied in few real world problems. In this paper, we investigate the behavior of the ECOC approach on two image vision problems: logo recognition and shape classification using Decision Tree and AdaBoost as the base learners. The results show that the ECOC method can be used to improve the classification performance in comparison with the classical multiclass approaches. Index Terms Error Correcting Output Codes (ECOC), logo recognition, shape categorization, multiclass classification, oneversus-one, one-versus-all. I. INTRODUCTION A common task in many real-world pattern recognition problems is to discriminate among instances that belong to multiple classes. The predominant approach to deal with such problems is to recast the multiclass problem into a series of smaller binary classification tasks, which is referred to as class binarization [1]. In this way, two-class problems can be solved by binary classifiers and the results can then be combined so as to provide a solution to the original multiclass problem. Among the proposed methods for approaching class binarization, three widely applied strategies are one-versusone [2], one-versus-all [3] [4], and Error Correcting Output Codes (ECOC) [5]. In one-versus-all, the multiclass problem is decomposed into several binary problems in that for each class a binary classifier is trained to discriminate among the patterns of the class and the patterns of the remaining classes. In the one-versus-one approach, one classifier is trained for each possible pair of classes. In both approaches, the final classification prediction is based on a voting or committee procedure. On the other hand, ECOC is a general framework for class binarization approaches that enhances the generalization ability of binary classifiers [5]. The basis of the ECOC framework is it to decompose a multiclass problem into a larger number of binary problems. In this way, each classifier is trained on a two meta-class problem, where each meta-class consists of some combinations of the original classes. The ECOC method can be broken down into two stages: encoding and decoding. The aim of the encoding stage is to design a discrete decomposition matrix (codematrix) for the given problem. Each row of the codematrix, called codeword, is a sequence of bits representing each class, where each bit identifies the membership of the class to a classifier [6]. In the decoding stage, the final classification decision is obtained based on the outputs of binary classifiers. Given an unlabeled test sample, each binary classifier casts a vote to one of the two meta-classes used in training. This vector is compared to each class codeword of the matrix, and the test sample is assigned to the class with the closest codeword according to some distance measure. Moreover, the ECOC ensemble has shown to reduce the bias and variance errors of the base classifiers [7], [8], [9], In terms of classification performance, Hsu and Lin [10] compared different approaches for multiclass SVM problems, including one-versus-one, one-versus-all, and DDAG. Using ten benchmark datasets, the authors claimed that the oneversus-one method is superior to the other approaches. However, the experiments are not conclusive as only 10 datasets are used, and only two of them have more than seven classes. Furthermore, only one classification algorithm is considered in their experiments. Pedrajas and Boyer s prominent paper [1] later presented an in-depth critical assessment of the three basic multiclass methods. One of the main paper s conclusions states that ECOC and one-versus-one are the best choices for powerful learners and for simpler learners, respectively. In the spite of ability of the ECOC approach on classification of problems with relatively large number of classes, one-versus-one and one-versus-all methods are the two most commonly used in real world applications, mainly because of their clarity in comparison with the ECOC approach. In this paper, we have applied the ECOC approach on two image processing problems: logo recognition and shape classification using two base learners. The results show that the ECOC method can be used to improve the classification performance in comparison with classical multiclass approaches. The rest of this paper is organized as follows: Section 2 briefly reviews the three main class binarization methods. Section 3 includes the description of our applications together with experimental results and the discussion. Finally, Section 4 concludes the paper /12/$ IEEE 508
2 II. RELATED WORK The following briefly describes some notations used in this paper: T = {(x 1,y 1 ), (x 2,y 2 ),...,(x m,y m )}. A training set; where x i R n ; and each label,y i, is an integer belongs to Y = {ω 1,ω 2,...,ω c }, where c is the number of classes h = {h 1,h 2,...,h L } : A set of L binary classifiers. The goal of class binarization methods it to get a feature vector,x, as its input, and to assign it to a class label from Y. As we mentioned before, the methods for multiclass problems can be generally categorized into three approaches: One-versus-all(OVA): The one-versus-all method constructs c binary classifiers, one for each class. The ith classifier, h i,is trained with data from class i as positive instances and all data from the other classes as negative instances. A new instance is classified in the class whose corresponding classifier output has the largest value. So, the ensemble decision function, h, is defined as: y =arg max h i(x) (1) i {1,...,c} One-versus-one (OVO): The one-vs-one method, also called pairwise classification, constructs c(c 1)/2 classifiers [11]. Classifier ij, h ij, is trained using all data from class i as positive instances and all data from class j as negative instances, and disregarding the remaining data. To classify a new instance, x, each of the base classifiers cast a vote for one of the two classes used in its training. Then, the one-vs.- one method applies the majority voting scheme for labeling x to the class with the most votes. Ties are usually broken arbitrarily for the larger class. More complicated combination methods have also been proposed [12] [13] [14]. Ko and Byun proposed a method based on combination of the one-versus-all method and a modification of the oneversus-one method using SVM as a base learner [15]. This method first obtain the top two classes whose corresponding classifiers have the highest confident based on the outputs of all one-versus-all classifiers. In a recent paper [16], very similar idea is presented and named A&O. However, in both these methods, the learning algorithm which finds the two most likely classes is the same as the final classification algorithm. Consequently, it is very likely that some classification errors will be common, arising from the limitation of base learner on certain patterns. Error Correcting Output Codes (ECOC):The basis of the ECOC framework consists of designing a codeword for each of the classes [5]. This method uses a matrix M of {1, 1} values of size c L, where L is the number of codewords codifying each class. This matrix is interpreted as a set of L binary learning problems, one for each column. That is, each column corresponds to a binary classifier, called dichotomizer h j, which separates the set of classes into two metaclasses. Instance x, belonging to class i, is a positive instance for the jth classifier if and only if M ij =1and is a negative instance if and only if M ij = 1 [5]. Table 1 shows a possible binary coding matrix for a 4-class problem ω 1,...,ω 4 with respective TABLE I A SAMPLE ECOC MATRIX Class h 1 h 2 h 3 h 4 h 5 h 6 ω ω ω ω codewords that uses 6 dichotomizers h 1,,h 6. In this table, each column is associated with a dichotomy classifier,h j, and each row is a unique codeword that is associated with an individual target class. For example, h 3 recognizes two metaclasses: original classes 1 and 4 form the first meta-class, and the other two form the second one. When testing an unlabeled pattern, x, each classifier outputs a 0 or 1, creating a L long output code vector. This output vector is compared to each codeword in the matrix, and the class whose codeword has the closest distance to the output vector is chosen as the predicted class. The process of merging the outputs of individual binary classifiers is usually called decoding. The most commonly decoding methods are the Hamming distance. This method looks for the minimum distance between the prediction vector and codewords. The ECOC method was then extended by Allwein et al. [17] using a coding matrix with three values, {1, 0, 1}, where the zero value means that a given class is not considered in the training phase of a particular classifier. In this way, a class can be omitted in the training of a particular binary classifier. This extended codeword is denominated sparse random code and the standard codes (binary ECOC) were named dense random codes. Thanks to this unifying approach, the classical oneversus-one method can be represented as an ECOC matrix: the coding matrix has ( n k) columns, in which each column corresponds to a pair of classes (ω i,ω j ). For this column, the matrix has +1 in row i, -1inrowj, and zeros in all other rows. Note that in both dense and sparse coding styles, a test codeword cannot contain the zero value since the output of each dichotomizer is 1, +1. III. EXPERIMENTAL COMPARISON A. Experimental settings In order to present the results, first, we discuss the experimental settings of the experiments.we compared the Dense and Sparse ECOC method designs with classical multiclass classification methods including OVO and OVA on two real world machine vision applications. We considered random codes of 10log 2 (c) and 15log 2 (c) bits for dense and, respectively [17]. The class of an instance in the ECOC schemes is chosen using the Exponential Loss- Weighted (ELW) decoding [16]. In this study, two base learners were chosen: Gentle AdaBoost with 50 runs of decision stumps [20], and a classification and regression tree (CART) with the Gini-index as a split criterion. The experiments were all implemented in MATLAB software. For performance evaluation, we utilized 10-fold cross-validation to improve the reliability of the results. In 509
3 Fig. 1. Some examples of labeled shapes in the MPEG7 dataset order to have a fair comparison, the training and test sets of all methods were the same for each repetition of the experiments. B. Image vision applications 1) Shape categorization: The first real application on our experiments was shape classification, in which we used the MPEG7 dataset 1. This dataset consists of C =70classes (bone, chicken, cellular phone, etc.) with 20 instances per class, which represents a total of 1400 object images. All samples were described using the Blurred Shape Model descriptor [18]. Figure 1 shows a couple of samples for some categories of this dataset. 2) Logo Recognition: The proposed approach was then used in our logo recognition problem. The logo images were based on a database of logos which contains pure pictorial logos (e.g. logo 60, Fig. 2), text-like logos (e.g. logo 30, Fig. 2), and text-graphics mixture logos (e.g. logo10, Fig. 3). The complete dataset contains 105 images and was obtained from the database distributed by the Document Processing Group, Center for Automation Research, University of Maryland. The logos in the dataset have very different sizes; the smallest one is pixels and the largest one is pixels. This dataset provides only a single instance of 105 individual logo classes. For each logo class, some artificially degraded images were generated by using the noise models described in the following subsection. The recognition process should be translation, scale and rotation invariant. Thus, three normalization steps were applied to the raw images before feature extraction step. First, we shifted the image in order to locating the centroid of the white pixel locations at the image center, which gives us translational invariance. Second, we rotate the image around the centroid so that its major principal axis is aligned with the horizontal, which gives us rotational invariance. Third, we resize the image so that the bounding box of the logo symbol is fixed at pixels. Noise models We investigated the robustness of the methods when the logos are corrupted using two different image degradation methods: 1) Gaussian noise (a global degradation as shown in Fig.3a) 1 MPEG7 Repository dataset: latecki/ and 2) spot noise (a local degradation as shown in Fig. 3b). For each method, we degraded the images in the database, varying the amount of degradation in equally spaced steps. We generated a set of 40 examples for each class of logo images by adding both Gaussian white noise of mean = [0, 0.1, 0.2,...,0.5] and var = [0, 0.01,...,0.05] and spot noise of different size (width =10, 15,...,30pixels). Feature extraction Generally, there are two approaches in shape description and feature extraction: contour-based vs. region-based [19]. Region based methods considers all the pixels within a shape region to obtain the shape features, rather than only use boundary information as in contour-based techniques. Region-based methods are generally less sensitive to noise and variations; are more robust as they use all the shape information available; provide more accurate retrieval; and can cope well with shape defection, which is a common problem for contour-based shape representation methods. Especially, shape content is more important than the contour features in our application. In this paper, we used one of the commonly used region-based shape descriptor: Geometric moment invariants, which can be defined as follows: m pq = x p y q f(x, y)p, q =0, 1, 2,... (2) x y In our experiments, we derived all combinations of lower order moments (p, q =0, 1, 2, 3), which has the desirable properties of being invariant under scaling, translation and rotation. C. Experimental results and analysis The average accuracy of all considered methods over 20 runs is illustrated in Figure 4 and Fig. 5 using MPEG7 and Logo datasets, respectively. These figures show the improved ability of the ECOC method in terms of classification accuracy. In order to show the superiority of the ECOC method, statistical analysis is necessary. According to the recommendations of Demsar [20], we consider the use of non-parametric tests. Non-parametric tests are safer than parametric tests, such as ANOVA and t-test, since they do not assume normal distribution or homogeneity of variance. In this study, we employ the Iman-Davenport test [21]. If there are statistically 510
4 (a) Logo60 (b) Logo30 (c) Logo10 Fig. 2. Some examples of logos in the database used in our experiments (a) Gaussian noise (b) spot noise Fig. 3. Examples of noisy logos patterns derived by applying the Gaussian and spot noise model (a) CART Classifier Fig. 4. Average Accuracy of different methods using MPEG7 dataset significant differences in the classification performance, then we can proceed with the Nemenyi test [20] as a post-hoc test, which is used to compare all methods with each other. To do that, we first rank competing methods for each dataset. The best performing method getting the rank of 1, the second best ranked 2, etc. The method s mean rank is obtained by averaging its ranks across all experiments. Then, we use the Friedman test [18] to compare these mean ranks to decide whether to reject the null hypothesis, which states that all considered methods have equivalent performance. Iman and Davenport [21] found that this statistic is undesirably conservative, and proposed a corrected measure. Applying this method, we can reject the null hypothesis, and show that there exists significant statistical difference among the rival methods. Further, to compare rival methods with each other, we apply the Nemenyi test, as illustrated in Fig. 6 and Fig.7. In these figures, the mean rank of each method is indicated by a square. The horizontal bar across each square shows the critical difference. Two methods are significantly different if their corresponding average ranks differ by at least the critical difference value. i.e., their horizontal bars are not overlapping. The results in Fig. 4 and Fig. 5, along with the statistical tests presented in Fig. 6 and Fig. 7 indicate that in general ECOC methods receive the best performance among all strategies. The other finding is that the OVA method performs poorly in the present experiments for the considered classifiers, especially when the number of classes increases. In addition, the overall accuracy of all methods using a fixed classification algorithm generally decreases as the number of classes increases. This finding is expected as the difficulty of the multi-class problem increases when there are a large number of classes. IV. CONCLUSIONS In this paper, we investigated the behavior of Error Correcting Output Codes on two image vision problems: logo recognition and shape classification using CART decision tree and AdaBoost as the base learners. These results indicate that even though the ECOC approach has received much less attention in the applied literature than classical OVO and OVA, it can significantly improve classification accuracy. An analysis of the results shows a somewhat clearer picture. Using Decision Tree as the base learner, we found significant differences between ECOC and classical methods for both dense and sparse schemes. The ECOC methods are generally able to outperform OVO and OVA. Comparing the two schemes of ECOC, we can see that both achieve a similar performance, whereas is slightly inferior to on the current datasets. Using AdaBoost as the 511
5 (a) CART Classifier Fig. 5. Average Accuracy of different methods using Logo dataset (a) CART Classifier Fig. 6. Comparison results of rival methods based on the Nemenyi test using MPEG7 dataset (a) CART Classifier Fig. 7. Comparison results of rival methods based on the Nemenyi test using Logo dataset base learner, we can see that achieved the best accuracy results. One can also see that the relative performance of OVO,, and tends to be closer when the number of classes increases, which is consistent with findings from [1]. However, note that for a large number of classes the sub-linear number of classifiers of Dense and Sparse ECOC designs is considerably lower than OVO and OVA approaches. REFERENCES [1] N. Garcia-Pedrajas and D. Ortiz-Boyer, An empirical study of binary classifier fusion methods for multiclass classification, Information Fusion, vol. 12, no. 2, pp , [2] T. Hastie and R. Tibshirani, Classification by pairwise coupling, in Proceedings of the 1997 conference on Advances in neural information processing systems 10, ser. NIPS 97. Cambridge, MA, USA: MIT Press, 1998, pp [3] P. Clark and R. Boswell, Rule induction with cn2: Some recent improvements, in Machine Learning EWSL-91, ser. Lecture Notes in Computer Science, Y. Kodratoff, Ed. Springer Berlin / Heidelberg, 1991, vol. 482, pp
6 [4] R. Anand, K. Mehrotra, C. K. Mohan, and S. Ranka, Efficient classification for multiclass problems using modular neural networks, Neural Networks, IEEE Transactions on, vol. 6, no. 1, pp , [5] T. Dietterich and G. Bakiri, Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research, vol. 2, p , [6] S. Escalera, O. Pujol, and P. Radeva, Separability of ternary codes for sparse designs of error-correcting output codes, Pattern Recognition Letters, vol. 30, no. 3, pp , [7] E. Kong and T. Dietterich, Error-correcting output coding corrects bias and variance, in Machine Learning: Proceedings of the Twelfth International Conference on Machine Learning, A. Prieditis and J. Lemmer, Eds., 1995, pp [8], Why error-correcting output coding works with decision trees, Technical Report, Department of Computer Science, Oregon State University, Corvallis, OR., Tech. Rep., [9] T. Windeatt and R. Ghaderi, Coding and decoding strategies for multiclass learning problems, Information Fusion, vol. 4, no. 1, pp , [10] H. Chih-Wei and L. Chih-Jen, A comparison of methods for multiclass support vector machines, Neural Networks, IEEE Transactions on, vol. 13, no. 2, pp , [11] S. Knerr, L. Personnaz, and G. Dreyfus, Single-layer learning revisited: a stepwise procedure for building and training a neural network, in Neurocomputing: Algorithms, Architectures and Applications, J. Fogelman, Ed. Springer-Verlag, [12] S.-H. Park and J. Fürnkranz, Efficient pairwise classification, in Proceedings of the 18th European Conference on Machine Learning (ECML 2007, Warsaw, Poland), J. N. Kok, J. Koronacki, R. López de Mántaras, S. Matwin, D. Mladenić, and A. Skowron, Eds. Springer- Verlag, 2007, pp [13] T.-F. Wu, C.-J. Lin, and R. C. Weng, Probability estimates for multiclass classification by pairwise coupling, J. Mach. Learn. Res., vol. 5, pp , [14] M. Bagheri, Q. Gao, and S. Escalera, Efficient pairwise classification using local cross off strategy, in 25th Canadian conf. on Artificial Intelligence, Toronto, Canada, 2012, p. to appear. [15] J. Ko and H. Byun, Binary classifier fusion based on the basic decomposition methods, in Multiple Classifier Systems, ser. Lecture Notes in Computer Science, T. Windeatt and F. Roli, Eds. Springer Berlin / Heidelberg, 2003, vol. 2709, pp [16] N. Garcia-Pedrajas and D. Ortiz-Boyer, Improving multiclass pattern recognition by the combination of two strategies, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 6, pp , [17] E. L. Allwein, R. E. Schapire, and Y. Singer, Reducing multiclass to binary: a unifying approach for margin classifiers, Journal of Machine Learning Research, vol. 1, pp , [18] S. Escalera, A. Forns, O. Pujol, P. Radeva, G. Snchez, and J. Llads, Blurred shape model for binary and grey-level symbol recognition, Pattern Recognition Letters, vol. 30, no. 15, pp , [19] D. Zhang and G. Lu, Review of shape representation and description techniques, Pattern Recognition, vol. 37, no. 1, pp. 1 19, [20] J. Demsar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, vol. 7, pp. 1 30, [21] R. Iman and J. Davenport, Approximations of the critical regions of the friedman statistic, Communications in Statistics, vol. 6, pp ,
Sub-class Error-Correcting Output Codes
Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat
Which Is the Best Multiclass SVM Method? An Empirical Study
Which Is the Best Multiclass SVM Method? An Empirical Study Kai-Bo Duan 1 and S. Sathiya Keerthi 2 1 BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 [email protected]
Survey on Multiclass Classification Methods
Survey on Multiclass Classification Methods Mohamed Aly November 2005 Abstract Supervised classification algorithms aim at producing a learning model from a labeled training set. Various
Multi-class Classification: A Coding Based Space Partitioning
Multi-class Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique,
Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each
Large Margin DAGs for Multiclass Classification
S.A. Solla, T.K. Leen and K.-R. Müller (eds.), 57 55, MIT Press (000) Large Margin DAGs for Multiclass Classification John C. Platt Microsoft Research Microsoft Way Redmond, WA 9805 [email protected]
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
L25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
SUPPORT VECTOR MACHINE (SVM) is the optimal
130 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 Multiclass Posterior Probability Support Vector Machines Mehmet Gönen, Ayşe Gönül Tanuğur, and Ethem Alpaydın, Senior Member, IEEE
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Decompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
CLASS imbalance learning refers to a type of classification
IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS, PART B Multi-Class Imbalance Problems: Analysis and Potential Solutions Shuo Wang, Member, IEEE, and Xin Yao, Fellow, IEEE Abstract Class imbalance problems
Chapter 2 Multi-class Classification in Image Analysis via Error-Correcting Output Codes
Chapter 2 Multi-class Classification in Image Analysis via Error-Correcting Output Codes Sergio Escalera, David M.J. Tax, Oriol Pujol, Petia Radeva, and Robert P.W. Duin Abstract. A common way to model
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
A Learning Algorithm For Neural Network Ensembles
A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
Training Methods for Adaptive Boosting of Neural Networks for Character Recognition
Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal
New Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Ensemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
The Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
An Experimental Study on Rotation Forest Ensembles
An Experimental Study on Rotation Forest Ensembles Ludmila I. Kuncheva 1 and Juan J. Rodríguez 2 1 School of Electronics and Computer Science, University of Wales, Bangor, UK [email protected]
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
Classifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
SVM Based License Plate Recognition System
SVM Based License Plate Recognition System Kumar Parasuraman, Member IEEE and Subin P.S Abstract In this paper, we review the use of support vector machine concept in license plate recognition. Support
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Impact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
Part-Based Recognition
Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
Multi-Class Active Learning for Image Classification
Multi-Class Active Learning for Image Classification Ajay J. Joshi University of Minnesota Twin Cities [email protected] Fatih Porikli Mitsubishi Electric Research Laboratories [email protected] Nikolaos Papanikolopoulos
More Local Structure Information for Make-Model Recognition
More Local Structure Information for Make-Model Recognition David Anthony Torres Dept. of Computer Science The University of California at San Diego La Jolla, CA 9093 Abstract An object classification
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
Using One-Versus-All classification ensembles to support modeling decisions in data stream mining
Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Patricia E.N. Lutu Department of Computer Science, University of Pretoria, South Africa [email protected]
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Microsoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
Signature verification using Kolmogorov-Smirnov. statistic
Signature verification using Kolmogorov-Smirnov statistic Harish Srinivasan, Sargur N.Srihari and Matthew J Beal University at Buffalo, the State University of New York, Buffalo USA {srihari,hs32}@cedar.buffalo.edu,[email protected]
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Galaxy Morphological Classification
Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,
Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)
260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Model Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
Risk pricing for Australian Motor Insurance
Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model
Machine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
Binary Image Scanning Algorithm for Cane Segmentation
Binary Image Scanning Algorithm for Cane Segmentation Ricardo D. C. Marin Department of Computer Science University Of Canterbury Canterbury, Christchurch [email protected] Tom
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
How To Identify Noisy Variables In A Cluster
Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Roulette Sampling for Cost-Sensitive Learning
Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION Saurabh Asija 1, Rakesh Singh 2 1 Research Scholar (Computer Engineering Department), Punjabi University, Patiala. 2 Asst.
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Error-Correcting Output Codes for Multi-Label Text Categorization
Error-Correcting Output Codes for Multi-Label Text Categorization Giuliano Armano 1, Camelia Chira 2, and Nima Hatami 1 1 Department of Electrical and Electronic Engineering University of Cagliari Piazza
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
Supervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition
Send Orders for Reprints to [email protected] The Open Electrical & Electronic Engineering Journal, 2014, 8, 599-604 599 Open Access A Facial Expression Recognition Algorithm Based on Local Binary
Introduction To Ensemble Learning
Educational Series Introduction To Ensemble Learning Dr. Oliver Steinki, CFA, FRM Ziad Mohammad July 2015 What Is Ensemble Learning? In broad terms, ensemble learning is a procedure where multiple learner
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
