Multi-class Classification: A Coding Based Space Partitioning
|
|
|
- Alison Stephens
- 10 years ago
- Views:
Transcription
1 Multi-class Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique, Battle Bât. A, 7 route de Drize, 7 Carouge, Switzerland Institute of Computational Intelligence, Czȩstochowa University of Technology Al. Armii Krajowej 6, 4-00 Czȩstochowa, Poland Abstract. In this work we address the problem of multi-class classification in machine learning. In particular, we consider the coding approach which converts a multi-class problem to several binary classification problems by mapping the binary labeled space into several partitioned binary labeled spaces through binary channel codes. By modeling this learning problem as a communication channel, these codes are meant to have error correcting capabilities and thus performance improvement in classification. However, we argue that conventional coding schemes designed for communication systems do not treat the space partitioning problem optimally, because they are heedless of the partitioning behavior of underlying binary classifiers. We discuss an approach which is optimal in terms of space partitioning and advise it as a powerful tool towards multi-class classification. We then review the LDA, a known method for multi-class case and compare its performance with the proposed method. We run the experiments on synthetic data in several scenarios and then on a real database for face identification. Keywords: Multi-Class Classification, Error Correcting Output Codes, Support Vector Machines, Linear Discriminant Analysis INTRODUCTION The general multi-class classification is a fundamentally important problem that can find many different applications in different domains. Classification of alphabet letters from a handwritten document, phoneme segmentation in speech processing, face identification and content identification of multimedia segments are among the application examples of this problem. Although studied for many years, it is still considered as an open issue and none of the many proposed methods achieve superior performance under all conditions for different applications. In binary classification where there are only two classes, many methods exist that are well developed to tackle different classification scenarios. However, they are designed naturally to address the binary case and their extension to multiclass is either not possible or is not straightforward. One way to accommodate
2 Authors Suppressed Due to Excessive Length for the multi-class case and still use the existing powerful binary algorithms is to break a multi-class problem to several binary problems and combine the results together. This method is done in several ways. Among the existing methods that convert the multi-class problem to several binary ones there is the famous one-vs-all approach which considers each of the classes in each binary classification problem to be classified against the rest. This requires to run the binary classification algorithm M times. In fact, because the situation is not symmetric, it might be the case that these binary problems are very different from each other in nature which might be problematic in practice. Another method would be to consider each pair of classes together. This method, known as one-vs-one approach, requires M(M ) binary classifications to capture all the combinations between the data. These methods, while could work for small M s, will seriously fail as M grows because the number of binary classifications to be carried out would be intractable. The fact that in order to identify M objects would naturally require log M binary descriptors has motivated an extensive research in the field. A significant achievement was due to Dietterich [] who made an analogy between the problem of classification and communication channel and thus suggested to consider it as a channel coding problem. This idea was followed by further research and study, however, no significant results have been reported. An essential element of Dietterich approach is the selection of coding matrix. A coding matrix design was proposed in [] which was shown to be optimal in terms of space partitioning for the problem of content identification of objects. In this paper we relax the assumption of identification where in each class there is only one instance and instead we address the general problem of multi-class classification where there is an arbitrary number of classes and arbitrary number of instances in each class as the training examples. We compare it with the Linear Discriminant Analysis (LDA), an efficient multi-class classification schemes in different classification scenarios and show that it could be efficiently used also for the general problem of multi-class classification. LDA is a powerful technique in statistics that aims at finding a set of transformed features in a reduced dimension space such that they give best discrimination among classes. The use of LDA in machine learning, although not popularized, but has been extensively studied at least for the binary case. In an experimental study in [], the use of LDA for multi-class classificaion has been investigated. It is shown that this method, although with certain limitations regarding its assumption on the distributions of classes, can serve as an efficient classification method also for the multi-class case. The paper is organized as follows: In Section we study the problem of multi-class classification. We review the coding approach for multi-class classification and introduce a method to optimally solve it. We then consider the LDA method towards multi-class classification. The experimental results are reported in Section. We finally summarize the paper in Section 4.
3 Multi-class Classification: A Coding Based Space Partitioning Multi-class Classification In this section we first consider the binary classification and the SVM approach to solve it. We then define the problem of multi-class classification and explain an important approach towards solving this problem known as Error Correcting Output Codes (ECOC)[]. Finally, we consider the design method introduced in [] as the basis for our later discussions.. Binary Classification Using Support Vector Machines Here we consider the general formulation of Support Vector Machine (SVM) for binary case. SVM is considered to be one of the most successful machine learning approaches for the binary supervised learning classification. When the two classes are linearly separable, the SVM algorithm tries to find a separating hyperplane such that it has the maximum margin from the closest instances of the two classes. The closest instances from each class are also called support vectors. This criterion, under certain conditions, is proved to guarantee the best generalization performance. Concretely, given the training instances as x(i) s with i s and x(i) R N and their binary labels as g(x(i)) {, +} for each of x(i) s, we seek to find a hyperplane in R N characterized by w R N and b such that it correctly classifies all the examples while it has the largest margin. Solving this problem is formulated as the optimization problem of (). The objective function is maximizing the margin and the constraints try to ensure that the training instances are correctly classified, i.e., their predicted labels are the same as their true labels. minimize w wt w subject to g(x(i)) ( w T x(i) + b ) (), i =,..., s. This optimization problem, or its other variants are treated using quadratic programming techniques and many powerful packages are designed to practically handle it. For the case where the training instances are not linearly separable, the above formulation can be modified by using the kernels technique. The idea is to use kernels which are pre-designed nonlinear functions that map the input space to a higher dimensional space. It could be the case that the instances in the new space are linearly separable. Therefore, () can be modified and used to find the best separating boundary between the instances in the higher dimensional space. A useful method to gain insight into the spatial geometry of the instances could be derived from Voronoi diagrams. A Voronoi diagram is a way of partitioning the space into convex cells where in each cell there is only one instance from the given examples. The cells are chosen such that all the points in the cell are closest to the corresponding instance than other instances in space. Based on this definition, one could easily confirm that the SVM boundaries, when trained with enough parameters, should in fact follow the Voronoi cells of the set of support vectors. Figure shows this phenomenon.
4 4 Authors Suppressed Due to Excessive Length Fig.. SVM boundaries (solid lines) following the Voronoi cells (dashed lines) of the support vectors It is important to mention that in all of these analyses, we are using the l - norm as the measure of distance. Other norms could also be studied which could offer advantages for some applications where the distortions have a non-gaussian character.. Multi-class Classification: Problem Formulation A set of training instances are given which contain features, x(i) R N and their labels g(x(i)) s belonging to any of the M classes. These labels are assumed to have been generated through an unknown hypothetical mapping function g : R N {,,..., M} that we want to approximate based on the given labeled training examples. The labels, unlike the common cases in machine learning, belong to a set of M members rather than only two classes. Because most of the existing classification algorithms are naturally designed for M = classes, to generalize them for multi-class cases usually requires to consider the problem as several binary classification problems.. Coding Approach for Multi-Class Classification The main idea behind the Error Correcting Output Codes (ECOC) approach to solve the multi-class problem is to consider it as a communication system where the identity of the correct output class for a given unlabeled example is being transmitted over a hypothetical channel which, due to the imperfections of the training data, the errors in the learning process and non-ideal choice of features is considered to be noisy []. Therefore, it would make sense to try to encode the classes using error correcting codes and transmit each of the bits through the channel, i.e., to run the learning algorithm, so that we would be able to cope with the errors in each individual binary classifier. Figure illustrates this idea.
5 Multi-class Classification: A Coding Based Space Partitioning 5 Sent Message length n E Coded String length l n Ch. Noisy String length l n D Received Message length n Class Identity m M E Class Code C(m, j) j l T Noisy Class b y D Class Approx. ˆm Fig.. Communication system and its classification equivalent: E and D are the Encoding and Decoding stages in communication, respectively. T is the training procedure, C is the coding matrix, b y is the derived binary code for the test example. Concretely, we assign randomly to each of the M classes a row of a coding matrix C (M l). Then we run a binary learning algorithm on all the training samples for every column of C so that we will have l mappings from R N, the original data space to the one dimensional binary space {0, }, or equivalently, one mapping rule from R N to the l-dimensional binary space {0, } l. Given a new unlabeled example y, we map it from R N to the l-dimensional binary space through the same mapping rule learned as above. We then compare this binary representation b y with the items in the database, or equivalently the rows of C by minimum Hamming distance rule or any other relevant decoding method. Coding Matrix Design An important concern in this method is the choice of the coding matrix. Using the techniques from coding theory the main focus of the researchers has been to design elements of C optimally to be able to combat against noise while keeping the rate of communication as close to as possible, which implies fewer number of binary classifications. Through this end, most design strategies were in essence aiming to maximize the Hamming distance between the rows of C[4]. Another approach in the design of this matrix is to assign the elements randomly. It is shown that this random assignment approach could do as good as the previous coding-oriented designs [5, 6]. It is important to mention that in the design of channel codes, e.g., famous codes like BCH or LDPC, it is generally assumed that the channel noise, in our case the bit flippings of b y due to the classification process, is i.i.d., or at least is independent of the input codewords. In the case of classification where we have a hypothetical channel representing the behavior of the learning algorithm, however, we can observe that these bit flippings, or equivalently the channel noise samples are highly correlated with the input codewords. Therefore, many of the information theoretic explanations based on notions like typicality could no longer be valid. To address this issue, we consider this problem from a learning theoretic viewpoint.
6 6 Authors Suppressed Due to Excessive Length Optimal Coding As mentioned earlier in section, a coding design was suggested in [] which was shown to be optimal in terms of space partitioning. An intuitive explanation for this fact is presented below (a) instances (M = 8 and p = 5) (b) binary classifications: SVM boundaries (solid) and Voronoi regions (dashed) Fig.. Instances of the original multi-class problem and the corresponding binary classifications Figure (a) shows the class instances and figure (b) shows the the corresponding decision boundaries of a binarized multi-class classification problem with M = 8 and p = 5 two dimensional instances for every class. As is clear from figure (b) and was discussed earlier in, the decision boundaries follow the outer Voronoi cells of each class. Notice that the choice of C dictates that in each of the l classifications, some of the adjacent classes be grouped together and labeled as 0 and some be grouped as and also this labeling and grouping could be changed in each classification. Therefore, in order to design an optimal coding matrix, we ask the question: Which choice of the elements of C guaranties learning at least one time these relevant Voronoi regions while avoiding redundant boundary learning? The first requirement links with having good performance via optimal class separability and the latter implies maintaining the number of columns (classifiers) as small as possible, i.e, having maximal rate of communication. Equivalently, the optimal assignment of codewords should work as a unique and non-redundant encoding of each class with the decision boundaries of SVM classifier following the Voronoi regions between the corresponding classes. This optimal design, in fact happens when the rows of C are chosen simply as the modulo- equivalent of the the class numbers. Therefore, as an example, in a 0-class problem, the codewords will be of length l = log M = 5, the st class should be encoded as the binary string 00000, and the 6 th class should be encoded as 000.
7 Multi-class Classification: A Coding Based Space Partitioning 7 Intuitively, considering each of the two adjacent classes, this approach assigns to each of them a codeword where they differ in at least one position. This satisfies our first objective because it guaranties that each pair of rows of C that could be assigned to two adjacent classes finds at least one position where the bit values are different. Therefore, they will be mapped into two different classes at least once which implies that the decision boundary will cross between them at least once. With this approach, the second requirement is also fully satisfied. Since the length of each codeword, i.e, the number of columns of C equals l = log M which is minimal and the rate of communication is exactly. In essence, the optimality of this approach is based on the fact that we learn the relevant Voronoi regions optimally. In fact learning the Voronoi regions of the support vectors is also equivalent to the maximum likelihood rule under Gaussian assumption for the classes. It is important to mention that we are using SVM classifiers with nonlinear kernels that are properly tuned. Other learning rules do not necessarily learn these Voronoi regions accurately and also the improper choice of kernel parameters does not guarantee learning these regions properly. While ML decoding rule requires the knowledge of Voronoi regions for all classes, in our approach, these regions are distributed among the l binary classifiers and are learned implicitly. The fusion of the results of all binary classifiers, equivalently produces the entire coverage for all classes defined by their Voronoi regions. Therefore, the results of these fused binary decoding scheme should coincide with the optimal ML decoder. It is also very important to mention that given a test example to be classified, unlike any other method, the complexity of decoding in this approach comprises only the SVM functional evaluations and does not incur any Hamming distance computation or decoding computations as in LDPC or other coding methods. The reason is due to the fact that a produced codeword is directly referring to a storage point in memory, hence no computation at all. However, the SVM functional evaluations involved could be very high if the nonlinear kernels are too complex..4 Linear Discriminant Analysis In this par,t we review the Discriminant Analysis approach which is a collection of techniques in statistical pattern recognition that try to find a set of discriminating features from the data by transforming them into a lower dimensional space. With their small complexity, their powerful behaviour in many classification scenarios and their theoretical importance, they are considered to be a good method of choice for many problems. Here we consider the famous LDA for the multi-class case. Multi-class LDA Formulation Given the training dataset as X = [x(), x(),, x(s)] T with instance x(i) R N, labeled as belonging to any of the M classes, c i s, the objective is to find a transformation W R N (M ) such that it maps the data
8 8 Authors Suppressed Due to Excessive Length into an M dimensional space X = XW such that the data could be discriminated more easily. Equivalently, it draws M hyperplanes between the M classes. Given a testing sample y, we transform it to the new space as ỹ = y T W. We then find the transformed class center µ i = µ T i W where µ i = p i x c i x, such that it has the smallest Euclidean distance to ỹ. The sample y is then classified to c i. The transform W should be found such that in the transformed space, the inter-class distances are the highest while the intra-class distances between the instances are the lowest, which is equivalent to having the maximal distinguishability between the classes. Concretely, we define the intra-class scatter matrix for class c i as: Ŝ i = x c i (x µ i )(x µ i ) T () and the total intra-class scatter matrix as: Ŝ intra = M Ŝ i. () i= The inter-class scatter matrix is defined as: Ŝ inter = M p i (µ i µ)(µ i µ) T, (4) i= where µ = s M i= p iµ i. Jointly maximizing the inter-class scatter of the data in the transformed space and minimizing their intra-class scatters is equivalent to solving the generalized eigenvalue problem in (5), Ŝ inter W = λŝintraw (5) Based on the W derived from Eq. 5, the test data y is classified as: ĉ = argmin yw µ i W, (6) i which is searching for the closest class center to the test example in the transformed domain. LDA for Multi-Class Classification LDA, along with its other variants could be considered as reliable methods for multiclass classification. The computational complexity of this approach could be significantly lower than other methods. However, there are certain limitations with this method. Unlike the SVM classifier discussed above which only considers the boundary instances, LDA assumes an underlying distribution for the data. Therefore, in order for the distribution estimation to be accurate, the number of training
9 Multi-class Classification: A Coding Based Space Partitioning 9 instances, compared to the data dimension should not be small. This could be a serious limiting factor, as it could be the case in many applications that the training data are limited. Moreover, the underlying distribution under which the method is successful is assumed to be Gaussian. Therefore, deviations from Gaussianity could seriously affect the performance. We investigate this fact in our experimental results. It is also important to point out that LDA tries to maximize distinguishability and does not necessarily consider minimizing the training error. For the multiclass case, in the eigenvalue decomposition problem of (5), pairs of classes between which there are large distances, completely dominate the decomposition. Therefore, in the transformed space, the classes with smaller distances will be overlapped []. Experimental Results In this section we experimentally study the discussed approaches under different practical scenarios. Since there are many different parameters involved, especially for the multi-class case, to compare different algorithms is not trivial. The data dimension, the size of the training set, the way the instances of different classes are mixed with each other and the underlying distributions of the classes could significantly affect the classification scenario. Especially, when we generalize from binary to the multi-class case, we could see that these parameters and their relative values could radically change the scenario. Therefore, instead of using only the current available databases which are very limited in terms of being rich in having different combinations of the parameters and therefore capturing a meaningful generality of an arbitrary scenario, we also experimented with synthetic databases. This way we can have the possibility to arbitrarily change these parameters. We first experiment the case where we generate the data from a Gaussian distribution where we can have control over inter-class and intra-class variances. We then experiment the case where each class is generated as a mixture of Gaussians. We finally report the result of comparison over the ExtendedYaleB [7] database for face recognition.. Synthetic Data We perform the experiments on the synthetic data in two parts. First we consider Gaussian distribution for the classes. In order to imitate more realistic solutions, we then experiment with the cases where the classes follow mixtures of several Gaussians. Gaussian Data We generate centers of classes as x c (i) R N with X c N (0, σinter I N) and i M. We then generate the j th instance of class c i as x(j) = x c (i) + Z intra with Z intra N (0, σintra I N). The test data are generated as Additive White Gaussian Noise with covariance matrix σzi N added to the
10 0 Authors Suppressed Due to Excessive Length centers of each class. We measure the performance of algorithms with accuracy which is defined as the ratio of all the correctly classified items to the whole test set size. Figure 4(a) depicts the results of classification when the number of classes were chosen as M = 6, the data dimension as N = 5 and the inter-class and intra-class scatters were chosen as σinter = 0 and σ intra =, respectively. The σ independent variable was the SNR which is defined as SNR = 0 log inter 0. σz The comparison was made between four methods. The first method is to classify a test item based on its Euclidean distance closeness to the centers of each of the classes, estimated from the training data. This approach, given the Gaussian assumption for each of the classes, is optimal in terms of Maximum-Likelihood rule. The second method is the multi-class LDA. The third method is the discussed ECOC-based space partitioning with optimal coding matrix design. The forth method is ECOC when the matrix was randomly chosen. For the two last methods, the number of classifiers were l = 4 and the binary classifiers were SVM with Gaussian kernels. The number of train and test instances per each of the classes were chosen as 00 in all experiments. Figure 4(b) is the result of the same experiment but with a different ratio for inter-class and intra-class scatters. They were chosen as σinter = and σ intra =. As can be seen from figure 4(b), Accuracy Accuracy Distance to Class Means LDA Optimal ECOC Partitioning Random ECOC Partitioning Distance to Class Means LDA Optimal ECOC Partitioning Random ECOC Partitioning SNR (a) σ inter = SNR (b) σ inter = Fig. 4. Accuracy vs. SNR for Gaussian Data (M = 6, N = 5) because the instances of each class are very scattered within the instances of the neighboring classes, the underlying binary SVM with Gaussian kernel tends to hugely overtrain on these training data, thus they fail to generalize on the test set. It is important to mention that the good behavior of the first two methods is due to the unrealistic Gaussian setup considered. We will see in the next ex-
11 Multi-class Classification: A Coding Based Space Partitioning periments that this performance could radically decrease as we change the data distribution to more realistic ones. Gaussian Mixtures In this part every class is assumed to have 5 Gaussian clouds from which the instances are generated. The class centers are generated as before. The centers of Gaussian clouds for class i, centered on x(i) are generated from another Gaussian distribution with covariance matrix σ cloud I N. The class instances are then generated as Gaussians which are centered on the cloud centers with covariance σ intra. Figure 5(a) is the result of experiment when there were M = 6 classes with dimension N = 5. The variances were σ inter =, σ cloud = and σ intra = and each of the 5 clouds there were 5 training and test examples. Accuracy Distance to Class Means LDA 0. Optimal ECOC Partitioning Random ECOC Partitioning SNR (a) σ cloud = Accuracy Distance to Class Means LDA Optimal ECOC Partitioning Random ECOC Partitioning SNR (b) σ cloud = 0 Fig. 5. Accuracy vs. SNR for Mixture of Gaussian Data (M = 6, N = 5) Figure 5(b) is the result of the same experiment with a more scattered clouds than the previous experiment, i.e., σcloud = 0 was changed. As can be seen from 5(b), the first two methods are obviously sub-optimal for this scenario. However, the discussed ECOC-based space partitioning method can successfully cope with it. Also, in all of the experiments, it is clearly seen that random coding, when the number of its binary classifiers is small, cannot provide a good performance. To resolve this, also to avoid the fluctuating results seen in all the figures, one should add more columns to the coding matrix, or equivalently increase the number of binary classifications, while the optimal coding approach achieves the best performance with the least number of binary classifications. This optimality is of course limited and dictated by the choice of underlying classifications.
12 Authors Suppressed Due to Excessive Length. Face Identification To compare the performance of the discussed methods on the real data, we used the ExtendedYaleB [7] database for face identification. There were 8 people, from each of them 500 face images were taken under different lighting conditions. We randomly chose 50 images from each for training and 50 for testing. The images were resized to 8 8 pixels and then directly vectorized. Random Projections were used to reduce the dimension of the features from N = 684 to L = 50. The instances were then fed to the same four methods. For the last two methods, l = log 8 = 5 binary SVM s with Gaussian kernels were used. Table illustrates the results. Method Euclidean Distance LDA Optimal ECOC Random ECOC to Class Means Partitioning Partitioning Accuracy Table. Accuracy of different methods on ExtendedYaleB Database 4 Summary In this paper we studied the problem of multi-class classification. We considered a general approach which was based on coding theory in communication systems that converts a multi-class problem to several binary problems by assigning a codeword to each class. We argued that conventional channel codes are not optimal for this problem as they do not take into account the partitioning properties of their underlying binary classifiers. We then proposed an approach for coding and showed that, if proper binary classifiers are used, it is optimally partitioning the multi-label space to several binary spaces. We tested the performance of the proposed methodology under different datasets. We first generated synthetic data that could simulate different real scenarios and then compared the performances under a real face identification database. The method was shown to be superior under more realistic classification scenarios. However, it was seen that because complex binary SVM s were used, one should be careful to avoid over-training of the underlying classifiers. Acknowledgments The work presented in this paper was supported by a grant from Switzerland through the Swiss Contribution to the enlarged European Union (PSPB- 5/00).
13 References Multi-class Classification: A Coding Based Space Partitioning. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via errorcorrecting output codes. J. Artif. Intell. Res. (JAIR) (995) Ferdowsi, S., Voloshynovskiy, S., Kostadinov, D.: Content identification: binary content fingerprinting versus binary content encoding. In: Proceedings of SPIE Photonics West, Electronic Imaging, Media Forensics and Security V, San Francisco, USA (January, 04). Li, T., Zhu, S., Ogihara, M.: Using discriminant analysis for multi-class classification: an experimental investigation. Knowledge and information systems 0(4) (006) Murphy, K.P.: Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series). The MIT Press (8 0) 5. James, G., Hastie, T.: The error coding method and picts. J. Computational and Graphical Statistics 7: (998) Voloshynovskiy, S., Koval, O., Beekhof, F., Holotyak, T.: Information-theoretic multiclass classification based on binary classifiers. Signal Processing Systems 65 (0) Lee, K., Ho, J., Kriegman, D.: Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intelligence 7(5) (005)
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Sub-class Error-Correcting Output Codes
Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Monotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: [email protected] Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification
1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
ADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING
Development of a Software Tool for Performance Evaluation of MIMO OFDM Alamouti using a didactical Approach as a Educational and Research support in Wireless Communications JOSE CORDOVA, REBECA ESTRADA
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
SUPPORT VECTOR MACHINE (SVM) is the optimal
130 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 Multiclass Posterior Probability Support Vector Machines Mehmet Gönen, Ayşe Gönül Tanuğur, and Ethem Alpaydın, Senior Member, IEEE
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Mathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Network Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
SYMMETRIC EIGENFACES MILI I. SHAH
SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
Adaptive Face Recognition System from Myanmar NRC Card
Adaptive Face Recognition System from Myanmar NRC Card Ei Phyo Wai University of Computer Studies, Yangon, Myanmar Myint Myint Sein University of Computer Studies, Yangon, Myanmar ABSTRACT Biometrics is
Learning Vector Quantization: generalization ability and dynamics of competing prototypes
Learning Vector Quantization: generalization ability and dynamics of competing prototypes Aree Witoelar 1, Michael Biehl 1, and Barbara Hammer 2 1 University of Groningen, Mathematics and Computing Science
Linear Codes. Chapter 3. 3.1 Basics
Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length
Signature Segmentation from Machine Printed Documents using Conditional Random Field
2011 International Conference on Document Analysis and Recognition Signature Segmentation from Machine Printed Documents using Conditional Random Field Ranju Mandal Computer Vision and Pattern Recognition
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
IN current film media, the increase in areal density has
IEEE TRANSACTIONS ON MAGNETICS, VOL. 44, NO. 1, JANUARY 2008 193 A New Read Channel Model for Patterned Media Storage Seyhan Karakulak, Paul H. Siegel, Fellow, IEEE, Jack K. Wolf, Life Fellow, IEEE, and
Cell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel [email protected] 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Supervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
Employer Health Insurance Premium Prediction Elliott Lui
Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than
Multi-Class Active Learning for Image Classification
Multi-Class Active Learning for Image Classification Ajay J. Joshi University of Minnesota Twin Cities [email protected] Fatih Porikli Mitsubishi Electric Research Laboratories [email protected] Nikolaos Papanikolopoulos
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Image Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Software Approaches for Structure Information Acquisition and Training of Chemistry Students
Software Approaches for Structure Information Acquisition and Training of Chemistry Students Nikolay T. Kochev, Plamen N. Penchev, Atanas T. Terziyski, George N. Andreev Department of Analytical Chemistry,
Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory
Outline Signal Detection M. Sami Fadali Professor of lectrical ngineering University of Nevada, Reno Hypothesis testing. Neyman-Pearson (NP) detector for a known signal in white Gaussian noise (WGN). Matched
Determining optimal window size for texture feature extraction methods
IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
Big Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),
CSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Unsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
Multi-Class and Structured Classification
Multi-Class and Structured Classification [slides prises du cours cs294-10 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294-fall09 Basic Classification in ML Input Output
Maximum Margin Clustering
Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Character Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
A Feature Selection Methodology for Steganalysis
A Feature Selection Methodology for Steganalysis Yoan Miche 1, Benoit Roue 2, Amaury Lendasse 1, Patrick Bas 12 1 Laboratory of Computer and Information Science Helsinki University of Technology P.O. Box
Content-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
Identification algorithms for hybrid systems
Identification algorithms for hybrid systems Giancarlo Ferrari-Trecate Modeling paradigms Chemistry White box Thermodynamics System Mechanics... Drawbacks: Parameter values of components must be known
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Machine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Which Is the Best Multiclass SVM Method? An Empirical Study
Which Is the Best Multiclass SVM Method? An Empirical Study Kai-Bo Duan 1 and S. Sathiya Keerthi 2 1 BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 [email protected]
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Coding and decoding with convolutional codes. The Viterbi Algor
Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
