A Neural Support Vector Network Architecture with Adaptive Kernels. 1 Introduction. 2 Support Vector Machines and Motivations
|
|
|
- Sandra Fisher
- 10 years ago
- Views:
Transcription
1 A Neural Support Vector Network Architecture with Adaptive Kernels Pascal Vincent & Yoshua Bengio Département d informatique et recherche opérationnelle Université de Montréal C.P Succ. Centre-Ville, Montréal, Québec, Canada, H3C 3J7 {vincentp,bengioy}@iro.umontreal.ca Submission to IJCNN 2000 Abstract In the Support Vector Machines (SVM) framework, the positive-definite kernel can be seen as representing a fixed similarity measure between two patterns, and a discriminant function is obtained by taking a linear combination of the kernels computed at training examples called support vectors. Here we investigate learning architectures in which the kernel functions can be replaced by more general similarity measures that can have arbitrary internal parameters. The training criterion used in SVMs is not appropriate for this purpose so we adopt the simple criterion that is generally used when training neural networks for classification tasks. Several experiments are performed which show that such Neural Support Vector Networks perform similarly to SVMs while requiring significantly fewer support vectors, even when the similarity measure has no internal parameters. 1 Introduction Many pattern recognition algorithms are based on the notion of a similarity measure, and generalization is obtained by assigning the same class to similar patterns. In the Support Vector Machines (SVM) framework [3, 11], the positive-definite kernel represents a kind of fixed similarity measure between two patterns, and a discriminant function is obtained by taking a linear combination of the kernels computed at training examples called support vectors. In this paper we investigate learning architectures in which the kernel functions can be replaced by more general similarity measures with arbitrary internal parameters that may be optimized jointly with the weights assigned to the support-vectors. Recent work studies adapting a positive-definite kernel based on geometrical considerations after a first SVM optimization run [1]. And [7] investigates ways of using a fixed but not necessarily positive-definite similarity matrix with SVMs. There is also much previous work on learning similarity measures, e.g, [8] adapts the scale of each dimension in a euclidean K-nearest-neighbor classifier, and [4, 2] use a convolutional neural network to learn a similarity measure (respectively for signature verification and thumbprint recognition). In this paper, we consider the minimization of the sum of margin losses over the training examples, where margin is the signed distance of the discriminant function output to the decision surface (as in AdaBoost [10]). We call this type of architecture Neural Support Vector Networks or NSVN. To allow adaptation of parameters inside the kernel, we minimize the squared loss with respect to the hyperbolic tangent of the discriminant function. This criterion, often used to train neural networks for classification, can also be framed as a criterion for maximizing this margin. Our experiments with NSVNs suggest that such architectures can perform similarly to SVMs while requiring significantly fewer support vectors, even when the similarity measure has no internal parameters. 2 Support Vector Machines and Motivations We consider pattern classification tasks with training data {(x i,y i )}, with x i R n,andy i { 1, +1} is the class associated to input pattern x i. SVMs [3, 11] use a discriminant function with the following form: f α,b (x) = α i y i K(x, x i )+b (1) (x i,y i) sv where sv, thesetofsupport vectors, is a subset of the training patterns, and sign(f(x)) gives the class for any pattern x. Parameters α and b are learned by the SVM algorithm, which also finds the set of support vectors, in
2 f(x) cost function being optimized Linear classifier with parameters b and α Linear Classifier output = α.x+b Mapped input vector x Kernels with shared parameters θ K K K K θ θ θ θ Support vectors: Input vector x: True class y of x Figure 1: The Neural Support Vector Network Architecture (dotted path is used only during training) that it brings the α i of all non-support-vectors down to 0. The kernel K is a function from R n R n to R that must verify Mercer s conditions, which amounts to being positive definite. For example we have used the Gaussian or RBF kernel K σ (x 1,x 2 )=e x 1 x σ 2. Mercer s conditions are necessary and sufficient for the existence of an implicit mapping Φ from the input space to an induced Hilbert space, called the kernel-feature space or Φ- space, such that the kernel actually computes a dot product in this Φ-space: K(x 1,x 2 )=< Φ(x 1 ), Φ(x 2 ) > This kernel trick allows the straightforward extension of the dot-product based SVM algorithm, originally designed for finding a margin-maximizing linear decision surface (hyperplane) in input space, to finding a margin-maximizing linear decision surface in Φ-space, which typically correspond to a non-linear decision surface in input space[3]. The margin that SVM learning maximizes is defined as the orthogonal Euclidean distance between the separating hyperplane and the nearest of the positive and negative examples, and is motivated by theoretical results that link it to bounds on the generalization error [11]. SVM learning amounts to solving a constrained quadratic programming problem in α, the details of which can be found in [3] and [6] for the soft-margin error-tolerationg extension, which adds a complexity control parameter C. Typically, a range of values for C and parameters of the kernel are tried and decided upon according to performance on a validation set. This approach is a serious limiting factor for the research on complex kernels with more than one or two parameters. Yet experiments [5] show that the choice of an appropriate K and parameters can be critical. Unfortunately, the mathematical formulation of SVMs does not easily allow to incorporate trainable adaptive k- ernels. In particular it is not clear whether the theoretical considerations underlying SVM training still hold for kernels with parameters that are not kept fixed during the optimization. Also the positive definitess constraint limits the choice of possible kernel functions. All these considerations lead us to the design of the architecture described in the following section. 3 Neural Support Vector Networks Figure 1 shows the general architecture of a Neural Support Vector Network (NSVN). SVMs and NSVNs yield decicion functions f of the same form (see equation 1). Consequently, when using the same fixed K, the function space F being considered by both algorithms is nearly 1 identical. However in NSVN training, K does not have to be a fixed positive-definite kernel, it can be any scalar function of two input objects, and may have parameters that can be learned. Computing the discriminant function f on a point x can be seen as a two-stage process: 1. map input x into x by computing the values of all support-vector-centered similarity measures for x: x =Ψ(x) =(K θ (x, x 1 ),K θ (x, x 2 ),..., K θ (x, x m )) where {x 1,x 2,...x m } = sv is the set of m support vectors 1 In NSVN training we don t usually constrain the α i s to be positive, thus we allow a support-vector-centered kernel to vote against its class.
3 2 1.5 exp(-m) log(1+exp(-m)) 1-tanh(m) [AdaBoost] [LogitBoost] [Doom II] classical squared error as a margin cost function squared error after tanh with 0.65 target Figure 2: Loss functions of the margin m = yf(x) (horizontal axis). Left: The loss functions used in AdaBoost, LogitBoost and Doom II. Right: The classical squared error criterion (with ±1 target), and squared error after tanh squashing with ±.65 target, expressed as functions of the margin m. and θ the parameters of the similarity-measure. Let us call Ψ-space the resulting space. 2. apply a linear classifier on x: f θ,α,b(x) =<α, x >+b =< α,ψ(x) > +b Suppose first that we are given a fixed set of support vectors sv. Finding a decision function f Fof the form of equation 1 amounts to constructing an appropriatelinear classifier in Ψ space. Notice the difference with SVM training which finds a margin maximizing linear classifier in Φ-space. While the existence of an implicit Φ-space requires a positive definite kernel, and Φ may be unknown analytically, a Ψ-space can be associated to any K (even non-symmetric) for a given set of support, and is defined, precisely, by its mapping. There are many possible algorithms for constructing a reasonable linear classifier; for instance a linear SVM could be used (see [7]). In this paper however, we will limit our study to algorithms based on the backpropagation of error gradients. These allow error gradients to be propagated back to parameters θ of a parameterized similarity measure K θ, and adapt them on the fly, which was one of our primary design goals. Formally, for a training set S (of input/output pairs (x, y), with x X,y { 1, +1}), a gradient based training algorithm will choose the parameters of the decision function f θ,α,b Fthat minimize an empirical error defined as the sum of the losses Q(f θ,α,b (x),y) incurred on each pattern (x, y) S, with Q : R { 1, +1} R. Training consists in searching for (θ, α, b) which minimize this average empirical loss: = arg min Q(f θ,α,b (x i ),y i ) f θ,α,b θ,α,b (x i,y i) S 3.1 Margin Loss Functions With NSVNs, we are no longer maximizing the geometrically inspired SVM margin. [10] use another definition of margin and relate AdaBoost s goood performance to the maximization of this margin. The support vector form of decision functions can to some extent be framed as a linear combination of weak classifiers à la AdaBoost, each application of the kernel to a support point providing one possible weak classifier. Formally, having a parameterized decision function f(x) whose sign determines the decided class, we define the individual margin of a given sample x i with true class y i { 1, +1} as m i = y i f(x i ), from which we define a loss function c(yf(x)) = c(m) = Q(f θ,α,b (x),y). The algorithm searches for the parameters (θ, α, b) that minimize (x i,y i) S c(m i) where c(m) is the margin loss function. [9] compare the performance of several voting methods that were shown to optimize a margin loss function. AdaBoost uses an exponential (e m ) margin loss function [10]. LogitBoost uses log 2 (1 + e 2m ) and Doom II [9] approximates a theoretically motivated margin loss with 1 tanh(m). As can be seen in Figure 2 (left), all these functions encourage large positive margins, and differ mainly in how they penalize large negative ones. In particular 1 tanh(x) won t penalize outliers to excess, and proved to work better especially in the case of label noise [9]. These margin loss functions have a problem if the parameters allow arbitrary scaling of the discriminant function f, which does not change the decision function, so the parameters could grow indefinitely to maximize margins.
4 For the 3 previously mentioned voting methods the parameters α i s are constrained, so that the problem does not appear. Yet, while it makes perfect sense to constrain i α i =1for instance (AdaBoost), how to constrain kernel parameters, or even just b, is much less clear. Now, our experiments suggest that the well known squared loss functions (f(x) y) 2,and(tanh(f(x)) 0.65y) 2 often used in neural networks perform rather well, even without constrained parameters. It is interesting to express them as margin loss functions to see why: Squared loss: (f(x) y) 2 =(1 m) 2 Squared loss after tanh: (tanh(f(x)) 0.65y) 2 =(0.65 tanh(m)) 2 Both are illustrated on figure 2 (right). Notice that the squared loss after tanh has a shape very similar to the margin loss function used in Doom II, except that it slightly increases for large positive margins, which is why it behaves well with unconstrained parameters. 3.2 Choice of the set of support vectors So far in our discussion, we assumed that we were given an appropriate set sv of support vectors. We have not yet discussed how to choose such a set. The SVM algorithm considers all training data and automatically chooses a subset as its support vectors by driving the corresponding α i s down to 0. On the contrary, a simple unconstrained empirical error minimization procedure that would consider all data is not very likely to lead to many zero αs. There are several ways to address this issue: 1. Add a regularization term to the loss function that would push down α i s and let the algorithm choose its support set by itself, for instance use a penalty term λ i α i, whereλ allows some control on the number of support vectors. 2. Use a heuristic to choose the support vectors. For instance we could use the support vectors returned by a SVM, or use geometric considerations such as points that are closest to points of the other class. 3. Pick m support points at random, m being seen as a capacity control parameter. One difference with some classical RBF networks is that the support vectors are training examples (not free parameters). Another one is that our ultimate goal is to learn, simultaneously with the α s, a similarity measure (which may be more sophisticated than a Mahalanobis distance), that is applied to all the support vectors (whereas generally in RBFs, there may be a different variance for each cluster). 4 Experimental Results In a first series of experiments we compared the performance of NSVN (picking sv at random) and SVM when using the same fixed kernel. The objective of the experiment was to see if the training criterion that we had set up could learn the parameters correctly in the sense of giving a decision function that performs comparably to SVMs. The experiments were performed with 5 data sets, four of which were obtained from the UCI Machine Learning database. All involve only 2-way classification. Each of the data set was split in three approximately equal subsets for training (choosing the parameters), validation (controlling capacity, either with the box constraint C for SVMs or with the number of support vectors m for NSVNs), and out-of-sample testing. The breast cancer (200 train validation test) and diabetes (200 train validation test) data was normalized according to the input values in the training set. The ionosphere data (100 train validation test) was used as is. For each of these these we used a Gaussian RBF-Kernel with σ chosen roughly with a few validation-runs on the validation set. The Corel data (296 train validation test) consists of 7x7x7 smoothed histogram counts (to the power 1 2 ) for the 3 colors of images from the Corel data set[5]. The task was the discrimination of two types of images. For this dataset, we used a kernel of the form exp( x x /σ), as suggested in [5]. For the Wisconsin Breast Cancer database, we also checked how the algorithms performed when 20% label noise (labels flipped with a probability of 20%) had been added to the training set. We run SVM optimizations for several values of the complexity-control parameter C. Similarly, we run NSVN optimizations for several values of m, each time trying 10 different random choices of m support vectors and keeping the one that minimized the average empirical loss (tanh(f(x)) 0.65y) 2. For both algorithms we retained the run with lowest error on the validation set, and the results of the comparative experiments are given in table 1.
5 Note that we also give the average error on the joined test and validation sets which are a bit less noisy (but slightly biased, although the bias should be similar for both SVM and NSVN). SVM NSVN #of test valid. #m of test valid. C S.V. error +test S.V. error +test Breast Cancer % 3.5% % 3.7% Noisy Breast Cancer % 3.7% % 4.1% Diabetes % 24.1% % 24.3% Ionosphere % 5.9% % 5.9% Corel % 11.3% % 11.3% Table 1: Comparison of number of support vectors and error rates obtained with SVM and NSVN. As can be seen, NSVN performs comparably to SVM, often with far fewer support vectors. Our next experiment aimed at showing that the NSVN architecture was, indeed, able to learn a useful similaritymeasure K that is not necessarily positive-definite, together with the weights of the support-vectors. For this, we used a traditional multilayer perceptron (MLP) with one hidden layer, sigmoid activation, and a single output unit, as our similarity measure K(x 1,x 2 ). The input layer receives the concatenation of the two inputs x 1 and x 2, and the MLP performs the similarity computation, which can be stated as K w0,b0,w1,b1 (x 1,x 2 )=tanh(b 1 + w 1.sigmoid(b 0 + w 0.(x 1,x 2 ))) This was tried on the Breast-Cancer data, using 3 hidden units and the same training procedure as previously described (different values of m, 10 random trials each). It achieved 3.0% error on the test-set and 3.1% error error on the combined validation and test set, with 30 support vectors. This shows that, although the parameters of the similarity measure were initialized at random, NSVN training was able to find values appropriate for the requested classification task. 5 Conclusion and Future Work We have proposed a new architecture that is inspired from SVMs but is driven by the objective of being able to learn a similarity function that is not necessarily a positive definite kernel. In the process, we have uncovered a link between the loss functions typically used in neural network training and the kind of margin cost functions optimized by AdaBoost and similar algorithms, and outlined the differences with the geometrically-inspired margin maximized by SVM learning. Moreover, we have shown experimentally that both approaches perform comparably, in terms of expected errors, which may suggest that the support-vector kind of architecture (which determines the form of discriminant functions that are considered) may be responsible for their good performance, and not only the particular kind of margin-maximization that is used. Several experiments on classification data sets showed that the proposed algorithm, when used with the same fixed Kernel, performs comparably to SVM, often with substantially fewer support-vectors (chosen at random!), which is in itself interesting, as it allows an equally substantial improvement in terms of speed. But more important, we have defined a framework that opens the way to the exploration of more interesting adaptive similarity measure than the fixed positive-definitie kernels typically used with SVM. Trainable parametric similarity measures can now be used, that were designed to incorporate prior knowledge specific to the task at hand (such as those proposed in [4, 2]). A large number of open questions remain though, in particular regarding the merits of various margin cost function, or the way to choose the set of support vectors... Acknowledgements The authors would like to thank Olivier Chapelle, Patrick Haffner and Léon Bottou for helpful discussions, as well as the NSERC Canadian funding agency, and the IRIS network for support.
6 References [1] S. Amari and S. Wu. Improving support vector machine classifiers by modifying kernel functions. Neural Networks, to appear. [2] P. Baldi and Y. Chauvin. Neural networks for fingerprint recognition. Neural Computation, 5(3): , [3] B. Boser, I. Guyon, and V. Vapnik. An algorithm for optimal margin classifiers. In Fifth Annual Workshop on Computational Learning Theory, pages , Pittsburgh, [4] J. Bromley, J. Benz, L. Bottou, I. Guyon, L. Jackel, Y. LeCun, C. Moore, E. Sackinger, and R. Shah. Signature verification using a siamese time delay neural network. In Advances in Pattern Recognition Systems using Neural Network Technologies, pages World Scientific, Singapore, [5] O. Chapelle, P. Haffner, and V. Vapnik. Svms for histogram-based image classification. IEEE Transactions on Neural Networks, accepted, special issue on Support Vectors. [6] C. Cortes and V. Vapnik. Soft margin classifiers. Machine Learning, 20: , [7] T. Graepel, R. Herbrich, P. Bollmann-Sdorra, and K. Obermayer. Classification on pairwise proximity data. In 12th Annual Conference on Neural Information Processing Systems (NIPS 98), [8] D.G. Lowe. Similarity metric learning for a variable-kernel classifier. Neural Computation, 7(1):72 85, [9] L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. In S. A. Solla, T. K. Leen, and K-R. Mller, editors, Advances in Neural Information Processing Systems 12. The MIT Press, Accepted for Publication. [10] Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, to appear, [11] V.N. Vapnik. The Nature of Statistical Learning Theory. Springer, New-York, 1995.
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
Training Methods for Adaptive Boosting of Neural Networks for Character Recognition
Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Support Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. [email protected] 1 Support Vector Machines: history SVMs introduced
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...
IFT3395/6390 Historical perspective: back to 1957 (Prof. Pascal Vincent) (Rosenblatt, Perceptron ) Machine Learning from linear regression to Neural Networks Computer Science Artificial Intelligence Symbolic
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Support-Vector Networks
Machine Learning, 20, 273-297 (1995) 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Support-Vector Networks CORINNA CORTES VLADIMIR VAPNIK AT&T Bell Labs., Holmdel, NJ 07733,
Neural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
Monotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: [email protected] Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
Using artificial intelligence for data reduction in mechanical engineering
Using artificial intelligence for data reduction in mechanical engineering L. Mdlazi 1, C.J. Stander 1, P.S. Heyns 1, T. Marwala 2 1 Dynamic Systems Group Department of Mechanical and Aeronautical Engineering,
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Support Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,
Visualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
Active Learning with Boosting for Spam Detection
Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Machine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
Lecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Tree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000
How Boosting the Margin Can Also Boost Classifier Complexity
Lev Reyzin [email protected] Yale University, Department of Computer Science, 51 Prospect Street, New Haven, CT 652, USA Robert E. Schapire [email protected] Princeton University, Department
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
Chapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
Data Mining. Supervised Methods. Ciro Donalek [email protected]. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.
Data Mining Supervised Methods Ciro Donalek [email protected] Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Large Margin DAGs for Multiclass Classification
S.A. Solla, T.K. Leen and K.-R. Müller (eds.), 57 55, MIT Press (000) Large Margin DAGs for Multiclass Classification John C. Platt Microsoft Research Microsoft Way Redmond, WA 9805 [email protected]
Loss Functions for Preference Levels: Regression with Discrete Ordered Labels
Loss Functions for Preference Levels: Regression with Discrete Ordered Labels Jason D. M. Rennie Massachusetts Institute of Technology Comp. Sci. and Artificial Intelligence Laboratory Cambridge, MA 9,
AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz
AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Signature Segmentation from Machine Printed Documents using Conditional Random Field
2011 International Conference on Document Analysis and Recognition Signature Segmentation from Machine Printed Documents using Conditional Random Field Ranju Mandal Computer Vision and Pattern Recognition
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
On Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: [email protected] Fabrizio Smeraldi School of
Which Is the Best Multiclass SVM Method? An Empirical Study
Which Is the Best Multiclass SVM Method? An Empirical Study Kai-Bo Duan 1 and S. Sathiya Keerthi 2 1 BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 [email protected]
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
INTRUSION DETECTION USING THE SUPPORT VECTOR MACHINE ENHANCED WITH A FEATURE-WEIGHT KERNEL
INTRUSION DETECTION USING THE SUPPORT VECTOR MACHINE ENHANCED WITH A FEATURE-WEIGHT KERNEL A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements
A Learning Algorithm For Neural Network Ensembles
A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
DUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 [email protected] Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
Image Normalization for Illumination Compensation in Facial Images
Image Normalization for Illumination Compensation in Facial Images by Martin D. Levine, Maulin R. Gandhi, Jisnu Bhattacharyya Department of Electrical & Computer Engineering & Center for Intelligent Machines
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
High-Performance Signature Recognition Method using SVM
High-Performance Signature Recognition Method using SVM Saeid Fazli Research Institute of Modern Biological Techniques University of Zanjan Shima Pouyan Electrical Engineering Department University of
Spam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle
Cross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets
Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering
Biometric Authentication using Online Signatures
Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu [email protected], [email protected] http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,
Machine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
Decompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation
Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Keng-Pei Lin Ming-Syan Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan Research Center
FISHER DISCRIMINANT ANALYSIS WITH KERNELS
FISHER DISCRIMINANT ANALYSIS WITH KERNELS Sebastian Mikat, Gunnar fitscht, Jason Weston! Bernhard Scholkopft, and Klaus-Robert Mullert tgmd FIRST, Rudower Chaussee 5, 12489 Berlin, Germany $Royal Holloway,
Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/
Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
Large-Scale Similarity and Distance Metric Learning
Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
An Introduction to Neural Networks
An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
