# F β Support Vector Machines

Save this PDF as:

Size: px
Start display at page:

## Transcription

2 s.t. { yi ( w, x i + b) 1 ξ i i =1..n ξ i 0 i =1..n where w and b are the parameters of the hyperplane. The Φ(.) term introduced in the objective function is used to penalize solutions presenting many training errors. For any feasible solution (w,b,ξ), missclassified training examples have an associated slack value of at least 1. The situation is illustrated in figure 1. Hence, it seems natural to chose a function counting the number of slacks greater or equal to 1 as penalization function Φ(.). Unfortunately, the optimization of such a function combined with the margin criterion turns out to be a mixed-integer problem known to be NP-hard [15]. In fact, two approximations of the counting function are commonly used: Φ(ξ) = n i=1 ξ i (1-norm) and Φ(ξ) = n i=1 ξ2 i (2- norm). These approximations present two peculiarities: 1) The sum of slacks related to examples inside the margin might be considered as errors. 2) Examples with a slack value greater than 1 might contribute as more than one error. However, the use of these approximations is computationally attractive as the problem remains convex, quadratic and consequently solvable in polynomial time. In the sequel, we will focus on the 2-norm alternative. Fig. 1. Soft-margin SVM and associated slacks The computation of the preceding approximations separately for different class labels allows to bound the elements of the contingency matrix. Proposition 1: Let (w,b,ξ) be a solution satisfying the constraints of OP1. The following bounds holds for the elements of the contingency matrix computed on the training set: #TP n + ξi 2 #FP ξi 2 #FN ξi 2 #TN n ξi 2 These bounds will be called the slack estimates of the contingency matrix. It should be noted that they also could have been formulated using the 1-norm approximation. B. The F β parametrization Let us introduce a parametrization of SVM in which a regularized F β criterion is optimized. The F β function can be expanded using the definition of precision and recall as: F β = (β2 +1)πρ β 2 π + ρ = (β 2 +1)#TP (β 2 +1)#TP + β 2 #FN +#FP The optimal value for F β ( 1) is obtained by minimizing β 2 #FN +#FP. Replacing #FN and #FP by their slack estimates and integrating this into the objective function leads to the following optimization problem: OP2 Minimize W (w,b,ξ) = 1 2 w 2 + C.[β 2. ξ 2 i + { yi ( w, x s.t. i + b) 1 ξ i i =1..n ξ i 0 i =1..n The relative importance of the F β criterion with respect to the margin can be tuned using the regularization constant C. Since the slack estimates for #FP and #FN are upper bounds, OP2 is based on a pessimistic estimation of the F β. OP2 can be seen as an instance of the SVM parametrization considering two kinds of slacks with the associated regularization constants C + and C [21], [13]. In our case, the regularization constants derive from the β value, i.e. C + = Cβ 2 and C = C. It should be pointed out that when β =1, OP2 is equivalent to the traditional 2-norm soft-margin SVM problem. The optimization of the F β criterion is closely related to the problem of training a SVM with an imbalanced dataset. When the prior of a class is by far larger than the prior of the other class, the classifier obtained by a standard SVM training is likely to act as the trivial acceptor/rejector (i.e. a classifier always predicting +1, respectively 1). To avoid this inconvenience, some authors [21] have introduced different penalities for the different classes using C + and C. This method has been applied in order to control the sensitivity 1 of the model. However, no automatic procedure has been proposed to choose the regularization constants with respect to the user specifications. Recently, this technique has been improved by artificially oversampling the minority class [1]. Other authors [2] have proposed to select a unique regularization constant C through a bootstrap procedure. This constant is then used as a starting point for tuning C + and C on a validation set. IV. MODEL SELECTION ACCORDING TO F β In the preceding section, we proposed a parametrization of SVM enabling the user to formulate his specifications with the β parameter. In addition, the remaining hyperparameters, i.e. the regularization constant and the kernels parameters, must be selected. In the case of SVM, model selection can be made using the statistical properties of the optimal hyperplane, thus avoiding the need of performing cross-validation. Indeed, 1 The sensitivity is the rate of true positive examples and is equivalent to recall. ξ 2 i ] 1444

3 several bounds of the leave-one-out (loo) error rate can be directly derived from the parameters of the optimal hyperplane expressed in dual form [20], [14], [10]. A practical evaluation of several of these bounds has been recently proposed in [7]. Moreover, Chapelle, Vapnik et al. [4] have shown that the hyperplane dual parameters are differentiable with respect to the hyperparameters. This allows the use of gradient-based techniques for model selection [4], [5]. In this section, we propose a gradient-based algorithm selecting automatically C and the width σ of a gaussian kernel 2 according to the generalization F β score. A. The generalization F β loss function It has been proved by Vapnik [19] that for an example (x i,y i ) producing a loo error, 4α i R 2 1 holds, where R is the radius of the smallest sphere enclosing all the training examples and α i is the i-th dual parameter of the optimal hyperplane. This inequality was originally formulated for the hard-margin case. However, it can be applied to the 2-norm soft-margin SVM as the latter can be seen as a hard margin problem with a transformed kernel [6], [13]. Using the preceding inequality, one can build an estimator of the generalization F β score of a given model. Alternately, it is possible to formulate a loss function following the reasoning developed in section III-B: L Fβ (α,r) 4R 2 β 2 α i + α i In the algorithm proposed in section IV-B, the model parameters are selected by minimizing the L Fβ (.,.) loss function. B. The model selection algorithm We introduce here an algorithm performing automatic model selection according to the F β criterion. It selects the model by performing a gradient descent of the F β loss function over the set of hyperparameters. For the sake of clarity, C and σ, are gathered in a single vector θ. The model selection algorithm is sketched hereafter. 2 k(x i, x j )=exp( x i x j 2 /2σ 2 ) Algorithm F β MODELSELECTION Input: Training set Tr =(x 1,y 1 ),...,(x n,y n ) Initial values for the hyperparameters θ 0 Precision parameter ɛ Output: Optimal hyperparameters θ SVM optimal solution α using θ α 0 trainf β SVM(Tr,θ 0 ); (R, λ) 0 smallestsphereradius(tr,θ 0 ); repeat θ t+1 updatehyperparameters(θ t, α t,r t, λ t ); α t+1 trainf β SVM(Tr,θ t+1 ); (R, λ) t+1 smallestsphereradius(tr,θ t+1 ); t t +1; until L Fβ (α t,r t ) L Fβ (α t 1,R t 1 ) <ɛ; return {θ t, α t } The trainf β SVM function solves OP3, the dual problem of OP2, which has the same form as the dual hard-margin problem [15]: OP3 Maximize W (α) = 1 n n α i α j y i y j k (x i, x j )+ α i 2 i,j=1 i=1 { n s.t. i=1 α iy i =0 α i 0 i =1..n with a transformed kernel: { k 1 k(xi, x j )+δ ij. (x i, x j )= Cβ if y 2 i =+1 k(x i, x j )+δ ij. 1 C if y i = 1 where δ ij is the Kronecker delta and k(.,.) is the original kernel function. The radius of the smallest sphere enclosing all the examples computed by the smallestsphereradius function is obtained by taking the square root of the objective function optimal value in the following optimization problem [15]: OP4 Maximize n W (λ) = λ i k (x i, x i ) i=1 n λ i λ j k (x i, x j ) i,j=1 { n s.t. i=1 λ i =1 λ i 0 i =1..n The optimization problems OP3 and OP4 can be solved in polynomial time in n, e.g. using an interior point method [17]. Furthermore, the solution to OP3, respectively OP4, at a given iteration can be used as a good starting point for the next iteration. 1445

4 At each iteration, the hyperparameters can be updated by means of a gradient step : θ t+1 = θ t η. L Fβ / where η>0 is the updating rate. However, second order methods often provide a faster convergence, which is valuable since two optimization problems have to be solved at each iteration. For this reason, the updatehyperparameters function relies on the BFGS algorithm [8], a quasi-newton optimization technique. The time complexity of the updatehyperparameters function is O(n 3 ) since it is dominated by the inversion of a possibly n n matrix (see section IV-C). The derivatives of the F β loss function with respect to the hyperparameters are detailed in the next section. The algorithm is iterated until the F β loss function no longer changes by more than ɛ. negative classes were respectively 0.3 and 0.7. It is usually more difficult to obtain a good recall when data are unbalanced in this way. Experiments were carried out using training sets of 600 examples, a fixed test set of 1,000 examples and a linear kernel. A comparison between the F β parametrization and the 2-norm soft-margin SVM with C =1is displayed in figure 2. For each β considered, the training data were resampled 10 times in order to produce averaged results. In this setting, our parametrization obtained better F β scores than the standard soft-margin SVM, especially when a high recall was requested. The second part of the figure 2 presents the evolution of precision, recall and the F β score for different β values. C. Derivatives of the F β loss function The derivatives of the transformed kernel function with respect to the hyperparameters are given by: k (x i, x j ) 1/(C 2 β 2 ) if i = j and y i =+1 = 1/C 2 if i = j and y i = 1 C 0 otherwise k (x i, x j ) σ 2 = k(x i, x j ) x i x j 2 2σ 4 The derivatives of the squared radius can then be obtained applying the lemma 2 of Chapelle, Vapnik et al. [4]: R 2 n = k (x i, x i ) n k (x i, x j ) λ i λ i λ j i=1 i,j=1 where θ {C, σ 2 }. The derivation of the hyperplane dual parameters proposed in [4] follows: ( ) (α,b) 1 H y = H (α,b)t, H = T Ky y y T 0 where K is the kernel matrix and y is the vector of examples labels. The H matrix is derived by using the preceding kernel function derivatives. It should be stressed that only examples corresponding to support vectors have to be considered in the above formula. Finally, the derivative of L Fβ (.,.) with respect to a hyperparameter θ is given by: L Fβ (α,r) = 4 R2 β 2 + 4R 2 β 2 α i + α i + α i α i V. EXPERIMENTS We performed several experiments to assess the performance of the F β parametrization and the model selection algorithm. First, the F β parametrization was tested with positive and negative data in R 10 drawn from two largely overlapping normal distributions. The priors for positive and Fig. 2. The F β parametrization tested with artificially generated data. Top: comparison between the standard 2-norm soft-margin SVM and the F β parametrization. Bottom: Evolution of precision, recall and of the F β score accoring to different β values. Afterwards, our parametrization was tested using several class priors. The experimental setup was unchanged except for the class priors while generating the training and test data. Figure 3 shows the evolution of the F β score obtained by our parametrization and by the 2-norm soft-margin SVM using several class priors. For the standard 2-norm soft-margin SVM, one notes that the effect of the priors is particularly important when positive examples are few in numbers and that a high recall is requested. In this setting, our parametrization outperformed the standard 2-norm soft-margin SVM by more than 0.1. The model selection algorithm was first tested with data 1446

5 Fig. 3. The F β parametrization tested using artificially generated data with several class priors. Top: F β scores obtained on the test set using the F β parametrization. Bottom: F β scores obtained on the test set using the standard 2-norm soft-margin SVM. Fig. 4. The F β model selection algorithm tested with artificially generated data and with β =2. Top: the evolution of the F β loss function during the gradient descent. Bottom: the related values of precision, recall and F β score on independent test data. generated as in the previous paragraph. The hyperparameters C and σ were initialized to 1 and the precision parameter ɛ was set to Our objective was to investigate the relation between the minimization of the F β loss function and the F β score obtained on unknown test data. The figure 4 shows the evolution of the F β loss function during the gradient descent, using β =2. The associated precision, recall and F β scores on test data are displayed in the bottom of the figure 4. Even if the optima of the F β loss function and the F β score do not match exactly, one can observe that good F β scores were obtained when the F β loss function is low. After 35 iterations, the classifier obtained a F β score close to 0.9 with the hyperparameters C = 4.33 and σ = The model selection algorithm was then compared to the Radius-Margin (RM) based algorithm [4] using the Diabetes dataset [3]. This dataset contains 500 positive examples and 268 negative examples. It was randomly split into a training and a test set, each one containing 384 examples. In this setting, it is usually more difficult to obtain a classifier with a high precision. The same initial conditions as before were used. The RM based algorithm select the model parameters of the 2-norm soft-margin SVM according to the RM estimator of the generalization error rate. It should be pointed out that when β =1, both methods are equivalent since the same function is optimized. The comparison is illustrated in the first part of the figure 5. As expected, our method provided better results when β moves far away from value 1. The influence of the β parameter on precision, recall and the F β score can be observed in the second part of the figure 5. VI. CONCLUSION We introduced in this paper F β SVMs, a new parametrization of support vector machines. It allows to formulate user specifications in terms of F β, a classical IR measure. Experiments illustrates the benefits of this approach over a standard SVM when precision and recall are of unequal importance. Besides, we extended the results of Chapelle, Vapnik et al. [4] based on the Radius-Margin (RM) bound in order to automatically select the model hyperparameters according to the generalization F β score. We proposed an algorithm which performs a gradient descent of the F β loss function over the set of hyperparameters. To do so, the partial derivatives of the F β loss function with respect to these hyperparameters have been formally defined. Our experiments on real-life data show the advantages of this method compared to the RM based algorithm when the F β evaluation criterion is considered. Our future work includes improvements to the model selection algorithm in order to deal with larger training sets. Indeed, it is possible to use a sequential optimization method [11] in the smallestsphereradius function and chunking 1447

6 Fig. 5. The F β model selection algorithm tested with the Diabetes dataset. Top: Comparison between the F β model selection algorithm and the radiusmargin based method. Bottom: Evolution of precision, recall and of the F β score accoring to different β values. techniques [9], [15] in the trainf β SVM function. This typically allows to solve problems with more than 10 4 variables. Moreover, we believe that the inverse matrix H 1 can be computed incrementally during the chuncking iterations, using the Schur inversion formula for block matrices [12]. [7] Kaibo Duan, S. Sathiya Keerthi, and Aun Neow Poo. Evaluation of simple performance measures for tuning svm hyperparameters. Neurocomputing, 51:41 59, [8] R. Fletcher and M. J. D. Powell. A rapidly convergent descent method for minimization. Computer Journal, 6: , [9] T. Joachims. Making large-scale support vector machine learning practical. In A. Smola B. Scholkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, [10] Thorsten Joachims. Estimating the generalization performance of a SVM efficiently. In Pat Langley, editor, Proceedings of ICML-00, 17th International Conference on Machine Learning, pages , Stanford, US, Morgan Kaufmann Publishers, San Francisco, US. [11] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. A fast iterative nearest point algorithm for support vector machine classifier design. IEEE-NN, 11(1):124, January [12] Carl D. Meyer. Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics, [13] John Shawe-Taylor Nello Critianini. An Introduction to Support Vector Machines. The Press Syndicate of the University of Cambridge, [14] B. Schölkopf, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Generalization bounds via eigenvalues of the Gram matrix. Technical report, Australian National University, February Submitted to COLT99. [15] B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, [16] Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1 47, [17] R. Vanderbei. LOQO: An interior point code for quadratic programming. Technical Report SOR 94-15, Princeton University, [18] Vladimir Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, [19] Vladimir Vapnik. Statistical Learning Theory. Wiley-Interscience, New York, [20] Vladimir Vapnik and Olivier Chapelle. Bounds on error expectation for support vector machines. Neural Computation, 12(9), [21] K. Veropoulos, N. Cristianini, and C. Campbell. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI99), Stockholm, Sweden, ACKNOWLEDGMENT This work is partially supported by the Fonds pour la formation à la Recherchedans l Industrieet dans l Agriculture (F.R.I.A.) under grant reference F3/5/5-MCF/FC REFERENCES [1] Rehan Akbani, Stephen Kwek, and Nathalie Japkowicz. Applying support vector machines to imbalanced datasets. In Proceedings of the 15th European Conference on Machine Learning (ECML), pages 39 50, Pisa, Italy, [2] S. Amerio, D. Anguita, I. Lazzizzera, S. Ridella, F. Rivieccio, and R. Zunino. Model selection in top quark tagging with a support vector classifier. In Proceedings of International Joint Conference on Neural Networks (IJCNN 2004), Budapest, Hungary, July [3] C.L. Blake and C.J. Merz. UCI repository of machine learning databases, [4] Olivier Chapelle, Vladimir Vapnik, Olivier Bousquet, and Sayan Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3): , [5] Kai-Min Chung, Wei-Chun Kao, Chia-Liang Sun, Li-Lun Wang, and Chih-Jen Lin. Radius margin bounds for support vector machines with the rbf kernel. Neural Comput., 15(11): , [6] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3): ,

### Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

### Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

### Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

### Choosing Multiple Parameters for Support Vector Machines

Machine Learning, 46, 131 159, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Choosing Multiple Parameters for Support Vector Machines OLIVIER CHAPELLE LIP6, Paris, France olivier.chapelle@lip6.fr

### Controlling the Sensitivity of Support Vector Machines. K. Veropoulos, C. Campbell, N. Cristianini. United Kingdom. maximising the Lagrangian:

Controlling the Sensitivity of Support Vector Machines K. Veropoulos, C. Campbell, N. Cristianini Department of Engineering Mathematics, Bristol University, Bristol BS8 1TR, United Kingdom Abstract For

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

### Support Vector Machines Explained

March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### Breaking SVM Complexity with Cross-Training

Breaking SVM Complexity with Cross-Training Gökhan H. Bakır Max Planck Institute for Biological Cybernetics, Tübingen, Germany gb@tuebingen.mpg.de Léon Bottou NEC Labs America Princeton NJ, USA leon@bottou.org

### Support Vector Machine (SVM)

Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

### Which Is the Best Multiclass SVM Method? An Empirical Study

Which Is the Best Multiclass SVM Method? An Empirical Study Kai-Bo Duan 1 and S. Sathiya Keerthi 2 1 BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 askbduan@ntu.edu.sg

### Large Margin DAGs for Multiclass Classification

S.A. Solla, T.K. Leen and K.-R. Müller (eds.), 57 55, MIT Press (000) Large Margin DAGs for Multiclass Classification John C. Platt Microsoft Research Microsoft Way Redmond, WA 9805 jplatt@microsoft.com

### Using Random Forest to Learn Imbalanced Data

Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,

### Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### Active Learning in the Drug Discovery Process

Active Learning in the Drug Discovery Process Manfred K. Warmuth, Gunnar Rätsch, Michael Mathieson, Jun Liao, Christian Lemmen Computer Science Dep., Univ. of Calif. at Santa Cruz FHG FIRST, Kekuléstr.

### SVM Ensemble Model for Investment Prediction

19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

### Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

### A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

### A fast multi-class SVM learning method for huge databases

www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and

### Introduction to Machine Learning NPFL 054

Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

### SUPPORT VECTOR MACHINE (SVM) is the optimal

130 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 Multiclass Posterior Probability Support Vector Machines Mehmet Gönen, Ayşe Gönül Tanuğur, and Ethem Alpaydın, Senior Member, IEEE

### Roulette Sampling for Cost-Sensitive Learning

Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

### Massive Data Classification via Unconstrained Support Vector Machines

Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI

### Learning with Local and Global Consistency

Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

### Working Set Selection Using Second Order Information for Training Support Vector Machines

Journal of Machine Learning Research 6 (25) 889 98 Submitted 4/5; Revised /5; Published /5 Working Set Selection Using Second Order Information for Training Support Vector Machines Rong-En Fan Pai-Hsuen

### A Linear Programming Approach to Novelty Detection

A Linear Programming Approach to Novelty Detection Colin Campbell Dept. of Engineering Mathematics, Bristol University, Bristol Bristol, BS8 1 TR, United Kingdon C. Campbell@bris.ac.uk Kristin P. Bennett

### Convex Optimization SVM s and Kernel Machines

Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### A User s Guide to Support Vector Machines

A User s Guide to Support Vector Machines Asa Ben-Hur Department of Computer Science Colorado State University Jason Weston NEC Labs America Princeton, NJ 08540 USA Abstract The Support Vector Machine

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Support Vector Pruning with SortedVotes for Large-Scale Datasets

Support Vector Pruning with SortedVotes for Large-Scale Datasets Frerk Saxen, Konrad Doll and Ulrich Brunsmann University of Applied Sciences Aschaffenburg, Germany Email: {Frerk.Saxen, Konrad.Doll, Ulrich.Brunsmann}@h-ab.de

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Classifiers & Classification

Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

### Learning with Local and Global Consistency

Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

### Multiple Kernel Learning on the Limit Order Book

JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise

### A Neural Support Vector Network Architecture with Adaptive Kernels. 1 Introduction. 2 Support Vector Machines and Motivations

A Neural Support Vector Network Architecture with Adaptive Kernels Pascal Vincent & Yoshua Bengio Département d informatique et recherche opérationnelle Université de Montréal C.P. 6128 Succ. Centre-Ville,

### Spam detection with data mining method:

Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

### Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

### Several Views of Support Vector Machines

Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

### Robust Cost Sensitive Support Vector Machine

Shuichi Katsumata The University of Tokyo shuichi katsumata@mist.i.u-tokyo.ac.jp Akiko Takeda The University of Tokyo takeda@mist.i.u-tokyo.ac.jp Abstract In this paper we consider robust classifications

### large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

### Online Classification on a Budget

Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk

### DUOL: A Double Updating Approach for Online Learning

: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University

### Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

### Maximum Margin Clustering

Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin

### Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

### SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH

1 SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH Y, HONG, N. GAUTAM, S. R. T. KUMARA, A. SURANA, H. GUPTA, S. LEE, V. NARAYANAN, H. THADAKAMALLA The Dept. of Industrial Engineering,

### Transform-based Domain Adaptation for Big Data

Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from

### On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers

On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers Francis R. Bach Computer Science Division University of California Berkeley, CA 9472 fbach@cs.berkeley.edu Abstract

### Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

### Support Vector Machines

Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

### Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### WE DEFINE spam as an e-mail message that is unwanted basically

1048 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Support Vector Machines for Spam Categorization Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir

### A Support Vector Method for Multivariate Performance Measures

Thorsten Joachims Cornell University, Dept. of Computer Science, 4153 Upson Hall, Ithaca, NY 14853 USA tj@cs.cornell.edu Abstract This paper presents a Support Vector Method for optimizing multivariate

### The Set Covering Machine

Journal of Machine Learning Research 3 (2002) 723-746 Submitted 12/01; Published 12/02 The Set Covering Machine Mario Marchand School of Information Technology and Engineering University of Ottawa Ottawa,

### Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

### Open-Set Face Recognition-based Visitor Interface System

Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe

### Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

### Tree based ensemble models regularization by convex optimization

Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000

### Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

### Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations

ISSN: 2278-181 Vol. 2 Issue 9, September - 213 Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations Author :Sushama Chouhan Author Affiliation: MTech Scholar

### Early defect identification of semiconductor processes using machine learning

STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew

### An Efficient Implementation of an Active Set Method for SVMs

Journal of Machine Learning Research 7 (2006) 2237-2257 Submitted 10/05; Revised 2/06; Published 10/06 An Efficient Implementation of an Active Set Method for SVMs Katya Scheinberg Mathematical Science

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

### A Study on SMO-type Decomposition Methods for Support Vector Machines

1 A Study on SMO-type Decomposition Methods for Support Vector Machines Pai-Hsuen Chen, Rong-En Fan, and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan cjlin@csie.ntu.edu.tw

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Online Active Learning Methods for Fast Label-Efficient Spam Filtering

Online Active Learning Methods for Fast Label-Efficient Spam Filtering D. Sculley Department of Computer Science Tufts University, Medford, MA USA dsculley@cs.tufts.edu ABSTRACT Active learning methods

### Equity forecast: Predicting long term stock price movement using machine learning

Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK Nikola.milosevic@manchester.ac.uk Abstract Long

### Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

### Adapting Codes and Embeddings for Polychotomies

Adapting Codes and Embeddings for Polychotomies Gunnar Rätsch, Alexander J. Smola RSISE, CSL, Machine Learning Group The Australian National University Canberra, 2 ACT, Australia Gunnar.Raetsch, Alex.Smola

### Support Vector Clustering

Journal of Machine Learning Research 2 (2) 25-37 Submitted 3/4; Published 2/ Support Vector Clustering Asa Ben-Hur BIOwulf Technologies 23 Addison st. suite 2, Berkeley, CA 9474, USA David Horn School

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC for Gene Expression Data Analysis

Combining SVM Classifiers Using Genetic Fuzzy Systems Based on AUC for Gene Expression Data Analysis Xiujuan Chen 1, Yichuan Zhao 2, Yan-Qing Zhang 1, and Robert Harrison 1 1 Department of Computer Science,

### Review of Computer Engineering Research WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM. Geeta R.B.* Shobha R.B.

Review of Computer Engineering Research journal homepage: http://www.pakinsight.com/?ic=journal&journal=76 WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM Geeta R.B.* Department

### ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization

ME128 Computer-ided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation

### Online Learning in Biometrics: A Case Study in Face Classifier Update

Online Learning in Biometrics: A Case Study in Face Classifier Update Richa Singh, Mayank Vatsa, Arun Ross, and Afzel Noore Abstract In large scale applications, hundreds of new subjects may be regularly

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Building MLP networks by construction

University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2000 Building MLP networks by construction Ah Chung Tsoi University of

### Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

### Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

### Density Level Detection is Classification

Density Level Detection is Classification Ingo Steinwart, Don Hush and Clint Scovel Modeling, Algorithms and Informatics Group, CCS-3 Los Alamos National Laboratory {ingo,dhush,jcs}@lanl.gov Abstract We

### Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets

Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

### Introduction to Machine Learning

Introduction to Machine Learning Isabelle Guyon isabelle@clopinet.com What is Machine Learning? Learning algorithm Trained machine TRAINING DATA Answer Query What for? Classification Time series prediction

### Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,