Regression Using Support Vector Machines: Basic Foundations


 Claribel Malone
 1 years ago
 Views:
Transcription
1 Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering Department University of Louisville Louisville, KY 40292
2 1 Regression Using Support Vector Machines: Basic Foundations Support Vector Machines (SVM) were developed by Vapnik [1] to solve the classification problem, but recently, SVM have been successfully extended to regression and density estimation problems [2]. SVM are gaining popularity due to many attractive features and promising empirical performance. For instance, the formulation of SVM density estimation employs the Structural Risk Minimization (SRM) principle, which has been shown to be superior to the traditional Empirical Risk Minimization (ERM) principle employed in conventional learning algorithms (e.g. neural networks) [3]. SRM minimizes an upper bound on the generalization error as opposed to ERM, which minimizes the error on the training data. This difference makes SVM more attractive in statistical learning applications. The traditional formulation of the SVM density estimation problem raises a quadratic optimization problem of the same size as the training data set. This computationally demanding optimization problem prevents the SVM from being the default choice of the pattern recognition community [4]. Several approaches have been introduced for circumventing the above shortcomings of the SVM learning. These include simpler optimization criterion for SVM design (e.g. the kernel ADA TRON [5]), specialized QP algorithms like the conjugate gradient method, decomposition techniques (which break down the large QP problem into a series of smaller QP subproblems), the sequential minimal optimization (SMO) algorithm and its various extensions [6], Nystrom approximations [7], and greedy Bayesian methods [8] and the Chunking algorithm [9]. Recently, active learning has become a popular paradigm for reducing the sample complexity of largescale learning tasks (e.g. [10 12]). In active learning, instead of learning from random samples, the learner has the ability to select its own training data. This is done iteratively and the output of one step is used to select the examples for the next step. This tutorial presents the mathematical foundations of the SVM regression algorithm. Then, it presents a new learning algorithm which uses the Mean Field (MF) theory. The MF methods provide efficient approximations which are able to cope with the complexity of probabilistic data models [13]. MF methods replace the intractable task of computing high dimensional sums and integrals by the much easier problem of solving a system of linear equations. The regression problem is formu
3 1 Problem Statement and Some Basic Principles 2 lated so that the MF method can be used to approximate the learning procedure in a way that avoids the quadratic programming optimization. This proposed approach is suitable for high dimensional regression problems and several experimental examples are presented. 1 Problem Statement and Some Basic Principles The regression problem can be stated as: given a training data set D = {(y i, t i ) i = 1, 2,..., n}, of input vectors y i and associated targets t i, the goal is to fit a function g(y) which approximates the relation inherited between the data set points and it can be used later on to infer the output t for a new input data point y. Any practical regression algorithm has a loss function L (t, g(y)), which describes how the estimated function deviated from the true one. Many forms for the loss function can be found in the literature: e.g. linear, quadratic loss function, exponential, etc. In this tutorial, Vapnik s loss function is used, which is known as ε insensitive loss function and defined as: 0 if t g(y) ε L (t, g(y)) = (1) t g(y) ε otherwise Figure 1: The soft margin loss function. where ε> 0 is a predefined constant which controls the noise tolerance. With the ε insensitive loss function, the goal is to find g(y) that has at most ε deviation from the actually obtained targets t i for all training data, and at the same time as flat as possible. In other words, the regression algorithm does not care about errors as long as they are less than ε, but will not accept any deviation larger than this.
4 2 Classical Formulation of the Regression Problem 3 For pedagogical reasons, the following discussion begins by describing the case of linear functions g, taking the form: f(y) = w.y + b (2) where w Y, Y is the input space, b R, and w.y is the dot product of the vectors w and y. 2 Classical Formulation of the Regression Problem As stated before, the goal of a regression algorithm is to fit a flat function to the data points. Flatness in the case of Eq. (2) means that one seeks a small w. One way to ensure this flatness is to minimize the norm, i.e. w 2. Thus, the regression problem can be written as a convex optimization problem: minimize subject to 1 2 w 2 (3) t i (w.y + b) ε (4) (w.y + b) t i ε The implied assumption in Eq.(4) is that such a function g actually exists that approximates all pairs (y i, t i ) with ε precision, or in other words, that the convex optimization problem is feasible. Sometimes, however, this may not be the case, or we also may want to allow for some errors. Analogously to the soft margin loss function [14] which was adapted to SVM machines Vapnik [15], slack variables ζ i, ζi can be introduced to cope with otherwise infeasible constraints of the optimization problem in Eq.(4). Hence the formulation stated in [15] is attained: minimize subject to 1 2 w 2 + C (ζ i + ζi ) (5) t i (w.y + b) ε + ζ i (w.y + b) t i ε + ζ (6) i ζ i, ζi 0 The constant C > 0 determines the tradeoff between the flatness of g and the amount up to which deviations larger than ε are tolerated. This corresponds to dealing with the so called εinsensitive loss function which described before.
5 2.1 Dual problem and quadratic programming 4 As shown in Fig.1, only the points outside the shaded region contribute to the cost insofar, as the deviations are penalized in a linear fashion. It turns out that in most cases the optimization problem Eq. (6) can be solved more easily in its dual formulation. Moreover, the dual formulation provides the key for extending SVM machine to nonlinear functions. Hence, a standard dualization method utilizing Lagrange multipliers will be described next. 2.1 Dual problem and quadratic programming The minimization problem in Eq. (6) is called the primal objective function. The key idea of the dual problem is to construct a Lagrange function from the primal objective function and the corresponding constraints, by introducing a dual set of variables. It can be shown that the Lagrange function has a saddle point with respect to the primal and dual variables at the solution (for details see e.g. [16], [17]). The primal objective function with its constraints are transformed to the Lagrange function as follows: L = 1 2 w 2 + C (ζ i + ζi ) (λ i ζ i + λ i ζi ) α i (ε + ζ i t i + (w.y + b)) αi (ε + ζi + t i (w.y + b)) (7) Here L is the Lagrangian and α i, α i, λ i, and λ i are Lagrange multipliers. Hence the dual variables in Eq. (7) have to satisfy positivity constraints: α i, α i, λ i, λ i 0. (8) It follows from the saddle point condition that the partial derivatives of L with respect to the primal variables (w, b, ζ i, ζ i ) have to vanish for optimality: (Note α ( ) i, refers to α i, and α i. b L = w L = (αi α i ) = 0 (9) (αi α i )y i = 0 (10) ( ) ζ L =C α ( ) i λ ( ) i = 0 (11) i
6 2.2 Support Vectors 5 Substituting from Eqs. (9),(10), and (11) into Eq. (7) yields the dual optimization problem: maximize 1 (α i αi )(α j α 2 j)(y i.y j ) ε (α i + αi ) + y i (α i αi ) i,j=1 subject to (α i αi ) = 0 and α i, αi [0, C] (12) In deriving Eq. (12), the dual variables λ i, λ i are eliminated through the condition in Eq. (11) which can be reformulated as λ ( ) i = C α ( ) i. Eq. (9) can be rewritten as follows: w = g(y) = (α i αi )y i, thus: (α i αi )(y i.y) + b (13) This is the socalled Support Vector Machines regression expansion, i.e. w can be completely described as a linear combination of the training patterns y i. In a sense, the complexity of a function s representation by SVs is independent of the dimensionality of the input space Y, and depends only on the number of SVs. Moreover, the complete algorithm can be described in terms of dot products between the data. Even when evaluating g(y), the value of w does not need to be computed explicitly. These observations will come in handy for the formulation of a nonlinear extension. 2.2 Support Vectors The KarushKuhnTucker (KKT) conditions [18, 19] are the basics for the Lagrangian solution. These conditions state that at the solution point, the product between dual variables and constraints has to vanish i.e.: α i (ε + ζ i t i + w.y i + b) = 0 αi (ε + ζ i + t i w.y i b) = 0 (14) (C α i )ζ i = 0 (C αi )ζi = 0 (15)
7 2.3 Computing b 6 Several useful conclusions can be drawn from these conditions. Firstly only samples (y i, t i ) with corresponding α ( ) i a set of dual variables α i, α i that: = C lie outside the εinsensitive tube. Secondly α i α i = 0, i.e. there can never be = 0 which are both simultaneously nonzero. This allows to conclude ε t i + w.y i + b 0 and ζ i = 0 if α i C (16) ε t i + w.y i + b 0 if α i > 0 (17) (18) A final note has to be made regarding the sparsity of the SVM expansion. From Eq. (14) it follows that only for g(y) ε the Lagrange multipliers may be nonzero, or in other words, for all samples inside the εtube (i.e. the shaded region in Fig. (1)) the α i, αi vanish: for g(y) < ε the second factor in Eq. (14) is nonzero, hence α i, αi has to be zero such that the KKT conditions are satisfied. Therefore there is a sparse expansion of w in terms of y i (i.e. not all y i needed to describe w). The training samples that come with nonvanishing coefficients are called Support Vectors. 2.3 Computing b There are many ways to compute the value of b in Eq. (13). One of such ways can be found in [20]: b = 1 2 (w.(y r + y s )) (19) where y r and y s are the support vectors (i.e. any input vector which has nonzero value of either α i or α i respectively). 3 Nonlinear Regression: The Kernel Trick The next step is to make the SVM algorithm nonlinear. This, for instance, could be achieved by simply preprocessing the training patterns y i by a map Ψ : Y I into some feature space I, as described in [1], and then applying the standard SVM regression algorithm. Here is a brief look at an example given in [1]. Example 1 (Quadratic features in R2)
8
9
10
11
12
13
14
15
16
17
18
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Largemargin linear
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationConvex Optimization SVM s and Kernel Machines
Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationA New Quantitative Behavioral Model for Financial Prediction
2011 3rd International Conference on Information and Financial Engineering IPEDR vol.12 (2011) (2011) IACSIT Press, Singapore A New Quantitative Behavioral Model for Financial Prediction Thimmaraya Ramesh
More informationIntroduction to Machine Learning NPFL 054
Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and
More informationNotes on Support Vector Machines
Notes on Support Vector Machines Fernando Mira da Silva Fernando.Silva@inesc.pt Neural Network Group I N E S C November 1998 Abstract This report describes an empirical study of Support Vector Machines
More informationBig Data  Lecture 1 Optimization reminders
Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationNonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
More informationME128 ComputerAided Mechanical Design Course Notes Introduction to Design Optimization
ME128 Computerided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation
More informationOptimization of Design. Lecturer:DungAn Wang Lecture 12
Optimization of Design Lecturer:DungAn Wang Lecture 12 Lecture outline Reading: Ch12 of text Today s lecture 2 Constrained nonlinear programming problem Find x=(x1,..., xn), a design variable vector of
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationUsing artificial intelligence for data reduction in mechanical engineering
Using artificial intelligence for data reduction in mechanical engineering L. Mdlazi 1, C.J. Stander 1, P.S. Heyns 1, T. Marwala 2 1 Dynamic Systems Group Department of Mechanical and Aeronautical Engineering,
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10725/36725
Duality in General Programs Ryan Tibshirani Convex Optimization 10725/36725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More informationNonlinear Optimization: Algorithms 3: Interiorpoint methods
Nonlinear Optimization: Algorithms 3: Interiorpoint methods INSEAD, Spring 2006 JeanPhilippe Vert Ecole des Mines de Paris JeanPhilippe.Vert@mines.org Nonlinear optimization c 2006 JeanPhilippe Vert,
More informationlargescale machine learning revisited Léon Bottou Microsoft Research (NYC)
largescale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationBy W.E. Diewert. July, Linear programming problems are important for a number of reasons:
APPLIED ECONOMICS By W.E. Diewert. July, 3. Chapter : Linear Programming. Introduction The theory of linear programming provides a good introduction to the study of constrained maximization (and minimization)
More information10. Proximal point method
L. Vandenberghe EE236C Spring 201314) 10. Proximal point method proximal point method augmented Lagrangian method MoreauYosida smoothing 101 Proximal point method a conceptual algorithm for minimizing
More informationA Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
More informationLinear Regression CS434. Supervised learning
Linear Regression CS434 A regression problem We want to learn to predict a person s height based on his/her knee height and/or arm span This is useful for patients who are bed bound and cannot stand to
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationSupport Vector Machines
Support Vector Machines Here we approach the twoclass classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationAdvanced Topics in Machine Learning (Part II)
Advanced Topics in Machine Learning (Part II) 3. Convexity and Optimisation February 6, 2009 Andreas Argyriou 1 Today s Plan Convex sets and functions Types of convex programs Algorithms Convex learning
More informationA fast multiclass SVM learning method for huge databases
www.ijcsi.org 544 A fast multiclass SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and TalebAhmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
More informationTHEORY OF SIMPLEX METHOD
Chapter THEORY OF SIMPLEX METHOD Mathematical Programming Problems A mathematical programming problem is an optimization problem of finding the values of the unknown variables x, x,, x n that maximize
More informationShiqian Ma, SEEM 5121, Dept. of SEEM, CUHK 1. Chapter 2. Convex Optimization
Shiqian Ma, SEEM 5121, Dept. of SEEM, CUHK 1 Chapter 2 Convex Optimization Shiqian Ma, SEEM 5121, Dept. of SEEM, CUHK 2 2.1. Convex Optimization General optimization problem: min f 0 (x) s.t., f i (x)
More informationLearning Using Privileged Information: Similarity Control and Knowledge Transfer
Journal of Machine Learning Research 16 (2015) 20232049 Submitted 7/15; Published 9/15 In memory of Alexey Chervonenkis Learning Using Privileged Information: Similarity Control and Knowledge Transfer
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationA Study on the Comparison of Electricity Forecasting Models: Korea and China
Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 22877843 / Online ISSN 23834757 A Study on the Comparison
More informationLINEAR PROGRAMMING P V Ram B. Sc., ACA, ACMA Hyderabad
LINEAR PROGRAMMING P V Ram B. Sc., ACA, ACMA 98481 85073 Hyderabad Page 1 of 19 Question: Explain LPP. Answer: Linear programming is a mathematical technique for determining the optimal allocation of resources
More informationLinear Systems. Singular and Nonsingular Matrices. Find x 1, x 2, x 3 such that the following three equations hold:
Linear Systems Example: Find x, x, x such that the following three equations hold: x + x + x = 4x + x + x = x + x + x = 6 We can write this using matrixvector notation as 4 {{ A x x x {{ x = 6 {{ b General
More informationPerceptron Learning Algorithm
Perceptron Learning Algorithm Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Separating Hyperplanes Construct linear decision boundaries that explicitly try to separate
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@neclabs.com 1 Support Vector Machines: history SVMs introduced
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationLecture 11: Graphical Models for Inference
Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference  the Bayesian network and the Join tree. These two both represent the same joint probability
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationDefinition of a Linear Program
Definition of a Linear Program Definition: A function f(x 1, x,..., x n ) of x 1, x,..., x n is a linear function if and only if for some set of constants c 1, c,..., c n, f(x 1, x,..., x n ) = c 1 x 1
More informationCHAPTER 17. Linear Programming: Simplex Method
CHAPTER 17 Linear Programming: Simplex Method CONTENTS 17.1 AN ALGEBRAIC OVERVIEW OF THE SIMPLEX METHOD Algebraic Properties of the Simplex Method Determining a Basic Solution Basic Feasible Solution 17.2
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationTable 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.
Online PassiveAggressive Algorithms Koby Crammer Ofer Dekel Shai ShalevShwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationChapter 4 Sequential Quadratic Programming
Optimization I; Chapter 4 77 Chapter 4 Sequential Quadratic Programming 4.1 The Basic SQP Method 4.1.1 Introductory Definitions and Assumptions Sequential Quadratic Programming (SQP) is one of the most
More informationInteriorPoint Algorithms for Quadratic Programming
InteriorPoint Algorithms for Quadratic Programming Thomas Reslow Krüth Kongens Lyngby 2008 IMMM.Sc200819 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK2800
More informationmax cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x
Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationLinear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.
1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that
More informationIEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181 The Global Kernel kmeans Algorithm for Clustering in Feature Space Grigorios F. Tzortzis and Aristidis C. Likas, Senior Member, IEEE
More informationA Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM
Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian
More informationMinimize subject to. x S R
Chapter 12 Lagrangian Relaxation This chapter is mostly inspired by Chapter 16 of [1]. In the previous chapters, we have succeeded to find efficient algorithms to solve several important problems such
More informationORF 523 Lecture 8 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 8, 2016
ORF 523 Lecture 8 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 8, 2016 When in doubt on the accuracy of these notes, please cross check with the instructor s
More informationMathematical Programming
1 The Addin constructs models that can be solved using the Solver Addin or one of the solution addins provided in the collection. When the Math Programming addin is installed, several new command lines
More informationMultiObjective Optimization
MultiObjective Optimization A quick introduction Giuseppe Narzisi Courant Institute of Mathematical Sciences New York University 24 January 2008 Outline 1 Introduction Motivations Definition Notion of
More informationFE670 Algorithmic Trading Strategies. Stevens Institute of Technology
FE670 Algorithmic Trading Strategies Lecture 6. Portfolio Optimization: Basic Theory and Practice Steve Yang Stevens Institute of Technology 10/03/2013 Outline 1 MeanVariance Analysis: Overview 2 Classical
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More information24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NPcomplete. Then one can conclude according to the present state of science that no
More informationDuality in Linear Programming
Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadowprice interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More informationChapter 2 Goal Programming Variants
Chapter 2 Goal Programming Variants This chapter introduces the major goal programming variants. The purpose and underlying philosophy of each variant are given. The three major variants in terms of underlying
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationLinear Programming, Lagrange Multipliers, and Duality Geoff Gordon
lp.nb 1 Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon lp.nb 2 Overview This is a tutorial about some interesting math and geometry connected with constrained optimization. It is not
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationReview of Computer Engineering Research WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM. Geeta R.B.* Shobha R.B.
Review of Computer Engineering Research journal homepage: http://www.pakinsight.com/?ic=journal&journal=76 WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM Geeta R.B.* Department
More information3.4. Solving Simultaneous Linear Equations. Introduction. Prerequisites. Learning Outcomes
Solving Simultaneous Linear Equations 3.4 Introduction Equations often arise in which there is more than one unknown quantity. When this is the case there will usually be more than one equation involved.
More informationLinear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 10, 2014 1 Principle of maximum likelihood Consider a family of probability distributions
More informationSensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS
Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and
More informationFactorization Machines
Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan rendle@ar.sanken.osakau.ac.jp Abstract In this
More informationThe Power Method for Eigenvalues and Eigenvectors
Numerical Analysis Massoud Malek The Power Method for Eigenvalues and Eigenvectors The spectrum of a square matrix A, denoted by σ(a) is the set of all eigenvalues of A. The spectral radius of A, denoted
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More informationLinear Programming in Matrix Form
Linear Programming in Matrix Form Appendix B We first introduce matrix concepts in linear programming by developing a variation of the simplex method called the revised simplex method. This algorithm,
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationFeature Extraction by Neural Network Nonlinear Mapping for Pattern Classification
Lerner et al.:feature Extraction by NN Nonlinear Mapping 1 Feature Extraction by Neural Network Nonlinear Mapping for Pattern Classification B. Lerner, H. Guterman, M. Aladjem, and I. Dinstein Department
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.
Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationOperations Research An Introduction
Operations Research An Introduction Ninth Edition Hamdy A. Taha University of Arkansas, Fayettevilie Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationLAGRANGIAN RELAXATION TECHNIQUES FOR LARGE SCALE OPTIMIZATION
LAGRANGIAN RELAXATION TECHNIQUES FOR LARGE SCALE OPTIMIZATION Kartik Sivaramakrishnan Department of Mathematics NC State University kksivara@ncsu.edu http://www4.ncsu.edu/ kksivara SIAM/MGSA Brown Bag
More informationThe equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.winvector.com/blog/20/09/thesimplerderivationoflogisticregression/
More informationIncreasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
More informationEC 6310: Advanced Econometric Theory
EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.
More informationThe Method of Lagrange Multipliers
The Method of Lagrange Multipliers S. Sawyer October 25, 2002 1. Lagrange s Theorem. Suppose that we want to maximize (or imize a function of n variables f(x = f(x 1, x 2,..., x n for x = (x 1, x 2,...,
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationFrom Maxent to Machine Learning and Back
From Maxent to Machine Learning and Back T. Sears ANU March 2007 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent 2007 1 / 36 50 Years Ago... The principles and mathematical methods of statistical
More informationCongestion Control 2: Utility, Fairness and Optimization in Resource Allocation
Mathematical Modelling for Computer Networks Part I Spring 203 (Period 4) Congestion Control 2: Utility, Fairness and Optimization in Resource Allocation Lecturers: Laila Daniel and Krishnan Narayanan
More informationMinimizing costs for transport buyers using integer programming and column generation. Eser Esirgen
MASTER STHESIS Minimizing costs for transport buyers using integer programming and column generation Eser Esirgen DepartmentofMathematicalSciences CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG
More informationA Robust Formulation of the Uncertain Set Covering Problem
A Robust Formulation of the Uncertain Set Covering Problem Dirk Degel Pascal Lutter Chair of Management, especially Operations Research RuhrUniversity Bochum Universitaetsstrasse 150, 44801 Bochum, Germany
More informationLecture 1: Linear Programming Models. Readings: Chapter 1; Chapter 2, Sections 1&2
Lecture 1: Linear Programming Models Readings: Chapter 1; Chapter 2, Sections 1&2 1 Optimization Problems Managers, planners, scientists, etc., are repeatedly faced with complex and dynamic systems which
More informationCurve Fitting. Next: Numerical Differentiation and Integration Up: Numerical Analysis for Chemical Previous: Optimization.
Next: Numerical Differentiation and Integration Up: Numerical Analysis for Chemical Previous: Optimization Subsections LeastSquares Regression Linear Regression General Linear LeastSquares Nonlinear
More informationChapter 3 LINEAR PROGRAMMING GRAPHICAL SOLUTION 3.1 SOLUTION METHODS 3.2 TERMINOLOGY
Chapter 3 LINEAR PROGRAMMING GRAPHICAL SOLUTION 3.1 SOLUTION METHODS Once the problem is formulated by setting appropriate objective function and constraints, the next step is to solve it. Solving LPP
More informationCHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
More informationChapter 15 Introduction to Linear Programming
Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 WeiTa Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of
More informationMathematical finance and linear programming (optimization)
Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may
More information