Regression Using Support Vector Machines: Basic Foundations

Size: px
Start display at page:

Download "Regression Using Support Vector Machines: Basic Foundations"

Transcription

1 Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering Department University of Louisville Louisville, KY 40292

2 1 Regression Using Support Vector Machines: Basic Foundations Support Vector Machines (SVM) were developed by Vapnik [1] to solve the classification problem, but recently, SVM have been successfully extended to regression and density estimation problems [2]. SVM are gaining popularity due to many attractive features and promising empirical performance. For instance, the formulation of SVM density estimation employs the Structural Risk Minimization (SRM) principle, which has been shown to be superior to the traditional Empirical Risk Minimization (ERM) principle employed in conventional learning algorithms (e.g. neural networks) [3]. SRM minimizes an upper bound on the generalization error as opposed to ERM, which minimizes the error on the training data. This difference makes SVM more attractive in statistical learning applications. The traditional formulation of the SVM density estimation problem raises a quadratic optimization problem of the same size as the training data set. This computationally demanding optimization problem prevents the SVM from being the default choice of the pattern recognition community [4]. Several approaches have been introduced for circumventing the above shortcomings of the SVM learning. These include simpler optimization criterion for SVM design (e.g. the kernel ADA- TRON [5]), specialized QP algorithms like the conjugate gradient method, decomposition techniques (which break down the large QP problem into a series of smaller QP sub-problems), the sequential minimal optimization (SMO) algorithm and its various extensions [6], Nystrom approximations [7], and greedy Bayesian methods [8] and the Chunking algorithm [9]. Recently, active learning has become a popular paradigm for reducing the sample complexity of large-scale learning tasks (e.g. [10 12]). In active learning, instead of learning from random samples, the learner has the ability to select its own training data. This is done iteratively and the output of one step is used to select the examples for the next step. This tutorial presents the mathematical foundations of the SVM regression algorithm. Then, it presents a new learning algorithm which uses the Mean Field (MF) theory. The MF methods provide efficient approximations which are able to cope with the complexity of probabilistic data models [13]. MF methods replace the intractable task of computing high dimensional sums and integrals by the much easier problem of solving a system of linear equations. The regression problem is formu-

3 1 Problem Statement and Some Basic Principles 2 lated so that the MF method can be used to approximate the learning procedure in a way that avoids the quadratic programming optimization. This proposed approach is suitable for high dimensional regression problems and several experimental examples are presented. 1 Problem Statement and Some Basic Principles The regression problem can be stated as: given a training data set D = {(y i, t i ) i = 1, 2,..., n}, of input vectors y i and associated targets t i, the goal is to fit a function g(y) which approximates the relation inherited between the data set points and it can be used later on to infer the output t for a new input data point y. Any practical regression algorithm has a loss function L (t, g(y)), which describes how the estimated function deviated from the true one. Many forms for the loss function can be found in the literature: e.g. linear, quadratic loss function, exponential, etc. In this tutorial, Vapnik s loss function is used, which is known as ε insensitive loss function and defined as: 0 if t g(y) ε L (t, g(y)) = (1) t g(y) ε otherwise Figure 1: The soft margin loss function. where ε> 0 is a predefined constant which controls the noise tolerance. With the ε insensitive loss function, the goal is to find g(y) that has at most ε deviation from the actually obtained targets t i for all training data, and at the same time as flat as possible. In other words, the regression algorithm does not care about errors as long as they are less than ε, but will not accept any deviation larger than this.

4 2 Classical Formulation of the Regression Problem 3 For pedagogical reasons, the following discussion begins by describing the case of linear functions g, taking the form: f(y) = w.y + b (2) where w Y, Y is the input space, b R, and w.y is the dot product of the vectors w and y. 2 Classical Formulation of the Regression Problem As stated before, the goal of a regression algorithm is to fit a flat function to the data points. Flatness in the case of Eq. (2) means that one seeks a small w. One way to ensure this flatness is to minimize the norm, i.e. w 2. Thus, the regression problem can be written as a convex optimization problem: minimize subject to 1 2 w 2 (3) t i (w.y + b) ε (4) (w.y + b) t i ε The implied assumption in Eq.(4) is that such a function g actually exists that approximates all pairs (y i, t i ) with ε precision, or in other words, that the convex optimization problem is feasible. Sometimes, however, this may not be the case, or we also may want to allow for some errors. Analogously to the soft margin loss function [14] which was adapted to SVM machines Vapnik [15], slack variables ζ i, ζi can be introduced to cope with otherwise infeasible constraints of the optimization problem in Eq.(4). Hence the formulation stated in [15] is attained: minimize subject to 1 2 w 2 + C (ζ i + ζi ) (5) t i (w.y + b) ε + ζ i (w.y + b) t i ε + ζ (6) i ζ i, ζi 0 The constant C > 0 determines the trade-off between the flatness of g and the amount up to which deviations larger than ε are tolerated. This corresponds to dealing with the so called ε-insensitive loss function which described before.

5 2.1 Dual problem and quadratic programming 4 As shown in Fig.1, only the points outside the shaded region contribute to the cost insofar, as the deviations are penalized in a linear fashion. It turns out that in most cases the optimization problem Eq. (6) can be solved more easily in its dual formulation. Moreover, the dual formulation provides the key for extending SVM machine to nonlinear functions. Hence, a standard dualization method utilizing Lagrange multipliers will be described next. 2.1 Dual problem and quadratic programming The minimization problem in Eq. (6) is called the primal objective function. The key idea of the dual problem is to construct a Lagrange function from the primal objective function and the corresponding constraints, by introducing a dual set of variables. It can be shown that the Lagrange function has a saddle point with respect to the primal and dual variables at the solution (for details see e.g. [16], [17]). The primal objective function with its constraints are transformed to the Lagrange function as follows: L = 1 2 w 2 + C (ζ i + ζi ) (λ i ζ i + λ i ζi ) α i (ε + ζ i t i + (w.y + b)) αi (ε + ζi + t i (w.y + b)) (7) Here L is the Lagrangian and α i, α i, λ i, and λ i are Lagrange multipliers. Hence the dual variables in Eq. (7) have to satisfy positivity constraints: α i, α i, λ i, λ i 0. (8) It follows from the saddle point condition that the partial derivatives of L with respect to the primal variables (w, b, ζ i, ζ i ) have to vanish for optimality: (Note α ( ) i, refers to α i, and α i. b L = w L = (αi α i ) = 0 (9) (αi α i )y i = 0 (10) ( ) ζ L =C α ( ) i λ ( ) i = 0 (11) i

6 2.2 Support Vectors 5 Substituting from Eqs. (9),(10), and (11) into Eq. (7) yields the dual optimization problem: maximize 1 (α i αi )(α j α 2 j)(y i.y j ) ε (α i + αi ) + y i (α i αi ) i,j=1 subject to (α i αi ) = 0 and α i, αi [0, C] (12) In deriving Eq. (12), the dual variables λ i, λ i are eliminated through the condition in Eq. (11) which can be reformulated as λ ( ) i = C α ( ) i. Eq. (9) can be rewritten as follows: w = g(y) = (α i αi )y i, thus: (α i αi )(y i.y) + b (13) This is the so-called Support Vector Machines regression expansion, i.e. w can be completely described as a linear combination of the training patterns y i. In a sense, the complexity of a function s representation by SVs is independent of the dimensionality of the input space Y, and depends only on the number of SVs. Moreover, the complete algorithm can be described in terms of dot products between the data. Even when evaluating g(y), the value of w does not need to be computed explicitly. These observations will come in handy for the formulation of a nonlinear extension. 2.2 Support Vectors The Karush-Kuhn-Tucker (KKT) conditions [18, 19] are the basics for the Lagrangian solution. These conditions state that at the solution point, the product between dual variables and constraints has to vanish i.e.: α i (ε + ζ i t i + w.y i + b) = 0 αi (ε + ζ i + t i w.y i b) = 0 (14) (C α i )ζ i = 0 (C αi )ζi = 0 (15)

7 2.3 Computing b 6 Several useful conclusions can be drawn from these conditions. Firstly only samples (y i, t i ) with corresponding α ( ) i a set of dual variables α i, α i that: = C lie outside the ε-insensitive tube. Secondly α i α i = 0, i.e. there can never be = 0 which are both simultaneously nonzero. This allows to conclude ε t i + w.y i + b 0 and ζ i = 0 if α i C (16) ε t i + w.y i + b 0 if α i > 0 (17) (18) A final note has to be made regarding the sparsity of the SVM expansion. From Eq. (14) it follows that only for g(y) ε the Lagrange multipliers may be nonzero, or in other words, for all samples inside the ε-tube (i.e. the shaded region in Fig. (1)) the α i, αi vanish: for g(y) < ε the second factor in Eq. (14) is nonzero, hence α i, αi has to be zero such that the KKT conditions are satisfied. Therefore there is a sparse expansion of w in terms of y i (i.e. not all y i needed to describe w). The training samples that come with nonvanishing coefficients are called Support Vectors. 2.3 Computing b There are many ways to compute the value of b in Eq. (13). One of such ways can be found in [20]: b = 1 2 (w.(y r + y s )) (19) where y r and y s are the support vectors (i.e. any input vector which has nonzero value of either α i or α i respectively). 3 Nonlinear Regression: The Kernel Trick The next step is to make the SVM algorithm nonlinear. This, for instance, could be achieved by simply preprocessing the training patterns y i by a map Ψ : Y I into some feature space I, as described in [1], and then applying the standard SVM regression algorithm. Here is a brief look at an example given in [1]. Example 1 (Quadratic features in R2)

8

9

10

11

12

13

14

15

16

17

18

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Convex Optimization SVM s and Kernel Machines

Convex Optimization SVM s and Kernel Machines Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

A New Quantitative Behavioral Model for Financial Prediction

A New Quantitative Behavioral Model for Financial Prediction 2011 3rd International Conference on Information and Financial Engineering IPEDR vol.12 (2011) (2011) IACSIT Press, Singapore A New Quantitative Behavioral Model for Financial Prediction Thimmaraya Ramesh

More information

Introduction to Machine Learning NPFL 054

Introduction to Machine Learning NPFL 054 Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and

More information

Notes on Support Vector Machines

Notes on Support Vector Machines Notes on Support Vector Machines Fernando Mira da Silva Fernando.Silva@inesc.pt Neural Network Group I N E S C November 1998 Abstract This report describes an empirical study of Support Vector Machines

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization

ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization ME128 Computer-ided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation

More information

Optimization of Design. Lecturer:Dung-An Wang Lecture 12

Optimization of Design. Lecturer:Dung-An Wang Lecture 12 Optimization of Design Lecturer:Dung-An Wang Lecture 12 Lecture outline Reading: Ch12 of text Today s lecture 2 Constrained nonlinear programming problem Find x=(x1,..., xn), a design variable vector of

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Using artificial intelligence for data reduction in mechanical engineering

Using artificial intelligence for data reduction in mechanical engineering Using artificial intelligence for data reduction in mechanical engineering L. Mdlazi 1, C.J. Stander 1, P.S. Heyns 1, T. Marwala 2 1 Dynamic Systems Group Department of Mechanical and Aeronautical Engineering,

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

More information

By W.E. Diewert. July, Linear programming problems are important for a number of reasons:

By W.E. Diewert. July, Linear programming problems are important for a number of reasons: APPLIED ECONOMICS By W.E. Diewert. July, 3. Chapter : Linear Programming. Introduction The theory of linear programming provides a good introduction to the study of constrained maximization (and minimization)

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

A Survey of Kernel Clustering Methods

A Survey of Kernel Clustering Methods A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering

More information

Linear Regression CS434. Supervised learning

Linear Regression CS434. Supervised learning Linear Regression CS434 A regression problem We want to learn to predict a person s height based on his/her knee height and/or arm span This is useful for patients who are bed bound and cannot stand to

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Advanced Topics in Machine Learning (Part II)

Advanced Topics in Machine Learning (Part II) Advanced Topics in Machine Learning (Part II) 3. Convexity and Optimisation February 6, 2009 Andreas Argyriou 1 Today s Plan Convex sets and functions Types of convex programs Algorithms Convex learning

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

THEORY OF SIMPLEX METHOD

THEORY OF SIMPLEX METHOD Chapter THEORY OF SIMPLEX METHOD Mathematical Programming Problems A mathematical programming problem is an optimization problem of finding the values of the unknown variables x, x,, x n that maximize

More information

Shiqian Ma, SEEM 5121, Dept. of SEEM, CUHK 1. Chapter 2. Convex Optimization

Shiqian Ma, SEEM 5121, Dept. of SEEM, CUHK 1. Chapter 2. Convex Optimization Shiqian Ma, SEEM 5121, Dept. of SEEM, CUHK 1 Chapter 2 Convex Optimization Shiqian Ma, SEEM 5121, Dept. of SEEM, CUHK 2 2.1. Convex Optimization General optimization problem: min f 0 (x) s.t., f i (x)

More information

Learning Using Privileged Information: Similarity Control and Knowledge Transfer

Learning Using Privileged Information: Similarity Control and Knowledge Transfer Journal of Machine Learning Research 16 (2015) 2023-2049 Submitted 7/15; Published 9/15 In memory of Alexey Chervonenkis Learning Using Privileged Information: Similarity Control and Knowledge Transfer

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

A Study on the Comparison of Electricity Forecasting Models: Korea and China

A Study on the Comparison of Electricity Forecasting Models: Korea and China Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison

More information

LINEAR PROGRAMMING P V Ram B. Sc., ACA, ACMA Hyderabad

LINEAR PROGRAMMING P V Ram B. Sc., ACA, ACMA Hyderabad LINEAR PROGRAMMING P V Ram B. Sc., ACA, ACMA 98481 85073 Hyderabad Page 1 of 19 Question: Explain LPP. Answer: Linear programming is a mathematical technique for determining the optimal allocation of resources

More information

Linear Systems. Singular and Nonsingular Matrices. Find x 1, x 2, x 3 such that the following three equations hold:

Linear Systems. Singular and Nonsingular Matrices. Find x 1, x 2, x 3 such that the following three equations hold: Linear Systems Example: Find x, x, x such that the following three equations hold: x + x + x = 4x + x + x = x + x + x = 6 We can write this using matrix-vector notation as 4 {{ A x x x {{ x = 6 {{ b General

More information

Perceptron Learning Algorithm

Perceptron Learning Algorithm Perceptron Learning Algorithm Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Separating Hyperplanes Construct linear decision boundaries that explicitly try to separate

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

More information

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

More information

Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Definition of a Linear Program

Definition of a Linear Program Definition of a Linear Program Definition: A function f(x 1, x,..., x n ) of x 1, x,..., x n is a linear function if and only if for some set of constants c 1, c,..., c n, f(x 1, x,..., x n ) = c 1 x 1

More information

CHAPTER 17. Linear Programming: Simplex Method

CHAPTER 17. Linear Programming: Simplex Method CHAPTER 17 Linear Programming: Simplex Method CONTENTS 17.1 AN ALGEBRAIC OVERVIEW OF THE SIMPLEX METHOD Algebraic Properties of the Simplex Method Determining a Basic Solution Basic Feasible Solution 17.2

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Chapter 4 Sequential Quadratic Programming

Chapter 4 Sequential Quadratic Programming Optimization I; Chapter 4 77 Chapter 4 Sequential Quadratic Programming 4.1 The Basic SQP Method 4.1.1 Introductory Definitions and Assumptions Sequential Quadratic Programming (SQP) is one of the most

More information

Interior-Point Algorithms for Quadratic Programming

Interior-Point Algorithms for Quadratic Programming Interior-Point Algorithms for Quadratic Programming Thomas Reslow Krüth Kongens Lyngby 2008 IMM-M.Sc-2008-19 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800

More information

max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x

max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that

More information

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 7, JULY 2009 1181 The Global Kernel k-means Algorithm for Clustering in Feature Space Grigorios F. Tzortzis and Aristidis C. Likas, Senior Member, IEEE

More information

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian

More information

Minimize subject to. x S R

Minimize subject to. x S R Chapter 12 Lagrangian Relaxation This chapter is mostly inspired by Chapter 16 of [1]. In the previous chapters, we have succeeded to find efficient algorithms to solve several important problems such

More information

ORF 523 Lecture 8 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 8, 2016

ORF 523 Lecture 8 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 8, 2016 ORF 523 Lecture 8 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, March 8, 2016 When in doubt on the accuracy of these notes, please cross check with the instructor s

More information

Mathematical Programming

Mathematical Programming 1 The Add-in constructs models that can be solved using the Solver Add-in or one of the solution add-ins provided in the collection. When the Math Programming add-in is installed, several new command lines

More information

Multi-Objective Optimization

Multi-Objective Optimization Multi-Objective Optimization A quick introduction Giuseppe Narzisi Courant Institute of Mathematical Sciences New York University 24 January 2008 Outline 1 Introduction Motivations Definition Notion of

More information

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology FE670 Algorithmic Trading Strategies Lecture 6. Portfolio Optimization: Basic Theory and Practice Steve Yang Stevens Institute of Technology 10/03/2013 Outline 1 Mean-Variance Analysis: Overview 2 Classical

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

Duality in Linear Programming

Duality in Linear Programming Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadow-price interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Chapter 2 Goal Programming Variants

Chapter 2 Goal Programming Variants Chapter 2 Goal Programming Variants This chapter introduces the major goal programming variants. The purpose and underlying philosophy of each variant are given. The three major variants in terms of underlying

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon

Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon lp.nb 1 Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon lp.nb 2 Overview This is a tutorial about some interesting math and geometry connected with constrained optimization. It is not

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Review of Computer Engineering Research WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM. Geeta R.B.* Shobha R.B.

Review of Computer Engineering Research WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM. Geeta R.B.* Shobha R.B. Review of Computer Engineering Research journal homepage: http://www.pakinsight.com/?ic=journal&journal=76 WEB PAGES CATEGORIZATION BASED ON CLASSIFICATION & OUTLIER ANALYSIS THROUGH FSVM Geeta R.B.* Department

More information

3.4. Solving Simultaneous Linear Equations. Introduction. Prerequisites. Learning Outcomes

3.4. Solving Simultaneous Linear Equations. Introduction. Prerequisites. Learning Outcomes Solving Simultaneous Linear Equations 3.4 Introduction Equations often arise in which there is more than one unknown quantity. When this is the case there will usually be more than one equation involved.

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 10, 2014 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and

More information

Factorization Machines

Factorization Machines Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan rendle@ar.sanken.osaka-u.ac.jp Abstract In this

More information

The Power Method for Eigenvalues and Eigenvectors

The Power Method for Eigenvalues and Eigenvectors Numerical Analysis Massoud Malek The Power Method for Eigenvalues and Eigenvectors The spectrum of a square matrix A, denoted by σ(a) is the set of all eigenvalues of A. The spectral radius of A, denoted

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Linear Programming in Matrix Form

Linear Programming in Matrix Form Linear Programming in Matrix Form Appendix B We first introduce matrix concepts in linear programming by developing a variation of the simplex method called the revised simplex method. This algorithm,

More information

Support Vector Machines

Support Vector Machines CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)

More information

Feature Extraction by Neural Network Nonlinear Mapping for Pattern Classification

Feature Extraction by Neural Network Nonlinear Mapping for Pattern Classification Lerner et al.:feature Extraction by NN Nonlinear Mapping 1 Feature Extraction by Neural Network Nonlinear Mapping for Pattern Classification B. Lerner, H. Guterman, M. Aladjem, and I. Dinstein Department

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning.  CS 2750 Machine Learning. Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x-5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Operations Research An Introduction

Operations Research An Introduction Operations Research An Introduction Ninth Edition Hamdy A. Taha University of Arkansas, Fayettevilie Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

LAGRANGIAN RELAXATION TECHNIQUES FOR LARGE SCALE OPTIMIZATION

LAGRANGIAN RELAXATION TECHNIQUES FOR LARGE SCALE OPTIMIZATION LAGRANGIAN RELAXATION TECHNIQUES FOR LARGE SCALE OPTIMIZATION Kartik Sivaramakrishnan Department of Mathematics NC State University kksivara@ncsu.edu http://www4.ncsu.edu/ kksivara SIAM/MGSA Brown Bag

More information

The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

More information

The Method of Lagrange Multipliers

The Method of Lagrange Multipliers The Method of Lagrange Multipliers S. Sawyer October 25, 2002 1. Lagrange s Theorem. Suppose that we want to maximize (or imize a function of n variables f(x = f(x 1, x 2,..., x n for x = (x 1, x 2,...,

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

From Maxent to Machine Learning and Back

From Maxent to Machine Learning and Back From Maxent to Machine Learning and Back T. Sears ANU March 2007 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent 2007 1 / 36 50 Years Ago... The principles and mathematical methods of statistical

More information

Congestion Control 2: Utility, Fairness and Optimization in Resource Allocation

Congestion Control 2: Utility, Fairness and Optimization in Resource Allocation Mathematical Modelling for Computer Networks- Part I Spring 203 (Period 4) Congestion Control 2: Utility, Fairness and Optimization in Resource Allocation Lecturers: Laila Daniel and Krishnan Narayanan

More information

Minimizing costs for transport buyers using integer programming and column generation. Eser Esirgen

Minimizing costs for transport buyers using integer programming and column generation. Eser Esirgen MASTER STHESIS Minimizing costs for transport buyers using integer programming and column generation Eser Esirgen DepartmentofMathematicalSciences CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG

More information

A Robust Formulation of the Uncertain Set Covering Problem

A Robust Formulation of the Uncertain Set Covering Problem A Robust Formulation of the Uncertain Set Covering Problem Dirk Degel Pascal Lutter Chair of Management, especially Operations Research Ruhr-University Bochum Universitaetsstrasse 150, 44801 Bochum, Germany

More information

Lecture 1: Linear Programming Models. Readings: Chapter 1; Chapter 2, Sections 1&2

Lecture 1: Linear Programming Models. Readings: Chapter 1; Chapter 2, Sections 1&2 Lecture 1: Linear Programming Models Readings: Chapter 1; Chapter 2, Sections 1&2 1 Optimization Problems Managers, planners, scientists, etc., are repeatedly faced with complex and dynamic systems which

More information

Curve Fitting. Next: Numerical Differentiation and Integration Up: Numerical Analysis for Chemical Previous: Optimization.

Curve Fitting. Next: Numerical Differentiation and Integration Up: Numerical Analysis for Chemical Previous: Optimization. Next: Numerical Differentiation and Integration Up: Numerical Analysis for Chemical Previous: Optimization Subsections Least-Squares Regression Linear Regression General Linear Least-Squares Nonlinear

More information

Chapter 3 LINEAR PROGRAMMING GRAPHICAL SOLUTION 3.1 SOLUTION METHODS 3.2 TERMINOLOGY

Chapter 3 LINEAR PROGRAMMING GRAPHICAL SOLUTION 3.1 SOLUTION METHODS 3.2 TERMINOLOGY Chapter 3 LINEAR PROGRAMMING GRAPHICAL SOLUTION 3.1 SOLUTION METHODS Once the problem is formulated by setting appropriate objective function and constraints, the next step is to solve it. Solving LPP

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information