On Convergence Rate of Concave-Convex Procedure
|
|
|
- Gerard Dickerson
- 9 years ago
- Views:
Transcription
1 On Convergence Rate of Concave-Conve Procedure Ian En-Hsu Yen Po-Wei Wang Nanyun Peng Johns Hopkins University Baltimore, MD Shou-De Lin Abstract Concave-Conve Procedure (CCCP) has been widely used to solve nonconve d.c.(difference of conve function) programs occur in learning problems, such as sparse support vector machine (SVM), transductive SVM, sparse principal componenent analysis (PCA), etc. Although the global convergence behavior of CC- CP has been well studied, the convergence rate of CCCP is still an open problem. Most of d.c. programs in machine learning involve constraints or nonsmooth objective function, which prohibits the convergence analysis via differentiable map. In this paper, we approach this problem in a different manner by connecting CC- CP with more general block coordinate decent method. We show that the recent convergence result [1] of coordinate gradient descent on nonconve, nonsmooth problem can also apply to eact alternating imization. This implies the convergence rate of CCCP is at least linear, if in d.c. program the nonsmooth part is piecewise-linear and the smooth part is strictly conve quadratic. Many d.c. programs in SVM literature fall in this case. 1 Introduction Concave-Conve Procedure is a popular method for optimizing d.c.(difference of conve function) program of the form: u() v() f i () 0, i = 1..p (1) where u(), v(), and f i () being conve functions, g j () being affine function, defined on R n. Suppose v() is (piecewise) differentiable, the Concave-Conve Procedure iteratively solves a sequence of conve program defined by linearizing the concave part: t+1 arg u() v( t ) T f i () 0, i = 1..p This procedure is originally proposed by Yuille etal.[2] to deal with unconstrained d.c. program with smooth objective function. Nevertheless, many d.c. programs in machine learning come with (2) 1
2 constraints or nonsmooth functions. For eample, [4] and [5] propose using a concave function to approimate l o -loss for sparse PCA and SVM feature selection repectively, which results in d.c. programs with conve constraints. In [11], [12], R.Collobert etal. formulate ramp-loss and transductive SVM as d.c. programs with nonsmooth u(), v(). Other eamples such as [9] and [13] use (1) with piecewise-linear u(), v() to handle nonconve tighter bound [9] and hidden variables [13] in structural-svm. Though CCCP is etensively used in machine learning, its convergence behavior has not been fully understood. Yuille etal. give an analysis of CCCP s global convergence in the original paper [2], which, however, is not complete [14]. The global convergence of CCCP is proved in [10] and [14] via different approaches. However, as [14] pointed out, the converence rate of CCCP is still an open problem. [6] and [7] have analyzed the local convergence behavior of Majorization Minimiation algorithm, where CCCP is a special case, by taking (2) as a differentiable map t+1 = M( t ), but the analysis only applies to the unconstrained, differetiable version of d.c. program. Thus it cannot be used for eamples mentioned above. In this paper, we approach this problem in a different way by connecting CCCP with more general block coordinate decent method. We show that the recent convergence result [1] of coordinate gradient descent on nonconve, nonsmooth problem can also apply to block coordinate descent. Since CCCP, and more general Majorization-Minimization algorithm are special cases of block coordinate descent, this connection provides a simple way to prove convergence rate of CCCP. In particular, we show that the sequence { t } t=0 provided by (2) converges at least linearly to a stationary point of (1) if the nonsmooth part in u() and v() are conve pieceiwise-linear and the smooth part in u() is strictly conve quadratic. Many d.c. programs in SVM literature fall in this case, such as that in ramp-loss SVM [11], transductive SVM [12], and structual-svm with hidden variables [13] or nonconve tighter bound [9]. 2 Majorization Minimization as Block Coordinate Descent In this section, we show CCCP is a special case of Majorization Minimization algoritm, and thus can be viewed as block cooridnate descent on surrogate function. Then we introduce an alternative formulation of algorithm (2) when v() is piecewise-linear that fits better for convergence analysis. The Majorization Minimization (MM) principle is as follows. Suppose our goal is to imize f() over Ω R n. We first construct a majorization function g(, y) over Ω Ω such that { f() g(, y),, y Ω (3) f() = g(, ), Ω The MM algorithm, therefore, imizes an upper bound of f() at each iteration t+1 arg g(, t ) (4) Ω until fi point of (4). Since the imum of g( t, y) occurs at y = t, (4) can be seen as block coordinate descent on g(, y) with two blocks, y. CCCP is a special case of (4) with f() = u() v() and g(, y) = u() v (y) T ( y) v(y). The condition (3) holds because, by conveity of v(), the first-order Taylor approimiation is less or equal to v(), with eqaulity when y =. Thus we have { f() = u() v() u() v (y) T ( y) v(y) = g(, y),, y Ω (5) f() = g(, ), Ω However, when v() is piecewise-linear, the term v (y) T ( y) v(y) is discontinuous and nonconve. For convergence analysis, we epress v(), in another way, as ma m (at i + b i) with v () = a k piecewisely, where k = arg ma i (a T i + b i). Thus we can epress d.c. program (1) in another form m u() d i (a T i + b i ) R n,d R m m d i = 1, d i 0, i = 1..m f i () 0, i = 1..p, 2 (6)
3 Alternaing imizing (6) between and d yields the same CCCP algorithm (2). 1 This formulation replaces discontinous, nonconve term v (y)( y) v(y) with quadratic term m d i(a T i +b i), so (6) fits the form of nonsmooth problem studied in [1], that is, imizing a function F () = f()+cp (), with c > 0, f() being smooth, and P () being conve nonsmooth. In net section, we give convergence result of CCCP by showing that the alternating imization of (6) yields the same sequence { t, y t } t=0 as that of a special case of Coordinate Gradient Descent proposed in [1]. 3 Convergence Theorem Lemma 3.1. Consider the problem, y F (, y) = f(, y) + cp (, y) (7) where f(, y) is smooth and P (, y) is nonsmooth, conve, lower semi-continuous, and seperable for and y. The sequence { t, y t } t=0 produced by alternating imization t+1 = arg F (, y t ) (8) y t+1 = arg y F ( t+1, y) (9) (a) converges to a stationary point of (7), if in (8) and (9), or their equivalent problems 2, the objective functions have smooth parts f(), f(y), that are strictly conve quadratic. (b) with linear convergence rate, if f(, y) is quadratic (maybe nonconve), P (, y) is polyhedral, in addition to assumption in (a). Proof. Let the smooth parts of objective function in (8) and (9), or their equivalent problems, be f() and f(y). We can define a special case of Coordinate Gradient Descent (CGD) proposed in [1] which yields the same sequence { t, y t } t=0 as that produced by the alternating imization, where for each iteration k of the CGD algorithm, we use hessian matri H k = 2 f() 0 n for block J = and H k yy = 2 f(y) 0 m for block J = y, with eact line search (which satisfies Armijo Rule) 3. By Theorem 1 of [1], the sequence { t, y t } t=0 converges to a stationary point of (7). If we further assume that f(, y) is quadratic and P (, y) is polyhedral, by applying Theorem 2 and 4 in [1], the convergence rate of { t, y t } t=0 is at least linear. Lemma 3.1 provides a basis for the convergence of (6), and thus the CCCP (2). However, when imizing over d, the objective in (6) is linear instead of strictly conve quadratic as required by lemma 3.1. The net lemma shows that the problem, nevertheless, has equivalent strictly conve quadratic problem that gives the same solution. Lemma 3.2. There eist ϵ 0 > 0 and d such that the problem arg d R m c T d + ϵ 2 d 2 m d i = 1, d i 0, i = 1..m has the same optimal solution d for ϵ < ϵ 0. (10) 1 Let set S = arg i ( a T i b i ). When imizing (6) over d, an optimal solution just set d k S = 1/ S and d j / S = 0. When imizing over, the problem becomes t+1 = arg (u() a T S b S ) = arg (u() v ( t ) T ) subject to constraints in (1), where a S, b S are averages over a k, b k for k S, and v ( t ) = a S is a sub-gradient of v() at t. 2 By equivalent, we means the two problems have the same optimal solution. 3 Note, in [1], the Hassian matri H k affect CGD algorithm only through H k JJ, so we can always have a positive-definite H k when H k JJ is positive definite by assigning, other than H k JJ, 1 to diagnoal elements, 0 to non-diagnoal elements. 3
4 Proof. Let set S = arg ma i (c i ) and c ma = ma i (c i ). As ϵ = 0, we can obtain a optimal solution d by setting d k S = 1 S and d j / S = 0. Let α, β, be Lagrange multipliers of affine and non-negative constraints in (10) respectively. By KKT condtion, the solution d, α, β must have c α e β = 0, with βk S = 0, α = c ma, and βj / S = c ma c j, where e = [1,.., 1] T m. When ϵ > 0, the KKT condition for (10) only differs by the equation ϵd c αe β = 0, which can be satisfied by setting β k S = 0, α = α + ϵ S, and β j / S = βj / S ϵ S. If ϵ S < β j = c ma c j for j / S, then d, α, β still satisfy the KKT condition. In other words, d is still optimal if ϵ < ϵ 0, where ϵ 0 = j / S (c ma c j ) S > 0. When imizing (6) over d, the problem is of form (10) with ϵ = 0 and c t i = at i t + b i, i = 1..m. Lemma (3.2) says that we can always find equivalent strictly conve quadratic problem by setting positive ϵ t < j / S (c t ma c t j ) S. Now we are ready to give the main theorem. Theorem 3.3. The Concave-Conve Procedure (2) converges to a stationary point of (1) in at least linear rate, if the nonsmooth part of u() and v() are conve piecewise-linear, the smooth part of u() is strictly conve quadratic, and the domain formed by f i () 0, g i () = 0 is polyhedral. Proof. Let u() v() = f u () + P u () v(), where f u () is strictly conve quadratic and P u (), v() are conve piecewise-linear. We can formulate CCCP as an alternating imization on (6) between and d. The constraints f i () 0 and g j () = 0 form a polyhedral domain, so we transform them into a polyhedral, lower semi-continuous function P dom () = 0 if f i () 0, g j () = 0 and P dom () = otherwise. By the same way, the domain constraints on d are also transformed into polyhedral, lower semi-continuous function P dom (d). Then (6) can be epressed as,d F (, d) = f(, d) + P (, d), where f(, d) = f u () m d i(a T i + b i) is quadratic and P (, d) = P u () + P dom () + P dom (d) is polyhedral, lower-semi-continuous, separable for and d. When alternating imizing F (, d), the smooth part of subproblem F (, d t ) is strictly conve quadratic, and the subproblem d F ( t+1, d), by Lemma 3.2, has equivalent strictly conve quadratic problem. Hence, the alternating imization of F (, d), that is, the CCCP algorithm (2), converges at least linearly to a stationary point of (1) by Lemma 3.1. To our knowledge, this is the first result on convergence rate of CCCP for nonsmooth problem. Specifically, we show that the CCCP algorithm used in [9], [11], [12] and [13] for structual-svm with tighter bound, transductive SVM, ramp-loss SVM and structual-svm with hidden variables has at least linear convergence rate. Acknowledgments This work was also supported by National Science Council, and Intel Cooperation under Grants NSC I , and 101R7501. References [1] Paul Tseng and Sangwoon Yun. (2009) A coordinate gradient descent method for nonsmooth separable imization. Mathematical Programg. [2] A. L. Yuille and A. Rangarajan. The concave-conve procedure. Neural Computation, 15:915V936, 2003 [3] L. Wang, X. Shen, and W. Pan. On transductive support vector machines. In J. Verducci, X. Shen, and J. Lafferty, editors, Prediction and Discovery. American Mathematical Society, [4] B. K. Sriperumbudur, D. A. Torres, and G. R. G. Lanckriet. Sparse eigen methods by d.c. programg. In Proc. of the 24th Annual International Conference on Machine Learning, [5] J. Neumann, C. Schnorr, and G. Steidl. Combined SVM-based feature selection and classification. Machine Learning, 61:129V150, [6] X.-L. Meng. Discussion on optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9(1):35V43, [7] R. Salakhutdinov, S. Roweis, and Z. Ghahramani. On the convergence of bound optimization algorithms. In Proc. 19th Conference in Uncertainty in Arti?cial Intelligence, pages 509V516,
5 [8] D. D. Lee and H. S. Seung. Algorithms for non-negative matri factorization. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 556V562. MIT Press, Cambridge, [9] C. B. Do, Q. V. Le, C. H. Teo, O. Chapelle, and A. J. Smola. Tighter bounds for structured estimation. In Advances in Neural Information Processing Systems 21, To appear. [10] T. Pham Dinh and L. T. Hoai An. Conve analysis approach to d.c. programg: Theory, algorithms and applications. Acta Mathematica Vietnamica, 22(1):289V355, [11] R. Collobert, F. Sinz, J. Weston, and L. Bottou. Large scale transductive SVMs. Journal of Machine Learning Research, 7:1687V1712, [12] Collobert, R., Weston, J., Bottou, L. (2006). Trading conveity for scalability. ICML06, 23rd International Conference on Machine Learning. Pittsburgh, USA. [13] C.-N. J. Yu and T. Joachims. Learning structural svms with latent variables. In International Conference on Machine Learning (ICML), 2009 [14] B. Sriperumbudur and G. Lanckriet, On the convergence of the concave-conve procedure, in Neural Information Processing Systems,
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
Moving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12
CONTINUED FRACTIONS AND PELL S EQUATION SEUNG HYUN YANG Abstract. In this REU paper, I will use some important characteristics of continued fractions to give the complete set of solutions to Pell s equation.
GI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
Nonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
15.1. Exact Differential Equations. Exact First-Order Equations. Exact Differential Equations Integrating Factors
SECTION 5. Eact First-Order Equations 09 SECTION 5. Eact First-Order Equations Eact Differential Equations Integrating Factors Eact Differential Equations In Section 5.6, ou studied applications of differential
t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!
Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following
Massive Data Classification via Unconstrained Support Vector Machines
Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI
Roots of Equations (Chapters 5 and 6)
Roots of Equations (Chapters 5 and 6) Problem: given f() = 0, find. In general, f() can be any function. For some forms of f(), analytical solutions are available. However, for other functions, we have
A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
A FIRST COURSE IN OPTIMIZATION THEORY
A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Fleet Assignment Using Collective Intelligence
Fleet Assignment Using Collective Intelligence Nicolas E Antoine, Stefan R Bieniawski, and Ilan M Kroo Stanford University, Stanford, CA 94305 David H Wolpert NASA Ames Research Center, Moffett Field,
by the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x
Lecture 4. LaSalle s Invariance Principle We begin with a motivating eample. Eample 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum Dynamics of a pendulum with friction can be written
Multi-variable Calculus and Optimization
Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus
Continuity of the Perron Root
Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North
Lecture 2: August 29. Linear Programming (part I)
10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
Efficient online learning of a non-negative sparse autoencoder
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil
Mathematics 31 Pre-calculus and Limits
Mathematics 31 Pre-calculus and Limits Overview After completing this section, students will be epected to have acquired reliability and fluency in the algebraic skills of factoring, operations with radicals
Support Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. [email protected] 1 Support Vector Machines: history SVMs introduced
No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics
No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Continued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
CHAPTER 9. Integer Programming
CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
Practical Guide to the Simplex Method of Linear Programming
Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear
Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach
MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical
Section 3-7. Marginal Analysis in Business and Economics. Marginal Cost, Revenue, and Profit. 202 Chapter 3 The Derivative
202 Chapter 3 The Derivative Section 3-7 Marginal Analysis in Business and Economics Marginal Cost, Revenue, and Profit Application Marginal Average Cost, Revenue, and Profit Marginal Cost, Revenue, and
24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no
Metric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
On SDP- and CP-relaxations and on connections between SDP-, CP and SIP
On SDP- and CP-relaations and on connections between SDP-, CP and SIP Georg Still and Faizan Ahmed University of Twente p 1/12 1. IP and SDP-, CP-relaations Integer program: IP) : min T Q s.t. a T j =
PRIME FACTORS OF CONSECUTIVE INTEGERS
PRIME FACTORS OF CONSECUTIVE INTEGERS MARK BAUER AND MICHAEL A. BENNETT Abstract. This note contains a new algorithm for computing a function f(k) introduced by Erdős to measure the minimal gap size in
Online Passive-Aggressive Algorithms on a Budget
Zhuang Wang Dept. of Computer and Information Sciences Temple University, USA [email protected] Slobodan Vucetic Dept. of Computer and Information Sciences Temple University, USA [email protected] Abstract
GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1
(R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter [email protected] February 20, 2009 Notice: This work was supported by the U.S.
Date: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
About the Gamma Function
About the Gamma Function Notes for Honors Calculus II, Originally Prepared in Spring 995 Basic Facts about the Gamma Function The Gamma function is defined by the improper integral Γ) = The integral is
LINEAR INEQUALITIES. less than, < 2x + 5 x 3 less than or equal to, greater than, > 3x 2 x 6 greater than or equal to,
LINEAR INEQUALITIES When we use the equal sign in an equation we are stating that both sides of the equation are equal to each other. In an inequality, we are stating that both sides of the equation are
Duality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
Linear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
Practice with Proofs
Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using
MATH 425, PRACTICE FINAL EXAM SOLUTIONS.
MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
In this section, we will consider techniques for solving problems of this type.
Constrained optimisation roblems in economics typically involve maximising some quantity, such as utility or profit, subject to a constraint for example income. We shall therefore need techniques for solving
Natural cubic splines
Natural cubic splines Arne Morten Kvarving Department of Mathematical Sciences Norwegian University of Science and Technology October 21 2008 Motivation We are given a large dataset, i.e. a function sampled
Constrained optimization.
ams/econ 11b supplementary notes ucsc Constrained optimization. c 2010, Yonatan Katznelson 1. Constraints In many of the optimization problems that arise in economics, there are restrictions on the values
Proximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
Lecture 5 Principal Minors and the Hessian
Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and
88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a
88 CHAPTER. VECTOR FUNCTIONS.4 Curvature.4.1 Definitions and Examples The notion of curvature measures how sharply a curve bends. We would expect the curvature to be 0 for a straight line, to be very small
Introduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
Linear Equations in Linear Algebra
1 Linear Equations in Linear Algebra 1.5 SOLUTION SETS OF LINEAR SYSTEMS HOMOGENEOUS LINEAR SYSTEMS A system of linear equations is said to be homogeneous if it can be written in the form A 0, where A
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Stochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
Homework # 3 Solutions
Homework # 3 Solutions February, 200 Solution (2.3.5). Noting that and ( + 3 x) x 8 = + 3 x) by Equation (2.3.) x 8 x 8 = + 3 8 by Equations (2.3.7) and (2.3.0) =3 x 8 6x2 + x 3 ) = 2 + 6x 2 + x 3 x 8
Piecewise Cubic Splines
280 CHAP. 5 CURVE FITTING Piecewise Cubic Splines The fitting of a polynomial curve to a set of data points has applications in CAD (computer-assisted design), CAM (computer-assisted manufacturing), and
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
Dual Methods for Total Variation-Based Image Restoration
Dual Methods for Total Variation-Based Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony
Perron vector Optimization applied to search engines
Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink
A Study on SMO-type Decomposition Methods for Support Vector Machines
1 A Study on SMO-type Decomposition Methods for Support Vector Machines Pai-Hsuen Chen, Rong-En Fan, and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan [email protected]
17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
MAP-Inference for Highly-Connected Graphs with DC-Programming
MAP-Inference for Highly-Connected Graphs with DC-Programming Jörg Kappes and Christoph Schnörr Image and Pattern Analysis Group, Heidelberg Collaboratory for Image Processing, University of Heidelberg,
Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
CONSTRAINED NONLINEAR PROGRAMMING
149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach
Polynomial and Synthetic Division. Long Division of Polynomials. Example 1. 6x 2 7x 2 x 2) 19x 2 16x 4 6x3 12x 2 7x 2 16x 7x 2 14x. 2x 4.
_.qd /7/5 9: AM Page 5 Section.. Polynomial and Synthetic Division 5 Polynomial and Synthetic Division What you should learn Use long division to divide polynomials by other polynomials. Use synthetic
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Elementary Gradient-Based Parameter Estimation
Elementary Gradient-Based Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California 94305 USA Abstract
Linearly Independent Sets and Linearly Dependent Sets
These notes closely follow the presentation of the material given in David C. Lay s textbook Linear Algebra and its Applications (3rd edition). These notes are intended primarily for in-class presentation
LIMITS AND CONTINUITY
LIMITS AND CONTINUITY 1 The concept of it Eample 11 Let f() = 2 4 Eamine the behavior of f() as approaches 2 2 Solution Let us compute some values of f() for close to 2, as in the tables below We see from
Derivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Chapter 4. Polynomial and Rational Functions. 4.1 Polynomial Functions and Their Graphs
Chapter 4. Polynomial and Rational Functions 4.1 Polynomial Functions and Their Graphs A polynomial function of degree n is a function of the form P = a n n + a n 1 n 1 + + a 2 2 + a 1 + a 0 Where a s
BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION
BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli
Week 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
6.2 Permutations continued
6.2 Permutations continued Theorem A permutation on a finite set A is either a cycle or can be expressed as a product (composition of disjoint cycles. Proof is by (strong induction on the number, r, of
Diagonal preconditioning for first order primal-dual algorithms in convex optimization
Diagonal preconditioning for first order primal-dual algorithms in conve optimization Thomas Pock Institute for Computer Graphics and Vision Graz University of Technology [email protected] Antonin Chambolle
0 0 such that f x L whenever x a
Epsilon-Delta Definition of the Limit Few statements in elementary mathematics appear as cryptic as the one defining the limit of a function f() at the point = a, 0 0 such that f L whenever a Translation:
Max-Min Representation of Piecewise Linear Functions
Beiträge zur Algebra und Geometrie Contributions to Algebra and Geometry Volume 43 (2002), No. 1, 297-302. Max-Min Representation of Piecewise Linear Functions Sergei Ovchinnikov Mathematics Department,
Section 4.2: The Division Algorithm and Greatest Common Divisors
Section 4.2: The Division Algorithm and Greatest Common Divisors The Division Algorithm The Division Algorithm is merely long division restated as an equation. For example, the division 29 r. 20 32 948
Optimal shift scheduling with a global service level constraint
Optimal shift scheduling with a global service level constraint Ger Koole & Erik van der Sluis Vrije Universiteit Division of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The
The sum of digits of polynomial values in arithmetic progressions
The sum of digits of polynomial values in arithmetic progressions Thomas Stoll Institut de Mathématiques de Luminy, Université de la Méditerranée, 13288 Marseille Cedex 9, France E-mail: [email protected]
