Online Convex Optimization
|
|
|
- Mark Parker
- 10 years ago
- Views:
Transcription
1 E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting of online convex optimization which, as we shall see, encompasses some of the online learning problems we have seen so far as special cases. A standard offline convex optimization problem involves a convex set Ω R n and a fixed convex cost function c : Ω R; the goal is to find a point x Ω that minimizes cx over Ω. An online convex optimization problem also involves a convex set Ω R n, but proceeds in trials: at each trial t, the player/algorithm must choose a point x t Ω, after which a convex cost function c t : Ω R is revealed by the environment/adversary, and a loss of c t x t is incurred for the trial; the goal is to minimize the total loss incurred over a sequence of trials, relative to the best single point in Ω in hindsight. Online Convex Optimization Convex set Ω R n For t 1,..., : Play x t Ω Receive cost function c t : Ω R Incur loss c t x t We will denote the total loss incurred by an algorithm A on trials as and the total loss of a fixed point x Ω as L [A] c t x t, L [x] c t x ; the regret of A w.r.t. the best fixed point in Ω is then simply L [A] inf x Ω L [x]. Let us consider which of the online learning problems we have seen earlier can be viewed as instances of online convex optimization OCO: 1. Online linear regression GD analysis. Here one selects a weight vector w t R n, and incurs a loss given by c t w t l y tw t x t, which is assumed to be convex in its argument, and is therefore convex in w t ; however the total loss is compared against the best fixed weight vector in only { u R n : u 2 U } for some U > 0. Call this latter set Ω. Clearly, if the weight vectors w t are also selected to be in Ω, e.g. by Euclidean projection onto Ω which in this case amounts to a simple normalization, then this becomes an instance of OCO. 2. Online linear regression EG analysis. Here one selects w t n, and incurs a loss given by c t w t l y tw t x t, which is assumed to be convex in its argument, and is therefore convex in w t ; the total loss is also compared against the best fixed weight vector in n. hus this can be viewed as an instance of OCO with Ω n. 1
2 2 Online Convex Optimization 3. Online prediction from expert advice. Here one selects a probability vector ˆp t n, and incurs a loss c t ˆp t l y tˆp t ξ t, which is assumed to be convex in its argument, and is therefore convex in ˆp t. However the total loss is compared only against the best vector in { e i n : i [n] }, where e i denotes a unit vector with 1 in the i-th position and 0 elsewhere. In general, the smallest loss in this set is not the same as the smallest loss in n, and therefore this is not an instance of OCO. 4. Online decision/allocation among experts. Here one selects a probability vector ˆp t n, and incurs a linear loss c t ˆp t ˆp t l t, which is clearly convex in ˆp t. he total loss is again compared against the best vector in { e i n : i [n] } as above, but in this case, comparison against this set is actually equivalent to comparison against n, since inf L [ˆp] ˆp n inf ˆp n ˆp l t inf ˆp ˆp n l t min i [n] e i l t min e i l t. i [n] hus online allocation with a linear loss as above can be viewed as an instance of OCO with Ω n. 2 Online Gradient Descent Algorithm he online gradient descent OGD algorithm [7] generalizes the basic gradient descent algorithm used for standard offline optimization problems, and can be described as follows: Algorithm Online Gradient Descent OGD Parameter: η > 0 Initialize: x 1 Ω For t 1,..., : Play x t Ω Receive cost function c t : Ω R Incur loss c t x t Update: x t+1 x t η c t x t x t+1 P Ω x t+1, where P Ω x argmin x x 2 Euclidean projection of x onto Ω x Ω heorem 2.1 Zinkevich, Let Ω R n be a closed, non-empty, convex set with x x 1 2 D x Ω. Let c t : Ω R t 1, 2,... be any sequence of convex cost functions with bounded subgradients, c t x 2 G t, x Ω. hen D G [ ] L OGDη inf L [x] 1 D 2 x Ω 2 η + η G2. Moreover, setting η upper bounds on them, are known in advance, we have to minimize the right-hand side of the above bound assuming D, G and, or L [ OGDη ] inf x Ω L [x] DG. Notes. Before proving the above theorem, we note the following: 1. he analysis of the GD algorithm for online regression can be viewed as a special case of the above, with D U and G ZR 2 see Lecture 17 notes. 2. With η as above which depends on knowledge of D, G and, one gets an O regret bound. Such an O regret bound with worse constants can also be obtained by using a time-varying parameter η t 1/ t t in fact this was the version contained in the original paper of Zinkevich [7].
3 Online Convex Optimization 3 Proof of heorem 2.1. he proof is similar to that of the regret bound we obtained for the GD algorithm for online regression in Lecture 17. Let x Ω. We have t [ ]: ct x t c t x c t x t x t x, by convexity of c t x t x t+1 x t x Summing over t 1..., we thus get η x x t 2 2 x x t x t x t x x t 2 2 x x t x t x t x x t 2 2 x x t η 2 c t x t 2 2 ct x t c t x 1 x x x x η 2 G 2 1 D 2 + η 2 G 2. It is easy to see that the right-hand side is minimized at η inequality gives the desired result. D G ; substituting this back in the above 2.1 Logarithmic regret bound for OGD when cost functions are strongly convex using time-varying parameter η t he above result gives an O regret bound for the OGD algorithm in general. When the cost functions c t are known to be α-strongly convex, one can in fact obtain an Oln regret bound with a slight variant of the OGD algorithm that uses a suitable time-varying parameter η t [3]: heorem 2.2 Hazan et al, Let Ω R n be a closed, non-empty, convex set with x x 1 2 D x Ω. Let c t : Ω R t 1, 2,... be any sequence of α-strongly convex cost functions with bounded subgradients, c t x t 2 G t, x Ω. hen the OGD algorithm with time-varying parameter η t 1 αt satisfies L [OGD η t 1 αt ] inf L [x] G2 ln + 1. x Ω Proof. Let x Ω. Proceeding similarly as before, we have t [ ]: ct x t c t x c t x t x t x α 2 xt x 2 2, by α-strong convexity of c t x t x t+1 x t x α η t 2 xt x x x t 2 2 x x t x t x t αη t x t x 2 2 t 1 x x t 2 2 x x t x t x t αη t x t x 2 2 t 1 x x t 2 2 x x t η 2 t c t x t 2 2 αη t x t x 2 2. t Summing over t 1... then gives ct x t c t x G2 2 G2 η t α x x η ln t2 1 1 α x x t 2 2 η t η t 1
4 4 Online Convex Optimization 3 Online Mirror Descent Algorithm he online mirror descent OMD algorithm, described for example in [6], generalizes the basic mirror descent algorithm used for standard offline optimization problems [4, 5, 1] much as OGD generalizes basic gradient descent. he OMD framework allows one to obtain O regret bounds for cost functions with bounded subgradients not only in Euclidean norm assumed by the OGD analysis, but in any suitable norm. Let Ω R n be a convex set containing Ω i.e. Ω Ω, and let F : Ω R be a strictly convex and differentiable function. Let B F : Ω Ω R denote the Bregman divergence associated with F ; recall that for all u, v Ω, this is defined as B F u, v F u F v u v F v. hen the OMD algorithm using the set Ω and function F can be described as follows: Algorithm Online Mirror Descent OMD Parameters: η > 0; convex set Ω Ω; strictly convex, differentiable function F : Ω R Initialize: x 1 argmin F x x Ω For t 1,..., : Play x t Ω Receive cost function c t : Ω R Incur loss c t x t Update: F x t+1 F x t η c t x t x t+1 P F Ω xt+1, where P F Ω Assume this yields x t+1 Ω x argmin B F x, x Bregman projection of x onto Ω x Ω Below we will obtain a regret bound for OMD with appropriate choice of the function F for cost functions with subgradients bounded in an arbitrary norm. Before we do so, let us recall the notion of a dual norm. Specifically, let be any norm defined on a closed convex set Ω R n. hen the corresponding dual norm is defined as u sup x Ω: x 1 x u 2 sup x u 1 2 x 2. x Ω heorem 3.1. Let Ω R n be a closed, non-empty, convex set. Let c t : Ω R t 1, 2,... be any sequence of convex cost functions with bounded subgradients c t x t G t, x Ω where is an arbitrary norm. Let Ω Ω be a convex set and F : Ω R be a strictly convex function such that a F x F x 1 D 2 x Ω; and b the restriction of F to Ω is α-strongly convex w.r.t., the dual norm of. hen [ ] L OMDη inf L [x] 1 D 2 + η2 G 2. x Ω η Moreover, setting η D G α, or upper bounds on them, are known in advance, we have to minimize the right-hand side of the above bound assuming D, G, and L [ OMDη ] inf x Ω L [x] DG 2 α.
5 Online Convex Optimization 5 Proof. Let x Ω. Proceeding as before, we have t [ ]: ct x t c t x c t x t x t x, by convexity of c t F x t F x t+1 x t x η B F x, x t B F x, x t+1 + B F x t, x t+1 1 η Summing over t 1..., we thus get 1 η B F x, x t B F x, x t+1 B F x t+1, x t+1 + B F x t, x t+1. ct x t c t x 1 B F x, x 1 B F x, x η η Now, it can be shown that Also, we have hus we get B F x t, x t+1 B F x t+1, x t+1 B F x, x 1 F x F x 1 D 2. F x t F x t+1 F x t+1 x t x t+1 B F x t, x t+1 B F x t+1, x t+1. F x t x t x t+1 α 2 xt x t+1 2 F x t+1.x t x t+1, by α-strong convexity of F on Ω η c t x t x t x t+1 α 2 xt x t+1 2 ηg x t x t+1 α 2 xt x t+1 2, by Hölders inequality η 2 G 2, η2 G 2. since η2 G 2 + α 2 xt x t+1 2 ηg x t x t+1 ct x t c t x 1 η It is easy to verify that the right-hand side is minimized at η D G inequality gives the desired result. Examples. Examples of OMD algorithms include the following: D 2 + η2 G 2. ηg α 2 xt x t ; substituting this back in the above 1. OGD. Here Ω R n and F x 1 2 x 2 2. Note that F is 1-strongly convex on R n and therefore on any Ω R n. 2. Exponential weights. Here Ω a n for some a > 0; Ω R n ++; and F x n i1 x i ln x n i1 x i. Note that F is strictly convex on Ω and 1-strongly convex on Ω n.
6 6 Online Convex Optimization Exercise. Suppose the cost functions c t are all β-strongly convex for some β > 0 and have subgradients bounded in some arbitrary norm as above. Can you obtain a logarithmic Oln regret bound for the OMD algorithm in this case, as we did for the OGD algorithm? Why or why not? More details on online convex optimization, including missing details in some of the above proofs, can be found for example in [2]. 4 Next Lecture In the next lecture, we will consider how algorithms for online supervised learning problems such as online regression and classification can be used in a batch setting, and will see how regret bounds for online learning algorithms can be converted to generalization error bounds for the resulting batch algorithms. References [1] Amir Beck and Marc eboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 313: , [2] Sébastion Bubeck. Introduction to online optimization. Lecture Notes, Princeton University, [3] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69: , [4] A. Nemirovski. Efficient methods for large-scale convex optimization problems. Ekonomika i Matematicheskie Metody, 15, [5] A. Nemirovski and D. Yudin. Problem Complexity and Method Efficiency in Optimization. Wiley Interscience, [6] Shai Shalev-Shwartz. Online Learning: heory, Algorithms, and Applications. PhD thesis, he Hebrew University of Jerusalem, [7] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning ICML, 2003.
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09
1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch
17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
Introduction to Online Optimization
Princeton University - Department of Operations Research and Financial Engineering Introduction to Online Optimization Sébastien Bubeck December 14, 2011 1 Contents Chapter 1. Introduction 5 1.1. Statistical
Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes
Advanced Course in Machine Learning Spring 2011 Online Learning Lecturer: Shai Shalev-Shwartz Scribe: Shai Shalev-Shwartz In this lecture we describe a different model of learning which is called online
Week 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
The p-norm generalization of the LMS algorithm for adaptive filtering
The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing
GI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
1 Portfolio Selection
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # Scribe: Nadia Heninger April 8, 008 Portfolio Selection Last time we discussed our model of the stock market N stocks start on day with
10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
Strongly Adaptive Online Learning
Amit Daniely Alon Gonen Shai Shalev-Shwartz The Hebrew University [email protected] [email protected] [email protected] Abstract Strongly adaptive algorithms are algorithms whose performance
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Load balancing of temporary tasks in the l p norm
Load balancing of temporary tasks in the l p norm Yossi Azar a,1, Amir Epstein a,2, Leah Epstein b,3 a School of Computer Science, Tel Aviv University, Tel Aviv, Israel. b School of Computer Science, The
Convex analysis and profit/cost/support functions
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m
Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x
Lecture 4. LaSalle s Invariance Principle We begin with a motivating eample. Eample 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum Dynamics of a pendulum with friction can be written
Projection-free Online Learning
Elad Hazan Technion - Israel Inst. of Tech. Satyen Kale IBM T.J. Watson Research Center [email protected] [email protected] Abstract The computational bottleneck in applying online learning to massive
Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research [email protected]
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research [email protected] Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
Online Lazy Updates for Portfolio Selection with Transaction Costs
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence Online Lazy Updates for Portfolio Selection with Transaction Costs Puja Das, Nicholas Johnson, and Arindam Banerjee Department
Adaptive Bound Optimization for Online Convex Optimization
Adaptive Bound Optimization for Online Convex Optimization H. Brendan McMahan Google, Inc. [email protected] Matthew Streeter Google, Inc. [email protected] Abstract We introduce a new online convex
Online Learning, Stability, and Stochastic Gradient Descent
Online Learning, Stability, and Stochastic Gradient Descent arxiv:1105.4701v3 [cs.lg] 8 Sep 2011 September 9, 2011 Tomaso Poggio, Stephen Voinea, Lorenzo Rosasco CBCL, McGovern Institute, CSAIL, Brain
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
Online Convex Programming and Generalized Infinitesimal Gradient Ascent Martin Zinkevich February 003 CMU-CS-03-110 School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 Abstract Convex
TD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 [email protected] Abstract
The Advantages and Disadvantages of Online Linear Optimization
LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA
Online Algorithms: Learning & Optimization with No Regret.
Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Online Learning and Online Convex Optimization. Contents
Foundations and Trends R in Machine Learning Vol. 4, No. 2 (2011) 107 194 c 2012 S. Shalev-Shwartz DOI: 10.1561/2200000018 Online Learning and Online Convex Optimization By Shai Shalev-Shwartz Contents
Introduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
ALMOST COMMON PRIORS 1. INTRODUCTION
ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type
Definition 11.1. Given a graph G on n vertices, we define the following quantities:
Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Efficient Projections onto the l 1 -Ball for Learning in High Dimensions
Efficient Projections onto the l 1 -Ball for Learning in High Dimensions John Duchi Google, Mountain View, CA 94043 Shai Shalev-Shwartz Toyota Technological Institute, Chicago, IL, 60637 Yoram Singer Tushar
Mathematical Methods of Engineering Analysis
Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................
The Relative Worst Order Ratio for On-Line Algorithms
The Relative Worst Order Ratio for On-Line Algorithms Joan Boyar 1 and Lene M. Favrholdt 2 1 Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, [email protected]
Notes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Follow the Leader with Dropout Perturbations
JMLR: Worshop and Conference Proceedings vol 35: 26, 204 Follow the Leader with Dropout Perturbations Tim van Erven Département de Mathématiques, Université Paris-Sud, France Wojciech Kotłowsi Institute
Distributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
Stochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - April 2013 Context Machine
Algorithms for sustainable data centers
Algorithms for sustainable data centers Adam Wierman (Caltech) Minghong Lin (Caltech) Zhenhua Liu (Caltech) Lachlan Andrew (Swinburne) and many others IT is an energy hog The electricity use of data centers
Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
4.1 Introduction - Online Learning Model
Computational Learning Foundations Fall semester, 2010 Lecture 4: November 7, 2010 Lecturer: Yishay Mansour Scribes: Elad Liebman, Yuval Rochman & Allon Wagner 1 4.1 Introduction - Online Learning Model
1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
Stochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - April 2013 Context Machine
Nonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
Convex Programming Tools for Disjunctive Programs
Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible
Separation Properties for Locally Convex Cones
Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam
Completely Positive Cone and its Dual
On the Computational Complexity of Membership Problems for the Completely Positive Cone and its Dual Peter J.C. Dickinson Luuk Gijben July 3, 2012 Abstract Copositive programming has become a useful tool
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Adaptive Subgradient Methods Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi Computer Science Division University of California, Berkeley Berkeley, CA 94720 USA
We shall turn our attention to solving linear systems of equations. Ax = b
59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system
The Expectation Maximization Algorithm A short tutorial
The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected
Classification/Decision Trees (II)
Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: [email protected] Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).
An interval linear programming contractor
An interval linear programming contractor Introduction Milan Hladík Abstract. We consider linear programming with interval data. One of the most challenging problems in this topic is to determine or tight
Date: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
LS.6 Solution Matrices
LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions
Lecture 7: Finding Lyapunov Functions 1
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1
CS229T/STAT231: Statistical Learning Theory (Winter 2015)
CS229T/STAT231: Statistical Learning Theory (Winter 2015) Percy Liang Last updated Wed Oct 14 2015 20:32 These lecture notes will be updated periodically as the course goes on. Please let us know if you
Operations Research and Financial Engineering. Courses
Operations Research and Financial Engineering Courses ORF 504/FIN 504 Financial Econometrics Professor Jianqing Fan This course covers econometric and statistical methods as applied to finance. Topics
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
Linear Risk Management and Optimal Selection of a Limited Number
How to build a probability-free casino Adam Chalcraft CCR La Jolla [email protected] Chris Freiling Cal State San Bernardino [email protected] Randall Dougherty CCR La Jolla [email protected] Jason
Lecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
Online Learning: Theory, Algorithms, and Applications. Thesis submitted for the degree of Doctor of Philosophy by Shai Shalev-Shwartz
Online Learning: Theory, Algorithms, and Applications Thesis submitted for the degree of Doctor of Philosophy by Shai Shalev-Shwartz Submitted to the Senate of the Hebrew University July 2007 This work
Properties of BMO functions whose reciprocals are also BMO
Properties of BMO functions whose reciprocals are also BMO R. L. Johnson and C. J. Neugebauer The main result says that a non-negative BMO-function w, whose reciprocal is also in BMO, belongs to p> A p,and
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1
Tail inequalities for order statistics of log-concave vectors and applications
Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic
Mathematical finance and linear programming (optimization)
Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may
Quasi-static evolution and congested transport
Quasi-static evolution and congested transport Inwon Kim Joint with Damon Alexander, Katy Craig and Yao Yao UCLA, UW Madison Hard congestion in crowd motion The following crowd motion model is proposed
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
Accelerated Parallel Optimization Methods for Large Scale Machine Learning
Accelerated Parallel Optimization Methods for Large Scale Machine Learning Haipeng Luo Princeton University [email protected] Patrick Haffner and Jean-François Paiement AT&T Labs - Research {haffner,jpaiement}@research.att.com
The Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
Big Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1
(R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter [email protected] February 20, 2009 Notice: This work was supported by the U.S.
Stiffie's On Line Scheduling Algorithm
A class of on-line scheduling algorithms to minimize total completion time X. Lu R.A. Sitters L. Stougie Abstract We consider the problem of scheduling jobs on-line on a single machine and on identical
Online Learning and Competitive Analysis: a Unified Approach
Online Learning and Competitive Analysis: a Unified Approach Shahar Chen Online Learning and Competitive Analysis: a Unified Approach Research Thesis Submitted in partial fulfillment of the requirements
Beyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - ECML-PKDD,
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH
31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,
OPTIMAL CONTROL OF A COMMERCIAL LOAN REPAYMENT PLAN. E.V. Grigorieva. E.N. Khailov
DISCRETE AND CONTINUOUS Website: http://aimsciences.org DYNAMICAL SYSTEMS Supplement Volume 2005 pp. 345 354 OPTIMAL CONTROL OF A COMMERCIAL LOAN REPAYMENT PLAN E.V. Grigorieva Department of Mathematics
DRAFT: Introduction to Online Convex Optimization DRAFT
: Introduction to Online Convex Optimization : Introduction to Online Convex Optimization Elad Hazan Princeton University [email protected] Boston Delft Foundations and Trends R in Optimization
LECTURE 1: DIFFERENTIAL FORMS. 1. 1-forms on R n
LECTURE 1: DIFFERENTIAL FORMS 1. 1-forms on R n In calculus, you may have seen the differential or exterior derivative df of a function f(x, y, z) defined to be df = f f f dx + dy + x y z dz. The expression
Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization
Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization Jacob Abernethy Computer Science Division UC Berkeley [email protected] Elad Hazan IBM Almaden [email protected] Alexander
1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
