Online Convex Optimization


 Mark Parker
 2 years ago
 Views:
Transcription
1 E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting of online convex optimization which, as we shall see, encompasses some of the online learning problems we have seen so far as special cases. A standard offline convex optimization problem involves a convex set Ω R n and a fixed convex cost function c : Ω R; the goal is to find a point x Ω that minimizes cx over Ω. An online convex optimization problem also involves a convex set Ω R n, but proceeds in trials: at each trial t, the player/algorithm must choose a point x t Ω, after which a convex cost function c t : Ω R is revealed by the environment/adversary, and a loss of c t x t is incurred for the trial; the goal is to minimize the total loss incurred over a sequence of trials, relative to the best single point in Ω in hindsight. Online Convex Optimization Convex set Ω R n For t 1,..., : Play x t Ω Receive cost function c t : Ω R Incur loss c t x t We will denote the total loss incurred by an algorithm A on trials as and the total loss of a fixed point x Ω as L [A] c t x t, L [x] c t x ; the regret of A w.r.t. the best fixed point in Ω is then simply L [A] inf x Ω L [x]. Let us consider which of the online learning problems we have seen earlier can be viewed as instances of online convex optimization OCO: 1. Online linear regression GD analysis. Here one selects a weight vector w t R n, and incurs a loss given by c t w t l y tw t x t, which is assumed to be convex in its argument, and is therefore convex in w t ; however the total loss is compared against the best fixed weight vector in only { u R n : u 2 U } for some U > 0. Call this latter set Ω. Clearly, if the weight vectors w t are also selected to be in Ω, e.g. by Euclidean projection onto Ω which in this case amounts to a simple normalization, then this becomes an instance of OCO. 2. Online linear regression EG analysis. Here one selects w t n, and incurs a loss given by c t w t l y tw t x t, which is assumed to be convex in its argument, and is therefore convex in w t ; the total loss is also compared against the best fixed weight vector in n. hus this can be viewed as an instance of OCO with Ω n. 1
2 2 Online Convex Optimization 3. Online prediction from expert advice. Here one selects a probability vector ˆp t n, and incurs a loss c t ˆp t l y tˆp t ξ t, which is assumed to be convex in its argument, and is therefore convex in ˆp t. However the total loss is compared only against the best vector in { e i n : i [n] }, where e i denotes a unit vector with 1 in the ith position and 0 elsewhere. In general, the smallest loss in this set is not the same as the smallest loss in n, and therefore this is not an instance of OCO. 4. Online decision/allocation among experts. Here one selects a probability vector ˆp t n, and incurs a linear loss c t ˆp t ˆp t l t, which is clearly convex in ˆp t. he total loss is again compared against the best vector in { e i n : i [n] } as above, but in this case, comparison against this set is actually equivalent to comparison against n, since inf L [ˆp] ˆp n inf ˆp n ˆp l t inf ˆp ˆp n l t min i [n] e i l t min e i l t. i [n] hus online allocation with a linear loss as above can be viewed as an instance of OCO with Ω n. 2 Online Gradient Descent Algorithm he online gradient descent OGD algorithm [7] generalizes the basic gradient descent algorithm used for standard offline optimization problems, and can be described as follows: Algorithm Online Gradient Descent OGD Parameter: η > 0 Initialize: x 1 Ω For t 1,..., : Play x t Ω Receive cost function c t : Ω R Incur loss c t x t Update: x t+1 x t η c t x t x t+1 P Ω x t+1, where P Ω x argmin x x 2 Euclidean projection of x onto Ω x Ω heorem 2.1 Zinkevich, Let Ω R n be a closed, nonempty, convex set with x x 1 2 D x Ω. Let c t : Ω R t 1, 2,... be any sequence of convex cost functions with bounded subgradients, c t x 2 G t, x Ω. hen D G [ ] L OGDη inf L [x] 1 D 2 x Ω 2 η + η G2. Moreover, setting η upper bounds on them, are known in advance, we have to minimize the righthand side of the above bound assuming D, G and, or L [ OGDη ] inf x Ω L [x] DG. Notes. Before proving the above theorem, we note the following: 1. he analysis of the GD algorithm for online regression can be viewed as a special case of the above, with D U and G ZR 2 see Lecture 17 notes. 2. With η as above which depends on knowledge of D, G and, one gets an O regret bound. Such an O regret bound with worse constants can also be obtained by using a timevarying parameter η t 1/ t t in fact this was the version contained in the original paper of Zinkevich [7].
3 Online Convex Optimization 3 Proof of heorem 2.1. he proof is similar to that of the regret bound we obtained for the GD algorithm for online regression in Lecture 17. Let x Ω. We have t [ ]: ct x t c t x c t x t x t x, by convexity of c t x t x t+1 x t x Summing over t 1..., we thus get η x x t 2 2 x x t x t x t x x t 2 2 x x t x t x t x x t 2 2 x x t η 2 c t x t 2 2 ct x t c t x 1 x x x x η 2 G 2 1 D 2 + η 2 G 2. It is easy to see that the righthand side is minimized at η inequality gives the desired result. D G ; substituting this back in the above 2.1 Logarithmic regret bound for OGD when cost functions are strongly convex using timevarying parameter η t he above result gives an O regret bound for the OGD algorithm in general. When the cost functions c t are known to be αstrongly convex, one can in fact obtain an Oln regret bound with a slight variant of the OGD algorithm that uses a suitable timevarying parameter η t [3]: heorem 2.2 Hazan et al, Let Ω R n be a closed, nonempty, convex set with x x 1 2 D x Ω. Let c t : Ω R t 1, 2,... be any sequence of αstrongly convex cost functions with bounded subgradients, c t x t 2 G t, x Ω. hen the OGD algorithm with timevarying parameter η t 1 αt satisfies L [OGD η t 1 αt ] inf L [x] G2 ln + 1. x Ω Proof. Let x Ω. Proceeding similarly as before, we have t [ ]: ct x t c t x c t x t x t x α 2 xt x 2 2, by αstrong convexity of c t x t x t+1 x t x α η t 2 xt x x x t 2 2 x x t x t x t αη t x t x 2 2 t 1 x x t 2 2 x x t x t x t αη t x t x 2 2 t 1 x x t 2 2 x x t η 2 t c t x t 2 2 αη t x t x 2 2. t Summing over t 1... then gives ct x t c t x G2 2 G2 η t α x x η ln t2 1 1 α x x t 2 2 η t η t 1
4 4 Online Convex Optimization 3 Online Mirror Descent Algorithm he online mirror descent OMD algorithm, described for example in [6], generalizes the basic mirror descent algorithm used for standard offline optimization problems [4, 5, 1] much as OGD generalizes basic gradient descent. he OMD framework allows one to obtain O regret bounds for cost functions with bounded subgradients not only in Euclidean norm assumed by the OGD analysis, but in any suitable norm. Let Ω R n be a convex set containing Ω i.e. Ω Ω, and let F : Ω R be a strictly convex and differentiable function. Let B F : Ω Ω R denote the Bregman divergence associated with F ; recall that for all u, v Ω, this is defined as B F u, v F u F v u v F v. hen the OMD algorithm using the set Ω and function F can be described as follows: Algorithm Online Mirror Descent OMD Parameters: η > 0; convex set Ω Ω; strictly convex, differentiable function F : Ω R Initialize: x 1 argmin F x x Ω For t 1,..., : Play x t Ω Receive cost function c t : Ω R Incur loss c t x t Update: F x t+1 F x t η c t x t x t+1 P F Ω xt+1, where P F Ω Assume this yields x t+1 Ω x argmin B F x, x Bregman projection of x onto Ω x Ω Below we will obtain a regret bound for OMD with appropriate choice of the function F for cost functions with subgradients bounded in an arbitrary norm. Before we do so, let us recall the notion of a dual norm. Specifically, let be any norm defined on a closed convex set Ω R n. hen the corresponding dual norm is defined as u sup x Ω: x 1 x u 2 sup x u 1 2 x 2. x Ω heorem 3.1. Let Ω R n be a closed, nonempty, convex set. Let c t : Ω R t 1, 2,... be any sequence of convex cost functions with bounded subgradients c t x t G t, x Ω where is an arbitrary norm. Let Ω Ω be a convex set and F : Ω R be a strictly convex function such that a F x F x 1 D 2 x Ω; and b the restriction of F to Ω is αstrongly convex w.r.t., the dual norm of. hen [ ] L OMDη inf L [x] 1 D 2 + η2 G 2. x Ω η Moreover, setting η D G α, or upper bounds on them, are known in advance, we have to minimize the righthand side of the above bound assuming D, G, and L [ OMDη ] inf x Ω L [x] DG 2 α.
5 Online Convex Optimization 5 Proof. Let x Ω. Proceeding as before, we have t [ ]: ct x t c t x c t x t x t x, by convexity of c t F x t F x t+1 x t x η B F x, x t B F x, x t+1 + B F x t, x t+1 1 η Summing over t 1..., we thus get 1 η B F x, x t B F x, x t+1 B F x t+1, x t+1 + B F x t, x t+1. ct x t c t x 1 B F x, x 1 B F x, x η η Now, it can be shown that Also, we have hus we get B F x t, x t+1 B F x t+1, x t+1 B F x, x 1 F x F x 1 D 2. F x t F x t+1 F x t+1 x t x t+1 B F x t, x t+1 B F x t+1, x t+1. F x t x t x t+1 α 2 xt x t+1 2 F x t+1.x t x t+1, by αstrong convexity of F on Ω η c t x t x t x t+1 α 2 xt x t+1 2 ηg x t x t+1 α 2 xt x t+1 2, by Hölders inequality η 2 G 2, η2 G 2. since η2 G 2 + α 2 xt x t+1 2 ηg x t x t+1 ct x t c t x 1 η It is easy to verify that the righthand side is minimized at η D G inequality gives the desired result. Examples. Examples of OMD algorithms include the following: D 2 + η2 G 2. ηg α 2 xt x t ; substituting this back in the above 1. OGD. Here Ω R n and F x 1 2 x 2 2. Note that F is 1strongly convex on R n and therefore on any Ω R n. 2. Exponential weights. Here Ω a n for some a > 0; Ω R n ++; and F x n i1 x i ln x n i1 x i. Note that F is strictly convex on Ω and 1strongly convex on Ω n.
6 6 Online Convex Optimization Exercise. Suppose the cost functions c t are all βstrongly convex for some β > 0 and have subgradients bounded in some arbitrary norm as above. Can you obtain a logarithmic Oln regret bound for the OMD algorithm in this case, as we did for the OGD algorithm? Why or why not? More details on online convex optimization, including missing details in some of the above proofs, can be found for example in [2]. 4 Next Lecture In the next lecture, we will consider how algorithms for online supervised learning problems such as online regression and classification can be used in a batch setting, and will see how regret bounds for online learning algorithms can be converted to generalization error bounds for the resulting batch algorithms. References [1] Amir Beck and Marc eboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 313: , [2] Sébastion Bubeck. Introduction to online optimization. Lecture Notes, Princeton University, [3] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69: , [4] A. Nemirovski. Efficient methods for largescale convex optimization problems. Ekonomika i Matematicheskie Metody, 15, [5] A. Nemirovski and D. Yudin. Problem Complexity and Method Efficiency in Optimization. Wiley Interscience, [6] Shai ShalevShwartz. Online Learning: heory, Algorithms, and Applications. PhD thesis, he Hebrew University of Jerusalem, [7] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning ICML, 2003.
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More information1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09
1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch
More information17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
More informationTable 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.
Online PassiveAggressive Algorithms Koby Crammer Ofer Dekel Shai ShalevShwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationLecture Notes on Online Learning DRAFT
Lecture Notes on Online Learning DRAFT Alexander Rakhlin These lecture notes contain material presented in the Statistical Learning Theory course at UC Berkeley, Spring 08. Various parts of these notes
More informationIntroduction to Online Optimization
Princeton University  Department of Operations Research and Financial Engineering Introduction to Online Optimization Sébastien Bubeck December 14, 2011 1 Contents Chapter 1. Introduction 5 1.1. Statistical
More informationOptimal Strategies and Minimax Lower Bounds for Online Convex Games
Optimal Strategies and Minimax Lower Bounds for Online Convex Games Jacob Abernethy UC Berkeley jake@csberkeleyedu Alexander Rakhlin UC Berkeley rakhlin@csberkeleyedu Abstract A number of learning problems
More informationOnline Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes
Advanced Course in Machine Learning Spring 2011 Online Learning Lecturer: Shai ShalevShwartz Scribe: Shai ShalevShwartz In this lecture we describe a different model of learning which is called online
More informationMind the Duality Gap: Logarithmic regret algorithms for online optimization
Mind the Duality Gap: Logarithmic regret algorithms for online optimization Sham M. Kakade Toyota Technological Institute at Chicago sham@ttic.org Shai ShalevShartz Toyota Technological Institute at
More informationWeek 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / 2184189 CesaBianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
More informationOnline Learning of Optimal Strategies in Unknown Environments
1 Online Learning of Optimal Strategies in Unknown Environments Santiago Paternain and Alejandro Ribeiro Abstract Define an environment as a set of convex constraint functions that vary arbitrarily over
More informationThe pnorm generalization of the LMS algorithm for adaptive filtering
The pnorm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology
More information1 Portfolio Selection
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # Scribe: Nadia Heninger April 8, 008 Portfolio Selection Last time we discussed our model of the stock market N stocks start on day with
More informationFurther Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing
More informationGI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
More information10. Proximal point method
L. Vandenberghe EE236C Spring 201314) 10. Proximal point method proximal point method augmented Lagrangian method MoreauYosida smoothing 101 Proximal point method a conceptual algorithm for minimizing
More informationStrongly Adaptive Online Learning
Amit Daniely Alon Gonen Shai ShalevShwartz The Hebrew University AMIT.DANIELY@MAIL.HUJI.AC.IL ALONGNN@CS.HUJI.AC.IL SHAIS@CS.HUJI.AC.IL Abstract Strongly adaptive algorithms are algorithms whose performance
More informationBig Data  Lecture 1 Optimization reminders
Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationAdaptive Bound Optimization for Online Convex Optimization
Adaptive Bound Optimization for Online Convex Optimization H. Brendan McMahan Google, Inc. mcmahan@google.com Matthew Streeter Google, Inc. mstreeter@google.com Abstract We introduce a new online convex
More informationConvex analysis and profit/cost/support functions
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m
More informationLoad balancing of temporary tasks in the l p norm
Load balancing of temporary tasks in the l p norm Yossi Azar a,1, Amir Epstein a,2, Leah Epstein b,3 a School of Computer Science, Tel Aviv University, Tel Aviv, Israel. b School of Computer Science, The
More informationOnline Learning meets Optimization in the Dual
Online Learning meets Optimization in the Dual Shai ShalevShwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., The Hebrew University, Jerusalem 91904, Israel 2 Google Inc., 1600 Amphitheater
More informationIntroduction to Convex Optimization for Machine Learning
Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning
More informationExample 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x
Lecture 4. LaSalle s Invariance Principle We begin with a motivating eample. Eample 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum Dynamics of a pendulum with friction can be written
More informationFoundations of Machine Learning OnLine Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Foundations of Machine Learning OnLine Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationConsistency of Surrogate Risk Minimization Methods for Binary Classification using Strongly Proper Losses
E0 370 Statistical Learning Theory Lecture 13 Sep 4, 013) Consistency of Surrogate Risk Minimization Methods for Binary Classification using Strongly Proper Losses Lecturer: Shivani Agarwal Scribe: Rohit
More informationA DriftingGames Analysis for Online Learning and Applications to Boosting
A DriftingGames Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire
More informationProjectionfree Online Learning
Elad Hazan Technion  Israel Inst. of Tech. Satyen Kale IBM T.J. Watson Research Center ehazan@ie.technion.ac.il sckale@us.ibm.com Abstract The computational bottleneck in applying online learning to massive
More informationOnline Convex Programming and Generalized Infinitesimal Gradient Ascent
Online Convex Programming and Generalized Infinitesimal Gradient Ascent Martin Zinkevich February 003 CMUCS03110 School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 Abstract Convex
More informationt := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
More informationOnline Lazy Updates for Portfolio Selection with Transaction Costs
Proceedings of the TwentySeventh AAAI Conference on Artificial Intelligence Online Lazy Updates for Portfolio Selection with Transaction Costs Puja Das, Nicholas Johnson, and Arindam Banerjee Department
More informationLecture 12 Basic Lyapunov theory
EE363 Winter 200809 Lecture 12 Basic Lyapunov theory stability positive definite functions global Lyapunov stability theorems Lasalle s theorem converse Lyapunov theorems finding Lyapunov functions 12
More informationOnline Learning, Stability, and Stochastic Gradient Descent
Online Learning, Stability, and Stochastic Gradient Descent arxiv:1105.4701v3 [cs.lg] 8 Sep 2011 September 9, 2011 Tomaso Poggio, Stephen Voinea, Lorenzo Rosasco CBCL, McGovern Institute, CSAIL, Brain
More informationEfficient Algorithms for Online Game Playing and Universal Portfolio Management
Electronic Colloquium on Computational Complexity, Report No. 33 (2006) Efficient Algorithms for Online Game Playing and Universal Portfolio Management Amit Agarwal Elad Hazan Abstract A natural algorithmic
More informationOnline Algorithms: Learning & Optimization with No Regret.
Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the
More informationApproximating Semidefinite Programs in Sublinear Time
Approximating Semidefinite Programs in Sublinear Time Dan Garber Technion  Israel Institute of Technology Haifa 3000 Israel dangar@cs.technion.ac.il Elad Hazan Technion  Israel Institute of Technology
More informationOnline Learning and Online Convex Optimization. Contents
Foundations and Trends R in Machine Learning Vol. 4, No. 2 (2011) 107 194 c 2012 S. ShalevShwartz DOI: 10.1561/2200000018 Online Learning and Online Convex Optimization By Shai ShalevShwartz Contents
More informationLINEAR PROGRAMMING WITH ONLINE LEARNING
LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA EMAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA
More informationALMOST COMMON PRIORS 1. INTRODUCTION
ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationChapter 15 Introduction to Linear Programming
Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 WeiTa Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of
More informationS = k=1 k. ( 1) k+1 N S N S N
Alternating Series Last time we talked about convergence of series. Our two most important tests, the integral test and the (limit) comparison test, both required that the terms of the series were (eventually)
More informationCONVEX optimization forms the backbone of many
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 58, NO 5, MAY 2012 3235 InformationTheoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization Alekh Agarwal, Peter L Bartlett, Member,
More informationMath 317 HW #5 Solutions
Math 317 HW #5 Solutions 1. Exercise 2.4.2. (a) Prove that the sequence defined by x 1 = 3 and converges. x n+1 = 1 4 x n Proof. I intend to use the Monotone Convergence Theorem, so my goal is to show
More informationTD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract
More informationSequences and Convergence in Metric Spaces
Sequences and Convergence in Metric Spaces Definition: A sequence in a set X (a sequence of elements of X) is a function s : N X. We usually denote s(n) by s n, called the nth term of s, and write {s
More informationSparse Online Learning via Truncated Gradient
Sparse Online Learning via Truncated Gradient John Langford Yahoo! Research jl@yahooinc.com Lihong Li Department of Computer Science Rutgers University lihong@cs.rutgers.edu Tong Zhang Department of Statistics
More informationA note on infinite systems of linear inequalities in Rn
Carnegie Mellon University Research Showcase @ CMU Department of Mathematical Sciences Mellon College of Science 1973 A note on infinite systems of linear inequalities in Rn Charles E. Blair Carnegie Mellon
More informationDefinition 11.1. Given a graph G on n vertices, we define the following quantities:
Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationmax cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x
Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study
More informationStanford University CS261: Optimization Handout 6 Luca Trevisan January 20, In which we introduce the theory of duality in linear programming.
Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, 2011 Lecture 6 In which we introduce the theory of duality in linear programming 1 The Dual of Linear Program Suppose that we
More informationMathematical Methods of Engineering Analysis
Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................
More informationThe Relative Worst Order Ratio for OnLine Algorithms
The Relative Worst Order Ratio for OnLine Algorithms Joan Boyar 1 and Lene M. Favrholdt 2 1 Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, joan@imada.sdu.dk
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA  Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt  April 2013 Context Machine
More informationEfficient Projections onto the l 1 Ball for Learning in High Dimensions
Efficient Projections onto the l 1 Ball for Learning in High Dimensions John Duchi Google, Mountain View, CA 94043 Shai ShalevShwartz Toyota Technological Institute, Chicago, IL, 60637 Yoram Singer Tushar
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
More informationMultiArmed Bandit Problems
Machine Learning Coms4771 MultiArmed Bandit Problems Lecture 20 Multiarmed Bandit Problems The Setting: K arms (or actions) Each time t, each arm i pays off a bounded realvalued reward x i (t), say
More informationNotes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 2226 Jan 2007 1 Introduction In this course we will be looking
More informationx if x 0, x if x < 0.
Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRALab,
More informationOptimal Onlinelist Batch Scheduling
Optimal Onlinelist Batch Scheduling Jacob Jan Paulus a,, Deshi Ye b, Guochuan Zhang b a University of Twente, P.O. box 217, 7500AE Enschede, The Netherlands b Zhejiang University, Hangzhou 310027, China
More informationN E W S A N D L E T T E R S
N E W S A N D L E T T E R S 73rd Annual William Lowell Putnam Mathematical Competition Editor s Note: Additional solutions will be printed in the Monthly later in the year. PROBLEMS A1. Let d 1, d,...,
More informationMinimize subject to. x S R
Chapter 12 Lagrangian Relaxation This chapter is mostly inspired by Chapter 16 of [1]. In the previous chapters, we have succeeded to find efficient algorithms to solve several important problems such
More information(Quasi)Newton methods
(Quasi)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable nonlinear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationLS.6 Solution Matrices
LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions
More informationTrading regret rate for computational efficiency in online learning with limited feedback
Trading regret rate for computational efficiency in online learning with limited feedback Shai ShalevShwartz TTIC Hebrew University Online Learning with Limited Feedback Workshop, 2009 June 2009 Shai
More informationCOS 511: Foundations of Machine Learning. Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 2006
COS 511: Foundations of Machine Learning Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 006 In the previous lecture, we introduced a new learning model, the Online Learning Model. After seeing a concrete
More informationOperations Research and Financial Engineering. Courses
Operations Research and Financial Engineering Courses ORF 504/FIN 504 Financial Econometrics Professor Jianqing Fan This course covers econometric and statistical methods as applied to finance. Topics
More informationNonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
More informationSeparation Properties for Locally Convex Cones
Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam
More informationA Potentialbased Framework for Online Multiclass Learning with Partial Feedback
A Potentialbased Framework for Online Multiclass Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationAlgorithms for sustainable data centers
Algorithms for sustainable data centers Adam Wierman (Caltech) Minghong Lin (Caltech) Zhenhua Liu (Caltech) Lachlan Andrew (Swinburne) and many others IT is an energy hog The electricity use of data centers
More informationThe Simplex Method. yyye
Yinyu Ye, MS&E, Stanford MS&E310 Lecture Note #05 1 The Simplex Method Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/
More informationCS229T/STAT231: Statistical Learning Theory (Winter 2015)
CS229T/STAT231: Statistical Learning Theory (Winter 2015) Percy Liang Last updated Wed Oct 14 2015 20:32 These lecture notes will be updated periodically as the course goes on. Please let us know if you
More informationConvex Programming Tools for Disjunctive Programs
Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible
More informationWe shall turn our attention to solving linear systems of equations. Ax = b
59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA  Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt  April 2013 Context Machine
More information4.1 Introduction  Online Learning Model
Computational Learning Foundations Fall semester, 2010 Lecture 4: November 7, 2010 Lecturer: Yishay Mansour Scribes: Elad Liebman, Yuval Rochman & Allon Wagner 1 4.1 Introduction  Online Learning Model
More informationCourse 221: Analysis Academic year , First Semester
Course 221: Analysis Academic year 200708, First Semester David R. Wilkins Copyright c David R. Wilkins 1989 2007 Contents 1 Basic Theorems of Real Analysis 1 1.1 The Least Upper Bound Principle................
More informationLecture 7: Finding Lyapunov Functions 1
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1
More informationEfficient Learning using ForwardBackward Splitting
Efficient Learning using ForwardBackward Splitting John Duchi University of California Berkeley jduchi@cs.berkeley.edu Yoram Singer Google singer@google.com Abstract We describe, analyze, and experiment
More informationLecture 3: UCB Algorithm, WorstCase Regret Bound
IEOR 8100001: Learning and Optimization for Sequential Decision Making 01/7/16 Lecture 3: UCB Algorithm, WorstCase Regret Bound Instructor: Shipra Agrawal Scribed by: Karl Stratos = Jang Sun Lee 1 UCB
More information{f 1 (U), U F} is an open cover of A. Since A is compact there is a finite subcover of A, {f 1 (U 1 ),...,f 1 (U n )}, {U 1,...
44 CHAPTER 4. CONTINUOUS FUNCTIONS In Calculus we often use arithmetic operations to generate new continuous functions from old ones. In a general metric space we don t have arithmetic, but much of it
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More information1 Polyhedra and Linear Programming
CS 598CSC: Combinatorial Optimization Lecture date: January 21, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im 1 Polyhedra and Linear Programming In this lecture, we will cover some basic material
More informationThe Expectation Maximization Algorithm A short tutorial
The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: emtut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009009 Corrected
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationCompletely Positive Cone and its Dual
On the Computational Complexity of Membership Problems for the Completely Positive Cone and its Dual Peter J.C. Dickinson Luuk Gijben July 3, 2012 Abstract Copositive programming has become a useful tool
More informationClassification/Decision Trees (II)
Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).
More informationFollow the Leader with Dropout Perturbations
JMLR: Worshop and Conference Proceedings vol 35: 26, 204 Follow the Leader with Dropout Perturbations Tim van Erven Département de Mathématiques, Université ParisSud, France Wojciech Kotłowsi Institute
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization
Adaptive Subgradient Methods Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi Computer Science Division University of California, Berkeley Berkeley, CA 94720 USA
More informationChapter 6 Finite sets and infinite sets. Copyright 2013, 2005, 2001 Pearson Education, Inc. Section 3.1, Slide 1
Chapter 6 Finite sets and infinite sets Copyright 013, 005, 001 Pearson Education, Inc. Section 3.1, Slide 1 Section 6. PROPERTIES OF THE NATURE NUMBERS 013 Pearson Education, Inc.1 Slide Recall that denotes
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More informationAccelerated Parallel Optimization Methods for Large Scale Machine Learning
Accelerated Parallel Optimization Methods for Large Scale Machine Learning Haipeng Luo Princeton University haipengl@cs.princeton.edu Patrick Haffner and JeanFrançois Paiement AT&T Labs  Research {haffner,jpaiement}@research.att.com
More informationLecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
More information