Online Convex Optimization

Size: px
Start display at page:

Download "Online Convex Optimization"

Transcription

1 E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting of online convex optimization which, as we shall see, encompasses some of the online learning problems we have seen so far as special cases. A standard offline convex optimization problem involves a convex set Ω R n and a fixed convex cost function c : Ω R; the goal is to find a point x Ω that minimizes cx over Ω. An online convex optimization problem also involves a convex set Ω R n, but proceeds in trials: at each trial t, the player/algorithm must choose a point x t Ω, after which a convex cost function c t : Ω R is revealed by the environment/adversary, and a loss of c t x t is incurred for the trial; the goal is to minimize the total loss incurred over a sequence of trials, relative to the best single point in Ω in hindsight. Online Convex Optimization Convex set Ω R n For t 1,..., : Play x t Ω Receive cost function c t : Ω R Incur loss c t x t We will denote the total loss incurred by an algorithm A on trials as and the total loss of a fixed point x Ω as L [A] c t x t, L [x] c t x ; the regret of A w.r.t. the best fixed point in Ω is then simply L [A] inf x Ω L [x]. Let us consider which of the online learning problems we have seen earlier can be viewed as instances of online convex optimization OCO: 1. Online linear regression GD analysis. Here one selects a weight vector w t R n, and incurs a loss given by c t w t l y tw t x t, which is assumed to be convex in its argument, and is therefore convex in w t ; however the total loss is compared against the best fixed weight vector in only { u R n : u 2 U } for some U > 0. Call this latter set Ω. Clearly, if the weight vectors w t are also selected to be in Ω, e.g. by Euclidean projection onto Ω which in this case amounts to a simple normalization, then this becomes an instance of OCO. 2. Online linear regression EG analysis. Here one selects w t n, and incurs a loss given by c t w t l y tw t x t, which is assumed to be convex in its argument, and is therefore convex in w t ; the total loss is also compared against the best fixed weight vector in n. hus this can be viewed as an instance of OCO with Ω n. 1

2 2 Online Convex Optimization 3. Online prediction from expert advice. Here one selects a probability vector ˆp t n, and incurs a loss c t ˆp t l y tˆp t ξ t, which is assumed to be convex in its argument, and is therefore convex in ˆp t. However the total loss is compared only against the best vector in { e i n : i [n] }, where e i denotes a unit vector with 1 in the i-th position and 0 elsewhere. In general, the smallest loss in this set is not the same as the smallest loss in n, and therefore this is not an instance of OCO. 4. Online decision/allocation among experts. Here one selects a probability vector ˆp t n, and incurs a linear loss c t ˆp t ˆp t l t, which is clearly convex in ˆp t. he total loss is again compared against the best vector in { e i n : i [n] } as above, but in this case, comparison against this set is actually equivalent to comparison against n, since inf L [ˆp] ˆp n inf ˆp n ˆp l t inf ˆp ˆp n l t min i [n] e i l t min e i l t. i [n] hus online allocation with a linear loss as above can be viewed as an instance of OCO with Ω n. 2 Online Gradient Descent Algorithm he online gradient descent OGD algorithm [7] generalizes the basic gradient descent algorithm used for standard offline optimization problems, and can be described as follows: Algorithm Online Gradient Descent OGD Parameter: η > 0 Initialize: x 1 Ω For t 1,..., : Play x t Ω Receive cost function c t : Ω R Incur loss c t x t Update: x t+1 x t η c t x t x t+1 P Ω x t+1, where P Ω x argmin x x 2 Euclidean projection of x onto Ω x Ω heorem 2.1 Zinkevich, Let Ω R n be a closed, non-empty, convex set with x x 1 2 D x Ω. Let c t : Ω R t 1, 2,... be any sequence of convex cost functions with bounded subgradients, c t x 2 G t, x Ω. hen D G [ ] L OGDη inf L [x] 1 D 2 x Ω 2 η + η G2. Moreover, setting η upper bounds on them, are known in advance, we have to minimize the right-hand side of the above bound assuming D, G and, or L [ OGDη ] inf x Ω L [x] DG. Notes. Before proving the above theorem, we note the following: 1. he analysis of the GD algorithm for online regression can be viewed as a special case of the above, with D U and G ZR 2 see Lecture 17 notes. 2. With η as above which depends on knowledge of D, G and, one gets an O regret bound. Such an O regret bound with worse constants can also be obtained by using a time-varying parameter η t 1/ t t in fact this was the version contained in the original paper of Zinkevich [7].

3 Online Convex Optimization 3 Proof of heorem 2.1. he proof is similar to that of the regret bound we obtained for the GD algorithm for online regression in Lecture 17. Let x Ω. We have t [ ]: ct x t c t x c t x t x t x, by convexity of c t x t x t+1 x t x Summing over t 1..., we thus get η x x t 2 2 x x t x t x t x x t 2 2 x x t x t x t x x t 2 2 x x t η 2 c t x t 2 2 ct x t c t x 1 x x x x η 2 G 2 1 D 2 + η 2 G 2. It is easy to see that the right-hand side is minimized at η inequality gives the desired result. D G ; substituting this back in the above 2.1 Logarithmic regret bound for OGD when cost functions are strongly convex using time-varying parameter η t he above result gives an O regret bound for the OGD algorithm in general. When the cost functions c t are known to be α-strongly convex, one can in fact obtain an Oln regret bound with a slight variant of the OGD algorithm that uses a suitable time-varying parameter η t [3]: heorem 2.2 Hazan et al, Let Ω R n be a closed, non-empty, convex set with x x 1 2 D x Ω. Let c t : Ω R t 1, 2,... be any sequence of α-strongly convex cost functions with bounded subgradients, c t x t 2 G t, x Ω. hen the OGD algorithm with time-varying parameter η t 1 αt satisfies L [OGD η t 1 αt ] inf L [x] G2 ln + 1. x Ω Proof. Let x Ω. Proceeding similarly as before, we have t [ ]: ct x t c t x c t x t x t x α 2 xt x 2 2, by α-strong convexity of c t x t x t+1 x t x α η t 2 xt x x x t 2 2 x x t x t x t αη t x t x 2 2 t 1 x x t 2 2 x x t x t x t αη t x t x 2 2 t 1 x x t 2 2 x x t η 2 t c t x t 2 2 αη t x t x 2 2. t Summing over t 1... then gives ct x t c t x G2 2 G2 η t α x x η ln t2 1 1 α x x t 2 2 η t η t 1

4 4 Online Convex Optimization 3 Online Mirror Descent Algorithm he online mirror descent OMD algorithm, described for example in [6], generalizes the basic mirror descent algorithm used for standard offline optimization problems [4, 5, 1] much as OGD generalizes basic gradient descent. he OMD framework allows one to obtain O regret bounds for cost functions with bounded subgradients not only in Euclidean norm assumed by the OGD analysis, but in any suitable norm. Let Ω R n be a convex set containing Ω i.e. Ω Ω, and let F : Ω R be a strictly convex and differentiable function. Let B F : Ω Ω R denote the Bregman divergence associated with F ; recall that for all u, v Ω, this is defined as B F u, v F u F v u v F v. hen the OMD algorithm using the set Ω and function F can be described as follows: Algorithm Online Mirror Descent OMD Parameters: η > 0; convex set Ω Ω; strictly convex, differentiable function F : Ω R Initialize: x 1 argmin F x x Ω For t 1,..., : Play x t Ω Receive cost function c t : Ω R Incur loss c t x t Update: F x t+1 F x t η c t x t x t+1 P F Ω xt+1, where P F Ω Assume this yields x t+1 Ω x argmin B F x, x Bregman projection of x onto Ω x Ω Below we will obtain a regret bound for OMD with appropriate choice of the function F for cost functions with subgradients bounded in an arbitrary norm. Before we do so, let us recall the notion of a dual norm. Specifically, let be any norm defined on a closed convex set Ω R n. hen the corresponding dual norm is defined as u sup x Ω: x 1 x u 2 sup x u 1 2 x 2. x Ω heorem 3.1. Let Ω R n be a closed, non-empty, convex set. Let c t : Ω R t 1, 2,... be any sequence of convex cost functions with bounded subgradients c t x t G t, x Ω where is an arbitrary norm. Let Ω Ω be a convex set and F : Ω R be a strictly convex function such that a F x F x 1 D 2 x Ω; and b the restriction of F to Ω is α-strongly convex w.r.t., the dual norm of. hen [ ] L OMDη inf L [x] 1 D 2 + η2 G 2. x Ω η Moreover, setting η D G α, or upper bounds on them, are known in advance, we have to minimize the right-hand side of the above bound assuming D, G, and L [ OMDη ] inf x Ω L [x] DG 2 α.

5 Online Convex Optimization 5 Proof. Let x Ω. Proceeding as before, we have t [ ]: ct x t c t x c t x t x t x, by convexity of c t F x t F x t+1 x t x η B F x, x t B F x, x t+1 + B F x t, x t+1 1 η Summing over t 1..., we thus get 1 η B F x, x t B F x, x t+1 B F x t+1, x t+1 + B F x t, x t+1. ct x t c t x 1 B F x, x 1 B F x, x η η Now, it can be shown that Also, we have hus we get B F x t, x t+1 B F x t+1, x t+1 B F x, x 1 F x F x 1 D 2. F x t F x t+1 F x t+1 x t x t+1 B F x t, x t+1 B F x t+1, x t+1. F x t x t x t+1 α 2 xt x t+1 2 F x t+1.x t x t+1, by α-strong convexity of F on Ω η c t x t x t x t+1 α 2 xt x t+1 2 ηg x t x t+1 α 2 xt x t+1 2, by Hölders inequality η 2 G 2, η2 G 2. since η2 G 2 + α 2 xt x t+1 2 ηg x t x t+1 ct x t c t x 1 η It is easy to verify that the right-hand side is minimized at η D G inequality gives the desired result. Examples. Examples of OMD algorithms include the following: D 2 + η2 G 2. ηg α 2 xt x t ; substituting this back in the above 1. OGD. Here Ω R n and F x 1 2 x 2 2. Note that F is 1-strongly convex on R n and therefore on any Ω R n. 2. Exponential weights. Here Ω a n for some a > 0; Ω R n ++; and F x n i1 x i ln x n i1 x i. Note that F is strictly convex on Ω and 1-strongly convex on Ω n.

6 6 Online Convex Optimization Exercise. Suppose the cost functions c t are all β-strongly convex for some β > 0 and have subgradients bounded in some arbitrary norm as above. Can you obtain a logarithmic Oln regret bound for the OMD algorithm in this case, as we did for the OGD algorithm? Why or why not? More details on online convex optimization, including missing details in some of the above proofs, can be found for example in [2]. 4 Next Lecture In the next lecture, we will consider how algorithms for online supervised learning problems such as online regression and classification can be used in a batch setting, and will see how regret bounds for online learning algorithms can be converted to generalization error bounds for the resulting batch algorithms. References [1] Amir Beck and Marc eboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 313: , [2] Sébastion Bubeck. Introduction to online optimization. Lecture Notes, Princeton University, [3] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69: , [4] A. Nemirovski. Efficient methods for large-scale convex optimization problems. Ekonomika i Matematicheskie Metody, 15, [5] A. Nemirovski and D. Yudin. Problem Complexity and Method Efficiency in Optimization. Wiley Interscience, [6] Shai Shalev-Shwartz. Online Learning: heory, Algorithms, and Applications. PhD thesis, he Hebrew University of Jerusalem, [7] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning ICML, 2003.

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09 1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Lecture Notes on Online Learning DRAFT

Lecture Notes on Online Learning DRAFT Lecture Notes on Online Learning DRAFT Alexander Rakhlin These lecture notes contain material presented in the Statistical Learning Theory course at UC Berkeley, Spring 08. Various parts of these notes

More information

Introduction to Online Optimization

Introduction to Online Optimization Princeton University - Department of Operations Research and Financial Engineering Introduction to Online Optimization Sébastien Bubeck December 14, 2011 1 Contents Chapter 1. Introduction 5 1.1. Statistical

More information

Optimal Strategies and Minimax Lower Bounds for Online Convex Games

Optimal Strategies and Minimax Lower Bounds for Online Convex Games Optimal Strategies and Minimax Lower Bounds for Online Convex Games Jacob Abernethy UC Berkeley jake@csberkeleyedu Alexander Rakhlin UC Berkeley rakhlin@csberkeleyedu Abstract A number of learning problems

More information

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes Advanced Course in Machine Learning Spring 2011 Online Learning Lecturer: Shai Shalev-Shwartz Scribe: Shai Shalev-Shwartz In this lecture we describe a different model of learning which is called online

More information

Mind the Duality Gap: Logarithmic regret algorithms for online optimization

Mind the Duality Gap: Logarithmic regret algorithms for online optimization Mind the Duality Gap: Logarithmic regret algorithms for online optimization Sham M. Kakade Toyota Technological Institute at Chicago sham@tti-c.org Shai Shalev-Shartz Toyota Technological Institute at

More information

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

More information

Online Learning of Optimal Strategies in Unknown Environments

Online Learning of Optimal Strategies in Unknown Environments 1 Online Learning of Optimal Strategies in Unknown Environments Santiago Paternain and Alejandro Ribeiro Abstract Define an environment as a set of convex constraint functions that vary arbitrarily over

More information

The p-norm generalization of the LMS algorithm for adaptive filtering

The p-norm generalization of the LMS algorithm for adaptive filtering The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology

More information

1 Portfolio Selection

1 Portfolio Selection COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # Scribe: Nadia Heninger April 8, 008 Portfolio Selection Last time we discussed our model of the stock market N stocks start on day with

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Strongly Adaptive Online Learning

Strongly Adaptive Online Learning Amit Daniely Alon Gonen Shai Shalev-Shwartz The Hebrew University AMIT.DANIELY@MAIL.HUJI.AC.IL ALONGNN@CS.HUJI.AC.IL SHAIS@CS.HUJI.AC.IL Abstract Strongly adaptive algorithms are algorithms whose performance

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Adaptive Bound Optimization for Online Convex Optimization

Adaptive Bound Optimization for Online Convex Optimization Adaptive Bound Optimization for Online Convex Optimization H. Brendan McMahan Google, Inc. mcmahan@google.com Matthew Streeter Google, Inc. mstreeter@google.com Abstract We introduce a new online convex

More information

Convex analysis and profit/cost/support functions

Convex analysis and profit/cost/support functions CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

More information

Load balancing of temporary tasks in the l p norm

Load balancing of temporary tasks in the l p norm Load balancing of temporary tasks in the l p norm Yossi Azar a,1, Amir Epstein a,2, Leah Epstein b,3 a School of Computer Science, Tel Aviv University, Tel Aviv, Israel. b School of Computer Science, The

More information

Online Learning meets Optimization in the Dual

Online Learning meets Optimization in the Dual Online Learning meets Optimization in the Dual Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., The Hebrew University, Jerusalem 91904, Israel 2 Google Inc., 1600 Amphitheater

More information

Introduction to Convex Optimization for Machine Learning

Introduction to Convex Optimization for Machine Learning Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning

More information

Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x

Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x Lecture 4. LaSalle s Invariance Principle We begin with a motivating eample. Eample 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum Dynamics of a pendulum with friction can be written

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Consistency of Surrogate Risk Minimization Methods for Binary Classification using Strongly Proper Losses

Consistency of Surrogate Risk Minimization Methods for Binary Classification using Strongly Proper Losses E0 370 Statistical Learning Theory Lecture 13 Sep 4, 013) Consistency of Surrogate Risk Minimization Methods for Binary Classification using Strongly Proper Losses Lecturer: Shivani Agarwal Scribe: Rohit

More information

A Drifting-Games Analysis for Online Learning and Applications to Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire

More information

Projection-free Online Learning

Projection-free Online Learning Elad Hazan Technion - Israel Inst. of Tech. Satyen Kale IBM T.J. Watson Research Center ehazan@ie.technion.ac.il sckale@us.ibm.com Abstract The computational bottleneck in applying online learning to massive

More information

Online Convex Programming and Generalized Infinitesimal Gradient Ascent

Online Convex Programming and Generalized Infinitesimal Gradient Ascent Online Convex Programming and Generalized Infinitesimal Gradient Ascent Martin Zinkevich February 003 CMU-CS-03-110 School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 Abstract Convex

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Online Lazy Updates for Portfolio Selection with Transaction Costs

Online Lazy Updates for Portfolio Selection with Transaction Costs Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence Online Lazy Updates for Portfolio Selection with Transaction Costs Puja Das, Nicholas Johnson, and Arindam Banerjee Department

More information

Lecture 12 Basic Lyapunov theory

Lecture 12 Basic Lyapunov theory EE363 Winter 2008-09 Lecture 12 Basic Lyapunov theory stability positive definite functions global Lyapunov stability theorems Lasalle s theorem converse Lyapunov theorems finding Lyapunov functions 12

More information

Online Learning, Stability, and Stochastic Gradient Descent

Online Learning, Stability, and Stochastic Gradient Descent Online Learning, Stability, and Stochastic Gradient Descent arxiv:1105.4701v3 [cs.lg] 8 Sep 2011 September 9, 2011 Tomaso Poggio, Stephen Voinea, Lorenzo Rosasco CBCL, McGovern Institute, CSAIL, Brain

More information

Efficient Algorithms for Online Game Playing and Universal Portfolio Management

Efficient Algorithms for Online Game Playing and Universal Portfolio Management Electronic Colloquium on Computational Complexity, Report No. 33 (2006) Efficient Algorithms for Online Game Playing and Universal Portfolio Management Amit Agarwal Elad Hazan Abstract A natural algorithmic

More information

Online Algorithms: Learning & Optimization with No Regret.

Online Algorithms: Learning & Optimization with No Regret. Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the

More information

Approximating Semidefinite Programs in Sublinear Time

Approximating Semidefinite Programs in Sublinear Time Approximating Semidefinite Programs in Sublinear Time Dan Garber Technion - Israel Institute of Technology Haifa 3000 Israel dangar@cs.technion.ac.il Elad Hazan Technion - Israel Institute of Technology

More information

Online Learning and Online Convex Optimization. Contents

Online Learning and Online Convex Optimization. Contents Foundations and Trends R in Machine Learning Vol. 4, No. 2 (2011) 107 194 c 2012 S. Shalev-Shwartz DOI: 10.1561/2200000018 Online Learning and Online Convex Optimization By Shai Shalev-Shwartz Contents

More information

LINEAR PROGRAMMING WITH ONLINE LEARNING

LINEAR PROGRAMMING WITH ONLINE LEARNING LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA

More information

ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS 1. INTRODUCTION ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

S = k=1 k. ( 1) k+1 N S N S N

S = k=1 k. ( 1) k+1 N S N S N Alternating Series Last time we talked about convergence of series. Our two most important tests, the integral test and the (limit) comparison test, both required that the terms of the series were (eventually)

More information

CONVEX optimization forms the backbone of many

CONVEX optimization forms the backbone of many IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 58, NO 5, MAY 2012 3235 Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization Alekh Agarwal, Peter L Bartlett, Member,

More information

Math 317 HW #5 Solutions

Math 317 HW #5 Solutions Math 317 HW #5 Solutions 1. Exercise 2.4.2. (a) Prove that the sequence defined by x 1 = 3 and converges. x n+1 = 1 4 x n Proof. I intend to use the Monotone Convergence Theorem, so my goal is to show

More information

TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

More information

Sequences and Convergence in Metric Spaces

Sequences and Convergence in Metric Spaces Sequences and Convergence in Metric Spaces Definition: A sequence in a set X (a sequence of elements of X) is a function s : N X. We usually denote s(n) by s n, called the n-th term of s, and write {s

More information

Sparse Online Learning via Truncated Gradient

Sparse Online Learning via Truncated Gradient Sparse Online Learning via Truncated Gradient John Langford Yahoo! Research jl@yahoo-inc.com Lihong Li Department of Computer Science Rutgers University lihong@cs.rutgers.edu Tong Zhang Department of Statistics

More information

A note on infinite systems of linear inequalities in Rn

A note on infinite systems of linear inequalities in Rn Carnegie Mellon University Research Showcase @ CMU Department of Mathematical Sciences Mellon College of Science 1973 A note on infinite systems of linear inequalities in Rn Charles E. Blair Carnegie Mellon

More information

Definition 11.1. Given a graph G on n vertices, we define the following quantities:

Definition 11.1. Given a graph G on n vertices, we define the following quantities: Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x

max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study

More information

Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, In which we introduce the theory of duality in linear programming.

Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, In which we introduce the theory of duality in linear programming. Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, 2011 Lecture 6 In which we introduce the theory of duality in linear programming 1 The Dual of Linear Program Suppose that we

More information

Mathematical Methods of Engineering Analysis

Mathematical Methods of Engineering Analysis Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................

More information

The Relative Worst Order Ratio for On-Line Algorithms

The Relative Worst Order Ratio for On-Line Algorithms The Relative Worst Order Ratio for On-Line Algorithms Joan Boyar 1 and Lene M. Favrholdt 2 1 Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, joan@imada.sdu.dk

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - April 2013 Context Machine

More information

Efficient Projections onto the l 1 -Ball for Learning in High Dimensions

Efficient Projections onto the l 1 -Ball for Learning in High Dimensions Efficient Projections onto the l 1 -Ball for Learning in High Dimensions John Duchi Google, Mountain View, CA 94043 Shai Shalev-Shwartz Toyota Technological Institute, Chicago, IL, 60637 Yoram Singer Tushar

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

More information

Multi-Armed Bandit Problems

Multi-Armed Bandit Problems Machine Learning Coms-4771 Multi-Armed Bandit Problems Lecture 20 Multi-armed Bandit Problems The Setting: K arms (or actions) Each time t, each arm i pays off a bounded real-valued reward x i (t), say

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

x if x 0, x if x < 0.

x if x 0, x if x < 0. Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Optimal Online-list Batch Scheduling

Optimal Online-list Batch Scheduling Optimal Online-list Batch Scheduling Jacob Jan Paulus a,, Deshi Ye b, Guochuan Zhang b a University of Twente, P.O. box 217, 7500AE Enschede, The Netherlands b Zhejiang University, Hangzhou 310027, China

More information

N E W S A N D L E T T E R S

N E W S A N D L E T T E R S N E W S A N D L E T T E R S 73rd Annual William Lowell Putnam Mathematical Competition Editor s Note: Additional solutions will be printed in the Monthly later in the year. PROBLEMS A1. Let d 1, d,...,

More information

Minimize subject to. x S R

Minimize subject to. x S R Chapter 12 Lagrangian Relaxation This chapter is mostly inspired by Chapter 16 of [1]. In the previous chapters, we have succeeded to find efficient algorithms to solve several important problems such

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

LS.6 Solution Matrices

LS.6 Solution Matrices LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information

COS 511: Foundations of Machine Learning. Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 2006

COS 511: Foundations of Machine Learning. Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 2006 COS 511: Foundations of Machine Learning Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 006 In the previous lecture, we introduced a new learning model, the Online Learning Model. After seeing a concrete

More information

Operations Research and Financial Engineering. Courses

Operations Research and Financial Engineering. Courses Operations Research and Financial Engineering Courses ORF 504/FIN 504 Financial Econometrics Professor Jianqing Fan This course covers econometric and statistical methods as applied to finance. Topics

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

Separation Properties for Locally Convex Cones

Separation Properties for Locally Convex Cones Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam

More information

A Potential-based Framework for Online Multi-class Learning with Partial Feedback

A Potential-based Framework for Online Multi-class Learning with Partial Feedback A Potential-based Framework for Online Multi-class Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Algorithms for sustainable data centers

Algorithms for sustainable data centers Algorithms for sustainable data centers Adam Wierman (Caltech) Minghong Lin (Caltech) Zhenhua Liu (Caltech) Lachlan Andrew (Swinburne) and many others IT is an energy hog The electricity use of data centers

More information

The Simplex Method. yyye

The Simplex Method.  yyye Yinyu Ye, MS&E, Stanford MS&E310 Lecture Note #05 1 The Simplex Method Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/

More information

CS229T/STAT231: Statistical Learning Theory (Winter 2015)

CS229T/STAT231: Statistical Learning Theory (Winter 2015) CS229T/STAT231: Statistical Learning Theory (Winter 2015) Percy Liang Last updated Wed Oct 14 2015 20:32 These lecture notes will be updated periodically as the course goes on. Please let us know if you

More information

Convex Programming Tools for Disjunctive Programs

Convex Programming Tools for Disjunctive Programs Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible

More information

We shall turn our attention to solving linear systems of equations. Ax = b

We shall turn our attention to solving linear systems of equations. Ax = b 59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - April 2013 Context Machine

More information

4.1 Introduction - Online Learning Model

4.1 Introduction - Online Learning Model Computational Learning Foundations Fall semester, 2010 Lecture 4: November 7, 2010 Lecturer: Yishay Mansour Scribes: Elad Liebman, Yuval Rochman & Allon Wagner 1 4.1 Introduction - Online Learning Model

More information

Course 221: Analysis Academic year , First Semester

Course 221: Analysis Academic year , First Semester Course 221: Analysis Academic year 2007-08, First Semester David R. Wilkins Copyright c David R. Wilkins 1989 2007 Contents 1 Basic Theorems of Real Analysis 1 1.1 The Least Upper Bound Principle................

More information

Lecture 7: Finding Lyapunov Functions 1

Lecture 7: Finding Lyapunov Functions 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

More information

Efficient Learning using Forward-Backward Splitting

Efficient Learning using Forward-Backward Splitting Efficient Learning using Forward-Backward Splitting John Duchi University of California Berkeley jduchi@cs.berkeley.edu Yoram Singer Google singer@google.com Abstract We describe, analyze, and experiment

More information

Lecture 3: UCB Algorithm, Worst-Case Regret Bound

Lecture 3: UCB Algorithm, Worst-Case Regret Bound IEOR 8100-001: Learning and Optimization for Sequential Decision Making 01/7/16 Lecture 3: UCB Algorithm, Worst-Case Regret Bound Instructor: Shipra Agrawal Scribed by: Karl Stratos = Jang Sun Lee 1 UCB

More information

{f 1 (U), U F} is an open cover of A. Since A is compact there is a finite subcover of A, {f 1 (U 1 ),...,f 1 (U n )}, {U 1,...

{f 1 (U), U F} is an open cover of A. Since A is compact there is a finite subcover of A, {f 1 (U 1 ),...,f 1 (U n )}, {U 1,... 44 CHAPTER 4. CONTINUOUS FUNCTIONS In Calculus we often use arithmetic operations to generate new continuous functions from old ones. In a general metric space we don t have arithmetic, but much of it

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

1 Polyhedra and Linear Programming

1 Polyhedra and Linear Programming CS 598CSC: Combinatorial Optimization Lecture date: January 21, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im 1 Polyhedra and Linear Programming In this lecture, we will cover some basic material

More information

The Expectation Maximization Algorithm A short tutorial

The Expectation Maximization Algorithm A short tutorial The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Completely Positive Cone and its Dual

Completely Positive Cone and its Dual On the Computational Complexity of Membership Problems for the Completely Positive Cone and its Dual Peter J.C. Dickinson Luuk Gijben July 3, 2012 Abstract Copositive programming has become a useful tool

More information

Classification/Decision Trees (II)

Classification/Decision Trees (II) Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).

More information

Follow the Leader with Dropout Perturbations

Follow the Leader with Dropout Perturbations JMLR: Worshop and Conference Proceedings vol 35: 26, 204 Follow the Leader with Dropout Perturbations Tim van Erven Département de Mathématiques, Université Paris-Sud, France Wojciech Kotłowsi Institute

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization Adaptive Subgradient Methods Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi Computer Science Division University of California, Berkeley Berkeley, CA 94720 USA

More information

Chapter 6 Finite sets and infinite sets. Copyright 2013, 2005, 2001 Pearson Education, Inc. Section 3.1, Slide 1

Chapter 6 Finite sets and infinite sets. Copyright 2013, 2005, 2001 Pearson Education, Inc. Section 3.1, Slide 1 Chapter 6 Finite sets and infinite sets Copyright 013, 005, 001 Pearson Education, Inc. Section 3.1, Slide 1 Section 6. PROPERTIES OF THE NATURE NUMBERS 013 Pearson Education, Inc.1 Slide Recall that denotes

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Accelerated Parallel Optimization Methods for Large Scale Machine Learning

Accelerated Parallel Optimization Methods for Large Scale Machine Learning Accelerated Parallel Optimization Methods for Large Scale Machine Learning Haipeng Luo Princeton University haipengl@cs.princeton.edu Patrick Haffner and Jean-François Paiement AT&T Labs - Research {haffner,jpaiement}@research.att.com

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information