# Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization

Save this PDF as:

Size: px
Start display at page:

Download "Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization"

## Transcription

5 Algorithm 1 Bandit Online Linear Optimization 1: Input: η > 0, ϑ-self-concordant R 2: Let x 1 = arg min x K [R(x]. 3: for t = 1 to do 4: Let {e 1,..., e n } and {λ 1,..., λ n } be the set of eigenvectors and eigenvalues of 2 R(x t. 5: Choose uniformly at random from {1,..., n} and ε t = ±1 with probability 1/2. 6: Predict y t = x t + ε t λ 1/2 e it. 7: Observe the gain ft y t R. 8: Define f t := n (ft y t ε t λ 1/2 e it. 9: Update [ 10: end for x t+1 = arg min x K η ] t f s x + R(x. s=1 Shortest Path problem. he precise analysis of our algorithm is given in Section 8. Finally, in Section 9 we spell out how to implement the algorithm with only one iteration of the Damped Newton method per time step. he following theorem is the main result of this paper (see Section 5 for the definition of ϑ-self-concordant barrier. heorem 1 Let K be a convex set and R be a ϑ-self-concordant barrier on K. Let u be any vector in K = K 1/. Suppose we have the property that ft x 1 for any x K. Setting η = ϑ log 4n, the regret of Algorithm 1 is bounded as ( E ft y t min E f u K t u + 16n ϑ log whenever > 8ϑ log. he expected regret over the original set K is within an additive O( n factor from the above guarantee, as implied by Lemma 8 in the Appendix. 4 Regularization Algorithms and Bregman Divergences As our algorithm is clearly based on a regularization framework, we now state a general result for the performance of any algorithm minimizing the regularized empirical loss. We call this method Follow the Regularized Leader, and we defer the proof of the regret bound to the Appendix. A similar analysis for convex loss functions can be found in [5], Chapter 11. We remark that the use of Bregman divergences in the context of online learning goes back at least to Kivinen and Warmuth [12]. Let f 1,..., f R n be any sequence of vectors. Suppose x t+1 is obtained as [ ] t x t+1 = arg min η f s x + R(x x K s=1 } {{ } Φ t(x (4 for some strictly-convex differentiable function R. We denote Φ 0 (x = R(x and Φ t = Φ t 1 + η f t. We will assume R approaches infinity at the boundary of K so that the unconstrained minimization problem will have a unique solution within K. We have the following bound on the performance of such an algorithm. Lemma 2 For any u K, the algorithm defined by (4 enjoys the following regret guarantee η f t (x t u D R (u, x 1 + for any sequence { f t }. D R (u, x 1 + η D R (x t, x t+1 f t (x t x t+1 In addition, we state a useful result that bounds the true regret based on the regret against the estimated functions f t. Lemma 3 Suppose that, for t = 1,...,, f t is such that E ft = f t and y t is such that E y t = x t. Suppose that we have the following regret bound: f t x t min u K f t u + C. hen the expected regret satisfies ( ( E ft y t min E f u K t u + C. 5 Self-concordant Functions and the Dikin ellipsoid Interior-point methods are arguably one of the greatest achievements in the field of Convex Optimization in the past two decades. hese iterative polynomial-time algorithms for Convex Optimization find the solution by adding a barrier function to the objective and solving the unconstrained minimization problem. he rough idea is to gradually reduce the weight of the barrier function as one approaches the solution. he construction of barrier functions for general convex sets has been studied extensively, and we refer the reader to [16, 4] for a thorough treatment on the subject. o be more precise, most of the results of this section can be found in [15], page 22-23, as well as in the aforementioned texts. 5.1 Definitions and Properties Definition 4 A self-concordant function R : int K R is a C 3 convex function such that D 3 R(x[h, h, h] 2 ( D 2 R(x[h, h] 3/2. Here, the third-order differential is defined as D 3 R(x[h 1, h 2, h 3 ] := 3 t 1 t 2 t 3 t1=t 2=t 3=0R(x + t 1 h 1 + t 2 h 2 + t 3 h 3.

7 within K and thus, in particular, so are the points x t ±λ 1/2 i e i for each i. Assuming the point y t := x t +λ 1/2 j e j was sampled and we received the value ft y t, we then construct the estimate ft := n λ j (ft y t e j. Notice it is crucial that we scale by λ j, the inverse l 2 distance between x t and y t, to ensure that f t is unbiased.on the other hand, we see that the divergence is approximately computed as D R (x t, x t+1 η 2 f t 2 R 1 ft = η 2 n 2 (f t y t 2 λ j (e j 2 R 1 e j = η 2 n 2 (f t y t 2. As an interesting and important aside, a necessary requirement of the above analysis is that we construct our estimates f t from the eigendirections e j. o see this, imagine that one eigenvalue λ 1 is very large, while another, λ 2 small. his corresponds to a thin and long Dikin ellipsoid, which would occur near a flat boundary. Suppose that instead of eigen-directions, we sample at an angle between them. With the thin ellipsoid the sampled points are still close in l 2 distance, implying that f t will be large in both eigen-directions. However, the inverse Hessian will only annihilate one of these directions. 7 Application to the online shortest path problem Because of its appealing structure, the online shortest path problem is one of the best studied problems in online optimization. akimoto and Warmuth [20], and later Kalai and Vempala [11], gave efficient algorithms for the full information setting. Awerbuch and Kleinberg [2] were the first to give an efficient algorithm with O( 2/3 regret in the partial information (bandit setting. he recent work of Dani et al [6] implies a O(m 3/2 -regret algorithm, where m = E is the number of edges in the graph. urning to Algorithm 1, we notice that whenever K is defined by linear constraints, R is defined in a straightforward way (see Section 5.2. As we show below, the online shortest path is an optimization problem on such a set, and we obtain an efficient O(m 3/2 -regret algorithm. Formally, the bandit shortest path problem is defined as the following repeated game: Given a directed graph G = (V, E and a source-sink pair s, t V, at each time step t = 1 to, Player chooses a path p t P s,t, where P s,t {E} V is the set of all s, t-paths in the graph Adversary independently chooses weights on the edges of the graph f t R m Player suffers and observes loss, which is the weighted length of the chosen path e p t f t (e he problem is transformed into an instance of bandit linear optimization by associating each path with a vector x {0, 1} E, where x(i indicates the presence of the ith edge. he loss is then defined through the dot product f x. Define the set K as the convex hull of the set of paths. It is well-known that this set is the set of flows in the graph and can be defined using O(m constraints: positivity constraints and conservation of in-flow and out-flow for every vertex other than source/sink (which have unit out-flow and in-flow, respectively. heorem 1 implies that Algorithm 1 attains O(m 3/2 regret for the bandit linear optimization problem over this set K. However, an astute reader would notice that with this definition of K, the algorithm produces a flow y t K, not necessarily a path, at each round. he loss suffered by the online player is f t y t and the game is specified differently from the bandit shortest path. However, it is easy to convert this flow algorithm into a randomized online shortest path algorithm: according to the standard flow decomposition theorem (see e.g. [17], a given flow in the graph can be decomposed into a distribution over at most m + 1 paths in polynomial time. Hence, given a flow y t K, one can obtain an unbiased estimator for f t y t by choosing a path according to the distribution of the decomposition, and estimating f t y t by the length of this path. In fact, we have the following general statement. Proposition 1 Suppose that, having computed y t in step (1 of Algorithm 1, we predict a random ȳ t K such that Eȳ t = y t, and in step (1 observe f t ȳt. If we use this observed value instead of f t y t in step (1, the expected regret of the modified algorithm is the same as that of Algorithm 1. he proposition implies that the modified algorithm attains low regret for games defined over discrete sets of possible predictions for the player. his is achieved by working with the convex hull of the discrete set while predicting in the original set. In particular, the modification allows us to predict a legal path while the algorithm works with the set of flows. he proof of Proposition 1 is straightforward: following closely the proof of heorem 1, we observe that the value f t y t is used in only two places. he first is in Equation (9, where it is upper-bounded by 1, and the second is in the proof of the fact that f t is unbiased. 8 Proof of the regret bound 8.1 Unbiasedness First, we show that E f t = f t. Condition on the choice and average over the choice of ε t : Hence, E εt ft = 1 ( 2 n f t (x t + λ 1/2 e it 1 ( 2 n f t (x t λ 1/2 e it = n(f t e it e it. Furthermore, Ey t = x t. E f t = n ( E it e it e ft = f t. λ 1/2 λ 1/2 e it e it

8 and x t+1 x t W 1 (x t α < 4nη < 1 2 by our choice of η and. We conclude that the local optimum arg min z W 12 (x t Φ t (z is strictly inside W 4nη (x t, and since Φ t is convex, the global optimum is x t+1 = arg min z K Φ t(z W 4nη (x t. Figure 1: he Dikin ellipsoid W 1 (x t at x t. he next minimum is guaranteed to lie in its scaled version W 4nη (x t. 8.2 Closeness of the next minimum We now use the properties of the Dikin ellipsoids mentioned in the previous section. Lemma 6 he next minimizer x t+1 is close to x t : Proof: Recall that x t+1 W 4nη (x t. x t+1 = arg min x K Φ t(x and x t = arg min x K Φ t 1(x where Φ t (x = η t f s=1 t x+r(x. Since Φ t 1 (x t = 0, we conclude that Φ t (x t = η f t. Consider any point in z W 1 (x t. It can be written as 2 z = x t + αu for some vector u such that u xt = 1 and α ( 1 2, 1 2. Expanding, Φ t (z = Φ t (x t + αu = Φ t (x t + α Φ t (x t u + α u 2 Φ t (ξu = Φ t (x t + αη f t u + α u 2 Φ t (ξu for some ξ on the path between x t and x t + αu. Let us check where the optimum of the RHS is obtained. Setting the derivative with respect to α to zero, we obtain α = η f t u u 2 Φ t (ξu = η f t u u 2 R(ξu. he fact that ξ is on the line x t to x t + αu implies that ξ x t xt αu xt < 1 2. Hence, by Eq (5, 2 R(ξ (1 ξ x t xt 2 2 R(x t R(x t. hus u 2 R(ξu > 1 4 u x t α < 4η f t u. = 1 4, and hence Recall that f t = n (f t y t ε t λ 1/2 e it and so f t u is maximized/minimized when u is a unit (with respect to xt vector in the direction of e it, i.e. u = ±λ 1/2 e it. We conclude that f t u n f t y t n (9 8.3 Proof of heorem 1 We are now ready to prove the regret bound for Algorithm 1. Since x t+1 W 4nη (x t, we invoke Eq (6 at x = x t and z = h = x t+1 x t : h ( R(x t+1 R(x t h 2 x t 1 h xt. Observe that x t+1 W 4nη (x t implies h xt < 4nη. he proof of Lemma 2 (Equation (12 in the Appendix reveals that We have R(x t R(x t+1 = η f t. f t (x t x t+1 = η 1 h ( R(x t+1 R(x t η 1 h 2 x t 1 h xt 16n2 η 1 4nη By Lemma 2, for any u K 1/ 32n 2 η. (10 f t (x t u η 1 D R (u, x 1 + f t (x t x t+1 η 1 D R (u, x n 2 η = η 1 (R(u R(x n 2 η 1 η (2ϑ log + 32n2 η, where the first equality follows since R(x 1 = 0, by the choice of x 1 ; the last inequality follows from Equation (7. Balancing with η = ϑ log 4n, we get f t (x t u 16n ϑ log. for any u in the scaled set K. Using Lemma 3, which we prove below, we obtain the statement of heorem Expected Regret Note that it is not f t x t that the algorithm should be incurring, but rather f t y t. However, it is easy to see that these are equal in expectation.

10 he functions ˆf t computed by the above algorithm are unbiased estimates of f t constructed by sampling eigenvectors of 2 R(z t. Define the Follow he Regularized Leader solutions ˆx t+1 arg min x K Ψ t(x, on the new functions ˆf t s. he sequence {ˆx t,ˆf t } is different from the sequence {x t, f t } generated by Algoritm 1. However, the same regret bound can be proved for the new algorithm. he only difference from the proof for Algorithm 1 is in the fact that ˆf t s are estimated using the Hessian at z t, not ˆx t. However, as we show next, z t is very close to ˆx t, and therefore the Hessians are within a factor of 2 by Equation (5, leading to a slightly worse constant for the regret. Lemma 7 It holds that for all t, λ 2 (Ψ t, z t 4n 2 η 2 Proof: he proof is by induction on t. For t = 1 the result is true because x 1 is chosen to minimize R. Suppose the statement holds for t 1. By definition, Note that λ 2 (Ψ t, z t = Ψ t (z t [ 2 Ψ t (z t ] 1 Ψ t (z t = Ψ t (z t [ 2 R(z t ] 1 Ψ t (z t. Ψ t (z t = Ψ t 1 (z t + ηˆf t. Using (x + y A(x + y 2x Ax + 2y Ay we obtain 1 2 λ2 (Ψ t, z t Ψ t 1 (z t [ 2 R(z t ] 1 Ψ t 1 (z t + η 2ˆf t [ 2 R(z t ] 1ˆft = λ 2 (Ψ t 1, z t + η 2ˆf t [ 2 R(z t ] 1ˆft. he first term can be bounded by fact (B and using the induction hypothesis, λ 2 (Ψ t 1, z t 4λ 4 (Ψ t 1, z t 1 64n 4 η 4. (11 As for the second term, ˆft [ 2 R(z t ] 1ˆft n 2 because of the way ˆf t is defined and since f t y t 1 by assumption. Combining the results, λ 2 (Ψ t, z t 128n 4 η 4 + 2n 2 η 2 4n 2 η 2 using the definition of η of heorem 1 and large enough. his proves the induction step. Note that Equation (11 with the choice of η and large enough implies λ 2 (Ψ t 1, z t << 1 2. Using this together with the above Lemma and facts (B and (C, we conclude that z t ˆx t ˆxt 2λ(Ψ t 1, z t 4λ(Ψ t 1, z t n 2 η 2 We observe that ˆx t and z t are very close in the local distance. his implies closeness in L 2 distance as well. Indeed, square roots of inverse eigenvalues λ 1/2 i, being the distances from ˆx t to the corresponding radii of the Dikin ellipsoid, can be at most the D. hus, 2 R D 2 I and thus z t ˆx t 2 D 1 z t ˆx t ˆxt 16D 1 n 2 η 2. As we proved, it requires only one Damped Newton update to maintain the sequence z t, which are O(1/ close to ˆx t. Hence, ft (z t ˆx t f t z t ˆx t = O(1. herefore, for any u K E ft (y t u = E ˆf t (z t u = E ˆf t (ˆx t u + E ˆf t (z t ˆx t = E ˆf t (ˆx t u + E ft (z t ˆx t = E ˆf t (ˆx t u + O(1 A slight modification of the proofs of Section 8 leads to a O( bound on the expected regret of the sequence {ˆx t }. Acknowledgments. We would like to thank Peter Bartlett for numerous illuminating discussions. We gratefully acknowledge the support of DARPA under grant FA and NSF under grant DMS References [1] Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. he nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1:48 77, [2] Baruch Awerbuch and Robert D. Kleinberg. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In SOC 04: Proceedings of the thirty-sixth annual ACM symposium on heory of computing, pages 45 53, New York, NY, USA, ACM. [3] P. Bartlett, V. Dani,. Hayes, S. Kakade, A. Rakhlin, and A. ewari. High-probability bounds for the regret of bandit online linear optimization, In submission to COL [4] A. Ben-al and A. Nemirovski. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, volume 2 of MPS/SIAM Series on Optimization. SIAM, Philadelphia, [5] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press, [6] Varsha Dani, homas Hayes, and Sham Kakade. he price of bandit information for online optimization. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MI Press, Cambridge, MA, 2008.

### How to Use Expert Advice

NICOLÒ CESA-BIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California

### Efficient algorithms for online decision problems

Journal of Computer and System Sciences 71 (2005) 291 307 www.elsevier.com/locate/jcss Efficient algorithms for online decision problems Adam Kalai a,, Santosh Vempala b a Department of Computer Science,

### Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity

Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign

### From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images

SIAM REVIEW Vol. 51,No. 1,pp. 34 81 c 2009 Society for Industrial and Applied Mathematics From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images Alfred M. Bruckstein David

### Maximizing the Spread of Influence through a Social Network

Maximizing the Spread of Influence through a Social Network David Kempe Dept. of Computer Science Cornell University, Ithaca NY kempe@cs.cornell.edu Jon Kleinberg Dept. of Computer Science Cornell University,

### Steering User Behavior with Badges

Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,

### THE PROBLEM OF finding localized energy solutions

600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Re-weighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,

### Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods

SIAM REVIEW Vol. 45,No. 3,pp. 385 482 c 2003 Society for Industrial and Applied Mathematics Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods Tamara G. Kolda Robert Michael

### Foundations of Data Science 1

Foundations of Data Science John Hopcroft Ravindran Kannan Version /4/204 These notes are a first draft of a book being written by Hopcroft and Kannan and in many places are incomplete. However, the notes

### Two faces of active learning

Two faces of active learning Sanjoy Dasgupta dasgupta@cs.ucsd.edu Abstract An active learner has a collection of data points, each with a label that is initially hidden but can be obtained at some cost.

### Fast Solution of l 1 -norm Minimization Problems When the Solution May be Sparse

Fast Solution of l 1 -norm Minimization Problems When the Solution May be Sparse David L. Donoho and Yaakov Tsaig October 6 Abstract The minimum l 1 -norm solution to an underdetermined system of linear

### IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289. Compressed Sensing. David L. Donoho, Member, IEEE

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract Suppose is an unknown vector in (a digital image or signal); we plan to

### Truthful Mechanisms for One-Parameter Agents

Truthful Mechanisms for One-Parameter Agents Aaron Archer Éva Tardos y Abstract In this paper, we show how to design truthful (dominant strategy) mechanisms for several combinatorial problems where each

### The Gödel Phenomena in Mathematics: A Modern View

Chapter 1 The Gödel Phenomena in Mathematics: A Modern View Avi Wigderson Herbert Maass Professor School of Mathematics Institute for Advanced Study Princeton, New Jersey, USA 1.1 Introduction What are

### Decoding by Linear Programming

Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095

### An Introduction to Regression Analysis

The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes * Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator

How Bad is Forming Your Own Opinion? David Bindel Jon Kleinberg Sigal Oren August, 0 Abstract A long-standing line of work in economic theory has studied models by which a group of people in a social network,

### RECENTLY, there has been a great deal of interest in

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 1, JANUARY 1999 187 An Affine Scaling Methodology for Best Basis Selection Bhaskar D. Rao, Senior Member, IEEE, Kenneth Kreutz-Delgado, Senior Member,

### WHICH SCORING RULE MAXIMIZES CONDORCET EFFICIENCY? 1. Introduction

WHICH SCORING RULE MAXIMIZES CONDORCET EFFICIENCY? DAVIDE P. CERVONE, WILLIAM V. GEHRLEIN, AND WILLIAM S. ZWICKER Abstract. Consider an election in which each of the n voters casts a vote consisting of

### Controllability and Observability of Partial Differential Equations: Some results and open problems

Controllability and Observability of Partial Differential Equations: Some results and open problems Enrique ZUAZUA Departamento de Matemáticas Universidad Autónoma 2849 Madrid. Spain. enrique.zuazua@uam.es

### Learning Deep Architectures for AI. Contents

Foundations and Trends R in Machine Learning Vol. 2, No. 1 (2009) 1 127 c 2009 Y. Bengio DOI: 10.1561/2200000006 Learning Deep Architectures for AI By Yoshua Bengio Contents 1 Introduction 2 1.1 How do

### ON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION. A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT

ON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT A numerical study of the distribution of spacings between zeros

### Generalized compact knapsacks, cyclic lattices, and efficient one-way functions

Generalized compact knapsacks, cyclic lattices, and efficient one-way functions Daniele Micciancio University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0404, USA daniele@cs.ucsd.edu

### Learning to Select Features using their Properties

Journal of Machine Learning Research 9 (2008) 2349-2376 Submitted 8/06; Revised 1/08; Published 10/08 Learning to Select Features using their Properties Eyal Krupka Amir Navot Naftali Tishby School of

### Sketching as a Tool for Numerical Linear Algebra

Foundations and Trends R in Theoretical Computer Science Vol. 10, No. 1-2 (2014) 1 157 c 2014 D. P. Woodruff DOI: 10.1561/0400000060 Sketching as a Tool for Numerical Linear Algebra David P. Woodruff IBM

### High-Rate Codes That Are Linear in Space and Time

1804 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 7, JULY 2002 High-Rate Codes That Are Linear in Space and Time Babak Hassibi and Bertrand M Hochwald Abstract Multiple-antenna systems that operate

### Optimization with Sparsity-Inducing Penalties. Contents

Foundations and Trends R in Machine Learning Vol. 4, No. 1 (2011) 1 106 c 2012 F. Bach, R. Jenatton, J. Mairal and G. Obozinski DOI: 10.1561/2200000015 Optimization with Sparsity-Inducing Penalties By