Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

Transcription

1 Journal of Machine Learning Research ) Subitted 8/11; Revised 3/12; Published 9/12 rading Regret for Efficiency: Online Convex Optiization with Long er Constraints Mehrdad Mahdavi Rong Jin ianbao Yang Departent of Coputer Science and Engineering Michigan State University East Lansing, MI, 48824, USA Editor: Shie Mannor Abstract In this paper we propose efficient algoriths for solving constrained online convex optiization probles. Our otivation stes fro the observation that ost algoriths proposed for online convex optiization require a projection onto the convex set K fro which the decisions are ade. While the projection is straightforward for siple shapes e.g., Euclidean ball), for arbitrary coplex sets it is the ain coputational challenge and ay be inefficient in practice. In this paper, we consider an alternative online convex optiization proble. Instead of requiring that decisions belong to K for all rounds, we only require that the constraints, which define the set K, be satisfied in the long run. By turning the proble into an online convex-concave optiization proble, we propose an efficient algorith which achieves O ) regret bound and O 3/4 ) bound on the violation of constraints. hen, we odify the algorith in order to guarantee that the constraints are satisfied in the long run. his gain is achieved at the price of getting O 3/4 ) regret bound. Our second algorith is based on the irror prox ethod Neirovski, 2005) to solve variational inequalities which achieves O 2/3 ) bound for both regret and the violation of constraints when the doain K can be described by a finite nuber of linear constraints. Finally, we extend the results to the setting where we only have partial access to the convex set K and propose a ultipoint bandit feedback algorith with the sae bounds in expectation as our first algorith. Keywords: online convex optiization, convex-concave optiization, bandit feedback, variational inequality 1. Introduction Online convex optiization has recently eerged as a priitive fraework for designing efficient algoriths for a wide variety of achine learning applications Cesa-Bianchi and Lugosi, 2006). In general, an online convex optiization proble can be forulated as a repeated gae between a learner and an adversary: at each iteration t, the learner first presents a solution x t K, where K R d is a convex doain representing the solution space; it then receives a convex function f t x) : K R + and suffers the loss f t x t ) for the subitted solution x t. he objective of the learner is to generate a sequence of solutions x t K,t = 1,2,, that iniizes the regretr defined as c 2012 Mehrdad Mahdavi, Rong Jin and ianbao Yang.

2 MAHDAVI, JIN AND YANG R = f t x t ) in x K f t x). 1) Regret easures the difference between the cuulative loss of the learner s strategy and the iniu possible loss had the sequence of loss functions been known in advance and the learner could choose the best fixed action in hindsight. WhenR is sub-linear in the nuber of rounds, that is, o), we call the solution Hannan consistent Cesa-Bianchi and Lugosi, 2006), iplying that the learner s average per-round loss approaches the average per-round loss of the best fixed action in hindsight. It is noticeable that the perforance bound ust hold for any sequence of loss functions, and in particular if the sequence is chosen adversarially. Many successful algoriths have been developed over the past decade to iniize the regret in the online convex optiization. he proble was initiated in the rearkable work of Zinkevich 2003) which presents an algorith based on gradient descent with projection that guarantees a regret of O ) when the set K is convex and the loss functions are Lipschitz continuous within the doain K. In Hazan et al. 2007) and Shalev-Shwartz and Kakade 2008) algoriths with logarithic regret bound were proposed for strongly convex loss functions. In particular, the algorith in Hazan et al. 2007) is based on online Newton step and covers the general class of exp-concave loss functions. Notably, the siple gradient based algorith also achieves an Olog) regret bound for strongly convex loss functions with an appropriately chosen step size. Bartlett et al. 2007) generalizes the results in previous works to the setting where the algorith can adapt to the curvature of the loss functions without any prior inforation. A odern view of these algoriths casts the proble as the task of following the regularized leader Rakhlin, 2009). In Abernethy et al. 2009), using gae-theoretic analysis, it has been shown that both O ) for Lipschitz continuous and Olog) for strongly convex loss functions are tight in the iniax sense. Exaining the existing algoriths, ost of the techniques usually require a projection step at each iteration in order to get back to the feasible region. For the perforance of these online algoriths, the coputational cost of the projection step is of crucial iportance. o otivate the setting addressed in this paper, let us first exaine a popular online learning algorith for iniizing the regretr based on the online gradient descent OGD) ethod Zinkevich, 2003). At each iteration t, after receiving the convex function f t x), the learner coputes the gradient f t x t ) and updates the solution x t by solving the following optiization proble x t+1 = Π K x t η f t x t ))=argin x x t + η f t x t ) 2, 2) x K where Π K ) denotes the projection onto K and η>0 is a predefined step size. Despite the siplicity of the OGD algorith, the coputational cost per iteration is crucial for its applicability. For general convex doains, solving the optiization proble in 2) is an offline convex optiization proble by itself and can be coputationally expensive. For exaple, when one envisions a positive seidefinitive cone in applications such as distance etric learning and atrix copletion, the full eigen-decoposition of a atrix is required to project the updated solutions back into the cone. Recently several efficient algoriths have been developed for projection onto specific doains, for exaple, l 1 ball Duchi et al., 2008; Liu and Ye, 2009); however, when the doain K is coplex, the projection step is a ore involved task or coputationally burdensoe. 2504

3 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS o tackle the coputational challenge arising fro the projection step, we consider an alternative online learning proble. Instead of requiring x t K, we only require the constraints, which define the convex doain K, to be satisfied in a long run. hen, the online learning proble becoes a task to find a sequence of solutions x t,t [] that iniizes the regret defined in 1), under the long ter constraints, that is, x t/ K. We refer to this proble as online learning with long ter constraints. In other words, instead of solving the projection proble in 2) on each round, we allow the learner to ake decisions at soe iterations which do not belong to the set K, but the overall sequence of chosen decisions ust obey the constraints at the end by a vanishing convergence rate. Fro a different perspective, the proposed online optiization with long ter constraints setup is reiniscent of regret iniization with side constraints or constrained regret iniization addressed in Mannor and sitsiklis 2006), otivated by applications in wireless counication. In regret iniization with side constraints, beyond iniizing regret, the learner has soe side constraints that need to be satisfied on average for all rounds. Unlike our setting, in learning with side constraints, the set K is controlled by adversary and can vary arbitrarily fro trial to trial. It has been shown that if the convex set is affected by both decisions and loss functions, the iniax optial regret is generally unattainable online Mannor et al., 2009). One interesting application of the constrained regret iniization is ulti-objective online classification where the learner ais at siultaneously optiizing ore than one classification perforance criteria. In the siple two objective online classification considered in Bernstein et al. 2010), the goal of the online classifier is to axiize the average true positive classification rate with an additional perforance guarantee in ters of the false positive rate. Following the Neyan-Pearson risk, the intuitive approach to tackle this proble is to optiize one criterion i.e., axiizing the true positive rate) subject to explicit constraint on the other objective i.e., false positive rate) that needs to be satisfied on average for the sequence of decisions. he constrained regret atching CRM) algorith, proposed in Bernstein et al. 2010), efficiently solves this proble by relaxing the objective under ild assuptions on the single-stage constraint. he ain idea of the CRM algorith is to incorporate the penalty, that should be paid by the learner to satisfy the constraint, in the objective i.e., true positive rate) by subtracting a positive constant at each decision step. It has been shown that the CRM algorith asyptotically satisfies the average constraint i.e., false positive rate) provided that the relaxation constant is above a certain threshold. Finally, it is worth entioning that the proposed setting can be used in certain classes of online learning such as online-to-batch conversion Cesa-Bianchi et al., 2004), where it is sufficient to guarantee that constraints are satisfied in the long run. More specifically, under the assuption that received exaples are i.i.d saples, the solution for batch learning is to average the solutions obtained over all the trials. As a result, if the long ter constraint is satisfied, it is guaranteed that the average solution will belong to the doain K. In this paper, we describe and analyze a general fraework for solving online convex optiization with long ter constraints. We first show that a direct application of OGD fails to achieve a sub-linear bound on the violation of constraints and an O ) bound on the regret. hen, by turning the proble into an online convex-concave optiization proble, we propose an efficient algorith which is an adaption of OGD for online learning with long ter constraints. he proposed algorith achieves the sae O ) regret bound as the general setting and O 3/4 ) bound for the violation of constraints. We show that by using a siple trick we can turn the proposed ethod into an algorith which exactly satisfies the constraints in the long run by achieving O 3/4 ) re- 2505

4 MAHDAVI, JIN AND YANG gret bound. When the convex doain K can be described by a finite nuber of linear constraints, we propose an alternative algorith based on the irror prox ethod Neirovski, 2005), which achieves O 2/3 ) bound for both regret and the violation of constraints. Our fraework also handles the cases when we do not have full access to the doain K except through a liited nuber of oracle evaluations. In the full-inforation version, the decision aker can observe the entire convex doain K, whereas in a partial-inforation a.k.a bandit setting) the decision aker ay only observe the cost of the constraints defining the doain K at liited points. We show that we can generalize the proposed OGD based algorith to this setting by only accessing the value oracle for doain K at two points, which achieves the sae bounds in expectation as the case that has a full knowledge about the doain K. In suary, the present work akes the following contributions: A general theore that shows, in online setting, a siple penalty based ethod attains linear bound O) for either the regret or the long ter violation of the constraints and fails to achieve sub-linear bound for both regret and the long ter violation of the constraints at the sae tie. A convex-concave forulation of online convex optiization with long ter constraints, and an efficient algorith based on OGD that attains a regret bound of O 1/2 ), and O 3/4 ) violation of the constraints. A odified OGD based algorith for online convex optiization with long ter constraints that has no constraint violation but O 3/4 ) regret bound. An algorith for online convex optiization with long ter constraints based on the irror prox ethod that achieves O 2/3 ) regret and constraint violation. A ultipoint bandit version of the basic algorith with O 1/2 ) regret bound and O 3/4 ) violation of the constraints in expectation by accessing the value oracle for the convex set K at two points. he reainder of the paper is structured as follows: In Section 3, we first exaine a siple penalty based strategy and show that it fails to attain sub-linear bound for both regret and long ter violation of the constraints. hen, we forulate regret iniization as an online convex-concave optiization proble and apply the OGD algorith to solve it. Our first algorith allows the constraints to be violated in a controlled way. It is then odified to have the constraints exactly satisfied in the long run. Section 4 presents our second algorith which is an adaptation of the irror prox ethod. Section 5 generalizes the online convex optiization with long ter constraints proble to the setting where we only have a partial access to the convex doain K. Section 6 concludes the work with a list of open questions. 2. Notation and Setting Before proceeding, we define the notations used throughout the paper and state the assuptions ade for the analysis of algoriths. Vectors are shown by lower case bold letters, such as x R d. Matrices are indicated by upper case letters such as A and their pseudoinverse is represented by A. We use [] as a shorthand for the set of integers {1,2,...,}. hroughout the paper we denote by and 1 the l 2 Euclidean) nor and l 1 -nor, respectively. We useeande t to 2506

5 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS denote the expectation and conditional expectation with respect to all randoness in early t 1 trials, respectively. o facilitate our analysis, we assue that the doain K can be written as an intersection of a finite nuber of convex constraints, that is, K ={x R d : g i x) 0,i []}, where g i ),i [], are Lipschitz continuous functions. Like any other works for online convex optiization such as Flaxan et al. 2005), we assue that K is a bounded doain, that is, there exist constants R > 0 and r < 1 such that K RB and rb K wherebdenotes the unit l 2 ball centered at the origin. For the ease of notation, we use B = RB. We focus on the proble of online convex optiization, in which the goal is to achieve a low regret with respect to a fixed decision on a sequence of loss functions. he difference between the setting considered here and the general online convex optiization is that, in our setting, instead of requiring x t K, or equivalently g i x t ) 0,i [], for all t [], we only require the constraints to be satisfied in the long run, naely g ix t ) 0,i []. hen, the proble becoes to find a sequence of solutions x t,t [] that iniizes the regret defined in 1), under the long ter constraints g ix t ) 0,i []. Forally, we would like to solve the following optiization proble online, in x 1,...,x B f t x t ) in x K f t x) s.t. g i x t ) 0, i []. 3) For siplicity, we will focus on a finite-horizon setting where the nuber of rounds is known in advance. his condition can be relaxed under certain conditions, using standard techniques see, e.g., Cesa-Bianchi and Lugosi, 2006). Note that in 3), i) the solutions coe fro the ball B K instead of K and ii) the constraint functions are fixed and are given in advance. Like ost online learning algoriths, we assue that both loss functions and the constraint functions are Lipschitz continuous, that is, there exists constants L f and L g such that f t x) f t x ) L f x x, g i x) g i x ) L g x x for any x B and x B,i []. For siplicity of analysis, we use G=ax{L f,l g } and F D = ax ax f tx) f t x ) 2L f R, x,x K t [] = ax i [] ax x B g ix) L g R. Finally, we define the notion of a Bregan divergence. Let φ ) be a strictly convex function defined on a convex set K. he Bregan divergence between x and x is defined as B φ x,x ) = φx) φx ) x x ) φx ) which easures how uch the function φ ) deviates at x fro it s linear approxiation at x. 3. Online Convex Optiization with Long er Constraints In this section we present and analyze our gradient descent based algoriths for online convex optiization proble with long ter constraints. We first describe an algorith which is allowed to violate the constraints and then, by applying a siple trick, we propose a variant of the first algorith which exactly satisfies the constraints in the long run. 2507

6 MAHDAVI, JIN AND YANG Before we state our forulation and algoriths, let us review a few alternative techniques that do not need explicit projection. A straightforward approach is to introduce an appropriate selfconcordant barrier function for the given convex set K and add it to the objective function such that the barrier diverges at the boundary of the set. hen we can interpret the resulting optiization proble, on the odified objective functions, as an unconstrained iniization proble that can be solved without projection steps. Following the analysis in Abernethy et al. 2012), with an appropriately designed procedure for updating solutions, we could guarantee a regret bound of O ) without the violation of constraints. A siilar idea is used in Abernethy et al. 2008) for online bandit learning and in Narayanan and Rakhlin 2010) for a rando walk approach for regret iniization which, in fact, translates the issue of projection into the difficulty of sapling. Even for linear Lipschitz cost functions, the rando walk approach requires sapling fro a Gaussian distribution with covariance given by the Hessian of the self-concordant barrier of the convex set K that has the sae tie coplexity as inverting a atrix. he ain liitation with these approaches is that they require coputing the Hessian atrix of the objective function in order to guarantee that the updated solution stays within the given doain K. his liitation akes it coputationally unattractive when dealing with high diensional data. In addition, except for well known cases, it is often unclear how to efficiently construct a self-concordant barrier function for a general convex doain. An alternative approach for online convex optiization with long ter constraints is to introduce a penalty ter in the loss function that penalizes the violation of constraints. More specifically, we can define a new loss function ˆf t ) as ˆf t x)= f t x)+δ [g i x)] +, 4) where [z] + = ax0,1 z) and δ>0 is a fixed positive constant used to penalize the violation of constraints. We then run the standard OGD algorith to iniize the odified loss function ˆf t ). he following theore shows that this siple strategy fails to achieve sub-linear bound for both regret and the long ter violation of constraints at the sae tie. heore 1 Given δ > 0, there always exists a sequence of loss functions { f t x)} and a constraint function gx) such that either f tx t ) in gx) 0 f tx)=o) or [gx t)] + = O) holds, where{x t } is the sequence of solutions generated by the OGD algorith that iniizes the odified loss functions given in 4). We defer the proof to Appendix A along with a siple analysis of the OGD when applied to the odified functions in 4). he analysis shows that in order to obtain O ) regret bound, linear bound on the long ter violation of the constraints is unavoidable. he ain reason for the failure of using odified loss function in 4) is that the weight constant δ is fixed and independent fro the sequence of solutions obtained so far. In the next subsection, we present an online convex-concave forulation for online convex optiization with long ter constraints, which explicitly addresses the liitation of 4) by autoatically adjusting the weight constant based on the violation of the solutions obtained so far. As entioned before, our general strategy is to turn online convex optiization with long ter constraints into a convex-concave optiization proble. Instead of generating a sequence of solutions that satisfies the long ter constraints, we first consider an online optiization strategy that 2508

7 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS allows the violation of constraints on soe rounds in a controlled way. We then odify the online optiization strategy to obtain a sequence of solutions that obeys the long ter constraints. Although the online convex optiization with long ter constraints is clearly easier than the standard online convex optiization proble, it is straightforward to see that optial regret bound for online optiization with long ter constraints should be on the order of O ), no better than the standard online convex optiization proble. 3.1 An Efficient Algorith with O ) Regret Bound and O 3/4 ) Bound on the Violation of Constraints he intuition behind our approach stes fro the observation that the constrained optiization proble in x K f tx) is equivalent to the following convex-concave optiization proble in ax x B λ R + f t x)+ λ i g i x), 5) where λ = λ 1,...,λ ) is the vector of Lagrangian ultipliers associated with the constraints g i ),,..., and belongs to the nonnegative orthantr +. o solve the online convex-concave optiization proble, we extend the gradient based approach for variational inequality Neirovski, 1994) to 5). o this end, we consider the following regularized convex-concave function as L t x,λ)= f t x)+ { λ i g i x) δη 2 λ2 i }, 6) where δ>0 is a constant whose value will be decided by the analysis. Note that in 6), we introduce a regularizer δηλ 2 i /2 to prevent λ i fro being too large. his is because, when λ i is large, we ay encounter a large gradient for x because of x L t x,λ) λ i g i x), leading to unstable solutions and a poor regret bound. Although we can achieve the sae goal by restricting λ i to a bounded doain, using the quadratic regularizer akes it convenient for our analysis. Algorith 1 shows the detailed steps of the proposed algorith. Unlike standard online convex optiization algoriths that only update x, Algorith 1 updates both x and λ. In addition, unlike the odified loss function in 4) where the weights for constraints{g i x) 0} are fixed, Algorith 1 autoatically adjusts the weights {λ i } based on {g ix)}, the violation of constraints, as the gae proceeds. It is this property that allows Algorith 1 to achieve sub-linear bound for both regret and the violation of constraints. o analyze Algorith 1, we first state the following lea, the key to the ain theore on the regret bound and the violation of constraints. Lea 2 Let L t, ) be the function defined in 6) which is convex in its first arguent and concave in its second arguent. hen for any x,λ) B R + we have L t x t,λ) L t x,λ t ) 1 x x t 2 + λ λ t 2 x x t+1 2 λ λ t+1 2 ) + η 2 xl t x t,λ t ) 2 + λ L t x t,λ t ) 2 ). Proof Following the analysis of Zinkevich 2003), convexity of L t,λ) iplies that L t x t,λ t ) L t x,λ t ) x t x) x L t x t,λ t ) 7) 2509

8 MAHDAVI, JIN AND YANG Algorith 1 Gradient based Online Convex Optiization with Long er Constraints 1: Input: constraints g i x) 0,i [], step size η, and constant δ>0 2: Initialization: x 1 = 0 and λ 1 = 0 3: for t = 1,2,..., do 4: Subit solution x t 5: Receive the convex function f t x) and experience loss f t x t ) 6: Copute x L t x t,λ t )= f t x t )+ λi t g i x t ) and λi L t x t,λ t )=g i x t ) ηδλt i 7: Update x t and λ t by 8: end for x t+1 = Π B x t η x L t x t,λ t )) λ t+1 = Π [0,+ ) λ t + η λ L t x t,λ t )) and by concavity of L t x, ) we have Cobining the inequalities 7) and 8) results in L t x t,λ) L t x t,λ t ) λ λ t ) λ L t x t,λ t ). 8) L t x t,λ) L t x,λ t ) x t x) x L t x t,λ t ) λ λ t ) λ L t x t,λ t ). 9) Using the update rule for x t+1 in ters of x t and expanding, we get x x t+1 2 x x t 2 x t x) x L t x t,λ t )+η 2 x L t x t,λ t ) 2, 10) where the first inequality follows fro the nonexpansive property of the projection operation. Expanding the inequality for λ λ t+1 2 in ters of λ t and plugging back into the 9) with 10) establishes the desired inequality. Proposition 3 Let x t and λ t,t [] be the sequence of solutions obtained by Algorith 1. hen for any x B and λ R +, we have L t x t,λ) L t x,λ t ) 11) R2 + λ 2 + η 2 +1)G 2 + 2D 2) + η 2 +1)G 2 + 2δ 2 η 2) λ t 2. Proof We first bound the gradient ters in the right hand side of Lea 2. Using the inequality a 1 + a ,a n ) 2 na a a2 n), we have x L t x t,λ t ) 2 +1)G 2 1+ λ t 2) and λ L t x t,λ t ) 2 2D 2 + δ 2 η 2 λ t 2 ). In Lea 2, by adding the inequalities of all iterations, and using the fact x R we coplete the proof. he following theore bounds the regret and the violation of the constraints in the long run for Algorith

9 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS heore 4 Define a=r +1)G 2 + 2D 2. Set η=r 2 /[a ]. Assue is large enough such that 2 +1) 1. Choose δ such that δ +1)G 2 +2δ 2 η 2. Let x t,t [] be the sequence of solutions obtained by Algorith 1. hen for the optial solution x = in x K f tx) we have f t x t ) f t x ) a = O 1/2 ), and g i x t ) 2 F + a ) δr 2 a + a ) R 2 = O 3/4 ). Proof We begin by expanding 11) using 6) and rearranging the ters to get { } [ f t x t ) f t x)]+ λ i i x t ) λtg g i i x) δη 2 λ 2 δη 2 + η 2 λ t 2 + R2 + λ 2 + η 2 +1)G 2 + 2δ 2 η 2) λ t 2. +1)G 2 + 2D 2) Since δ +1)G 2 +2δ 2 η 2, we can drop the λ t 2 ters fro both sides of the above inequality and obtain { δη [ f t x t ) f t x)]+ λ i i x t ) g 2 + ) } λ 2 i λ i tg i x)+ R2 + η 2 +1)G 2 + 2D 2 ) ). he left hand side of above inequality consists of two ters. he first ter basically easures the difference between the cuulative loss of the Algorith 1 and the optial solution and the second ter includes the constraint functions with corresponding Lagrangian ultipliers which will be used to bound the long ter violation of the constraints. By taking axiization for λ over the range 0,+ ), we get f t x t ) f t x )+ [ f t x t ) f t x)]+ {[ g i x t ) ] 2 } + 2δη + /η) λtg i i x) R2 + η 2 +1)G 2 + 2D 2 ) ). Since x K, we have g i x ) 0,i [], and the resulting inequality becoes [ ] 2 g ix t ) + 2δη + /η) R2 + η 2 +1)G 2 + 2D 2 ) ). he stateent of the first part of the theore follows by using the expression for η. he second part is proved by substituting the regret bound by its lower bound as f tx t ) f t x ) F. 2511

10 MAHDAVI, JIN AND YANG Reark 5 We observe that the introduction of quadratic regularizer δη λ 2 /2 allows us to turn the expression λ i g ix t ) into [ g ix t ) ] 2, leading to the bound for the violation of the constraints. In addition, the quadratic regularizer defined in ters of λ allows us to work with un- + bounded λ because it cancels the contribution of the λ t ters fro the loss function and the bound on the gradients x L t x,λ). Note that the constraint for δ entioned in heore 4 is equivalent to 2 1/+1)+ 1/+1)+ δ +1) 2 8G 2 η2 +1) 2 8G 2 η 2 4η 2, 12) fro which, when is large enough i.e., η is sall enough), we can siply set δ = 2+1)G 2 that will obey the constraint in 12). By investigating Lea 2, it turns out that the boundedness of the gradients is essential to obtain bounds for Algorith 1 in heore 4. Although, at each iteration, λ t is projected onto ther +, since K is a copact set and functions f t x) and g i x),i [] are convex, the boundedness of the functions iplies that the gradients are bounded Bertsekas et al., 2003, Proposition 4.2.3). 3.2 An Efficient Algorith with O 3/4 ) Regret Bound and without Violation of Constraints In this subsection we generalize Algorith 1 such that the constrained are satisfied in a long run. o create a sequence of solutions{x t,t []} that satisfies the long ter constraints g ix t ) 0,i [], we ake two odifications to Algorith 1. First, instead of handling all of the constraints, we consider a single constraint defined as gx) = ax i [] g i x). Apparently, by achieving zero violation for the constraint gx) 0, it is guaranteed that all of the constraints g i ),i [] are also satisfied in the long ter. Furtherore, we change Algorith 1 by odifying the definition of L t, ) as L t x,λ)= f t x)+λgx)+γ) ηδ 2 λ2, 13) where γ > 0 will be decided later. his odification is equivalent to considering the constraint gx) γ, a tighter constraint than gx) 0. he ain idea behind this odification is that by using a tighter constraint in our algorith, the resulting sequence of solutions will satisfy the long ter constraint gx t) 0, even though the tighter constraint is violated in any trials. Before proceeding, we state a fact about the Lipschitz continuity of the function gx) in the following proposition. Proposition 6 Assue that functions g i ),i [] are Lipschitz continuous with constant G. hen, function gx)=ax i [] g i x) is Lipschitz continuous with constant G, that is, gx) gx ) G x x for any x B and x B. 2512

11 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS Proof First, we rewrite gx)=ax i [] g i x) as gx)=ax α α ig i x) where is the - siplex, that is, ={α R +; α i = 1}. hen, we have gx) gx ) = ax α α i g i x) ax α α i g i x ) ax α i g i x) α i g i x ) α ax α α i g i x) g i x ) G x x, where the last inequality follows fro the Lipschitz continuity of g i x),i []. o obtain a zero bound on the violation of constraints in the long run, we ake the following assuption about the constraint function gx). Assuption 1 Let K K be the convex set defined as K ={x R d : gx)+γ 0} where γ 0. We assue that the nor of the gradient of the constraint function gx) is lower bounded at the boundary of K, that is, in gx)+γ=0 gx) σ. A direct consequence of Assuption 1 is that by reducing the doain K to K, the optial value of the constrained optiization proble in x K fx) does not change uch, as revealed by the following theore. heore 7 Let x and x γ be the optial solutions to the constrained optiization probles defined as in gx) 0 fx) and in gx) γ fx), respectively, where fx)= f tx) and γ 0. We have fx ) fx γ ) G σ γ. Proof We note that the optiization proble in gx) γ fx)=in gx) γ f tx), can also be written in the iniax for as fx γ )=in ax x B λ R + f t x)+λgx)+γ), 14) where we use the fact that K K B. We denote by x γ and λ γ the optial solutions to 14). We have fx γ )=in ax x B λ R + = in x B f t x)+λgx)+γ) f t x)+λ γ gx)+γ) f t x )+λ γ gx )+γ) f t x )+λ γ γ, 2513

12 MAHDAVI, JIN AND YANG where the second equality follows the definition of the x γ and the last inequality is due to the optiality of x, that is, gx ) 0. o bound fx γ ) fx ), we need to bound λ γ. Since x γ is the iniizer of 14), fro the optiality condition we have f t x γ )=λ γ gx γ ). 15) By setting v = f tx γ ), we can siplify 15) as λ γ gx γ ) = v. Fro the KK optiality condition Boyd and Vandenberghe, 2004), if gx γ )+γ<0 then we have λ γ = 0; otherwise according to Assuption 1 we can bound λ γ by λ γ v gx γ ) G σ. We coplete the proof by applying the fact fx ) fx γ ) fx )+λ γ γ. As indicated by heore 7, when γ is sall, we expect the difference between two optial values fx ) and fx γ ) to be sall. Using the result fro heore 7, in the following theore, we show that by running Algorith 1 on the odified convex-concave functions defined in 13), we are able to obtain an O 3/4 ) regret bound and zero bound on the violation of constraints in the long run. heore 8 Set a=2r/ 2G 2 + 3D 2 + b 2 ), η=r 2 /[a ], and δ=4g 2. Let x t,t [] be the sequence of solutions obtained by Algorith 1 with functions defined in 13) with γ=b 1/4 and b=2 FδR 2 a 1 + ar 2 ). Let x be the optial solution to in x K f tx). With sufficiently large, that is, F a, and under Assuption 1, we have x t,t [] satisfy the global constraint gx t) 0 and the regretr is bounded by R = f t x t ) f t x ) a + b σ G 3/4 = O 3/4 ). Proof Let x γ be the optial solution to in gx) γ f tx). Siilar to the proof of heore 4 when applied to functions in 13) we have f t x t ) δη 2 f t x)+λ λ 2 t + R2 + λ 2 gx t )+γ) + η 2 λ t ) 2G 2 + 3D 2 + γ 2 ) ) + η 2 gx)+γ) δη 2 λ2 2G 2 + 3δ 2 η 2) λt. 2 By setting δ 2G 2 + 3δ 2 η 2 which is satisfied by δ=4g 2, we cancel the ters including λ t fro the right hand side of above inequality. By axiizing for λ over the range0,+ ) and noting that γ b, for the optial solution x γ, we have [ ft x t ) f t x γ ) ] + [ ] 2 gx t)+γ + R2 2δη + 1/η) + η 2 2G 2 + 3D 2 + b 2 ) ), 2514

13 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS which, by optiizing for η and applying the lower bound for the regret as f tx t ) f t x γ ) F, yields the following inequalities and gx t ) f t x t ) f t x γ ) a 16) 2 F + a ) δr 2 a + a ) R 2 γ, 17) for the regret and the violation of the constraint, respectively. Cobining 16) with the result of heore 7 results in f tx γ ) f tx )+a +G/σ)γ. By choosing γ = b 1/4 we attain the desired regret bound as f t x t ) f t x ) a + bg σ 3/4 = O 3/4 ). o obtain the bound on the violation of constraints, we note that in 17), when is sufficiently large, that is, F a, we have gx t) 2 FδR 2 a 1 + ar 2 ) 3/4 b 3/4. Choosing b = 2 FδR 2 a 1 + ar 2 ) 3/4 guarantees the zero bound on the violation of constraints as claied. 4. A Mirror Prox Based Approach he bound for the violation of constraints for Algorith 1 is unsatisfactory since it is significantly worse than O ). In this section, we pursue a different approach that is based on the irror prox ethod in Neirovski 2005) to iprove the bound for the violation of constraints. he basic idea is that solving 5) can be reduced to the proble of approxiating a saddle pointx,λ) B [0, ) by solving the associated variational inequality. We first define an auxiliary function Fx,λ) as Fx,λ)= { λ i g i x) δη 2 λ2 i In order to successfully apply the irror prox ethod, we follow the fact that any convex doain can be written as an intersection of linear constraints, and ake the following assuption: Assuption 2 We assue that g i x),i [] are linear, that is, K ={x R d : g i x)=x a i b i 0,i []} where a i R d is a noralized vector with a i =1 and b i R. }. he following proposition shows that under Assuptions 2, the function Fx, λ) has Lipschitz continuous gradient, a basis for the application of the irror prox ethod. Proposition 9 Under Assuption 2, Fx,λ) has Lipschitz continuous gradient, that is, x Fx,λ) x Fx,λ ) 2 + λ Fx,λ) λ Fx,λ ) 2 2+δ 2 η 2 ) x x 2 + λ λ 2 ). 2515

14 MAHDAVI, JIN AND YANG Algorith 2 Prox Method with Long er Constraints 1: Input: constraints g i x) 0,i [], step size η, and constant δ 2: Initialization: z 1 = 0 and µ 1 = 0 3: for t = 1,2,..., do 4: Copute the solution for x t and λ t as x t = Π B z t η x Fz t,µ t )) λ t = Π [0,+ ) µ t + η λ Fz t,µ t )) 5: Subit solution x t 6: Receive the convex function f t x) and experience loss { f t x t ) } 7: Copute L t x,λ)= f t x)+fx,λ)= f t x)+ λ i g i x) δη 2 λ2 i 8: Update z t and µ t by 9: end for z t+1 = Π B z t η x L t x t,λ t )) µ t+1 = Π [0,+ ) µ t + η λ L t x t,λ t )) Proof Since = x Fx,λ) x Fx,λ ) 2 + λ Fx,λ) λ Fx,λ ) 2 λ i λ i)a i 2+ a i x x 2 )+δηλ i λ i ) A λ λ ) Ax x ) 2 + 2δ 2 η 2 λ λ 2 2σ 2 axa) x x 2 +σ 2 axa)+2δ 2 η 2 ) λ λ 2. σ ax A)= we have σ 2 axa), leading to the desired result. λ ax AA ) raa ), Algorith 2 shows the detailed steps of the irror prox based algorith for online convex optiization with long ter constraints defined in 5). Copared to Algorith 1, there are two key features of Algorith 2. First, it introduces auxiliary variables z t and µ t besides the variables x t and λ t. At each iteration t, it first coputes the solutions x t and λ t based on the auxiliary variables z t and µ t ; it then updates the auxiliary variables based on the gradients coputed fro x t and λ t. Second, two different functions are used for updatingx t,λ t ) andz t,µ t ): function Fx,λ) is used for coputing the solutions x t and λ t, while function L t x,λ) is used for updating the auxiliary variables z t and µ t. Our analysis is based on the Lea 3.1 fro Neirovski 2005) which is restated here for copleteness. Lea 10 Let Bx,x ) be a Bregan distance function that has odulus α with respect to a nor, that is, Bx,x ) α x x 2 /2. Given u B, a, and b, we set w=argin x B a x u)+bx,u), u + = arginb x u)+bx,u). x B 2516

15 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS hen for any x B and η>0, we have ηb w x) Bx,u) Bx,u + )+ η2 2α a b 2 α [ w u 2 + w u + 2]. 2 We equip B [0,+ ) with the nor defined as z,µ) 2 = z 2 + µ 2, 2 where 2 is the Euclidean nor defined separately for each doain. It is iediately seen that the Bregan distance function defined as Bz t,µ t,z t+1,µ t+1 )= 1 2 z t z t µ t µ t+1 2 is α=1 odules with respect to the nor. o analyze the irror prox algorith, we begin with a siple lea which is the direct application of Lea 10 when applied to the updating rules of Algorith 3. Lea 11 If η+δ 2 η 2 ) 1 4 holds, we have L t x t,λ) L t x,λ t ) x z t 2 x z t λ µ t 2 λ µ t+1 2 Proof o apply Lea 10, we define u, w, u +, a and b as follows Using Leas 2 and 10, we have u=z t,µ t ),u + =z t+1,µ t+1 ),w=x t,λ t ), + η f t x t ) 2. a= x Fz t,µ t ), λ Fz t,µ t )),b= x L t x t,λ t ), λ L t x t,λ t )). L t x t,λ) L t x,λ t ) x z t 2 x z t+1 2 λ µ t 2 λ µ t+1 2 η { x Fz t,µ 2 t ) x L t x t,λ t ) 2 + λ Fz t,µ t ) λ L t x t,λ t ) 2} } {{ } I 1 { zt x t 2 + µ 2 t λ t 2}. } {{ } II By expanding the gradient ters and applying the inequalitya+b) 2 2a 2 +b 2 ), we upper bound I) as: I)= η 2 {2 f tx t ) x Fz t,µ t ) x Fx t,λ t ) 2 + λ Fz t,µ t ) λ Fx t,λ t ) 2 } η f t x t ) 2 + η { x Fz t,µ t ) x Fx t,λ t ) 2 + λ Fx t,λ t ) λ Fx t,λ t ) 2} η f t x t ) 2 + +δ 2 η 2 ) { z t x t 2 + µ t λ t 2}, 18) 2517

16 MAHDAVI, JIN AND YANG where the last inequality follows fro Proposition 9. Cobining II) with 18) results in L t x t,λ) L t x,λ t ) x z t 2 x z t+1 2 λ µ t 2 λ µ t+1 2 η f t x t ) δ 2 η 2 ) 1 ) { zt x t 2 + µ 2 t λ t 2 2}. We coplete the proof by rearranging the ters and setting η+δ 2 η 2 ) 1 4. heore 12 Set η= 1/3 and δ= 2/3. Let x t,t [] be the sequence of solutions obtained by Algorith 2. hen for 164+1) 3 we have f t x t ) f t x ) O 2/3 ) and g i x t ) O 2/3 ). Proof Siilar to the proof of heore 4, by suing the bound in Lea 11 for all rounds t = 1,,, and taking axiization for λ we have the following inequality for any x K, [ f t x t ) f t x )]+ [ g i x t ) ] 2 + 2δη + /η) R2 + η 2 G2. By setting δ= 1 η and using the fact that f tx t ) f t x ) F we have: and [ f t x t ) f t x)] R2 + η 2 G2 g i x t ) 1+ ) R η ) 2 η + η G2 + F. Substituting the stated value for η, we get the desired bounds as entioned in the theore. Note that the condition η+δ 2 η 2 ) 1 4 in Lea 11 is satisfied for the stated values of η and δ as long as 164+1) 3. Using the sae trick as heore 8, by introducing appropriate γ, we will be able to establish the solutions that exactly satisfy the constraints in the long run with an O 2/3 ) regret bound as shown in the following corollary. In the case when all the constraints are linear, that is, g i x) = a i x b i,i [], Assuption 1 is siplified into the following condition, in α i a i σ, 19) α where is a diensional siplex, that is, = {α R + : α i = 1}. his is because gx) = ax α α ig i x) and as a result, the sub)gradient of gx) can always be written as gx) = α i g i x) = α ia i where α. As an illustrative exaple, consider the case when the nor vectors a i,i [] are linearly independent. In this case the condition entioned in 19) obviously holds which indicates that the assuption does not liit the applicability of the proposed algorith. 2518

17 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS Corollary 13 Let η = δ = 1/3. Let x t,t [] be the sequence of solutions obtained by Algorith 2 with γ = b 1/3 and b = 2 F. With sufficiently large, that is, F R 2 1/3 + G 2 2/3, under Assuptions 2 and condition in 19), we have x t,t [] satisfy the global constraints g ix t ) 0,i [] and the regretr is bounded by R = f t x t ) f t x ) R2 2 1/3 G G ) F 2/3 = O 2/3 ). σ he proof is siilar to that of heore 8 and we defer it to Appendix B. As indicated by Corollary 13, for any convex doain defined by a finite nuber of halfspaces, that is, Polyhedral set, one can easily replace the projection onto the Polyhedral set with the ball containing the Polyhedral at the price of satisfying the constraints in the long run and achieving O 2/3 ) regret bound. 5. Online Convex Optiization with Long er Constraints under Bandit Feedback for Doain We now turn to extending the gradient based convex-concave optiization algorith discussed in Section 3 to the setting where the learner only receives partial feedback for constraints. More specifically, the exact definition of the doain K is not exposed to the learner, only that the solution is within a ball B. Instead, after receiving a solution x t, the oracle will present the learner with the convex loss function f t x) and the axiu violation of the constraints for x t, that is, gx t )= ax i [] g i x t ). We reind that the function gx) defined in this way is Lipschitz continuous with constant G as proved in Proposition 6. In this setting, the convex-concave function defined in 6) becoes as L t x,λ)= f t x)+λgx) δη/2)λ 2. he entioned setting is closely tied to the bandit online convex optiization. In the bandit setting, in contrast to the full inforation setting, only the cost of the chosen decision i.e., the incurred loss f t x t )) is revealed to the algorith, not the function itself. here is a rich body of literature that deals with the bandit online convex optiization. In the seinal papers of Flaxan et al. 2005) and Awerbuch and Kleinberg 2004) it has been shown that one could design algoriths with O 3/4 ) regret bound even in the bandit setting where only evaluations of the loss functions are revealed at a single point. If we specialize to the online bandit optiization of linear loss functions, Dani et al. 2007) proposed an inefficient algorith with O ) regret bound and Abernethy et al. 2008) obtained O log) bound by an efficient algorith if the convex set adits an efficiently coputable self-concordant barrier. For general convex loss functions, Agarwal et al. 2010) proposed optial algoriths in a new bandit setting, in which ultiple points can be queried for the cost values. By using ultiple evaluations, they showed that the odified online gradient descent algorith can achieve O ) regret bound in expectation. Algorith 3 gives a coplete description of the proposed algorith under the bandit setting, which is a slight odification of Algorith 1. Algorith 3 accesses the constraint function gx) at two points. o facilitate the analysis, we define L t x,λ)= f t x)+λĝx) ηδ 2 λ2, 2519

18 MAHDAVI, JIN AND YANG Algorith 3 Multipoint Bandit Online Convex Optiization with Long er Constraints 1: Input: constraint gx), step size η, constant δ>0, exploration paraeter ζ>0, and shrinkage coefficient ξ 2: Initialization: x 1 = 0 and λ 1 = 0 3: for t = 1,2,..., do 4: Subit solution x t 5: Select unit vector u t uniforly at rando 6: Query gx) at points x t +ζu t and [ x t ζu t and incur average of] the as violation of constraints 7: Copute g x,t = f t x t )+λ d t 2ζ gx t+ ζu t ) gx t ζu t ))u t 8: Copute g λ,t = 1 2 gx t+ ζu t )+gx t ζu t )) ηδλ t 9: Receive the convex function f t x) and experience loss f t x t ) 10: Update x t and λ t by 11: end for x t+1 = Π 1 ξ)b x t η g x,t ) λ t+1 = Π [0,+ ) λ t + η g λ,t ) where ĝx) is the soothed version of gx) defined as ĝx)=e v S [ d ζ gx+ζv)v] at point x t wheres denotes the unit sphere centered at the origin. Note that ĝx) is Lipschitz continuous with the sae constant G, and it is always differentiable even though gx) is not in our case. Since we do not have access to the function ĝ ) to copute x Lx,λ), we need a way to estiate its gradient at point x t. Our gradient estiation closely follows the idea in Agarwal et al. 2010) by querying gx) function at two points. he ain advantage of using two points to estiate the gradient with respect to one point gradient estiation used in Flaxan et al. 2005) is that the forer has a bounded nor which is independent of ζ and leads to iproved regret bounds. he gradient estiators for x Lx t,λ t )= fx t )+λ t ĝx t ) and λ Lx t,λ t )=ĝx t ) δηλ t in Algorith 3 are coputed by evaluating the gx) function at two rando points around x t as [ ] d g x,t = f t x t )+λ t 2ζ gx t+ ζu t ) gx t ζu t ))u t and g λ,t = 1 2 gx t+ ζu t )+gx t ζu t )) ηδλ t, where u t is chosen uniforly at rando fro the surface of the unit sphere. Using Stock s theore, 1 Flaxan et al. 2005) showed that 2ζ gx t+ ζu t ) gx t ζu t ))u t is a conditionally unbiased estiate of the gradient of ĝx) at point x t. o ake sure that randoized points around x t live inside the convex doain B, we need to stay away fro the boundary of the set such that the ball of radius ζ around x t is contained in B. In particular, in Flaxan et al. 2005) it has been shown that for any x 1 ξ)b and any unit vector u it holds that x+ζu) B as soon as ζ [0,ξr]. In order to facilitate the analysis of the Algorith 3, we define the convex-concave function H t, ) as H t x,λ)= L ) ) t x,λ)+ g x,t x Lx t,λ t ) x+ g λ,t λ Lx t,λ t ) λ. 20) 2520

19 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS It is easy to check that x Hx t,λ t )= g x,t and λ Hx t,λ t )= g λ,t. By defining functions H t x,λ), Algorith 3 reduces to Algorith 1 by doing gradient descent on functions H t x,λ) except the projection is ade onto the set1 ξ)b instead of B. We begin our analysis by reproducing Proposition 3 for functions H t, ). Lea 14 If the Algorith 1 is perfored over convex set K with functions H t defined in 20), then for any x K we have H t x t,λ) H t x,λ t ) R2 + λ ηd 2 + G 2 ) + ηd 2 G 2 + η 2 δ 2 ) Proof We have x H t x t,λ t ) = g x,t and λ H t x t,λ t ) = g λ,t. It is straightforward to show that 1 2ζ gx t + ζu t ) gx t ζu t ))u t has nor bounded by Gd Agarwal et al., 2010). So, the nor of gradients are bounded as g x,t 2 2 2G2 + d 2 G 2 λ 2 t) and g λ,t 2 2 2D2 + η 2 δ 2 λ 2 t). Using Lea 2, by adding for all rounds we get the desired inequality. he following theore gives the regret bound and the expected violation of the constraints in the long run for Algorith 3. heore 15 Let c= D 2 + G 2 2R+ 2D δr )+ D GD r + 1) r. Set η=r/ 2D 2 + G 2 ). Choose δ such that δ 2d 2 G 2 + η 2 δ 2 ). Let ζ= δ and ξ= ζ r. Let x t,t [] be the sequence of solutions obtained by Algorith 3. We then have f t x t ) f t x) GD + c = O 1/2 ), and r [ ] δr E gx t ) Gδ D 2 + G 2 ) ) R GD + c + F) = O 3/4 ). D 2 + G 2 r Proof Using Lea 2 for the functions L t, ) and H t, ) we have and also L t x t,λ) L t x,λ t ) x t x) x L t x t,λ t ) λ λ t ) λ L t x t,λ t ), H t x t,λ) H t x,λ t ) x t x) g x,t λ λ t ) g λ,t. Subtracting the preceding inequalities, taking expectation, and suing for all t fro 1 to we get ] E L t x t,λ) L t x,λ t ) [ =E [ +E H t x t,λ) H t x,λ t ) [ ] x t x) x L t x t,λ t ) E t [ g xt,t])+λ t λ) λ L t x t,λ t ) E t [ g λt,t]) λ 2 t. ]. 21) 2521

20 MAHDAVI, JIN AND YANG Next we provide an upper bound on the difference between the gradients of two functions. First, E t [ g x,t ]= x L t x t,λ t ), so g x,t is an unbiased estiator of x L t x t,λ t ). Considering the update rule for λ t+1 we have λ t+1 1 η 2 δ) λ t +ηd which iplies that λ t D δη for all t. So we obtain λ t λ) λ L t x t,λ t ) E t [ g λt,t]) [ ] λ t λ E t λ L t x t,λ t ) g λt,t 2 D 1 δη 2 gx t+ ζu t )+gx t ζu t )) ĝx t ) DG δη ζ u t DG ζ, 22) δη where the last inequality follows fro Lipschitz property of the functions gx) and ĝx) with the sae constant G. Cobining the inequalities 21) and 22) and using Lea 14, we have [ E L t x t,λ) L t x,λ t ) ] R2 + λ 2 + ηd 2 + G 2 ) + ηd 2 G 2 + η 2 δ 2 ) By expanding the right hand side of above inequality, we obtain [ ] [ ] [ f t x t ) f t 1 ξ)x)]+λe ĝx t ) E ĝ1 ξ)x) λ t ηδ R2 + λ 2 + ηd 2 + G 2 ) + ηd 2 G 2 + η 2 δ 2 ) λ 2 t + DGζ δη. By choosing δ 2d 2 G 2 + η 2 δ 2 ) we cancel λ 2 t ters fro both sides and have 2 λ2 + ηδ 2 [ ] [ ] [ f t x t ) f t 1 ξ)x)]+λe ĝx t ) E ĝ1 ξ)x) λ t ηδ 2 λ2 R2 + λ 2 λ 2 t + DGζ δη. λt 2 + ηd 2 + G 2 ) + DGζ. 23) δη By convexity and Lipschitz property of f t x) and gx) we have f t 1 ξ)x) 1 ξ) f t x)+ξf t 0) f t x)+dgξ, 24) gx) ĝx)+gζ, and ĝ1 ξ)x) g1 ξ)x)+gζ gx)+gζ+dgξ. 25) Plugging 24) and 25) back into 23), for any optial solution x K we get [ ] [ f t x t ) f t x)]+λe gx t ) ηδ 2 λ2 λgζ R2 + λ 2 + ηd 2 + G 2 ) + DGζ δη + DGξ +DGξ+Gζ) λ t. 26) Considering the fact that λ t D δη we have λ t D δη. Plugging back into the 26) and rearranging the ters we have [ ] [ f t x t ) f t x)]+λe gx t ) ηδ 2 λ2 λgζ λ2 + ηd2 + G 2 ) + DGζ + DGξ +DGξ+Gζ)D δη δη. R2 2522

21 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS By setting ξ= ζ r and ζ= 1 we get [ f t x t ) f t x)] R2 + ηd2 + G 2 ) + DGζ δη + ζdg + D r r + 1)ζDG δη, which gives the entioned regret bound by optiizing for η. Maxiizing for λ over the range 0,+ ) and using f tx t ) f t x ) F, yields the following inequality for the violation of constraints [ E [ gx t) ] Gζ 4δη/2+1/) ] 2 + DG r + c + F. Plugging in the stated values of paraeters copletes the proof. Note that δ = 4d 2 G 2 obeys the condition specified in the theore. 6. Conclusion In this study we have addressed the proble of online convex optiization with constraints, where we only need the constraints to be satisfied in the long run. In addition to the regret bound which is the ain tool in analyzing the perforance of general online convex optiization algoriths, we defined the bound on the violation of constraints in the long ter which easures the cuulative violation of the solutions fro the constraints for all rounds. Our setting is applied to solving online convex optiization without projecting the solutions onto the coplex convex doain at each iteration, which ay be coputationally inefficient for coplex doains. Our strategy is to turn the proble into an online convex-concave optiization proble and apply online gradient descent algorith to solve it. We have proposed efficient algoriths in three different settings; the violation of constraints is allowed, the constraints need to be exactly satisfied, and finally we do not have access to the target convex doain except it is bounded by a ball. Moreover, for doains deterined by linear constraints, we used the irror prox ethod, a siple gradient based algorith for variational inequalities, and obtained an O 2/3 ) bound for both regret and the violation of the constraints. Our work leaves open a nuber of interesting directions for future work. In particular it would be interesting to see if it is possible to iprove the bounds obtained in this paper, i.e., getting an O ) bound on the regret and better bound than O 3/4 ) on the violation of constraints for general convex doains. Proving optial lower bounds for the proposed setting also reains as an open question. Also, it would be interesting to consider strongly convex loss or constraint functions. Finally, relaxing the assuption we ade to exactly satisfy the constraints in the long run is an interesting proble to be investigated. Acknowledgents he authors would like to thank the Action Editor and three anonyous reviewers for their constructive coents and helpful suggestions on the original version of this paper. his work was sup- 2523

22 MAHDAVI, JIN AND YANG ported in part by National Science Foundation IIS ) and Office of Navy Research Award N ). Appendix A. Proof of heore 1 We first show that when δ < 1, there exists a loss function and a constraint function such that the violation of constraint is linear in. o see this, we set f t x)=w x,t [] and gx)=1 w x. Assue we start with an infeasible solution, that is, gx 1 ) > 0 or x 1 w<1. Given the solution x t obtained at tth trial, using the standard gradient descent approach, we have x t+1 = x t η1 δ)w. Hence, if x t w<1, since we have x t+1 w<x t w<1, if we start with an infeasible solution, all the solutions obtained over the trails will violate the constraint gx) 0, leading to a linear nuber of violation of constraints. Based on this analysis, we assue δ>1 in the analysis below. Given a strongly convex loss function fx) with odulus γ, we consider a constrained optiization proble given by in fx), gx) 0 which is equivalent to the following unconstrained optiization proble in x fx)+λ[gx)] +, where λ 0 is the Lagrangian ultiplier. Since we can always scale fx) to ake λ 1/2, it is safe to assue λ 1/2<δ. Let x and x a be the optial solutions to the constrained optiization probles argin gx) 0 fx) and argin fx)+δ[gx)] +, respectively. We choose fx) such that x fx ) > 0, which leads to x a x. his holds because according to the first order optiality condition, we have fx )= λ gx ), fx a )= δ gx ), and therefore fx ) fx a ) when λ<δ. Define = fx a ) fx ). Since γ x a x 2 /2 due to the strong convexity of fx), we have >0. Let {x t } be the sequence of solutions generated by the OGD algorith that iniizes the odified loss function fx)+δ[gx)] +. We have As a result, we have fx t )+δ[gx t )] + in fx)+δ[gx)] + x = fx a )+δ[gx a )] + ) fx a )+λ[gx a )] + ) = fx )+λ[gx )] + )+ fx a )+λ[gx a )] + fx ) λ[gx )]) in fx)+. gx) 0 fx t )+δ[gx t )] + in fx)=o), gx) 0 iplying that either the regret fx t) fx ) or the violation of the constraints [gx)] + is linear in. 2524

23 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS o better understand the perforance of penalty based approach, here we analyze the perforance of the OGD in solving the online optiization proble in 3). he algorith is analyzed using the following lea fro Zinkevich 2003). Lea 16 Let x 1,x 2,...,x be the sequence of solutions obtained by applying OGD on the sequence of bounded convex functions f 1, f 2,..., f. hen, for any solution x K we have f t x t ) f t x ) R2 + η 2 f t x t ) 2. We apply OGD to functions ˆf t x), t [] defined in 4), that is, instead of updating the solution based on the gradient of f t x), we update the solution by the gradient of ˆf t x). Using Lea 16, by expanding the functions ˆf t x) based on 4) and considering the fact that [g ix )] 2 + = 0, we get f t x t ) f t x )+ δ 2 [g i x)] 2 + R2 + η 2 Fro the definition of ˆf t x), the nor of the gradient ˆf t x t ) is bounded as follows ˆf t x) 2 = f t x)+δ ˆf t x t ) 2. 27) [g i x)] + g i x) 2 2G 2 1+δ 2 D 2 ), 28) where the inequality holds because a 1 + a 2 ) 2 2a a2 2 ). By substituting 28) into the 27) we have: f t x t ) f t x )+ δ 2 [g i x t )] 2 + R2 + ηg2 1+δ 2 D 2 ). 29) Since [ ] 2 + is a convex function, fro Jensen s inequality and following the fact that f tx t ) f t x ) F, we have: δ 2 ] 2 g i x t ) [ + δ 2 [g i x t )] 2 + R2 + ηg2 1+δ 2 D 2 ) + F. By iniizing the right hand side of 29) with respect to η, we get the regret bound as f t x t ) and the bound for the violation of constraints as f t x ) RG 21+δ 2 D 2 ) = Oδ ) 30) g i x t ) R 2 + ηg2 1+δ 2 D 2 ) + F ) 2 δ = O 1/4 δ 1/2 + δ 1/2 ). 31) Exaining the bounds obtained in 30) and 31), it turns out that in order to recover O ) regret bound, we need to set δ to be a constant, leading to O) bound for the violation of constraints in the long run, which is not satisfactory at all. 2525

24 MAHDAVI, JIN AND YANG Appendix B. Proof of Corollary 13 Let x γ be the optial solution to in gx) γ f tx). Siilar to the proof of heore 12, we have [ ft x t ) f t x γ ) ] + [ gx t )+γ ] 2 + R2 2δη + 1/η) + η 2 G2. Using the stated values for the paraeters η=δ= 1/3, and applying the fact that f tx t ) f t x γ ) F we obtain, and f t x t ) f t x γ ) R2 2 1/3 + G2 2 2/3 32) [ gx t )+γ Fro heore 7, we have the bound ] R 2 1/3 + G 2 2/3 + F ) 1/3. 33) f t x γ ) f t x )+ G γ. 34) σ Cobining inequalities 32) and 34) with substituting the stated value of γ = b 1/3 yields the regret bound as desired. o obtain the bound for the violation of the constraints, fro 33) we have gx t ) 2 R 2 1/3 + G 2 2/3 + F ) 1/3 b 2/3. For sufficiently large values of, that is, F R 2 1/3 +G 2 2/3 we can siplify above inequality as gx t) 2 F 2/3 b 2/3. By setting b=2 F the zero bound on the violation of constraints is guaranteed. References Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. Copeting in the dark: An efficient algorith for bandit linear optiization. In COL, pages , Jacob Abernethy, Alekh Agarwal, Peter L. Bartlett, and Alexander Rakhlin. A stochastic view of optial regret through iniax duality. In COL, Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. Interior-point ethods for full-inforation and bandit online learning. IEEE ransactions on Inforation heory, 587): , Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optial algoriths for online convex optiization with ulti-point bandit feedback. In COL, pages 28 40, Baruch Awerbuch and Robert D. Kleinberg. Adaptive routing with end-to-end feedback: distributed learning and geoetric approaches. In SOC, pages 45 53,

25 ONLINE CONVEX OPIMIZAION WIH LONG ERM CONSRAINS Peter L. Bartlett, Elad Hazan, and Alexander Rakhlin. Adaptive online gradient descent. In NIPS, pages , Andrey Bernstein, Shie Mannor, and Nahu Shikin. Online classification with specificity constraints. In NIPS, pages , Diitri P. Bertsekas, Angelia Nedic, and Asuan E. Ozdaglar. Convex Analysis and Optiization. Athena Scientific, Stephen Boyd and Lieven Vandenberghe. Convex Optiization. Cabridge University Press, Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Gaes. Cabridge University Press, Nicolò Cesa-Bianchi, Alex Conconi, and Claudio Gentile. On the generalization ability of on-line learning algoriths. IEEE ransactions on Inforation heory, 509): , Varsha Dani, hoas P. Hayes, and Sha Kakade. optiization. In NIPS, he price of bandit inforation for online John Duchi, Shai Shalev-Shwartz, Yora Singer, and ushar Chandra. Efficient projections onto the l 1 -ball for learning in high diensions. In ICML, pages , Abraha Flaxan, Ada auan Kalai, and H. Brendan McMahan. Online convex optiization in the bandit setting: gradient descent without a gradient. In SODA, pages , Elad Hazan, Ait Agarwal, and Satyen Kale. Logarithic regret algoriths for online convex optiization. Machine Learning, 692-3): , Jun Liu and Jieping Ye. Efficient euclidean projections in linear tie. In ICML, pages 83 90, Shie Mannor and John N. sitsiklis. Online learning with constraints. In COL, pages , Shie Mannor, John N. sitsiklis, and Jia Yuan Yu. Online learning with saple path constraints. Journal of Machine Learning Research, 10: , Hariharan Narayanan and Alexander Rakhlin. Rando walk approach to regret iniization. In NIPS, pages , Arkadi Neirovski. Efficient ethods in convex prograing. Lecture Notes, Available at neirovs, Arkadi Neirovski. Prox-ethod with rate of convergence o1/t) for variational inequalities with lipschitz continuous onotone operators and sooth convex-concave saddle point probles. SIAM J. on Optiization, 151): , Alexander Rakhlin. Lecture notes on online learning. Lecture Notes, Available at rakhlin/papers, Shai Shalev-Shwartz and Sha M. Kakade. Mind the duality gap: Logarithic regret algoriths for online optiization. In NIPS, pages ,

26 MAHDAVI, JIN AND YANG Martin Zinkevich. Online convex prograing and generalized infinitesial gradient ascent. In ICML, pages ,