Mind the Duality Gap: Logarithmic regret algorithms for online optimization

Size: px
Start display at page:

Download "Mind the Duality Gap: Logarithmic regret algorithms for online optimization"

Transcription

1 Mind the Duality Gap: Logarithmic regret algorithms for online optimization Sham M. Kakade Toyota Technological Institute at Chicago Shai Shalev-Shartz Toyota Technological Institute at Chicago Abstract We describe a primal-dual frameork for the design and analysis of online strongly convex optimization algorithms. Our frameork yields the tightest knon logarithmic regret bounds for Follo-The-Leader and for the gradient descent algorithm proposed in Hazan et al. [2006]. We then sho that one can interpolate beteen these to extreme cases. In particular, e derive a ne algorithm that shares the computational simplicity of gradient descent but achieves loer regret in many practical situations. Finally, e further extend our frameork for generalized strongly convex functions. 1 Introduction In recent years, online regret minimizing algorithms have become idely used and empirically successful algorithms for many machine learning problems. Notable examples include efficient learning algorithms for structured prediction and ranking problems [Collins, 2002, Crammer et al., 2006]. Most of these empirically successful algorithms are based on algorithms hich are tailored to general convex functions, hose regret is O( T. Rather recently, there is a groing body of ork providing online algorithms for strongly convex loss functions, ith regret guarantees that are only O(log T. These algorithms have potential to be highly applicable since many machine learning optimization problems are in fact strongly convex either ith strongly convex loss functions (e.g. log loss, square loss or, indirectly, via strongly convex regularizers (e.g. L 2 or KL based regularization. Note that in this later case, the loss function itself may only be just convex but a strongly convex regularizer effectively makes this a strongly convex optimization problem (e.g. the SVM optimization problem uses the hinge loss ith L 2 regularization. The aim of this paper is to provide a template for deriving a ider class of regret-minimizing algorithms for online strongly convex programming. Online convex optimization takes place in a sequence of consecutive rounds. At each round, the learner predicts a vector t S R n, and the environment responds ith a convex loss function, l t : S R. The goal of the learner is to minimize the difference beteen his cumulative loss and the cumulative loss of the optimal fixed vector, T l T t( t min S l t(. This is termed regret since it measures ho sorry the learner is, in retrospect, not to have predicted the optimal vector. Roughly speaking, the family of regret minimizing algorithms (for general convex functions can be seen as varying on to axes, the style and the aggressiveness of the update. In addition to online algorithms relative simplicity, the empirical successes are also due to having these to knobs to tune for the problem at hand (hich determine the nature of the regret bound. By style, e mean updates hich favor either rotational invariance (such as gradient descent like update rules or sparsity (like the multiplicative updates. Of course there is a much richer family here, including the L p updates. By the aggressiveness of the update, e mean ho much the algorithm moves its decision to be consistent ith most recent loss functions. For example, the preceptron algorithm makes no update 1

2 hen there is no error. In contrast, there is a family of algorithms hich more aggressively update the loss hen there is a margin mistake. These algorithms are shon to have improved performance (see for example the experimental study in Shalev-Shartz and Singer [2007b]. While historically much of the analysis of these algorithms have been done on a case by case basis, in retrospect, the proof techniques have become somehat boilerplate, hich has lead to groing body of ork to unify these analyses (see Cesa-Bianchi and Lugosi [2006] for revie. Perhaps the most unified vie of these algorithms is the primal-dual frameork of Shalev-Shartz and Singer [2006], Shalev-Shartz [2007], for hich the gamut of these algorithms can be largely vieed as special cases. To aspects are central in providing this unification. First, the frameork orks ith a complexity function, hich determines the style of algorithm and the nature of the regret guarantee (If this function is the L 2 norm, then one obtains gradient like updates, and if this function is the KLdistance, then one obtains multiplicative updates. Second, the algorithm maintains both primal and dual variables. Here, the the primal objective function is T l t( (here l t is the loss function provided at round t, and one can construct a dual objective function D t (, hich only depends on the loss functions l 1, l 2,... l t 1. The algorithm orks by incrementally increasing the dual objective value (in an online manner, hich can be done since each D t is only a function of the previous loss functions. By eak duality, this can be seen as decreasing the duality gap. The level of aggressiveness is seen to be ho fast the algorithm is attempting to increase the dual objective value. This paper focuses on extending the duality frameork for online convex programming to the case of strongly convex functions. This analysis provides a more unified and intuitive vie of the extant algorithms for online strongly convex programming. An important observation e make is that any σ-strongly convex loss function can be reritten as l i ( = f( + g i (, here f is a fixed σ- strongly convex function (i.e. f does not depend on i, and g i is a convex function. Therefore, after t online rounds, the amount of intrinsic strong convexity e have in the primal objective t l t( 1 is at least σ t. In particular, this explains the learning rate of σ t proposed in the gradient descent algorithm of Hazan et al. [2006]. Indeed, e sho that our frameork includes the gradient descent algorithm of Hazan et al. [2006] as an important special case, in hich the aggressiveness level is minimal. At the most aggressive end, our frameork yields the Follo-The-Leader algorithm. Furthermore, the template algorithm serves as a vehicle for deriving ne algorithms (hich enjoy logarithmic regret guarantees. The remainder of the paper is outlined as follos. We first provide background on convex duality. As a armup, in Section 3, e present an intuitive primal-dual analysis of Follo-The-Leader (FTL, hen f is the Euclidean norm. This naturally leads to a more general primal-dual algorithm (for hich FTL is a special case, hich e present in Section 4. Next, e further generalize our algorithmic frameork to include strongly convex complexity functions f ith respect to arbitrary norms. We note that the introduction of a complexity function as already provided in Shalev- Shartz and Singer [2007a], but the analysis is rather specialized and does not have a knob hich can tune the aggressiveness of the algorithm. Finally, in Sec. 6 e conclude ith a side-by-side comparison of our algorithmic frameork for strongly convex functions and the frameork for (non-strongly convex functions given in Shalev-Shartz [2007]. 2 Mathematical Background We denote scalars ith loer case letters (e.g. and λ, and vectors ith bold face letters (e.g. and λ. The inner product beteen vectors x and is denoted by x,. To simplify our notation, given a sequence of vectors λ 1,..., λ t or a sequence of scalars σ 1,..., σ t e use the shorthand λ 1:t = t λ i and = t σ i. Sets are designated by upper case letters (e.g. S. The set of non-negative real numbers is denoted by R +. For any k 1, the set of integers {1,..., k} is denoted by [k]. A norm of a vector x is denoted by x. The dual norm is defined as λ = sup{ x, λ : x 1}. For example, the Euclidean norm, x 2 = ( x, x 1/2 is dual to itself and the L 1 norm, x 1 = i x i, is dual to the L norm, x = max i x i. 2

3 FOR t = 1, 2,..., T : Define t = 1 λ t 1:(t 1 Receive a function l t( = σ t g t( and suffer loss l t( t Update λ t+1 t s.t. the folloing holds (λ t+1 t argmax D t+1(λ 1,..., λ t λ 1,...,λ t Figure 1: A primal-dual vie of Follo-the-Leader. Here the algorithm s decision t is the best decision ith respect to the previous losses. This presentation exposes the implicit role of the dual variables. Slightly abusing notation, λ 1:0 = 0, so that 1 = 0. See text. We next recall a fe definitions from convex analysis. A function f is σ-strongly convex if f(αu + (1 αv αf(u + (1 αf(v σ 2 α (1 α u v 2 2. In Sec. 5 e generalize the above definition to arbitrary norms. If a function f is σ-strongly convex then the function g( = f( σ 2 2 is convex. The Fenchel conjugate of a function f : S R is defined as f (θ = sup, θ f(. S If f is closed and convex, then the Fenchel conjugate of f is f itself (a function is closed if for all α > 0 the level set { : f( α} is a closed set. It is straightforard to verify that the function f( is conjugate to itself. The definition of f also implies that for c > 0 e have (c f (θ = c f (θ/c. A vector λ is a sub-gradient of a function f at if for all S, e have that f( f(, λ. The differential set of f at, denoted f(, is the set of all sub-gradients of f at. If f is differentiable at, then f( consists of a single vector hich amounts to the gradient of f at and is denoted by f(. The Fenchel-Young inequality states that for any and θ e have that f( + f (θ, θ. Sub-gradients play an important role in the definition of the Fenchel conjugate. In particular, the folloing lemma, hose proof can be found in Borein and Leis [2006], states that if λ f( then the Fenchel-Young inequality holds ith equality. Lemma 1 Let f be a closed and convex function and let f( be its differential set at. Then, for all λ f(, e have f( + f (λ = λ,. We make use of the folloing variant of Fenchel duality (see the appendix for more details: max f ( λ 1,...,λ T λ t g t (λ t min 3 Warmup: A Primal-Dual Vie of Follo-The-Leader T f( + g t (. (1 In this section, e provide a dual analysis for the FTL algorithm. The dual vie of FTL ill help us to derive a family of logarithmic regret algorithms for online convex optimization ith strongly convex functions. Recall that FTL algorithm is defined as follos: t = argmin t 1 l i (. (2 For each i [t 1] define g i ( = l i ( σi 2 2, here σ i is the largest scalar such that g i is still a convex function. The assumption that l i is σ-strongly convex guarantees that σ i σ. We can 3

4 therefore rerite the objective function on the right-hand side of Eq. (2 as P t ( = 2 t g i (, (3 (recall that = t 1 σ i. The Fenchel dual optimization problem (see Sec. 2 is to maximize the folloing dual objective function 1 t 1 D t (λ 1,..., λ t 1 = λ 1:(t 1 2 gi (λ i. (4 2 Let (λ t 1,..., λ t t 1 be the maximizer of D t. The relation beteen the optimal dual variables and the optimal primal vector is given by (see again Sec. 2 1 t = λ t 1:(t 1. (5 Throughout this section e assume that strong duality holds (i.e. Eq. (1 holds ith equality. See the appendix for sufficient conditions. In particular, under this assumption, e have that the above setting for t is in fact a minimizer of the primal objective, since (λ t 1,..., λ t t 1 maximizes the dual objective (see the appendix. The primal-dual vie of Follo-the-Leader is presented in Figure 1. Denote t = D t+1 (λ t+1 t D t (λ t 1,..., λ t t 1. (6 To analyze the FTL algorithm, e first note that (by strong duality t = D T +1 (λ1 T +1,..., λ T +1 T = min P T +1( = min l t (. (7 Second, the fact that (λ t+1 t is the maximizer of D t+1 implies that for any λ e have t D t+1 (λ t 1,..., λ t t 1, λ D t (λ t 1,..., λ t t 1. (8 The folloing central lemma shos that there exists λ such that the right-hand side of the above is sufficiently large. Lemma 2 Let (λ 1,..., λ t 1 be an arbitrary sequence of vectors. Denote = 1 λ 1:(t 1, let v l t (, and let λ = v σ t. Then, λ g t ( and D t+1 (λ 1,..., λ t 1, λ D t (λ 1,..., λ t 1 = l t ( v 2 2. Proof We prove the lemma for the case t > 1. The case t = 1 can be proved similarly. Since l t ( = σt g t ( and v l t ( e have that λ g t (. Denote t = D t+1 (λ 1,..., λ t 1, λ D t (λ 1,..., λ t 1. Simple algebraic manipulations yield t = 1 λ 1:(t 1 + λ λ1:(t 1 2 g 2 2σ t (λ 1:(t 1 = λ 1:(t 1 2 ( 1 1 +, λ λ 2 gt (λ 2 2 = σ t 2 ( 1 σ t +, λ λ 2 gt (λ 2 2 = σ t 2 ( σ +, λ gt 2 (λ t 2 + σ t, λ + λ 2 } 2 {{ } 2 2 } {{ } A B Since λ g t (, Lemma 1 thus implies that, λ gt (λ = g t (. Therefore, A = l t (. Next, e note that B = σt+λ 2 2. We have thus shon that t = l t ( σt+λ 2 2. Plugging the definition of λ into the above e conclude our proof. Combining Lemma 2 ith Eq. (7 and Eq. (8 e obtain the folloing: 4

5 FOR t = 1, 2,..., T : Define t = 1 λ t 1:(t 1 Receive a function l t( = σ t g t( and suffer loss l t( t Update λ t+1 t s.t. the folloing holds λ t g t( t, s.t. D t+1(λ t+1 t D t+1(λ t 1,..., λ t t 1, λ t Figure 2: A primal-dual algorithmic frameork for online convex optimization. Again, 1 = 0. Corollary 1 Let l 1,..., l T be a sequence of functions such that for all t [T ], l t is σ t -strongly convex. Assume that the FTL algorithm runs on this sequence and for each t [T ], let v t be in l t ( t. Then, l t ( t min l t ( 1 2 v t 2 (9 Furthermore, let L = max t v t and assume that for all t [T ], σ t σ. Then, the regret is bounded by L2 2σ (log(t + 1. If e are dealing ith the square loss l t ( = µ t 2 2 (here nature chooses µ t, then it is straightforard to see that Eq. (8 holds ith equality, and this leads to the previous regret bound holding ith equality. This equality is the underlying reason that the FTL strategy is a minimax strategy (See Abernethy et al. [2008] for a proof of this claim. 4 A Primal-Dual Algorithm for Online Strongly Convex Optimization In the previous section, e provided a dual analysis for FTL. Here, e extend the analysis and derive a more general algorithmic frameork for online optimization. We start by examining the analysis of the FTL algorithm. We first make the important observation that Lemma 2 is not specific to the FTL algorithm and in fact holds for any configuration of dual variables. Consider an arbitrary sequence of dual variables: (λ 2 1, (λ 3 1, λ 3 2,..., (λ T 1 +1,..., λ T +1 T and denote t as in Eq. (6. Using eak duality, e can replace the equality in Eq. (7 ith the folloing inequality that holds for any sequence of dual variables: t = D T +1 (λ1 T +1,..., λ T +1 T min A summary of the algorithmic frameork is given in Fig. 2. P T +1( = min l t (. (10 The folloing theorem, a direct corollary of the previous equation and Lemma 2, shos that all instances of the frameork achieve logarithmic regret. Theorem 1 Let l 1,..., l T be a sequence of functions such that for all t [T ], l t is σ t -strongly convex. Then, any algorithm that can be derived from Fig. 2 satisfies here v t l t ( t. l t ( t min l t ( 1 2 v t 2 (11 Proof Let t be as defined in Eq. (6. The last condition in the algorithm implies that t D t+1 (λ t 1,..., λ t t 1, v t σ t t D t (λ t 1,..., λ t t 1. The proof follos directly by combining the above ith Eq. (10 and Lemma 2. We conclude this section by deriving several algorithms from the frameork. 5

6 Example 1 (Follo-The-Leader As e have shon in Sec. 3, the FTL algorithm (Fig. 1 is equivalent to optimizing the dual variables at each online round. This update clearly satisfies the condition in Fig. 2 and is therefore a special case. Example 2 (Gradient-Descent Folloing Hazan et al. [2006], Bartlett et al. [2007] suggested the folloing update rule for differentiable strongly convex function t+1 = t 1 l t ( t. (12 Using a simple inductive argument, it is possible to sho that the above update rule is equivalent to the folloing update rule of the dual variables (λ t+1 t = (λ t 1,..., λ t t 1, l t ( t σ t t. (13 Clearly, this update rule satisfies the condition in Fig. 2. Therefore our frameork encompasses this algorithm as a special case. Example 3 (Online Coordinate-Dual-Ascent The FTL and the Gradient-Descent updates are to extreme cases of our algorithmic frameork. The former makes the largest possible increase of the dual hile the latter makes the smallest possible increase that still satisfies the sufficient dual increase requirement. Intuitively, the FTL method should have smaller regret as it consumes more of its potential earlier in the optimization process. Hoever, its computational complexity is large as it requires a full blon optimization procedure at each online round. A possible compromise is to fully optimize the dual objective but only ith respect to a small number of dual variables. In the extreme case, e optimize only ith respect to the last dual variable. Formally, e let { λ t i if i < t λ t+1 i = argmax D t+1 (λ t 1,..., λ t t 1, λ t if i = t λ t Clearly, the above update satisfies the requirement in Fig. 2 and therefore enjoys a logarithmic regret bound. The computational complexity of performing this update is often small as e optimize over a single dual vector. Occasionally, it is possible to obtain a closed-form solution of the optimization problem and in these cases the computational complexity of the coordinate-dual-ascent update is identical to that of the gradient-descent method. 5 Generalized strongly convex functions In this section, e extend our algorithmic frameork to deal ith generalized strongly convex functions. We first need the folloing generalized definition of strong convexity. Definition 1 A continuous function f is σ-strongly convex over a convex set S ith respect to a norm if S is contained in the domain of f and for all v, u S and α [0, 1] e have f(α v + (1 α u α f(v + (1 α f(u σ 2 α (1 α v u 2. (14 It is straightforard to sho that the function f( = is strongly convex ith respect to the Euclidean norm. Less trivial examples are given belo. Example 4 The function f( = n i log( i / 1 n is strongly convex over the probability simplex, S = { R n + : 1 = 1}, ith respect to the L 1 norm. Its conjugate function is f (θ = log( 1 n n exp(θ i. Example 5 For q (1, 2, the function f( = 1 2(q 1 2 q is strongly convex over S = R n ith respect to the L q norm. Its conjugate function is f (θ = 1 2(p 1 θ 2 p, here p = (1 1/q 1. For proofs, see for example Shalev-Shartz [2007]. In the appendix, e list several important properties of strongly convex functions. In particular, the Fenchel conjugate of a strongly convex function is differentiable. 6

7 INPUT: A strongly convex function f FOR t = 1, 2,..., T : «1 Define t = f λt 1:(t 1 t 2 Receive a function l t 3 Suffer loss l t( t 4 Update λ t+1 t s.t. there exists λ t l t( t ith D t+1(λ t+1 t D t+1(λ t 1,..., λ t t 1, λ t INPUT: A σ-strongly convex function f FOR t = 1, 2,..., T : «1 Define t = f λt 1:(t 1 2 Receive a function l t = σf + g t 3 Suffer loss l t( t 4 Update λ t+1 t s.t. there exists λ t g t( t ith D t+1(λ t+1 t D t+1(λ t 1,..., λ t t 1, λ t Figure 3: Primal-dual template algorithms for general online convex optimization (left and online strongly convex optimization (right. Here a 1:t = P t ai, and for notational convenient, e implicitly assume that a 1:0 = 0. See text for description. Consider the case here for all t, l t can be ritten as σ t f + g t here f is 1-strongly convex ith respect to some norm and g t is a convex function. We also make the simplifying assumption that σ t is knon to the forecaster before he defines t. For each round t, e no define the primal objective to be The dual objective is (see again Sec. 2 t 1 P t ( = f( + g i (. (15 ( D t (λ 1,..., λ t 1 = f λ t 1 1:(t 1 gi (λ i. (16 An algorithmic frameork for online optimization in the presence of general strongly convex functions is given on the right-hand side of Fig. 3. The folloing theorem provides a logarithmic regret bound for the algorithmic frameork given on the right-hand side of Fig. 3. Theorem 2 Let l 1,..., l T be a sequence of functions such that for all t [T ], l t = σ t f + g t for f being strongly convex.r.t. a norm and g t is a convex function. Then, any algorithm that can be derived from Fig. 3 (right satisfies l t ( t min l t ( 1 2 here v t g t ( t and is the norm dual to. The proof of the theorem is given in Sec. B 6 Summary v t 2, (17 In this paper, e extended the primal-dual algorithmic frameork for general convex functions from Shalev-Shartz and Singer [2006], Shalev-Shartz [2007] to strongly convex functions. The template algorithms are outlined in Fig. 3. The left algorithm is the primal-dual algorithm for general convex functions from Shalev-Shartz and Singer [2006], Shalev-Shartz [2007]. Here, f is the complexity function, (λ t 1,..., λ t t are the dual variables at time t, and D t ( is the dual objective 7

8 function at time t (hich is a loer bound on primal value. The function f is the gradient of the conjugate function of f, hich can be vieed as a projection of the dual variables back into the primal space. At the least aggressive extreme, in order to obtain T regret, it is sufficient to set λ i t (for all i to be a subgradient of the loss l t ( t. We then recover an online mirror descent algorithm [Beck and Teboulle, 2003, Grove et al., 2001, Kivinen and Warmuth, 1997], hich specializes to gradient descent hen f is the squared 2-norm or the exponentiated gradient descent algorithm hen f is the relative entropy. At the most aggressive extreme, here D t is maximized at each round, e t 1 have Follo the Regularized Leader, hich is t = arg min l i( + t f(. Intermediate algorithms can also be devised such as the passive-aggressive algorithms [Crammer et al., 2006, Shalev-Shartz, 2007]. The right algorithm in Figure 3 is our ne contribution for strongly convex functions. Any σ- strongly convex loss function can be decomposed into l t = σf + g t, here g t is convex. The algorithm for strongly convex functions is different in to ays. First, the effective learning rate is no 1 rather than 1 t (see Step 1 in both algorithms. Second, more subtly, the condition on the dual variables (in Step 4 is only determined by the subgradient of g t, rather than the subgradient of l t. At the most aggressive end of the spectrum, here D t is maximized at each round, e have the t 1 Follo the Leader (FTL algorithm: t = arg min l i(. At the least aggressive end, e have the gradient descent algorithm of Hazan et al. [2006] (hich uses learning rate 1. Furthermore, e provide algorithms hich lie in beteen these to extremes it is these algorithms hich have the potential for most practical impact. Empirical observations suggest that algorithms hich most aggressively close the duality gap tend to perform most favorably [Crammer et al., 2006, Shalev-Shartz and Singer, 2007b]. Hoever, at the FTL extreme, this is often computationally prohibitive to implement (as one must solve a full blon optimization problem at each round. Our template algorithm suggests a natural compromise, hich is to optimize the dual objective but only ith respect to a small number of dual variables (say the most current dual variable e coin this algorithm online coordinate-dual-ascent. In fact, it is sometimes possible to obtain a closed-form solution of this optimization problem, so that the computational complexity of the coordinate-dual-ascent update is identical to that of a vanilla gradient-descent method. This variant update still enjoys a logarithmic regret bound. References J. Abernethy, P. Bartlett, A. Rakhlin, and A. Teari. Optimal strategies and minimax loer bounds for online convex games. In Proceedings of the Nineteenth Annual Conference on Computational Learning Theory, P. L. Bartlett, E. Hazan, and A. Rakhlin. Adaptive online gradient descent. In Advances in Neural Information Processing Systems 21, A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31: , J. Borein and A. Leis. Convex Analysis and Nonlinear Optimization. Springer, S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments ith perceptron algorithms. In Conference on Empirical Methods in Natural Language Processing, K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shartz, and Y. Singer. Online passive aggressive algorithms. Journal of Machine Learning Research, 7: , Mar A. J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43(3: , E. Hazan, A. Kalai, S. Kale, and A. Agaral. Logarithmic regret algorithms for online convex optimization. In Proceedings of the Nineteenth Annual Conference on Computational Learning Theory, J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1:1 64, January S. Shalev-Shartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, The Hebre University, S. Shalev-Shartz and Y. Singer. Convex repeated games and Fenchel duality. In Advances in Neural Information Processing Systems 20, S. Shalev-Shartz and Y. Singer. Logarithmic regret algorithms for strictly convex repeated games. Technical report, The Hebre University, 2007a. Available at shais. S. Shalev-Shartz and Y. Singer. A unified algorithmic approach for efficient online label ranking. In aistat07, 2007b. 8

9 A Fenchel duality Consider the optimization problem An equivalent problem is ( inf 0, 1,..., T f( 0 + inf S ( f( + g t ( t g t (. s.t. 0 S and t [T ], t = 0. (18 Introducing T vectors λ 1,..., λ T, each λ t R n is a vector of Lagrange multipliers for the equality constraint t = 0, e obtain the folloing Lagrangian L( 0, 1,..., T, λ 1,..., λ T = f( 0 + g t ( t + The dual problem is the task of maximizing the folloing dual objective value, λ t, 0 t. D(λ 1,..., λ T = inf L( 0, 1,..., T, λ 1,..., λ T 0 S, 1,..., T ( = sup 0, λ t f( 0 sup ( t, λ t g t ( t 0 S t = f ( λ t gt (λ t, here f, g1,..., gt are the Fenchel conjugate functions of f, g 1,..., g T. Therefore, the generalized Fenchel dual problem is ( f λ t gt (λ t. (19 sup λ 1,...,λ T Note that hen T = 1 the above duality is the so-called Fenchel duality [Borein and Leis, 2006]. The eak duality theorem tells us that the primal objective upper bounds the dual objective: sup f ( λ 1,...,λ T λ t g (λ t inf f( + T g t (. S A sufficient condition for equality to hold (i.e. strong duality is that f is a strongly convex function, g 1,..., g T are convex functions, and the intersection of the domains of g 1,..., g T is polyhedral. Assume that strong duality holds, let (λ 1,..., λ T be a maximizer of the dual objective function, and let (0,..., T be a maximizer of the problem given in Eq. (18. Then, optimality conditions imply that (0,..., T = argmin L( 0,..., T, λ 1,..., λ T. 0,..., T See for example Boyd and Vandenberghe [2004] Section In particular, the above implies that 0 = argmax 0, λ t f( 0 = f ( λ 1:T, 0 S here in the last equality e used Lemma 1. Since 0 is also a minimizer of our original problem, e obtain the primal-dual link function = f ( λ 1:T. 9

10 B Proof of Thm. 2 We first use the folloing properties of the Fenchel conjugate of strongly convex functions. The proof of this lemma follos from Lemma 18 in Shalev-Shartz [2007]. Lemma 3 Let f be a σ-strongly convex function over S ith respect to a norm. Let f be the Fenchel conjugate of f. Then, f is differentiable and for all θ 1, θ 2 R n, e have f (θ 1 + θ 2 f (θ 1 f (θ 1, θ σ θ 2 2 We also need the folloing technical lemma. Lemma 4 Assume f strongly convex, let a, b 0, and let b = f (θ/b. Then, af (θ/a bf (θ/b (b af( Proof Since f is strongly convex e kno that f is differentiable. Using Lemma 1 e have The definition of f no implies that f (θ/b = b, θ/b f( b f (θ/a = max, θ/a f( b, θ/a f( b Therefore, af (θ/a bf (θ/b (a bf( b hich concludes our proof. Next, e sho that the gradient descend update rule yields a sufficient increase of the dual objective. Lemma 5 Let (λ 1,..., λ t 1 be an arbitrary sequence of vectors. Denote = f ( 1 λ 1:(t 1 and let λ g t (. Then, D t+1 (λ 1,..., λ t 1, λ D t (λ 1,..., λ t 1 l t ( λ 2 2. Proof Denote t = D t+1 (λ 1,..., λ t 1, λ D t (λ 1,..., λ t 1. Since f is strongly convex e can apply Lemma 3 to get that ( ( t = f λ 1:(t 1+λ + f λ 1:(t 1 gt (λ ( ( ( f λ 1:(t 1,λ + λ 2 2( + σ 2 1:(t 1 f λ 1:(t 1 gt (λ ( ( = f λ 1:(t 1 f λ 1:(t 1 +, λ gt (λ λ 2. } {{ } } {{ } 2 B A Since λ g t ( e get from Lemma 1 that B = g t (. Next, e use Lemma 4 and the definition of to get that A ( f( = σt f(. Thus, A + B σ t f( + g t ( = l t ( and this concludes our proof. The proof of Thm. 2 no easily follos. Proof [of Thm. 2] Denote t = D t+1 (λ t+1 t D t (λ t 1,..., λ t t 1 and note that Eq. (10 still holds. The definition of the update in Fig. 3 and Lemma 5 implies that there exists v t g t ( t such that t l t ( t vt 2 2. Summing over t and combining ith Eq. (10 e conclude our proof. 10

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Online Convex Optimization

Online Convex Optimization E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes Advanced Course in Machine Learning Spring 2011 Online Learning Lecturer: Shai Shalev-Shwartz Scribe: Shai Shalev-Shwartz In this lecture we describe a different model of learning which is called online

More information

Optimal Strategies and Minimax Lower Bounds for Online Convex Games

Optimal Strategies and Minimax Lower Bounds for Online Convex Games Optimal Strategies and Minimax Lower Bounds for Online Convex Games Jacob Abernethy UC Berkeley jake@csberkeleyedu Alexander Rakhlin UC Berkeley rakhlin@csberkeleyedu Abstract A number of learning problems

More information

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09 1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Online Learning: Theory, Algorithms, and Applications. Thesis submitted for the degree of Doctor of Philosophy by Shai Shalev-Shwartz

Online Learning: Theory, Algorithms, and Applications. Thesis submitted for the degree of Doctor of Philosophy by Shai Shalev-Shwartz Online Learning: Theory, Algorithms, and Applications Thesis submitted for the degree of Doctor of Philosophy by Shai Shalev-Shwartz Submitted to the Senate of the Hebrew University July 2007 This work

More information

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

More information

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

A Potential-based Framework for Online Multi-class Learning with Partial Feedback

A Potential-based Framework for Online Multi-class Learning with Partial Feedback A Potential-based Framework for Online Multi-class Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Journal of Machine Learning Research 7 2006) 551 585 Submitted 5/05; Published 3/06 Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Joseph Keshet Shai Shalev-Shwartz Yoram Singer School of

More information

Lecture Notes on Online Learning DRAFT

Lecture Notes on Online Learning DRAFT Lecture Notes on Online Learning DRAFT Alexander Rakhlin These lecture notes contain material presented in the Statistical Learning Theory course at UC Berkeley, Spring 08. Various parts of these notes

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

The p-norm generalization of the LMS algorithm for adaptive filtering

The p-norm generalization of the LMS algorithm for adaptive filtering The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS 1. INTRODUCTION ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Strongly Adaptive Online Learning

Strongly Adaptive Online Learning Amit Daniely Alon Gonen Shai Shalev-Shwartz The Hebrew University AMIT.DANIELY@MAIL.HUJI.AC.IL ALONGNN@CS.HUJI.AC.IL SHAIS@CS.HUJI.AC.IL Abstract Strongly adaptive algorithms are algorithms whose performance

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Online Lazy Updates for Portfolio Selection with Transaction Costs

Online Lazy Updates for Portfolio Selection with Transaction Costs Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence Online Lazy Updates for Portfolio Selection with Transaction Costs Puja Das, Nicholas Johnson, and Arindam Banerjee Department

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Adaptive Bound Optimization for Online Convex Optimization

Adaptive Bound Optimization for Online Convex Optimization Adaptive Bound Optimization for Online Convex Optimization H. Brendan McMahan Google, Inc. mcmahan@google.com Matthew Streeter Google, Inc. mstreeter@google.com Abstract We introduce a new online convex

More information

12. Inner Product Spaces

12. Inner Product Spaces 1. Inner roduct Spaces 1.1. Vector spaces A real vector space is a set of objects that you can do to things ith: you can add to of them together to get another such object, and you can multiply one of

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Vectors Math 122 Calculus III D Joyce, Fall 2012

Vectors Math 122 Calculus III D Joyce, Fall 2012 Vectors Math 122 Calculus III D Joyce, Fall 2012 Vectors in the plane R 2. A vector v can be interpreted as an arro in the plane R 2 ith a certain length and a certain direction. The same vector can be

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

DUOL: A Double Updating Approach for Online Learning

DUOL: A Double Updating Approach for Online Learning : A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

OPTIMIZING WEB SERVER'S DATA TRANSFER WITH HOTLINKS

OPTIMIZING WEB SERVER'S DATA TRANSFER WITH HOTLINKS OPTIMIZING WEB SERVER'S DATA TRANSFER WIT OTLINKS Evangelos Kranakis School of Computer Science, Carleton University Ottaa,ON. K1S 5B6 Canada kranakis@scs.carleton.ca Danny Krizanc Department of Mathematics,

More information

Online Learning and Online Convex Optimization. Contents

Online Learning and Online Convex Optimization. Contents Foundations and Trends R in Machine Learning Vol. 4, No. 2 (2011) 107 194 c 2012 S. Shalev-Shwartz DOI: 10.1561/2200000018 Online Learning and Online Convex Optimization By Shai Shalev-Shwartz Contents

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

Online Learning with Switching Costs and Other Adaptive Adversaries

Online Learning with Switching Costs and Other Adaptive Adversaries Online Learning with Switching Costs and Other Adaptive Adversaries Nicolò Cesa-Bianchi Università degli Studi di Milano Italy Ofer Dekel Microsoft Research USA Ohad Shamir Microsoft Research and the Weizmann

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

Online Classification on a Budget

Online Classification on a Budget Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Efficient Projections onto the l 1 -Ball for Learning in High Dimensions

Efficient Projections onto the l 1 -Ball for Learning in High Dimensions Efficient Projections onto the l 1 -Ball for Learning in High Dimensions John Duchi Google, Mountain View, CA 94043 Shai Shalev-Shwartz Toyota Technological Institute, Chicago, IL, 60637 Yoram Singer Tushar

More information

4.6 Linear Programming duality

4.6 Linear Programming duality 4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal

More information

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12 CONTINUED FRACTIONS AND PELL S EQUATION SEUNG HYUN YANG Abstract. In this REU paper, I will use some important characteristics of continued fractions to give the complete set of solutions to Pell s equation.

More information

1 Portfolio mean and variance

1 Portfolio mean and variance Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

Schooling, Political Participation, and the Economy. (Online Supplementary Appendix: Not for Publication)

Schooling, Political Participation, and the Economy. (Online Supplementary Appendix: Not for Publication) Schooling, Political Participation, and the Economy Online Supplementary Appendix: Not for Publication) Filipe R. Campante Davin Chor July 200 Abstract In this online appendix, we present the proofs for

More information

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where. Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S

More information

Convex Programming Tools for Disjunctive Programs

Convex Programming Tools for Disjunctive Programs Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Online Learning and Competitive Analysis: a Unified Approach

Online Learning and Competitive Analysis: a Unified Approach Online Learning and Competitive Analysis: a Unified Approach Shahar Chen Online Learning and Competitive Analysis: a Unified Approach Research Thesis Submitted in partial fulfillment of the requirements

More information

The Multiplicative Weights Update method

The Multiplicative Weights Update method Chapter 2 The Multiplicative Weights Update method The Multiplicative Weights method is a simple idea which has been repeatedly discovered in fields as diverse as Machine Learning, Optimization, and Game

More information

Online Learning of Optimal Strategies in Unknown Environments

Online Learning of Optimal Strategies in Unknown Environments 1 Online Learning of Optimal Strategies in Unknown Environments Santiago Paternain and Alejandro Ribeiro Abstract Define an environment as a set of convex constraint functions that vary arbitrarily over

More information

Optimal energy trade-off schedules

Optimal energy trade-off schedules Optimal energy trade-off schedules Neal Barcelo, Daniel G. Cole, Dimitrios Letsios, Michael Nugent, Kirk R. Pruhs To cite this version: Neal Barcelo, Daniel G. Cole, Dimitrios Letsios, Michael Nugent,

More information

The Advantages and Disadvantages of Online Linear Optimization

The Advantages and Disadvantages of Online Linear Optimization LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization Adaptive Subgradient Methods Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi Computer Science Division University of California, Berkeley Berkeley, CA 94720 USA

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Duality in Linear Programming

Duality in Linear Programming Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadow-price interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

Robust Median Reversion Strategy for On-Line Portfolio Selection

Robust Median Reversion Strategy for On-Line Portfolio Selection Proceedings of the Tenty-Third International Joint Conference on Artificial Intelligence Robust Median Reversion Strategy for On-Line Portfolio Selection Dingjiang Huang,2,3, Junlong Zhou, Bin Li 4, Steven

More information

Optimization Theory for Large Systems

Optimization Theory for Large Systems Optimization Theory for Large Systems LEON S. LASDON CASE WESTERN RESERVE UNIVERSITY THE MACMILLAN COMPANY COLLIER-MACMILLAN LIMITED, LONDON Contents 1. Linear and Nonlinear Programming 1 1.1 Unconstrained

More information

On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

Introduction to Online Optimization

Introduction to Online Optimization Princeton University - Department of Operations Research and Financial Engineering Introduction to Online Optimization Sébastien Bubeck December 14, 2011 1 Contents Chapter 1. Introduction 5 1.1. Statistical

More information

A Quantitative Approach to the Performance of Internet Telephony to E-business Sites

A Quantitative Approach to the Performance of Internet Telephony to E-business Sites A Quantitative Approach to the Performance of Internet Telephony to E-business Sites Prathiusha Chinnusamy TransSolutions Fort Worth, TX 76155, USA Natarajan Gautam Harold and Inge Marcus Department of

More information

Randomization Approaches for Network Revenue Management with Customer Choice Behavior

Randomization Approaches for Network Revenue Management with Customer Choice Behavior Randomization Approaches for Network Revenue Management with Customer Choice Behavior Sumit Kunnumkal Indian School of Business, Gachibowli, Hyderabad, 500032, India sumit kunnumkal@isb.edu March 9, 2011

More information

The Relative Worst Order Ratio for On-Line Algorithms

The Relative Worst Order Ratio for On-Line Algorithms The Relative Worst Order Ratio for On-Line Algorithms Joan Boyar 1 and Lene M. Favrholdt 2 1 Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, joan@imada.sdu.dk

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

Pacific Journal of Mathematics

Pacific Journal of Mathematics Pacific Journal of Mathematics GLOBAL EXISTENCE AND DECREASING PROPERTY OF BOUNDARY VALUES OF SOLUTIONS TO PARABOLIC EQUATIONS WITH NONLOCAL BOUNDARY CONDITIONS Sangwon Seo Volume 193 No. 1 March 2000

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

Leveraging Multipath Routing and Traffic Grooming for an Efficient Load Balancing in Optical Networks

Leveraging Multipath Routing and Traffic Grooming for an Efficient Load Balancing in Optical Networks Leveraging ultipath Routing and Traffic Grooming for an Efficient Load Balancing in Optical Netorks Juliana de Santi, André C. Drummond* and Nelson L. S. da Fonseca University of Campinas, Brazil Email:

More information

z 0 and y even had the form

z 0 and y even had the form Gaussian Integers The concepts of divisibility, primality and factoring are actually more general than the discussion so far. For the moment, we have been working in the integers, which we denote by Z

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.

More information

Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom. Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.com This paper contains a collection of 31 theorems, lemmas,

More information

LARGE CLASSES OF EXPERTS

LARGE CLASSES OF EXPERTS LARGE CLASSES OF EXPERTS Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 31, 2006 OUTLINE 1 TRACKING THE BEST EXPERT 2 FIXED SHARE FORECASTER 3 VARIABLE-SHARE

More information

6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation

6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation 6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation Daron Acemoglu and Asu Ozdaglar MIT November 2, 2009 1 Introduction Outline The problem of cooperation Finitely-repeated prisoner s dilemma

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

Online Convex Programming and Generalized Infinitesimal Gradient Ascent

Online Convex Programming and Generalized Infinitesimal Gradient Ascent Online Convex Programming and Generalized Infinitesimal Gradient Ascent Martin Zinkevich February 003 CMU-CS-03-110 School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 Abstract Convex

More information

Basic Linear Algebra

Basic Linear Algebra Basic Linear Algebra by: Dan Sunday, softsurfer.com Table of Contents Coordinate Systems 1 Points and Vectors Basic Definitions Vector Addition Scalar Multiplication 3 Affine Addition 3 Vector Length 4

More information

CONVEX optimization forms the backbone of many

CONVEX optimization forms the backbone of many IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 58, NO 5, MAY 2012 3235 Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization Alekh Agarwal, Peter L Bartlett, Member,

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information

We shall turn our attention to solving linear systems of equations. Ax = b

We shall turn our attention to solving linear systems of equations. Ax = b 59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Gambling Systems and Multiplication-Invariant Measures

Gambling Systems and Multiplication-Invariant Measures Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously

More information

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information