Online Learning and Competitive Analysis: a Unified Approach

Size: px
Start display at page:

Download "Online Learning and Competitive Analysis: a Unified Approach"

Transcription

1 Online Learning and Competitive Analysis: a Unified Approach Shahar Chen

2

3 Online Learning and Competitive Analysis: a Unified Approach Research Thesis Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Shahar Chen Submitted to the Senate of the Technion Israel Institute of Technology Iyar 5775 Haifa April 2015

4

5 This research was carried out under the supervision of Prof. Seffi Naor and Dr. Niv Buchbinder, in the Department of Computer Science. Technion - Computer Science Department - Ph.D. Thesis PHD Some results in this thesis have been published as articles by the author and research collaborators in conferences and journals during the course of the author s doctoral research period, the most up-to-date versions of which being: Niv Buchbinder, Shahar Chen, Anupam Gupta, Viswanath Nagarajan, and Joseph Naor. packing and covering framework with convex objectives. CoRR, abs/ , Online Niv Buchbinder, Shahar Chen, and Joseph Naor. Competitive algorithms for restricted caching and matroid caching. In Algorithms - ESA th Annual European Symposium, Wroclaw, Poland, September 8-10, Proceedings, pages , Niv Buchbinder, Shahar Chen, and Joseph Naor. Competitive analysis via regularization. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages , Niv Buchbinder, Shahar Chen, Joseph Naor, and Ohad Shamir. Unified algorithms for online learning and competitive analysis. In COLT The Twenty-fifth Annual Conference on Learning Theory, pages , Acknowledgements I would like to thank Seffi and Niv, my advisors, who wisely led me and graciously guided me throughout the years of my work. It has been a fascinating period, and I highly appreciate the good fortune of having the chance to work with you. The generous financial help of Irwin and Joan Jacobs, the Zeff Fellowship, and the Technion is gratefully acknowledged.

6

7 Contents List of Figures Abstract 1 Abbreviations and Notations 3 1 Introduction 5 2 Preliminaries Linear Programming and Convex Programming Lagrangian Duality and Optimality Conditions Approximation Algorithms using Linear Programming Matroids and Submodular Functions Matroids Submodular Functions Brief Introduction to Online Computation Competitiveness via Regularization Introduction Online Regularization Algorithm Analysis General Covering Constraints with Variable Upper Bounds Online Set Cover with Service Cost Unified Algorithms for Online Learning and Competitive Analysis Introduction Preliminaries: Online Learning and Competitive Analysis Algorithms and Results Proofs and Algorithm Derivation the Experts/MTS Case Proofs and Algorithm Derivation the Matroid Case

8 5 Restricted Caching and Matroid Caching Introduction Definitions and Problem Formulation Main Algorithm Rounding the Fractional Solution Online A Lower Bound on the Auxiliary Graph Diameter Special Cases of Restricted Caching Concluding Remarks Online Packing and Covering Framework with Convex Objectives Introduction Techniques and Chapter Outline The General Framework The Algorithm Monotone Online Maximization Applications l p -norm of Packing Constraints Online Set Cover with Multiple Costs Profit maximization with nonseparable production costs Hebrew Abstract i

9 List of Figures 3.1 Primal and dual LP formulations for the online covering problem The primal and dual LP formulations for the MTS problem The primal and dual LP formulations for the Matroid problem n, l-companion cache The primal and dual LP formulations for the matroid caching problem Uniform decomposition into spanning trees of the initial fractional solution Decomposition into spanning trees of the updated fractional solution

10

11 Abstract Online learning and competitive analysis are two widely studied frameworks for online decisionmaking settings. Despite the frequent similarity of the problems they study, there are significant differences in their assumptions, goals and techniques, hindering a unified analysis and richer interplay between the two. In this research we provide several contributions in this direction. First, we provide a single unified algorithm which by parameter tuning, interpolates between optimal regret for learning from experts in online learning and optimal competitive ratio for the metrical task systems problem MTS in competitive analysis, improving on the results of Blum and Burch The algorithm also allows us to obtain new regret bounds against drifting experts, which might be of independent interest. Moreover, our approach allows us to go beyond experts/mts, obtaining similar unifying results for structured action sets and combinatorial experts, whenever the setting has a certain matroid structure. A complementary direction of our research tries to borrow various learning techniques, specifically focusing on the online convex optimization domain, in order to obtain new results in the competitive analysis framework. We show how regularization, a fundamental method in machine learning and particularly in the field of online learning, can be applied to obtain new results in the area of competitive analysis. We also show how convex conjugacy and Fenchel duality, other powerful techniques used in online convex optimization and learning, can be used in the competitive analysis setting, allowing us to cope with a richer world of online optimization problems. 1

12 2

13 Abbreviations and Notations LP : Linear program P : A primal minimum program D : A dual maximum program P, D : The change in the cost of the primal and dual programs, respectively G = V, E : Graph with set of vertices V and set of edges E opt : The cost of the optimal offline solution u v : Relative entropy with respect to u and v E : Ground set n : Number of elements in ground set I : Collection of independent sets subset of 2 E M : Matroid rm : Matroid rank function γm : Matroid density cm : Matroid circumference BM : The bases polytope corresponding to M PM : The independent sets polytope corresponding to M P ss M : The spanning sets polytope corresponding to M M : The dual matroid of M f : The convex conjugate of function f l fx : The l th coordinate of the gradient of f at point x 3

14 4

15 Chapter 1 Introduction Online learning, in its decision-theoretic formulation, captures the problem of a decision-maker who iteratively needs to make decisions in the face of future uncertainty. In each round, the decision-maker picks a certain action from an action set, and then suffers a cost associated with that action. The cost vector is not known in advance, and might even be chosen by an adversary with full knowledge of the decision-maker s strategy. The performance is typically measured in terms of the regret, namely the difference between the total accumulated cost and the cost of an arbitrary fixed policy from some comparison class. Non-trivial algorithms usually attain regret which is sublinear in the number of rounds. While online learning is a powerful and compelling framework, with deep connections to statistical learning, it also has some shortcomings. In particular, it is well-recognized that regret against a fixed policy is often too weak, especially when the environment changes over time and thus no single policy is always good. This has led to several papers e.g., [HW98, HS09, CMEDV10, RST11] which discuss performance with respect to stronger notions of regret, such as adaptive regret or tracking the best expert. A related shortcoming of online learning is that it does not capture well problems with states, where costs depend on the decision-maker s current configuration as well as on past actions. Consider, for instance, the problem of allocating jobs to servers in an online fashion. Clearly, the time it takes to process jobs strongly depends on the system state, such as its overall load, determined by all previous allocation decisions. The notion of regret does not capture this setting well, since it measures the regret with respect to a fixed policy, while assuming that at each step this policy faces the exact same costs. Thus, one might desire algorithms for a much more ambitious framework, where we need to compete against arbitrary policies, including an optimal offline policy which has access to future unknown costs, and where we can model states. Such problems have been intensively studied in the field of competitive analysis for a detailed background, see [BEY98]. In such a framework, attaining sublinear regret is hopeless in general. Instead, the main measure used is the competitive ratio, that bounds the ratio of the total cost of the decision-maker and the 5

16 total cost of an optimal offline policy, in a worst-case sense. This usually provides a weaker performance guarantee than online learning, but with respect to a much stronger optimality criterion. While problems studied under these two frameworks are often rather similar, there has not been much research on general connections between the two. The main reason for this situation other than social factors stemming from the separate communities studying them is some crucial differences in the modeling assumptions. For example, in order to model the notion of state, competitive analysis usually assumes a movement cost of switching between states. In the online learning framework, this would be equivalent to having an additional cost associated with switching actions between rounds. Another difference is that in competitive analysis one assumes 1-lookahead, i.e., the decision-maker knows the cost vector in the current round. In contrast, online learning has 0-lookahead, and the decision-maker does not know the cost vector of the current round until a decision is made. Such differences, as stated in [CBL06, p. 3], have so far prevented the derivation of a general theory allowing a unified analysis of both types of problems. In this work, we attempt to connect the two fields, online learning and competitive analysis. Our attempt can be classified into two lines of action. The first line of action is exploit the similarities between the frameworks to provide a unified algorithmic approach, that attains both optimal regret and an optimal competitive ratio to a large class of problems in both fields. Chapter 4 addresses this issue. The second line of action is to bridge over the analytical gap between the two fields. That is, as the two communities have worked separately, different tools and methods have evolved in one field without much attention from the other. Our research tries to borrow various techniques, especially from the learning domain, in order to obtain new results in the competitive analysis framework. Chapter 3 and Chapter 6 address this issue. Our Contribution In Chapter 3 we provide a framework for designing competitive online algorithms using regularization, a widely used technique in online learning, particularly in online convex optimization. In our new framework we exhibit a general competitive deterministic algorithm for generating a fractional solution that satisfies a time-varying set of online covering and precedence constraints. This framework allows to incorporate both service costs over time and setup costs into a host of applications. We then provide a competitive randomized algorithm for the online set cover problem with service cost. This model allows for sets to be both added and deleted over time from a solution. Chapter 4 adopts the regularization approach studied in Chapter 3 to introduce a single unified algorithm which by parameter tuning, interpolates between optimal regret for learning from experts in online learning and optimal competitive ratio for the metrical task systems problem MTS in competitive analysis, improving on previous results. The algorithm also 6

17 allows us to obtain new regret bounds against drifting experts, which might be of independent interest. Moreover, our approach allows us to go beyond experts/mts, obtaining similar unifying results for structured action sets and combinatorial experts, whenever the setting has a certain matroid structure. In Chapter 5 we exploit the techniques introduced in the Chapter 4 to study the online restricted caching problem, where each memory item can be placed in only a restricted subset of cache locations. We solve this problem through a more general online caching problem, in which the cache architecture is subject to matroid constraints. Our main result is a polynomial time approximation algorithm to the matroid caching problem, which guarantees an Olog 2 k- approximation for any restricted cache of size k, independently of its structure. In addition, we study the n, l-companion caching problem, defined by [BETW01] as a special case of restricted caching, and prove that our algorithm achieves an optimal competitive factor of Olog n + log l, improving on previous results of [FMS02]. Chapter 6 considers online fractional covering problems with a convex objective, where the covering constraints arrive over time. We also consider the corresponding dual online packing problems with concave objective. We provide an online primal-dual framework for both classes of problems with competitive ratio depending on certain monotonicity and smoothness parameters of the objective function f, which match or improve on guarantees for some special classes of functions f considered previously. This framework extends the primal-dual linear programming techniques developed in competitive analysis, using the notion of convex conjugacy and Fenchel duality, well studied techniques in online convex optimization. 7

18 8

19 Chapter 2 Preliminaries 2.1 Linear Programming and Convex Programming A mathematical program, or a mathematical optimization problem, is a problem of minimizing or maximizing a function over a feasible set of constraints. More formally, we define a mathematical program in the following form: min f 0 x subject to: 2.1 for any 1 j m, f j x b j. Here, the vector x = x 1,..., x n is the optimization variable of the problem, the function f 0 : R n R is the objective function, and the constraints are defined by m constraint functions f j : R n R, j = 1,..., m, and m constants b 1,..., b m. When minimization is considered we usually refer to the problem as the primal problem, denoted by P. An important class of mathematical optimization problems is linear optimization problems. Optimization problem 2.1 is called a linear program LP if the functions f 0,..., f m are linear, i.e., satisfy f j αx + βy = αf j x + βf j y, 2.2 for all x, y R n and α, β R. In our discussion, we usually consider the following linear 9

20 program formulation: Technion - Computer Science Department - Ph.D. Thesis PHD P : min n c i x i i=1 subject to: 2.3 n for any 1 j m, a ij x i b j, i=1 for any 1 i n, x i 0. It is well known that any linear program can be formulated in this way. The sparsity of a linear constraint is referred to the number of non-zero coefficients in the latter formulation. Another, more general, class of mathematical optimization problems is linear programs with a convex objective. In this work, we refer to an optimization problem 2.1 as such if the constraint functions are linear and the objective function f 0 is convex, i.e., satisfy f 0 αx + βy αf 0 x + βf 0 y, 2.4 for all x, y R n and α, β R, with α + β = 1 and α, β 0. If strict inequality holds in 2.4 whenever x y and 0 < a, b < 1, then we say that the objective function f 0 is strictly convex Lagrangian Duality and Optimality Conditions Given problem 2.1, the Lagrangian L : R n R m R is defined as, Lx, λ = f 0 x m λ j f j x b j, and the vector λ = λ 1,..., λ m contains the lagrangian dual variables. j=1 The idea in lagrangian duality is use the Lagrangian in order to bound any feasible solution and in particular the optimal solution of optimization problem 2.1. To do so, we define the lagrangian dual function g : R m R as the minimum of the Lagrangian L over x: Weak Duality: gλ = inf x Lx, λ = inf x f 0 x m λ j f j x b j. The weak duality property states that for any optimization problem, its lagrangian dual function j=1 yields a lower bound on its optimal value. More formally, Theorem 2.1. Let p denote the value of an optimal primal optimization problem 2.1, and 10

21 let g denote the corresponding dual function. Then for any λ 0, we have gλ p. Technion - Computer Science Department - Ph.D. Thesis PHD Proof. Suppose that x is a feasible solution for the primal problem. This immediately implies that f j x b j and λ j 0, for any 1 j m. Then we have and, therefore, m λ j f j x b j 0 j=1 gλ = inf x Lx, λ L x, λ = f 0 x m λ j f j x b j f 0 x. Since the latter inequality holds for any feasible x, the theorem follows. Theorem 2.1 states that given a primal optimization problem, for any λ 0 the dual function gives a lower bound on the optimal value of the problem. In order to obtain the best lower bound using lagrangian duality, we formulate the following lagrangian dual problem denoted by D: j=1 max gλ 1,..., λ m subject to: 2.5 for any 1 j m, λ j 0. Lagrangian duality plays an important role in the area of mathematical optimization, and particularly in linear and convex optimization. In the case of linear optimization, problem 2.5 yields dual program D which corresponds to linear program P in 2.3: D : max m b j y j j=1 subject to: 2.6 m for any 1 i n, a ij y j c i, j=1 for any 1 j m, y j 0. A special subclass of linear programs consists of programs in which the coefficients a ij, b j and c i in 2.3 and 2.6 are all nonnegative. In this case the primal program is called covering problem and the dual program is called packing problem. Capturing various applications and classical optimization problems, this is an important subclass in the study of approximation 11

22 algorithms, see e.g. [Vaz01]. In the case of LPs with a convex objective, one can also develop problem 2.5 to obtain a more specific Formulation based on the convex conjugate function of f 0. We refer the reader to Chapter 6 for further details, since this is where convex duality is used. Strong Duality: When the gap between the optimal value of the primal problem 2.1 and the optimal value of the dual problem 2.5 is zero, we say that strong duality holds. It turns out that when the objective function f 0 is convex, and some basic conditions on the constraints exist, then we have strong duality. Specifically, for linear programs with a convex objective the following theorem holds. Theorem 2.2. A primal linear program with a convex objective has a finite optimal solution if and only if its dual program has a finite optimal solution. In this case, the value of the optimal solutions of the primal and dual programs is equal. Optimality Conditions: Let x be an optimal solution for problem 2.1, and let λ be an optimal solution for the dual problem. Then, if strong duality holds, the following conditions, called the Karush-Kuhn- Tucker KKT conditions, are satisfied: f j x b j, j = 1,..., m 2.7 λ j 0, j = 1,..., m 2.8 λ j f j x b j = 0, j = 1,..., m 2.9 m f 0 x + λ j f j x = j=1 The first two conditions follow immediately from the feasibility of the primal and dual solutions. To obtain the last two conditions we use strong duality and get, f 0 x = gλ = inf x f 0 x f 0 x f 0 x. m λ j f j x b j j=1 m λ j f j x b j j=1 12

23 We conclude that the last two inequalities hold with equality. Condition 2.9, also known as complementary slackness, follows from the last inequality or equality since each term in the sum m j=1 λ j f jx b j is nonnegative. Since the first inequality holds with equality, x minimizes Lx, λ over x, and therefore its gradient with respect to x must vanish at x, i.e., the Condition 2.10 follows. For convex problems, the KKT conditions are also sufficient to ensure optimality. That is, any primal-dual pair of solutions that satisfy the above inequalities are primal and dual optimal with zero optimality gap. For a comprehensive survey on convex programming and optimization see e.g., [BV04] Approximation Algorithms using Linear Programming Many interesting optimization problems can be formulated as integer programs, i.e., mathematical programs in which the optimization variables are assigned with integral values, x 1,..., x n Z. Unfortunately, adding integrality restriction often makes the problems hard. A way of handling this hardness is by relaxing the formulation removing the integrality restriction, and allowing a fractional assignment to the variables. Thus, the optimal solution of a relaxation of a minimization problem is a lower bound on the optimal solution of the problem. The ratio between the optimal solutions is called the integrality gap of the relaxation. As a result, linear programming has become a very influential tool for the design and analysis of approximation algorithms and online algorithms. Given an integral optimization problem, we formulate a linear relaxation to the problem. Next, we solve possibly approximately the linear relaxation, obtaining a fractional solution. Finally, we apply a procedure which rounds the fractional solution often using randomness, to obtain a feasible solution for the original problem. We shall demonstrate this extremely useful technique in the following chapters. For further information on approximation techniques we refer the reader to [Vaz01]. 2.2 Matroids and Submodular Functions Matroids Matroids are extremely useful combinatorial objects that capture many natural collections of subsets such as sparse subsets, forests in graphs, linearly independent sets in vector spaces, and sets of nodes in legal matchings of a given graph. Let E be a finite set and let I be a non-empty collection of subsets of E. M = E, I is called a matroid if I satisfies: for every S 1 S 2, if S 2 I then also S 1 I, if S 1, S 2 I and S 1 > S 2, then there exists an element e S 1 \S 2 such that S 2 {e} I. 13

24 The latter property is called the set exchange property. Given a matroid M = E, I, we refer to E as the ground set, and every subset S I is called independent or dependent otherwise. For S E, a subset B of S is called a base of S if B is a maximal independent subset of S. A well known fact is that for any subset S of E, any two bases of S have the same size, called the rank of S, denoted by r S. For example, s-sparse subsets are the bases of an s-uniform matroid, where re = s. Spanning trees in a connected graph G = V, E are bases of a graphic matroid with E = E and I being the collection of all subsets of E that form a forest, with rank re = V 1. A subset of E is called spanning if it contains a base of E. So bases are the inclusion-wise minimal spanning sets and the only independent spanning sets. Each matroid M = E, I is associated with a dual matroid M = E, I, where I = {I E E \ I is a spanning set of M}. This means that the bases of M are precisely the complements of the bases of M, implying M = M. Moreover, for any S E the rank function of the dual matroid satisfies r S = S + re \ S re. The density of a matroid M, γ M, is defined as max S E,S = { S /rs}. For example, the density of the s-subsets matroid is n/s. The density of a graphic matroid spanning trees in a graph G = V, E is max S V, S >1 { ES / S 1}, where ES is the set of edges in the subgraph defined by the vertices of S. A circuit C in a matroid M, is defined as an inclusion-wise minimal dependent set, that is C \ {e} I, for every e C. The circumference of a matroid M, cm, is the cardinality of the largest circuit in M. For example, the circumference of an s-uniform matroid is s + 1, and the circumference of a graphic matroid in a graph G = V, E is the length of the longest simple cycle in it. A subset F of E is called nonseparable if every pair of elements in F lie in a common circuit; otherwise there is a partition of F into non-empty sets F 1 and F 2 with rf = rf 1 + rf 2. See [HW69, Whi35] for further details Submodular Functions A set function is a function f : 2 E R which assigns a value to all subsets of E. f is called submodular if, fs 1 + fs 2 fs 1 S 2 + fs 2 S 2, 2.11 for all subsets S 1, S 2 of E. Similarly, f is supermodular when it satisfies 2.11 with the opposite inequality sign. Furthermore, f is called nondecreasing if fs 2 fs 1 for any S 2 S 1 E, and f is called normalized if f = 0. Matroids are closely related to submodularity, since the rank function of any matroid is submodular, nondecreasing and normalized. For a thorough survey of the results on matroids we refer the reader to [Sch03, Law76]. 14

25 2.3 Brief Introduction to Online Computation Technion - Computer Science Department - Ph.D. Thesis PHD Most of the thesis deals with online algorithms and online optimization problems. We therefore start with an introduction to online computation. Since the theory of online computation, and this thesis as well, considers various models and settings, we keep this introduction general and brief. In Chapter 3 and Chapter 4 we further elaborate on the settings that we consider. An online algorithm must respond to a sequence of events or requests by producing a sequence of decisions. Each decision is made based on the history of past events and decisions, but without knowledge of the future. The decisions made by the algorithm generate a cost or a profit, which the algorithm tries to minimize or to maximize, respectively. We consider several online settings, and follow previous work and adopt the popular notions of competitive ratio and regret to evaluate the performance of our online algorithms under these settings. Let opti be the cost of the optimal feasible solution of a sequence of events denoted by I. 1 An online algorithm is said to be c-competitive for a minimization problem, if for every sequence of events I, the algorithm generates a cost of at most c opti + d, where d is independent of the event sequence. Analysis of online algorithms with respect to this measure is referred to as competitive analysis. For maximization, the definition of competitiveness is analogous. When considering a maximization problem, a c-competitive algorithm is guaranteed to return a solution with profit of at least opti/c d, where opti is the maximum profit solution, and d is independent of the event sequence. The second performance measure that we use is regret. For minimization, an algorithm obtains a regret of h if it is guaranteed to return a solution with cost at most opti + h. An equivalent way to view an online problem is as a game between an online player and a malicious adversary. The online player follows an online algorithm on an input that is created by the adversary. Knowing the strategy of the online player, the adversary produces the worst possible input. In other words, the adversary constructs a sequence of events that produces bad results expensive cost, or low profit for the online player, but at the same time good results for an optimal offline strategy. It is also possible to consider competitiveness and regret when the online algorithm uses randomization. In this work, we only consider models where the adversary knows the algorithm and the probability distribution the algorithm uses to make its random decisions. The adversary is not aware, however, of the actual random choices made by the algorithm throughout its execution. This kind of adversary is called an oblivious adversary. In case randomization is allowed, the expected cost or profit of the algorithm is compared against optimal solution opti. In Chapter 4 we show connections between these two performance measures in various settings. 1 The notion of optimal feasible may differ from one online setting to the other. See the following chapters for more details. 15

26 16

27 Chapter 3 Competitiveness via Regularization In this Chapter we provide a framework for designing competitive online algorithms using regularization, a widely used technique in online learning, particularly in online convex optimization. An online algorithm that uses regularization serves requests by computing a solution, at each step, to an objective function involving a smooth convex regularization function. Applying the technique of regularization allows us to obtain new results in the domain of competitive analysis. 3.1 Introduction Competitive analysis and online learning are two important research fields that study the problem of a decision-maker who iteratively needs to make decisions in the face of uncertainty. A typical online problem proceeds in rounds, where in each round an online algorithm is given a request and needs to serve it. We propose a general setting for studying online problems by letting the request sequence be a convex set that varies over time. To be more specific, in each round t {1,..., T }, a feasible convex region P t R n is revealed along with a service cost vector c t. The online algorithm needs to choose a feasible point y t P t and move from y t 1 to y t. The cost of the algorithm at round t is the sum of the service cost c t, y t, and the movement cost y t, y t The goal is to find an online algorithm which is competitive with respect to an offline solution minimizing T t=1 c t, y t + T t=1 y t, y t 1 1. This setting captures many important problems in online computation, e.g., caching [ST85], online covering [AAA + 03, BN05], allocation problem [BBMN11], and hyperplane chasing [FL93]. Consider for example metrical task systems MTS [BLS92] and online set cover [AAA + 03]. In the online set cover problem we are given a set system defined over a universe of elements. The elements appear one by one and need to be covered upon arrival. To formulate the problem within our general setting, the feasible region is initially the whole space 2. Upon arrival of element t, convex set P t is defined to be the intersection of P t 1 and the covering constraint 1 More generally, we allow movement cost which is n i=1 wi yi,t yi,t 1. 2 Without loss of generality, y 0 is initialized to be the origin. 17

28 corresponding to element t. Covering element t by set s means increasing y s from 0 to 1. There is no service cost in the online set cover problem. In MTS, the feasible convex region remains {x R n +, n i=1 x i = 1} throughout all rounds, however, in each round t a new service cost vector c t is given. Thus, both service and movements costs are incurred, yet the feasible region is always defined by a single fixed covering constraint. We proceed to define the online set cover problem with service cost that generalizes both online set cover and MTS. The basic setting is the same as in online set cover, where in each round a subset of the elements needs to be covered. Each chosen set pays an opening cost as in set cover; it also pays a service cost in each of the rounds in which it is open. This means that it can be beneficial for an online algorithm to both add and delete sets from the cover throughout its execution, while paying a movement cost for that. This setting captures both service costs that should be paid if sets facilities remain open and a fully dynamic environment in which both sets and elements arrive and depart over time. Cloud computing is an example of a practical setting captured by the online set cover problem with service cost. The covering constraints indicate the number of servers that are needed in certain regions; movement cost corresponds to the cost of turning servers on and off, and the service cost corresponds to the energy consumption of the servers. Let us now turn our attention to the area of online learning and in particular to the domain of online convex optimization. In an online convex optimization problem, a bounded closed convex feasible set P R n is given as input, and in each round t {1,..., T } a loss vector c t is revealed 3. The online algorithm picks a point y t P and the loss incurred is c t, y t 1. The performance of the online algorithm is usually measured via the notion of regret defined as T t=1 c t, y t 1 min x P T t=1 c t, x. We further elaborate on this setting in Chapter 4. For additional details and techniques, see for example [Zin03, CBL06, HKKA06, FKM05]. An important technique used in online convex optimization is regularization. Generally speaking, regularization is achieved by adding a smooth convex function to a given objective function and then greedily solving the new online problem, in order to obtain good regret bounds. The goal of regularization is to stabilize the solution, that is, avoid drastic shifts in the solution from round to round. Regularization in online learning appears in the literature at least as early as [KW97] and [Gor99], and more modern analyses can be found in [CBL06, Rak09]. Coming back to the world of competitive analysis, it is clear that any online algorithm for our general setting must try to mimic the configurations of an optimal offline solution on one hand, while minimizing the movement cost on the other hand. That is, in a sense, an online algorithm must maintain stability. Therefore, our goal is to apply a regularization approach, similarly to the way it is applied in online learning, in order to obtain good competitive bounds. 3 More generally, a convex loss function f t is given. 18

29 Our Results Technion - Computer Science Department - Ph.D. Thesis PHD We provide a novel framework for designing competitive online algorithms that uses regularization together with a primal-dual analysis. In this framework, the online algorithm s output in each round is chosen to be the solution to an optimization problem involving a smooth convex regularization function. The analysis of the competitive factor is based on recent online primaldual LP techniques developed in competitive analysis see the survey of [BN09a]. We exhibit an online algorithm which is obtained from this framework for the case where P t is defined by covering and precedence constraints 4, and prove bounds on its competitive ratio. Our main result is: Theorem 3.1. For any ɛ > 0, there is an O1 + ɛ log1 + k/ɛ-competitive online algorithm if P t is a covering and precedence polytope, where k is the maximal sparsity of the covering constraints. The proof of the theorem is simple and elegant. We start with the KKT optimality conditions of the regularized problem in each round. These conditions imply a simple construction for a dual solution to the original online problem i.e., before regularization and thus can serve as a lower bound on the optimal offline solution. The theorem yields an alternative algorithm and proof to many previously studied fundamental online problems, e.g., caching, MTS on a weighted star, online set cover, online connectivity, the allocation problem, and more. The importance of Theorem 3.1 is that it allows us to be competitive against a combination of both service and movement costs, while satisfying multiple constraints. Thus, we obtain competitive algorithms for online problems which previously did not seem within reach of polylogarithmic competitive factors. One example is the online set cover problem with service cost mentioned earlier. Note that even though the formulations of classic online problems e.g., online set cover, caching, etc. contain multiple constraints, no service cost is incurred over time. The only exceptions are MTS and finely-competitive paging [BBK99], but both have a formulation the consists of a single fixed covering constraint. Another example is fractional shortest path with time-varying traffic loads, a problem studied extensively in the online learning community with respect to regret minimization. In this problem there is a graph together with a source node s and a target node t. An online algorithm needs to maintain unit flow between s and t. In each round, new edge costs are revealed, and the online algorithm is allowed to both increase and decrease fractionally capacities on the edges so as to maintain the unit flow between s and t. Thus, both movement and service costs are incurred here. A feasible solution is defined by multiple constraints corresponding to the cuts separating s from t. Unlike the learning variant, our online algorithm also pays a cost for increasing the capacities of the edges, capturing the fact that modifying a solution over time incurs an additional cost. 4 A precedence constraint is of the form x y. 19

30 The next issue we consider is rounding online a fractional solution to the online set cover with service cost. Recall that this problem captures both online set cover and MTS which are rounded by very different techniques. Online set cover is rounded by adding to the cover each set independently taking advantage of the fact that variables can only increase. In contrast, rounding MTS uses strong dependence between the variables, taking into advantage the fact that there is only a single fixed constraint. We utilize recent ideas for rounding fractional solutions via exponential clocks see [BNS13], thus allowing us to unify both independent and dependent choices into a single rounding scheme. We obtain the following tight result. Theorem 3.2. There is a randomized algorithm which is Olog S max log m-competitive for the online set cover problem with service costs, where m denotes the number of sets and S max denotes the maximal set size. Related Work: There are several works that discuss the connection between competitive analysis and online learning, such as [BB97, BBK99, BBN10, ABL + 13]. We defer the full overview to Chapter 4. Abernethy et al. [ABBS10] discuss competitive-analysis algorithms using a regularized work function, yet limited to the MTS setting. 3.2 Online Regularization Algorithm In this section we develop our main algorithm which is based on regularization. The algorithm is given in each round a new polyhedron P t R n defined by a set of covering constraints, and a cost vector c t R n +. The goal is to minimize T t=1 c t, y t + T n t=1 i=1 w i y i,t y i,t 1, where in each round t, y t P t. The algorithm is very simple conceptually, and is based on solving in each round a convex optimization problem with a regularized objective function which includes both the previous point y t 1 as well as the current cost vector c t. Thus, our solution in each step is determined greedily and independently of rounds prior to t 1. The convex objective function is obtained by trading off relative entropy plus a linear term with the movement cost. Algorithm 3.1 Regularization Algorithm parameters: ɛ > 0, η = ln 1 + n/ɛ. initialize y i,0 = 0 for all i = 1,..., n. for t = 1, 2,..., T do let c t R n + be the cost vector and let P t be the feasible set of solutions at time t. solve the following convex program to obtain y t, P y t = arg min x P t { c t, x + 1 η n w i x i + ɛ } xi + ɛ/n ln x i. n y i,t 1 + ɛ/n i=1 end for 20

31 The relative entropy function, w u = i w i ln w i + u i w i u i is widely used as a regularizer in online learning problems that involve l 1 -norm constraints such as maintaining a distribution over a ground set of elements. We note that since in each round the objective function is convex and P t is a convex set, then the program P is solvable in polynomial time using standard convex optimization techniques, such as interior-point methods [NN94] Analysis We next analyze the algorithm, thus proving Theorem 3.1. First, we formulate the offline problem as a linear program, and also write its dual that will serve as a lower bound. To demonstrate the ideas we assume that in round t, P t is defined as m t covering constraints of the form i S j,t y i,t 1. In Section we show how to deal with the more general cases in which we have either precedence constraints, or each constraint is of the form i S j,t y i,t r j,t where r j,t N and 0 y i,t 1 for every i. Without loss of generality, we assume that both the optimal solution and our algorithm pay only for increasing the variables, thus the movement cost from y t 1 to y t is equal to n i=1 w i max{0, y i,t y i,t 1 }. The problem formulation appears in Figure 3.1. Let k denote the maximal sparsity of the covering constraints, i.e., k = max { S j,t : 1 t T, 1 j m t }. Our proof is based on deriving the KKT optimality conditions of the regularized problem in each round. The constraints define dual variables that are then carefully plugged into the dual formulation in Figure 3.1 to yield a feasible dual solution. This, in turn, yields a lower bound on the performance of the online algorithm. P min T n t=1 i=1 c i,t y i,t + T n t=1 i=1 w i z i,t D max T mt t=1 j=1 a j,t t 1 and 1 j m t i S j,t y i,t 1 t 1 and i b i,t w i t 1 and 1 i n z i,t y i,t y i,t 1 t 1 and 1 i n b i,t+1 b i,t c i,t j i S j,t a j,t t and 1 i n z i,t, y i,t 0 t 1 and i, j a j,t, b i,t 0 Figure 3.1: Primal and dual LP formulations for the online covering problem. KKT optimality conditions. In each round, Algorithm 3.1 solves a convex program P with m t covering constraints. We define a nonnegative lagrangian variable a j,t for each covering constraint in round t {1,..., T }. The KKT optimality conditions define the following 21

32 relationship between the optimal values of y t and a j,t. Technion - Computer Science Department - Ph.D. Thesis PHD j m t, 1 j m t, 1 i n, 1 i n, y i,t 1 0, 3.1 i S j,t a j,t y i,t 1 = 0, 3.2 i S j,t c i,t + w i η ln yi,t + ɛ n y i,t 1 + ɛ a j,t 0, 3.3 n j:i S j,t y i,t c i,t + w i η ln yi,t + ɛ n y i,t 1 + ɛ = n j:i S j,t a j,t Proof of Theorem 3.1. We first construct a dual solution to the offline problem D using the values that are obtained by the KKT optimality conditions. We then show that the primal and dual solutions obtained are feasible, and finally we prove that the dual we constructed can pay for both movement and service cost of the online algorithm. To construct the dual we simply assign the same dual value obtained by the optimality conditions to a j,t in D, and we define b i,t+1 = w i η ln 1+ɛ/n y i,t +ɛ/n. We claim next that the primal and dual solutions are feasible. From Condition 3.1, all the primal covering constraints are satisfied, and by setting z e,t = max {0, y e,t y e,t 1 } we get a feasible primal solution. To this end note that for any t and i, b i,t+1 b i,t = w i η ln yi,t + ɛ n y i,t 1 + ɛ c i,t a j,t, n j:i S j,t w where the inequality follows from 3.3. Also, 0 b i,t+1 = i ln1+n/ɛ ln 1+ɛ/n y i,t +ɛ/n w i, which follows since 0 y i,t 1, and additionally a j,t 0 as this is a lagrangian dual variable of a covering constraint. Bounding the movement cost at time t: Let M t be the movement cost at time t. As we indicated we charge our algorithm and opt only for increasing the fractional value of the elements. We get, w i M t = η η y i,t y i,t 1 y i,t >y i,t 1 η = η y i,t >y i,t 1 y i,t >y i,t 1 y i,t + ɛ n y i,t + ɛ n wi η ln yi,t + ɛ n y i,t 1 + ɛ 3.5 n a j,t c i,t, 3.6 j i S j,t 22

33 where Inequality 3.5 follows as a b a ln a/b for any a, b > 0, and Equality 3.6 follows from Condition 3.4 since if y i,t > y i,t 1 then also y i,t > 0. Since c i,t, y i,t and a j,t are nonnegative, we then get, η n i=1 y i,t + ɛ a j,t = η n j i S j,t m t a j,t j=1 y i,t + ɛ S j,t n i S j,t η1 + ɛk m t n a j,t. 3.7 j=1 Inequality 3.7 follows from Condition 3.2. Summing over all times t we get that the total movement cost is at most η1 + ɛk n times the value of D. Bounding the service cost: Let S be the service cost paid by the algorithm. We rely on the following property, which follows from Jensen s inequality, to bound S. Lemma Log sum inequality, [CT91]. For any nonnegative numbers a 1, a 2,..., a n and b 1, b 2,..., b n, with equality if and only if a i b i We get, n i=1 a i log a i b i = const. n a i log i=1 n i=1 a i n i=1 b i T n T m t S = c i,t y i,t = a j,t y i,t 1 T n yi,t + ɛ n w i y i,t ln η y t=1 i=1 t=1 j=1 i S j,t t=1 i=1 i,t 1 + ɛ 3.8 n { T m t = a j,t 1 n T w i y i,t + ɛ yi,t + ɛ n ln η n y t=1 j=1 i=1 t=1 i,t 1 + ɛ ɛ T yi,t + ɛ } n ln n n y t=1 i,t 1 + ɛ 3.9 n D 1 n T T w i y i,t + ɛ y i,t + ɛ n η n t=1 ln T i=1 t=1 y i,t 1 + ɛ n ɛ n ln yi,t + ɛ n y i,0 + ɛ 3.10 n D. t= Equality 3.8 follows from Condition 3.4. Equality 3.9 follows from Condition 3.2. Inequality 3.10 follows by telescopic sum and Lemma Inequality 3.11 follows as y i,0 = 0, so for every i, ɛ n ln yi,t + ɛ n y i,0 + ɛ = y i,0 + ɛ yi,0 + ɛ n ln n n y i,t + ɛ y i,0 y i,t, n 23

34 and, T y i,t + ɛ n T ln y i,t + ɛ T n / y i,t 1 + ɛ n t=1 t=1 t=1 both because a b a lna/b for any a, b 0. T y i,t + ɛ T n y i,t 1 + ɛ n = y i,t y i,0, Hence, by choosing ɛ = ɛn k, one can conclude that total cost is at most 1+1+ɛ ln1+k/ɛ times the value of D General Covering Constraints with Variable Upper Bounds The latter proof also holds in the more general case where in round t, P t is defined by m t covering constraints of the form i S j,t y i,t r j,t where r j,t N, and 0 y i,t 1 for every 1 i n. This captures settings like weighted paging, as well as more involved generalizations. To see this, we note that we can replace every box constraint by a set of knapsack cover KC inequalities, as suggested by [CFLP00]. Given a covering LP form, the KC-inequalities for a particular covering constraint i s x i r are defined as follows: for any subset s s of variables, the maximum possible contribution of the variables in s to the constraint is s, and if s < r then at least a contribution of r s must come from variables in s \ s. Therefore, for every s s : s < r we get the valid inequality i s\s x i r s. By adding the KC constraints, the original box constraint becomes unnecessary; consider the first round t in which a variable y i,t exceeds 1. For every s that contains i we know that l s\{i} y l,t r j,t 1, and thus we may reduce y i,t to 1 satisfying all constraints. This reduces both the value of the the service cost and value of the relative entropy since y i,t 1 1, contradicting the minimality of each step in the regularized problem. The rest of the analysis follows along the same lines as above, except for Inequality 3.7 where we get M t η1 + ɛk n r j,t m t j=1 r j,ta j,t, which is clearly bounded by η1 + ɛk n D t, since r j,t 1 for all constraints, even after adding the KC-inequalities. A similar proof also holds for the case where we are given a fixed set of precedence constraints of the form x y, in addition to the varying covering constraints. Such constraints appear, for example, in standard facility location formulations, or in the allocation problem [BBMN11]. Here, we obtain an additional KKT condition as well as new dual variables which correspond to the precedence constraints. It is relatively easy to show that by assigning the new dual variables obtained by the KKT conditions to their corresponding variables in D we obtain a dual solution that can pay for both movement and service cost of the online algorithm. 3.3 Online Set Cover with Service Cost t=1 t=1 In this section we show how to round a fractional solution to the online set cover with service cost. The problem statement is as follows. We are given a set of elements E = {e 1, e 2,..., e n }, and a family of subsets S = {s 1,..., s m }, where each s i E. At each time t {1,..., T } we 24

35 have a set of elements E t E that the algorithm should cover, and a service cost c s,t on each set s S. The algorithm pays the sum of service costs of the sets that are taken at time t. Also, the algorithm pays one unit for allocating each additional set that is not in the solution at time t 1. It is easy to see that the problem can be formulated and solved fractionally using Algorithm 3.1. Next, we show how to round the fractional solution online. We present a randomized rounding algorithm for the fractional solution which is based on exponential clocks. A random variable X is distributed according to the exponential distribution with rate λ if it has density f X x = λe λx for every x 0, and f X x = 0 otherwise. We denote this by X expλ. Exponential clocks are simply competing independent exponential random variables. An exponential clock wins a competition if it has the smallest value among all participating exponential clocks. The rounding is as follows. Algorithm 3.2 Rounding Algorithm 1: parameter: α 0 2: for each s S, choose i.i.d random variable Z s exp1. 3: for each e E, choose i.i.d random variable Z e exp1. 4: at any time t, let y s,t denote the current fractional value of s. 5: for t = 1, 2,..., T do 6: let A t = { s S Zs y s,t } < α. 5 7: let B t = e E { s s = arg min s e s { Zs 8: output A t B t. 9: end for y s,t }, and Zs y s < Z e max{0,1 s e s ys,t} }. First, we observe that the algorithm covers all elements in E t at time t. This is true since for each such element e E t, s e s y s,t 1, and thus B t always contains a set s that covers e. We next prove the following theorem that bounds the performance of the algorithm. Theorem 3.3. The expected cost of the solution is Olog S max times the cost of the fractional solution, by choosing α = logs max, where S max = max { s : s S}. 6 Since the fractional solution is Olog m-competitive with respect to the optimal solution, we get that the integral algorithm is Olog S max log m-competitive, thus proving Theorem 3.2. Proof. We use the following well known properties of the exponential distribution: 1. If X expλ and c > 0, then X c expλc. 2. Let X 1,..., X k be independent random variables with X i expλ i : a min {X 1,..., X k } expλ λ k. b P r[x i min j i {X j }] = λ i λ λ k. 5 In any case of division by 0, we assume the value to be. 6 S max does not has to be known in advance. We may set it to be the maximal set size known so far. 25

36 For simplicity we use y s instead of y s,t when t is clear from the context. Let y e,t = max{0, 1 s e s y s,t}. Note that by the guarantee of the fractional solution y e,t = 0 for any e E t. Bounding the movement cost: We can break down the movement in the fractional solution between round t 1 and t into a sequence of at most m small steps. First we take all the sets whose factional value has increased, S + = {s S : y s,t > y s,t 1 }, and increase their variables, one set at a time. Next, we take all the sets whose factional value has decreased, S = {s S : y s,t < y s,t 1 }, and decrease their variables, one set at a time. Note that the total fractional movement remains the same, and the solution remains feasible throughout all steps. Moreover, the movement in Algorithm 3.2 at each round t is bounded by the total movement incurred during these update steps, since the initial output at time t 1 and the final output at time t are identical, but there is a chance for sets to be added and then removed during the intermediate steps. From now on, we assume that at each round we perform a small step. We would like to show that every step that incurs a fractional movement s for some set s, incurs an expected increase of at most α + 2 s α + 1 e α s in the randomized { solution. { } Let Y e,s denote the Zs minimal clock of element e, other than s: Y e,s = min min s s:e s y, Ze s y e }. By the exponential distribution properties we have, Y e,s expλ, where λ = s s:e s y s +y e 1 y s. Decreasing variables: assume set s decreased its value in the fractional solution from y s,t 1 to y s,t = y s,t 1 s. As a result, every e s possibly increased its variable y e by at most s. Let us first increase these variables: it is easy to see that no set can be newly selected due to this change. Next, we decrease the value of y s. s cannot be selected due to this change, unless it was already selected in the previous round; however, other sets might turn minimal because of Line 7. That is, some set s s may join B t, where s A t 1, B t 1. Let us bound the expected number of such sets, E [ B t \ B t 1 A t 1 ] = Pr e s = s = s = s Pr [new set added to B t due to element e] e s [ Zs y s,t 1 Y e,s < x=α x=α f Y x Pr λ y s,t 1 s + λ e αy s,t 1 s+λ ] Z s, and Y e,s α y s,t 1 s [ Zs y s,t 1 x Z s y s,t 1 s ] dx λ e λx e y s,t 1 s x e y s,t 1 x dx λ y s,t 1 + λ e αy s,t 1+λ. 26

37 Recall that λ = s s y s,t 1 + y e,t. Therefore, y s,t 1 s + λ 1, and the latter expression is maximized when y s,t 1 s = 0, λ = 1, implying, s λ y s,t 1 s + λ e y s,t 1 s+λ α λ y s,t 1 + λ e y s,t 1+λ α s = s s e α 1 s + 1 e α s e α s e α s e α+1 s e α s α + 1 s e α, where the last two inequalities follow as e x x + 1. Hence, the total expected number of new selected sets is bounded by s α + 1 e α s. Increasing variables: Assume set s increased its value from y s,t 1 to y s,t = y s,t 1 + s. As a result, every e s possibly decreased its variable y e by at most s. Let us first increase the value of y s. It is easy to see that no additional set other then s can be selected due to this change. The probability of s to be selected in this round due to Line 6 is [ Z s Pr [s A t, and s A t 1 ] = P r < α y s,t 1 + s Z s y s,t 1 = e α y s,t 1 1 e α s α s. ] = e α y s,t 1 e α y s,t 1+ s The probability of s to be selected at time t due to Line 7 and not Line 6 is, Pr [s B t, and s B t 1, A t ] [ Pr e s Z s Y e,s y s,t 1 + s [ Z s Pr Y e,s y e s s,t 1 + s [ = s f Y x Pr = s = s x=α x=α Z s y s,t 1, and Z s y s,t 1 + s x Z s y s,t 1, and Y e,s α ] Z s α y s,t 1 + s ] Z s y s,t 1 ] dx λ e λx e y s,t 1 x e y s,t 1+ s x dx λ y s,t 1 + λ e αy s,t 1+λ λ y s,t 1 + s + λ e αy s,t 1+ s+λ We note that y s,t 1 +λ 1. The latter expression is maximized when y s,t 1 = 0, λ = 1, implying, s λ y s,t 1 + λ e αy s,t 1+λ λ y s,t 1 + s + λ e αy s,t 1+ s+λ 1 e α s = s 1 s + 1 e α s s e α 1 e α+1 s e α s α + 1 s e α, 1 s + 1 e α s+1 27

38 where the last two inequalities follow as e x x + 1. Next, we decrease the variable y e of every e s each by at most s. As a result, other sets may turn minimal because of Line 7. Each such variable effects the competition only in one element, so the expected number of selections due to its decrease is bounded by 1 α +1 s e α, and the total expected number of selections is bounded by s α + 1 s e α. We proved that every decrease s in set s imposes an increase of at most s α+1 e α s in the integral solution. In addition, every increase s in set s imposes an increase of at most α s + 2 s α + 1 e α s in the integral solution. Since the total fractional decrease is bounded by the total increase, then by choosing α = lns max, the total movement in the randomized algorithm is at most 4 lns max + 3 times the fractional movement. Bounding the service cost: We claim that at any round, Algorithm 3.2 outputs every set s S with probability of at most α + s e α y s,t. Note that the algorithm is memoryless in the sense that the output at each round depends only on the current fractional values. Therefore, we may observe the current rounded solution as an offline rounding of the current fractional solution. Similarly to [BNS13], we can prove the claim holds. As a result, the expected service cost of the randomized solution is at most α + s e α times the service cost of the fractional solution, yielding lns max + 1 approximation by choosing α = lns max. 28

39 Chapter 4 Unified Algorithms for Online Learning and Competitive Analysis In this chapter we introduce a single unified algorithm which by parameter tuning, interpolates between optimal regret for learning from experts in online learning and optimal competitive ratio for the metrical task systems problem MTS in competitive analysis, improving on previous results. The algorithm also allows us to obtain new regret bounds against drifting experts, which might be of independent interest. Moreover, our approach allows us to go beyond experts/mts, obtaining similar unifying results for structured action sets and combinatorial experts, whenever the setting has a certain matroid structure. 4.1 Introduction Online learning and competitive analysis are two widely studied frameworks for online decisionmaking settings. While problems studied under these two frameworks are often rather similar, there has not been much research on general connections between the two. We note that one particular setting, known as learning from experts in the online learning framework and metrical task systems MTS with a uniform metric in the competitive analysis framework, has been jointly studied in [BB00]. In particular, the latter paper showed how certain algorithms, based on tuning some parameters, were able to interpolate between a reasonable regret bound and a reasonable competitive ratio. The interpolation was performed using the notion of α-unfair competitive ratio, which forces the policy we compete with to pay α times more for the movement cost. In the limit, α goes to infinity, and thus the competing policy becomes essentially static, and the setting becomes reminiscent of online learning. While these are important and interesting results, they are specific to the setting of experts/mts. In modern online learning, learning from experts is now known to be a very special case of much more general settings, such as combinatorial experts see Chapter 5 in [CBL06], and online convex optimization. Thus, a natural question is whether unifying analysis and 29

40 algorithms exist in such cases as well. Technion - Computer Science Department - Ph.D. Thesis PHD Our Results We contribute to the joint study of the frameworks of online learning and competitive analysis by providing a novel unified algorithmic approach, based on the regularization technique introduced in Chapter 3 along with primal-dual analysis. First, we show that in the experts/mts setting, our algorithm attains both optimal regret and an optimal competitive ratio unlike the results in [BB00], which do not obtain optimal competitive ratios, as well as optimal results for settings in between, such as shifting and drifting experts. The regret bound for drifting experts is new, to the best of our knowledge, and might be of independent interest. Furthermore, we show how our approach can be applied to more general, structured learning/competitive analysis settings, which satisfy matroid constraints. Matroids are extremely useful combinatorial objects that play an important role in combinatorial optimization since the pioneering work of Edmonds in the 1970s [Edm70, Edm71], and they naturally capture structured action sets such as spanning trees and sparse subsets. In the context of online convex optimization, our results may be viewed as online learning over the matroid base polytope. As in the experts/mts case, we also get regret bounds against actions which shift or drift a limited amount. Moreover, this can be done in a fine-grained way which respects the problem structure e.g. competing with spanning trees where only a bounded number of individual edges can change over time. Our algorithms are straightforward, and the various performance guarantees are all obtained just by tuning two parameters. In fact, our online algorithms have an equivalent form, based on recent primal-dual LP techniques developed in competitive analysis. The alternative algorithms are more explicit, and use multiplicative updates. This equivalence emphasizes, once again, the strong connections between the fields of online learning and competitive analysis. In this Chapter we focus on the regularization approach, we shall study the multiplicative updates approach in Chapter 5. A key technical feature in our algorithms is that we shift weights by an additive constant before applying regularization or multiplicative updates, thus deviating from the standard approach taken in multiplicative updates [LW94, FS97], weight sharing algorithms [HW98], and online learning algorithms involving regularization, such as Follow-the-Regularized-Leader see book by Shalev-Shwartz [SS11] and survey by Hazan [Haz09]. As a result, in intermediate steps weights can have negative values a situation uncommon in both approximation and online algorithms. We emphasize that although some of the settings we discuss might also be treatable by more conventional online learning tools, we obtain relevant algorithms naturally from our framework, rather than requiring a case-by-case construction which is common for online learning over structured sets, see [KWK10]. Overall, we hope that our work on combining online learning and competitive analysis 30

41 provides a step towards bringing these two rich and mature fields closer together. We also hope that the tools we develop may lead to practical algorithms which combine the advantages of both worlds. On one hand, the practical performance and usefulness of online learning, and on the other hand the robustness to highly dynamic and state-dependent environments of competitive analysis. Related Work: There are several works related to ours, other than [BB00] which we have already discussed. However, to the best of our knowledge, none of them attempt to provide a single algorithmic approach connecting online learning with competitive analysis. For example, [BBN10] show an analysis of experts and the unfair MTS problem, using a primal-dual approach similar to ours. However, a different algorithm and analysis is applied to each of the problems, and the algorithms are considerably more complex, and do not scale as well to the more general setting of matroids. [BCK02] discuss algorithms for decision making on lists and trees, for both a competitive analysis setting and an online learning setting, and show how they can be combined using the hedge algorithm [FS97] to provide simultaneous guarantees. Papers such as [BBK99] and [ABBS10] discuss competitive-analysis algorithms derived using tools from online learning, e.g., regularization. Other works attempt to strengthen the standard regret framework of online learning, such as learning with global cost functions [EDKMM09] and using more adaptive notions of regret as discussed above. The matroid settings that we consider partially overlap with those of [KWK10], which were studied in the standard online learning framework. For these settings, we obtain similar optimal results for online learning, without the need for case-by-case constructions, and again get an interpolation between online learning and competitive analysis. Organization: The rest of the chapter is organized as follows. We first introduce the frameworks of online learning and competitive analysis in Section 4.2. Then, we present the unified algorithms and specify our results in Section 4.3. In Section 4.4 we explain how we derived the algorithm of the simpler case of experts/mts, and analyze it. The algorithm derivation in the matroid case is conceptually similar but technically more complex, and is provided in Section Preliminaries: Online Learning and Competitive Analysis We begin by describing online learning and competitive analysis, as applied to the settings we consider. To facilitate our unified analysis, we will strive to use the same notation and terminology for both settings, sometimes using conventions from one to describe the other. Online learning in the experts setting proceeds in T rounds. We consider a finite action set E, where E = n. In the beginning of each round t, the decision-maker maintains a distribution vector y t 1 over E which can be seen as a randomized policy over picking one out of n experts at that round. Then, a cost vector c t is revealed, and the decision-maker incurs the expected cost y t 1, c t. Vector c t may be generated in an arbitrary, possibly adversarial way, and we only assume that each vector s entry is bounded in [0, 1] which can be easily relaxed by scaling. The 31

42 decision-maker then chooses a new vector y t for the next round. The goal of the decision-maker is to minimize regret, defined as Technion - Computer Science Department - Ph.D. Thesis PHD T y t 1, c t t=1 T y, c t, where y = arg min {w 0, w 1 =1}{ T t=1 y, c t }. For this bound to be non-trivial, we expect a regret which grows sublinearly with T. A more ambitious goal studied in the literature e.g. [HW98] is tracking the best expert, or regret against shifting experts. In the latter case, we wish to minimize T t=1 y t 1, c t T t=1 y t, c t, where y0,..., y T 1 is the best sequence of distributions which change at most k times i.e. yi y i+1 for at most k values of i. In this work, we will in fact study a more general framework, which we call drifting experts, in which the regret is against the optimal sequence y0,..., y T 1 such that T t=1 1 2 y t yt 1 1 k. This generalizes shifting experts, since any k-shifting sequence is also a k-drifting sequence. We are not familiar with existing explicit results in the literature for drifting experts. In the more general framework that we consider here, rather than just picking single elements of E, we assume that the decision-maker can pick subsets of E, from a family of subsets I which has some structure. Such settings were considered in several online learning papers, such as [KV05] and [KWK10]. For example, consider web advertising, where we can place exactly s ads on some website at any given timepoint, out of n ads overall. This can be naturally modeled as an online learning problem, where I is all of E s subsets of size s, and we want to compete against the set of best s ads in hindsight. As another example, consider online learning of spanning trees, which is relevant in the context of communication networks. In that case, E is a set of edges in a graph, and C is the convex hull of all subsets of edges which form a spanning tree. The goal in these settings is to minimize regret with respect to the best single element of y C in hindsight, namely T t=1 t=1 y t 1, c t min y C T y, c t. It turns out that the latter two settings, the basic experts setting, as well as many other settings, satisfy a matroid structure. Matroids are extremely useful combinatorial objects 1. We refer the reader to Section 2.2 for a formal definition and brief description of the important matroid properties for further details see e.g., [Sch03]. Given matroid M = E, I, recall that a subset B of E is called a base of E if B is a maximal independent subset of E. A well known fact is that all matroid bases have the same size, called the rank of E, denoted by r E or r for short. The base polytope of a matroid M is defined as the convex hull of the incidence vectors t=1 of the bases of M. We refer to this polytope as BM. We focus on algorithms which work over bases of matroids, interpolating online learning and 1 For instance, they play a crucial role in the analysis of greedy algorithms, and have deep connections to submodular functions which have recently gained popularity in machine learning. 32

43 competitive analysis, and obtaining results in intermediate settings such as competing against shifting and drifting targets. For computational efficiency, our algorithms maintain a solution y t BM rather than a distribution over possibly-exponentially large I. This is known as a fractional solution. Since all vertices of BM are matroid bases, any such fractional solution always corresponds to a valid distribution over the bases. In case an integral matroid base is required, we can use the fractional solution to actually sample from a consistent distribution on the bases of the matroid. Such a procedure is known as rounding. An example of a relevant rounding technique is pipage rounding, which is fast and easy to implement see [CCPV11] for a description. We remark that techniques such as pipage rounding are only applicable when considering learning scenarios in which there are no switching costs. The case in which the algorithm is also paying for movements is considerably more involved. We address this issue in Chapter 5. We now turn to describe the matroid general setting in the competitive analysis framework. We first note that the analogue of the experts setting is known as the metrical task system MTS problem on a uniform metric, first formulated in [BLS92]. MTS abstracts many important online decision problems, e.g., process migration. In the online setting, the decision-maker sequentially needs to choose a vector y t in a high-dimensional simplex and incur costs depending on arbitrarily-chosen cost vectors. However, there are some important differences. First, the decision-maker pays a movement cost for changing from y t 1 to y t, which equals 1 2 y t y t 1 1, and not only a cost depending on c t known as the service cost. Second, the service cost incurred in round t is defined to be y t, c t, and not y t 1, c t. In other words, the decision-maker is allowed to first see the cost vector c t, and only then choose the new vector y t and pay accordingly. This is called 1-lookahead. In contrast, in the experts setting the decision-maker first pays the cost y t 1, c t and only then chooses a vector y t. This is called 0-lookahead. We decompose the total cost paid by the decision-maker into the service cost S 1 with 1-lookahead and the movement cost M as follows: S 1 = T y t, c t, M = t=1 T t=1 1 2 y t y t 1 1. To motivate these notions, we note that in the context of say MTS, one thinks of y t as a distribution over n possible states the algorithm might be in, 1 2 y t y t 1 1 as the cost associated with changing that state, and c t as specifying the cost of processing a task in each of the n states. Because of the movement cost, the ability of getting the cost c t in advance does not trivialize the problem. To allow comparison to the experts setting, we also define, S 0 = T y t 1, c t t=1 as the service cost of an algorithm whose action at round t does not depend on c t. The framework 33

44 naturally extends to the context of matroids - the decision-maker needs to maintain over time a base in a matroid M = E, I. Another important difference, in comparison with the online learning framework, is the performance measure. In competitive analysis the goal is not to compete against the best fixed element in BM, but rather against the optimal offline sequence y1,..., y T, which is a solution to min t=1,...,t y t BM T y t, c t + t=1 T t=1 1 2 y t y t 1 1. In other words, y1,..., y T is the optimal sequence of the decision-maker s choices, had she known all the cost vectors in advance, and could have solved the problem offline. Clearly, this is a much more ambitious goal than minimizing the regret with respect to a fixed y. We let S 1 = T yt, c t, M = t=1 T t=1 1 2 y t y t 1 1 denote the service cost and the movement cost of this optimal sequence, and let opt = S 1 + M denote the total cost. Thus, the competitive ratio is defined as the minimal c 1, such that for any sequence of cost vectors, S 1 + M c opt + d, where d is a constant independent of T. In competitive analysis, c is usually strictly greater than one, and is independent of T. For example, in the MTS setting the attainable competitive ratio is known to be Θln n [BLS92]. A crucial refinement of competitive ratio, which we use for providing a unified analysis of the two settings, is the notion of α-unfair competitive ratio, for α 1. This notion modifies the sequence y1,..., y T we compete against. Rather than defining it as the sequence minimizing T t=1 y t, c t + T t=1 1 2 y t yt 1 1, we define it as the sequence which is the solution to: min t=1,...,t y t BM T y t, c t + α t=1 T t=1 1 2 y t y t 1 1. The optimal cost of the above is denoted as optα. In words, the sequence we compete against pays α times more than the decision-maker for movement. The case α = 1 corresponds to the standard competitive analysis setting. For α > 1, the setting becomes easier, because it encourages the competing sequence to move less. In the limit α, the optimal sequence necessarily satisfies y1 =... = y T, and the setting becomes reminiscent of online learning where we compare ourselves against a fixed y although the 1-lookahead and the movement cost features remain. The α-unfair competitive ratio has been proposed in [BKRS00], and used to show connections between online learning and competitive analysis for experts/mts in [BB00]. To facilitate our regret bounds for k-drifting sequences, we let opt k denote the cost of the 34

45 best k-drifting sequence of valid vectors in BM which minimizes T t=1 y t 1, c t. It is easy to verify that, optα opt k + αk, since the optimal k-drifting solution incurs a cost of at most opt k + αk in the α-unfair setting. Another simple observation, based on the boundedness of c t, is that, S 0 = T y t 1, c t t=1 T y t, c t + t=1 T t=1 1 2 y t y t 1 1 = S 1 + M. Combining these two, we get the following useful observation relating the online learning and competitive analysis settings 2 : Observation 4.1. Suppose we have an algorithm in the α-unfair setting whose total cost is at most c optα + d, then we have an online learning algorithm with total cost S 0 S 1 + M c optα + d c opt k + c α k + d Algorithms and Results We first present Algorithm 4.1 for the experts/mts setting. The main idea behind it is to apply multiplicative updates to a shifted value of the weights. As we have seen in the previous section, this can be easily achieved via regularization: in each round t we maintain n variables y 1,t,..., y n,t denoting the current distribution over servers/experts. Consider the cost vector c 1,t,..., c n,t at round t. Then all we need to do is to solved a regularized convex program. Algorithm 4.1 Experts/MTS Algorithm regularized formulation Parameters: α 1,η > 0 Initialize y i,0 = 1 n for all i = 1,..., n. for t = 1, 2,... do Let c 1,t,..., c n,t be the cost vector at time t and let P be the probability simplex. Solve the following convex program to obtain y t, { y t = arg min c t, x + 1 n } 1 xi + 1 e x i + x P η e ηα ln ηα 1 1 y i,t x i. 4.2 e ηα 1 end for i=1 The use of regularization prevents us from the need of explicitly normalizing the variables after every multiplicative update. In fact, Algorithm 4.1 has an equivalent explicit form using the classic multiplicative updates approach, as shown in Algorithm 4.2. Given the cost vector 2 An observation of a similar flavor was also given in [BB00]. 35

46 c 1,t,..., c n,t at round t, we apply a shifted multiplicative update to the weights, that is, we set the new variables so that for every i, y i,t + Then, we carefully fix the shifted weights induce a valid distribution. 1 e ηα 1 = 1 y i,t 1 + e ηα e η c i,t. 1 y 1,t + 1 e ηα 1,..., y n,t + 1 e ηα 1 to ensure they Algorithm 4.2 Experts/MTS Algorithm learning-style formulation Parameters: α 1,η > 0 Initialize y i,0 = 1 n for all i = 1,..., n. for t = 1, 2,... do Let c 1,t,..., c n,t be the cost vector at time t. Find the smallest value a t e.g., using binary search such that n i=1 y i,t = 1, where { } 1 y i,t = max 0, y i,t 1 + e ηα e ηc 1 i,t a t 1 e ηα 1 end for We prove the following theorem. Theorem 4.2. For any α 1, η > 0, Algorithm 4.1 attains S 1 optα + lnn η, M 4.3 n 1 + e ηα ηoptα + ln n In particular, for α regret against a fixed distribution, by Observation 4.1, we get S 0 S 1 + M 1 + ηopt + lnn η By setting 3 α = lnn/η and using Observation 4.1, we also obtain S η opt k + k + 1 lnn η + ln n k + 1 lnn. 4.6 Let us try to understand the bounds in the theorem. For Equation 4.4, if we set α = 1 and η = lnn+ln ln n, we get the best known bound for MTS on uniform metrics [BBN10, ABBS10]. In particular, the bound is better than that obtained by the analysis of [BB00], who also interpolate between experts and MTS. For Equation 4.5, if we set η = lnn opt, then our analysis yields a virtually optimal regret bound of 2 opt lnn+lnn for the experts setting. Moreover, it is not hard to see that when α, our algorithm reduces to the canonical multiplicative updates algorithm see [CBL06]. Equation 4.6 is a regret bound with respect 3 This value is chosen for simplicity, and is not the tightest possible. 36

47 to the optimal k-drifting sequence. Setting η = k + 1 lnn/3opt k, we get an asymptotically optimal regret of less than, Technion - Computer Science Department - Ph.D. Thesis PHD k + 1 lnnopt k + 3k + 1 lnn for this problem which matches the lower bound for k-shifting experts. The lower bound can be derived, for example, by dividing the T rounds into k disjoint epochs of T/k rounds, and using the standard regret lower bound construction in each one of them. See, also, [CBL06, Corrolary 4.2 and Section 5.2]. We emphasize that while there exist previous results for the case of shifting experts, here we provide an algorithm and analysis for the strictly more general setting of drifting experts 4. We note that although opt and opt k may not be known in advance in order to tune η, one can use a standard doubling trick to circumvent this or obtain bounds in which these quantities are replaced by the number of rounds T, [CBL06]. The general case of a matroid M = E, I is handled by Algorithm 4.3, which works similarly to Algorithm 4.1. The algorithm maintains a distribution vector y t BM over the elements of E. Initially, we pick y 0 to be a vector in BM such that e E y ln 0,e + 1 e ηα 1 is minimized. It turns out that such initialization ensures that min e E {y 0,e } 1 γm, where γ M is the matroid density, and this is best possible lower bound see Claim 4.5. Then, in each round all we have to do is solve a regularized convex program, which implicitly encapsulates an update step that decreases the value y e,t, followed by a non-trivial normalization. Algorithm 4.3 Matroid Algorithm regularized formulation Parameters: α 1,η > 0 Start with a fractional base y 0 BM { such that, y 0 = arg min x BM } e E x ln e + 1 e ηα 1. for t = 1, 2,... do Let c 1,t,..., c n,t be the current cost vector. Solve the following convex program to obtain y t, P y t = arg min x BM end for { c t, x + 1 η } 1 xe + 1 e x e + e ηα ln ηα 1 1 y e E e,t x e. e ηα The performance guarantee of the algorithm is provided below. We note that it is a natural generalization of Theorem 4.2, as the expert setting corresponds to a matroid with re = 1, γ M = n. Theorem 4.3. For a matroid M = E, I, and any α 1, η > 0, Algorithm 4.3 attains 4 There do exist results for regret against drifting targets in the l 2 norm [Zin03]. However, these results do not require a significant change in the algorithm. In contrast, the standard multiplicative updates algorithm can be shown to fail against l 1 drift, so a new algorithm is indeed required. 37

48 S 1 optα + re ln γ M 4.8 η M 1 + n re + 1 e ηα ηoptα + r E ln γ M For α regret against a fixed distribution, by Observation 4.1, we get S 0 S 1 + M 1 + ηopt + re ln γ M η + re ln γ M By setting α = ln 1 + n re + 1 lnn re + 1 /η and using Observation 4.1, S η opt k + For Equation 4.10, if we set η = a regret bound of, 2k + re lnn re + 1 η + 3 k + re lnn re re lnγm opt, then our analysis yields for the experts setting 2 re lnγ Mopt + re lnγ M. For example, for s-sparse subsets, this corresponds to O s lnn/sopt + s lnn/s, and for spanning trees over E edges and V vertices, we get O V ln E V + 1opt + V ln E V + 1. This corresponds to the results of [KWK10], and moreover, our latter result is for spanning trees over general graphs rather than complete graphs. Equation 4.11 provides a version for k-drifting sequence. Setting η = k + re lnn re + 1/opt k, we get regret of less than 4 k + re lnn re + 1opt k + 3 k + re lnn re + 1 for this problem. Since the drift is measured with respect to the l 1 norm over BM, it naturally captures the structure of the problem. In particular, the drift is measured with respect to changes in individual elements in the s-subsets or individual edges in the spanning trees. 4.4 Proofs and Algorithm Derivation the Experts/MTS Case In this section, we explain how we derive and analyze our algorithms, focusing on the simpler case of experts/mts Algorithm 4.1 and Theorem 4.2. The derivation is based on a primal-dual linear programming analysis. It starts from a very 38

49 P min T n t=1 i=1 c i,t y i,t + T n t=1 i=1 α z i,t D max T t=0 a t Technion - Computer Science Department - Ph.D. Thesis PHD t 0 n i=1 y i,t = 1 i and t = 0 a 0 + b i,1 0 t 1 and expert i z i,t y i,t y i,t 1 t 1 and i b i,t+1 b i,t + c i,t a t t and expert i z i,t, y i,t 0 t 1 and i 0 b i,t α Figure 4.1: The primal and dual LP formulations for the MTS problem. simple LP formulation Figure 4.1 of the optimal offline α-unfair solution. Note that in order to charge for α T t=1 1 2 y t y t 1 1, it suffices to charge only on increasing coordinates. Thus, we will charge both the optimal solution and our algorithm for increasing variables. Figure 4.1 also contains a description of the dual program D. This program plays a central role in our analysis. We define D as the value of the dual program. It is well known that D is a lower bound on the value of any primal solution. The analysis of Algorithm 4.1 is based on deriving the KKT optimality conditions of the regularized problem in each round. As demonstrated in the previous Chapter, the primal LP constraints define dual variables that are then carefully plugged into the dual formulation in Figure 4.1 to construct a feasible dual solution. This, in turn, yields a lower bound on the performance of the online algorithm, that will eventually lead to Equation 4.4 in Theorem 4.2. The other bounds in the theorem are simple corollaries obtained by a direct calculation. KKT optimality conditions In each round, Algorithm 4.1 solves a convex program 4.2 with a single equality constraint. We define a lagrangian variable a t for that constraint in round t {1,..., T }. The KKT optimality conditions define the following relationship between the optimal values of y t and a t. n y i,t 1 = 0, 4.12 i=1 1 i n, c i,t + 1 yi,t η ln + 1 e ηα 1 y i,t a t 0, 4.13 e ηα 1 1 i n, y i,t c i,t + 1 yi,t η ln + 1 e ηα 1 y i,t a t = e ηα 1 We first construct a dual solution to the offline problem D using the values that are obtained by the KKT optimality conditions. We then show that the primal and dual solutions obtained are feasible, and finally we prove that the dual we constructed can pay for both movement and service cost of the online algorithm. To construct the dual we simply assign the same dual value obtained by the optimality conditions to a t in D, and we define b i,t+1 = 1 η ln 1+ 1 e ηα 1. y i,t + 1 e ηα 1 39

50 Primal P is feasible: By definition, both initialization of y 0 and update of y t are feasible. by setting z i,t = max {0, y i,t y i,t 1 } we obtain a feasible primal solution. Technion - Computer Science Department - Ph.D. Thesis PHD Dual D is feasible: Since initially y i,0 = 1 n, then we can set for each i, b i,1 = 1 η ln e ηα 1 1 n + 1 e ηα 1 n = α ln eηα +n 1 η n, a 0 = α + ln eηα +n 1 and we get that the first dual constraint is feasible. Also, by the definition of dual variable, note that for any t 1 and i, b i,t+1 b i,t = 1 yi,t η ln + 1 e ηα 1 y i,t c i,t a t, e ηα 1 where the inequality follows from Finally, 0 b i,t+1 = 1 η ln follows since 0 y i,t 1 Primal-dual relation: η 1+ 1 e ηα 1 y i,t + 1 e ηα 1, α, which Let D t be the change in the cost of the dual solution at time t. We bound the cost of the algorithm in each iteration by the change in the cost of the dual. Bounding the movement cost at time t: Let M t be the movement cost at time t. As we indicated we charge our algorithm and optα only for increasing the fractional value of the elements. We get, 1 M t = η η y i,t y i,t 1 i:y i,t >y i,t 1 η = η η = η i:y i,t >y i,t 1 i:y i,t >y i,t 1 n i=1 1 + y i,t + n e ηα 1 y i,t + y i,t + 1 e ηα 1 1 e ηα 1 1 e ηα 1 a t η 1 yi,t + 1 e ηα 1 y i,t e ηα 1 η ln 4.15 a t c i,t 4.16 a t n e ηα D t Inequality 4.15 follows as a b a ln a/b for any a, b > 0. Equality 4.16 follows from Condition 4.14 since if y i,t > y i,t 1 then also y i,t > 0. Inequality 4.17 follows as y i,t,c i,t and a t are nonnegative. Inequality 4.18 follows from Condition Summing over all times t 40

51 we get, Technion - Computer Science Department - Ph.D. Thesis PHD Bounding the service cost: algorithm. S 1 = = = T y t, c t = t=1 a t t=1 i=1 T n M η 1 + e ηα D t 1 t=1 n = η 1 + e ηα D + α ln eηα +n 1 1 η n n = η 1 + e ηα D e ηα 1 η 1 + D T t=1 i=1 T n y i,t 1 T η t=1 T a t 1 n T η i=1 t=1 t=1 T a t 1 η t=1 i=1 n e ηα 1 n c i,t y i,t i=1 n e ηα 1 n n e ηα ln e ηα + n 1 ln n. We are now ready to bound the service cost S 1 paid by the n yi,t + 1 e y i,t ln ηα 1 y i,t e ηα 1 y i,t + n T y i,t y i,t 1 t=1 T = a t 1 n y i,t y i,0 + η t=1 i=1 T a t = D + α ln eηα +n 1 η t=1 D + lnn η. 1 yi,t + 1 e e ηα ln ηα 1 1 y i,t e ηα 1 y i,t + ln 1 e ηα 1 y i,0 + 1 e ηα 1 e ηα 1 T t=1 ln y i,t + 1 e ηα 1 y i,t e ηα 1 e ηα 1 n i=1 y ln i,t + 1 e ηα 1 n i=1 ln y i,0 + 1 e ηα 1 n η e ηα Equality 4.19 follows from Condition Equality 4.20 follows from Condition Inequality 4.21 follows by telescopic sum, and because a b a lna/b for any a, b 0. Inequality 4.22 follows since n i=1 y i,0 = n i=1 y i,t = 1, and the fact that y 0 minimizes the expression n i=1 ln x i + 1 e ηα 1 over the simplex, as ln x + 1 e ηα 1 is convex in x. 41

52 P min T T c e,t y e,t + α z e,t D max T rea t rsa S,t t=1 e E t 0 and S E t 0 t=1 e E y e,t r S e E a 0 a S,0 + b e,1 = 0 e S S e S y e,t r E t 1 and e E b e,t+1 = b e,t + c e,t a t + e E t 1 and e E z e,t y e,t y e,t 1 t 1 and e E b e,t α t and e E z e,t 0 t, e E, S E b e,t, a S,t, a t 0 Figure 4.2: The primal and dual LP formulations for the Matroid problem. 4.5 Proofs and Algorithm Derivation the Matroid Case t=0 S E a S,t S e S In this Section we analyze Algorithm 4.3 that works for the general matroid setting, and prove Theorem 4.3. We start from an LP formulation Figure 4.2 of the optimal α-unfair solution together with the dual program D. Note that the first two primal constraints are a characterization of the matroid base polytope see [Sch03], Chapter 40 and ensure that in each round y t BM. For convenience, we replace the matroid rank constraint e E y e,t = r E with two inequality constraints. In addition, we note that the nonnegativity constraint y e,t 0 is redundant, since for every e, t the matroid rank constraints on subsets E and E \ {e} imply, y e,t = e E y e,t e E\{e} y e,t r E r E \ {e} 0. As in the case of experts/mts uniform matroid, Algorithm 4.3 has an equivalent explicit form using the classic multiplicative updates approach. The idea is to apply a shifted multiplicative update to the weights, followed by a sequence of up to n normalization steps. The normalization phase is more complex than in the expert setting as the matroid polytope consists of an exponential number of constraints, however it can be achieved due to the special properties of matroids. For all that, we shall analyze the algorithm using the regularization approach. On Chapter 5, we introduce the explicit version of multiplicative updates over matroid constraints, along with corresponding analysis. The analysis of Algorithm 4.3 is based on deriving the KKT optimality conditions of the regularized problem in each round. The primal LP constraints define dual variables that are carefully plugged into the dual formulation in Figure 4.2 to construct a feasible dual solution. This, in turn, yields a lower bound on the performance of the online algorithm, that will eventually lead to Inequalities 4.8 and 4.9 in Theorem 4.3. The other bounds in the theorem are simple corollaries obtained by a direct calculation. KKT optimality conditions In each round, Algorithm 4.3 solves a convex program 4.7 with a covering constraint for every S E, and an additional packing constraint. Respectively, 42

53 we define a lagrangian variable a S,t for each covering constraint in round t {1,..., T }, and a lagrangian variable a t for the packing constraint. The KKT optimality conditions define the following relationship between the optimal values of y t and a S,t, a t. S E, y e,t r E = 0, 4.23 e E y e,t r S 0, 4.24 e S S E, a t, a S,t S E, a S,t e S y e,t r S = 0, 4.26 e E, c e,t + 1 ye,t η ln + 1 e ηα 1 y e,t a t + a S,t = 0, 4.27 e ηα 1 S e S For our analysis, we need the following properties of matroids. Claim 4.4. For any matroid M = E, I, and any set S E: If re \ S < re then: Proof. If r E \ S = r E, S n re + 1 = If r E \ S < r E, S rs n re + 1 S re re \ S n re + 1 S S + E \ S r E \ S + 1 S n re + 1 = S S + 1 re E \ S S S + 1 re r E \ S S S + 1 < 1 r S. re r E \ S 4.30 r S, 4.31 where Inequality 4.30 follows as k 1 / k x x for any 1 x k 1. Inequality 4.31 follows by the submodularity of the matroid rank function. 43

54 Claim 4.5. Let y 0 be the fractional base of matroid M = E, I, which minimizes the expression e E y ln e + 1 e ηα 1. Then for each e E: Technion - Computer Science Department - Ph.D. Thesis PHD y e,0 1 γ M. Proof. Assume for contradiction that there exists e E such that y e,0 < 1 γm, and let T denote the minimal tight set which contains e, with respect to y 0. A set S E is tight with respect to a fractional solution y if f S y f = r S, and it is well known that if S 1 and S 2 are tight, then so are S 1 S 2 and S 1 S 2. The proof follows immediately from submodularity of the matroid rank function: rs 1 S 2 + rs 1 S 2 y f + y f = y f + f S 1 S 2 f S 1 S 2 f S 1 f S 2 y f = rs 1 + rs 2 rs 1 S 2 + rs 1 S 2. The existence of a minimal tight set T is guaranteed from the latter property and the fact that E is tight by simply taking the intersection of all tight sets which contain e. By the definition of the matroid density γ M, we have that y e,0 < rt T. Then, since f T y f,0 = r T, there exists e T such that y e,0 > rt T > y e,0. Therefore, we can increase y e,0 by a small amount ε, and decrease y e,0 by a small amount ε, and remain in the matroid base polytope. This is true as all of the matroid constraints remain satisfied. For any T such that e T the value of f T y f,0 can only decrease, implying that its corresponding matroid constraint remains satisfied. For any T such that e, e T the value of f T y f,0 remains unchanged, implying that the corresponding matroid constraint is satisfied. Finally, for any T such that e T and e T, the set T is not tight otherwise T T T would be tight, in contradiction to the minimality of T, so after increasing the value of y e,0 by sufficiently small ε, the corresponding matroid constraint remains satisfied. We note that the function ln x + 1 e ηα 1 is strictly convex in x when x > 0, and thus by choosing ε 1 2 ye,0 y e,0 and increasing ye,0 and decreasing y e,0 accordingly, we obtain a new feasible solution so that the value of e E y ln e,0 + 1 e ηα 1 strictly decreases. This is in contradiction to the fact that y 0 minimizes the latter expression. Claim 4.6. For any matroid M = E, I and element e E, the lagrangian dual variables satisfy a t S e S a S,t 0. Proof. Every y B M satisfies e E y e = r E. Therefore, we can replace each matroid packing constraint in B M with a covering constraint to obtain the following alternative 44

55 formulation of B M: S E, y e = r E, 4.32 e E y e r E r E \ S e S Observe that when S = E Constraint 4.33 requires e E y e r E, while Constraint 4.32 requires equality. Next, we prove that when minimizing objective function 4.7 in Algorithm 4.3 over all x B M, Constraint 4.32 is redundant. That is, minimizing 4.7 subject to constraints 4.33 alone, gives a solution y which satisfies equation 4.32, and since 4.7 is strictly convex, y is unique and equals the solution obtained in Algorithm 4.3. As a result, we can view the optimization problem in each iteration t 1 as convex minimization subject to solely to covering constraints, and by defining a lagrangian variable β S,t for each covering constraint we derive the following KKT optimality conditions. S E, y e,t r S 0, 4.34 e S S E, β S,t S E, β S,t e S y e,t r S = 0, 4.36 e E, c e,t + 1 ye,t η ln + 1 e ηα 1 y e,t β S,t = e ηα 1 S e S Hence, combining Conditions 4.27 and 4.35, 4.37 we conclude that for any e E, a t S e S a S,t = S e S β S,t 0. To prove property 4.32 we denote S E as the subset of all tight elements with respect to y t. That is, e S if there exists S E such that Constraint 4.33 holds with equality. Since the matroid rank function is submodular, then the function ˆrT = re re \ T is supermodular. Hence, similarly to the statement in the proof of Claim 4.5, it is easy to see that if two subsets T 1 and T 2 are tight, then so are T 1 T 2 and T 1 T 2, and in particular S is tight, implying e S y e,t = re re \ S. Furthermore, by the minimality of y t, we note that y e,t y e,t 1 for any e S. This is true since otherwise, as e is not tight, we can decrease y e,t by a sufficiently small amount without violating any covering constraint, and consequently decrease the objective value, since the expression c e,tx e + 1 η x e + 1 e ηα 1 ln x e is x e + 1 e ηα 1 y e,t e ηα 1 45

56 strictly increasing in x e for any x e > y e,t 1. Therefore we get, as required. y e,t = y e,t + e S e E e E\S y e,t = re re \ S + re re \ S + e E\S y e,t e E\S y e,t 1 re re \ S + re \ S = r E, We now turn to derive Theorem 4.3. As in the case of Theorem 4.2, we focus on proving Inequalities 4.8 and 4.9. We first construct a dual solution to the offline problem D using the values that are obtained by the KKT optimality conditions. We then show that the primal and dual solutions obtained are feasible, and finally we prove that the dual we constructed can pay for both movement and service cost of the online algorithm. To construct the dual we simply assign the same dual value obtained by the optimality conditions to a t, a S,t in D, and we define b e,t+1 = 1 η ln 1+ 1 e ηα 1. Primal P is feasible: y e,t+ 1 e ηα 1 By construction, the algorithm initialization assigns feasible y 0, and y t is feasible from Conditions 4.23, Setting z e,t = max {0, y e,t y e,t 1 } yields a feasible primal solution. Dual D is feasible: By Claim 4.5, initially y e,0 1 γm, implying b e,1 1 η ln e ηα 1 1 γm + 1 e ηα 1 ln = α e ηα +γm 1 γm. η Therefore, we can set a E,0 = α ln e ηα +γm 1 γm η, and a 0 = a S,0 = 0 for all S E, and we get that the first dual constraint is feasible. Also, by the definition of the dual variable, we note that for any t 1 and e, b e,t+1 b e,t = 1 ye,t η ln + 1 e ηα 1 y e,t = c e,t a t + a S,t, e ηα 1 S e S where the last equality follows from Finally, 0 b e,t+1 = 1 η ln follows since 0 y e,t 1, and a t, a S,t 0 from Condition e ηα 1 y e,t+ 1 e ηα 1 α, which Primal-dual relation: Let D t be the change in the cost of the dual solution at time t. We bound the cost of the algorithm in each iteration by the change in the cost of the dual. 46

57 Bounding the movement cost at time t: Let M t be the movement cost at time t. As we charge our algorithm only for increasing the fractional value of the elements, we get, Technion - Computer Science Department - Ph.D. Thesis PHD M t = η η y e,t y e,t 1 e:y e,t>y e,t ye,t η y e,t + e ηα 1 η ln + 1 e ηα 1 e:y e,t>y e,t 1 y e,t e ηα 1 1 = η y e,t + e ηα a t a S,t c e,t e:y e,t>y e,t 1 S e S y e,t a e ηα t 1 S e S a S,t η e E = η r E a t n a t S a S,t S E r S a S,t + e ηα S E η 1 + n re + 1 e ηα D t Inequality 4.38 follows as a b a ln a/b for any a, b > 0. Equality 4.39 follows from Condition Inequality 4.40 follows as y e,t, c e,t are nonnegative, and by Claim 4.6. Inequality 4.41 follows from Conditions 4.23, Inequality 4.42 follows since, which follows as, n a t n re + 1 S E n a t S E S a S,t n re + 1 S a S,t n re + 1 = rea t S E rsa S,t, n a t a S,t + n re + 1 S E S E E \ S n re + 1 a S,t re a t a S,t + re rs a S,t 4.43 S E S E = rea t S E rsa S,t, where Inequality 4.43 is implied by Claim 4.4, and Claim 4.6. Thus, M T t=1 η 1 + n re + 1 e ηα D t = η 1 + n re + 1 D 1 e ηα r E a S E r S a S,0, 47

58 and by simplifying we get, = η 1 + n re + 1 D e ηα + r E α 1 = 1 + n re + 1 ηd + r E ln e ηα n re + 1 e ηα 1 Bounding the service cost: algorithm. First, S 1 = = T y t, c t = t=1 T t=1 e E y e,t ln e ηα +γm 1 γm η γ M e ηα ηd + r E ln γ M e ηα + γ M 1 Finally, we now wish to bound the service cost S 1 paid by the T c e,t y e,t t=1 e E a t S e S a S,t 1 η T y e,t ln t=1 e E ye,t + 1 e ηα 1 y e,t 1 + 1, e ηα 1 where the last equality follows from Condition Next, by Conditions 4.23, 4.26 we have, T = D t 1 T η t=1 e E t=1 y e,t + 1 ye,t + 1 e e ηα ln ηα 1 1 y e,t e ηα 1 T t=1 ln y e,t+ 1 e ηα 1 y e,t e ηα 1 e ηα 1 and using the fact that a b a lna/b for any a, b 0, and telescopic summation we get, Simplifying we get, T D t 1 T y e,t y e,t 1 η t=1 e E t=1 y e,t + ln 1 e ηα 1 y e,0 + 1 e ηα 1 e ηα 1., T = D t 1 e E y ln e,t + 1 e y e,t y e,0 + ηα 1 e E ln y e,0 + 1 e ηα 1 η η e ηα t=1 e E T D t = D r E a 0 r S a S,0 = D + r E α ln eηα +γm 1 γm 4.45 η t=1 S E D + re η ln γ M. 48

59 Equality 4.44 follows by telescopic summation. Inequality 4.45 follows as e E y e,0 = e E y e,t = r E, and by the definition of y 0 as the minimizer of the expression e E x ln e + 1 e ηα 1 over all fractional bases of M. 49

60 50

61 Chapter 5 Restricted Caching and Matroid Caching We study the online restricted caching problem, where each memory item can be placed in only a restricted subset of cache locations. We solve this problem through a more general online caching problem in which the cache is subject to matroid constraints. Our main result is an Omin{d, log r} log c-competitive algorithm for the matroid caching problem, where r and c are the rank and circumference of the matroid, and d is the diameter of an auxiliary graph defined over it. In general, this result guarantees an Olog 2 k-competitiveness for any restricted cache of size k, independently of its structure. In addition, we study the special case of the n, l-companion caching problem [BETW01]. For companion caching we prove that our algorithm achieves an optimal competitive factor of Olog n + log l, improving on previous results of [FMS02]. 5.1 Introduction Caches are key components in modern computer and networking architectures. Designing efficient caching or paging policies is a fundamental online optimization problem with multiple applications. Take, for example, the classical two-level memory system, consisting of a slow memory of infinite size and a small fast memory the cache. The input is a sequence of page requests which are satisfied one by one. If a page p being requested is already in the cache, then no action is required and no cost is incurred. Otherwise, page p must be brought from the slow memory to the cache a page fault, incurring some fetching cost, and possibly requiring the eviction of another page in the cache. The objective is to minimize the total fetching cost by wisely choosing which pages to evict. In recent years significant progress has taken place in the areas of parallel and distributed computing, as well as in local and web storage, giving rise to new, more complex, cache architectures. One example is a multi-core processor, which has become the dominant processor 51

62 architecture today. In a multi-core processor, every core has a private and fast cache, and in addition all the cores share a fully associative cache. Despite extensive research on paging problems, algorithms and performance results for common real-life models such as multi-core processors are still not yet fully understood. Web storage via content distribution networks provides another example of a non-traditional cache architecture. A content distribution network CDN is a large distributed system of servers deployed in multiple data centers over the web. The goal is to provide end-users fast and easy access to content from various devices and locations. There are quite a few public companies that offer such services nowadays, e.g., Akamai, Microsoft s Azure and Amazon s CloudFront. The cache in this context can be a set of servers e.g., in a data center that maintains content and provides fast service to a set of end-users. Typically, placing content on this set of servers needs to adhere to various restrictions and constraints, e.g., not any content can be placed on every server. Suppose an online company wishes to hold private, possibly encrypted, content via a CDN. It may be the case that due to digital rights management, or encryption schemes, content belonging to a particular company can only be located on certain dedicated servers which, naturally, have limited capacity. Content which is not placed on these servers will be fetched upon request from a distance, incurring a cache miss. Given that massive amounts of information are involved, management of content on the cache is of utmost importance for end-user experience. We model this kind of constraints on cache architectures via the restricted caching model, defined by Brehob et al [BETW01]. The idea is that pages, i.e., content, can only be placed in a restricted set of cache locations i.e., servers. Thus, the sets of legal cache locations for two distinct pages may not be identical, though they may have a non-empty intersection. This is in contrast to traditional fully associative caches, where all cache locations are identical, and pages can be located anywhere in the cache. Restricted caches are sometimes referred to as having arbitrary associativity. As shown by [Pes03], various algorithms for fully associative caching can result in very poor performance for some settings with arbitrary associativity. One can hope to design general algorithms that can cope with this extended family of cache architectures. A hybrid cache architecture model that interests us in particular is companion cache, a simple restricted caching model which includes victim caches and assist caches as special cases. A companion cache architecture has two components: a fully-associative shared cache of size n and m private caches of size l. The private caches can store items corresponding to different types, e.g., users, locations, file types, etc, while the fully associative cache can store items of any type. Companion caching was first considered by [BETW01] and further studied by [FMS02, EvS07]. A schematic description of a companion cache is presented in Fig Our Results The starting point of our work is the key observation that the restricted caching problem is captured by the more general problem of maintaining over time an independent set of a matroid, 52

63 shared n-size fully associative cache multiple l-size private caches type 1 type 2 type 3 Technion - Computer Science Department - Ph.D. Thesis PHD Figure 5.1: n, l-companion cache with respect to an online sequence of element requests. That is, at any point of time the online algorithm has to maintain an independent set of elements which includes the currently requested element. We call this problem the matroid caching problem a formal definition of the problem is given in Section 5.2. Surprisingly, although the generalization itself is fairly simple, exploiting matroid properties turns out to be more convenient and powerful, and enables us to both improve and generalize on previous results. We introduce a general randomized algorithm for the online matroid caching problem on a matroid M, consisting of two components. The first component is a fractional online Olog cm-competitive algorithm, where cm is the circumference largest circuit of M. The online algorithm and its analysis exploit the matroid properties, and obtain this improved upper bound based on primal-dual LP techniques developed in competitive analysis. The second component is a randomized rounding scheme which integrates two online rounding algorithms. The first algorithm maintains for every fractional solution a distribution on integral matroid bases see e.g. [BBN12a]. The main difficulty is to wisely update this distribution after every change in the fractional solution. These updates are done by reducing the problem to finding shortest paths in an auxiliary graph defined on the matroid. This auxiliary graph was first introduced by Cunningham [Cun84] for the purpose of determining in strongly polynomial time whether a point is inside a matroid polyhedron. By using the fact that we maintain a feasible fractional solution, and by proving additional properties of the auxiliary graph, we obtain a rounding algorithm which loses a factor of d G M, the diameter of the auxiliary graph. The second algorithm is an Olog rm-approximate rounding, recently obtained by [GTW14]. The idea behind this algorithm is to maintain spanning sets of M in every iteration, and then transform them into bases without incurring any additional loss. Combining the two components we get our main theorem. To our knowledge, this is the first randomized competitive algorithm for general restricted caching. type m Theorem 5.1. There is a an Omin{d G M, log rm} log cm-competitive algorithm for matroid caching, where rm and cm are the rank and circumference of matroid M, and d G M is the diameter of an auxiliary graph of M. 53

64 We remark that cm rm + 1, and thus Theorem 5.1 guarantees an Olog 2 k- approximation for any restricted cache of size k, independently of its structure. Nevertheless, in many cases cm can be much smaller. For example, in graphic matroids the longest cycle in a graph can be much smaller than the total number of vertices. As for the diameter of the auxiliary graph, we show an example for which d G M = ΩrM. This is essentially a worst case example since we prove that d G M rm + 1. However, in some interesting cases of restricted caching the diameter can be much smaller, even a constant. Specifically, for companion caching we show: Theorem 5.2. For the companion caching matroid, cm = min{m, n + 1} l + n + 1 and d G M = 3. Thus, our algorithm is Olog n + log l-competitive. This result improves on the randomized upper bound of Olog n log l of [FMS02], and is optimal as it matches their lower bound of Ωlog n + log l. In Chapter 4, under a unified framework of online learning and competitive analysis, for a matroid M = E, I we consider the online problem of maintaining matroid bases over time, incurring both movement and service cost, and obtain an O log E rm-approximation via regularization and primal-dual analysis. In this chapter we exploit the ideas presented in Chapter 4, however our approach differs in several respects. First, Algorithm 4.3 does not handle constraints such as forcing elements to be included in the matroid base as in caching; 1 second, we only obtained a fractional solution to the problem; third, instead of applying regularization, we present an alternative primal-dual algorithm which performs multiplicative updates. These modifications allow us to handle element requests and also improve on the analysis of the previous chapter. Related Work: Caching over a fully associative cache, known as the paging problem, was introduced by [Bel66]. Sleator and Tarjan [ST85] showed that any deterministic algorithm is at least k-competitive, and also proved that LRU is exactly k-competitive. When randomization is allowed, [FKL + 91] showed a tight Olog k competitive algorithm to this well studied problem later improved by [MS91, ACN00]. Restricted caching, defined by [BETW01], is a generalization of the paging problem to cache architectures which cannot be modeled as set associative caches. There are very few results on this generalized setting see [Pes03] for example, however, some specific restricted cache architectures were studied. In particular, [BETW01] introduced the companion caching problem, further studied by [FMS02, Pes03, BEWT04, EvS07]. Primal-dual analysis is a fundamental approach in tackling online optimization problems [AAA + 09, BN09a, BBN10], specifically used for the design of caching problems [BBN12a, BBN12b, ACER12]. Recently, Gupta et al [GTW14] also studied the problem of online maintaining matroid bases, giving an Olog E -competitive fractional algorithm using a primal- 1 Note that forcing inclusion via, e.g., a series of steps that incur small costs on other pages until the element is completely in the base, changes the objective function and therefore cannot be performed in that manner. 54

65 dual approach. Similarly to Chapter 4, the algorithm cannot handle constraints such as forcing elements to be included in the matroid base. Interestingly, [GTW14] gives an Olog rm- approximation for rounding online a fractional solution. They further generalize the rounding to the weighted version and show that their result is tight. We note that their hardness result does not hold for the unweighted version, and specifically in some interesting special cases, e.g., uniform matroids, partition matroids, and some restricted cache models. 5.2 Definitions and Problem Formulation In the restricted caching problem we are given a set E of n pages and a set S of available memory slots. Every page p can be located on some subset of S. In every time step t we are given a request for a page p t. If the page is already in the cache, no cost is incurred. Otherwise, the page must be fetched from a slower memory to one of its feasible slots incurring a cost of one unit. If the slot is not empty the algorithm can reorganize the cache for free by either moving pages around between feasible slots or by evicting them. We assume that reorganization is for free, since the time required for fetching a page from slower memory dominates by several orders of magnitude the time for moving pages inside a cache [BETW01, FMS02]. Thus, the goal is to minimize the total cost, i.e. the number of pages that are fetched. Companion caching [FMS02] is a special case of restricted caching which is a hybrid of two classic cache architectures an l-way set-associative cache, and a fully associative cache. There are m caches of size l and a single cache of size n. There are m types of pages, where a page of type i can be stored either in the ith associative cache of size l, or in the fully associative cache of size n. We solve the restricted caching problem through a more general problem on matroids. As mentioned in Chapter 4, matroids are extremely useful combinatorial objects that play an important role in combinatorial optimization. We refer the reader to Section 2.2 for a brief description of the important matroid properties. Recall that a matroid M = E, I is defined over a finite set E of element, and a non-empty collection of subsets I of E, called independent sets. Three polytopes associated with a matroid M are the matroid polytope PM, the matroid base polytope BM, and the spanning set polytope P ss M see [Edm70, Sch03]. PM is the convex hull of incidence vectors of the independent sets of M. Similarly, BM is the convex hull of the incidence vectors of the bases of M, and P ss M is the convex hull of the incidence vectors of the spanning sets of M. Transversal matroids are one interesting example of matroids. Let A = A 1, A 2,..., A n be a family of subsets of a finite set E. A set T is called a transversal of A if there exist distinct elements a 1 A 1, a 2 A 2,..., a n A n such that T = {a 1, a 2,..., a n }. A partial transversal is a transversal of some subfamily A i1, A i2,..., A ik of A. Let I be the collection of all partial transversals of A. Then M = E, I is a matroid. The bases of this matroid are the inclusion-wise maximal partial transversals, and it follows from König s matching theorem that the rank function 55

66 P min T y pt,t + T z p,t D max T S rs a S,t t=1 t 0 and S E t=1 p E y p,t S r S t 1 and p p t b p,t+1 b p,t + p S p E t=0 S E b p,1 a S,0 S p S a S,t S p S t 1 and p E z p,t y p,t 1 y p,t t 1 and p E b p,t 1 t and p E z p,t, y p,t 0 t, p E, S E b p,t, a S,t 0 Figure 5.2: The primal and dual LP formulations for the matroid caching problem. r of the transversal matroid induced by A is given by rs = min T S S \ T + {i : A i T } for S E. It is not hard to see that we can model restricted caching using a transversal matroid. For every page p E there is a subset of cache slots to which it can be assigned. Equivalently, for every cache slot s i, P s i denotes the subset of pages in E that can be assigned to it. Let P = P s 1, P s 2,..., P S k be the page subsets for all cache slots. Then, every subset of pages T inducing a valid assignment of pages to the cache is a partial transversal of P. Let I be the collection of all valid cache assignments. Then, M = E, I is a matroid. We thus define a general caching problem on any matroid M = E, I. In the matroid caching problem the online algorithm must maintain at any time t an independent set S t I the cache of the matroid. At any time t, upon receiving a request for an element of the matroid page p t : if p t S t 1 no cost is incurred; otherwise, p t must be added to S t 1 paying one unit. If S t 1 {p t } is dependent, elements must be removed evicted, so that S t becomes independent. We can formulate the matroid caching problem as follows. Let the variables y p,t denote the fraction of page p E missing from the cache at time t. This means that at any time t, 1 y pt,t = 1 and 1 y t PM. The fetching cost of page p is max{0, y p,t 1 y p,t }. Figure 5.2 contains the linear relaxation of this formulation P, as well as the corresponding dual program D. Both programs play a central role in our analysis. We define D as the value of the dual program, which is a lower bound on the value of any primal solution. 5.3 Main Algorithm In this section we solve the fractional version of the matroid caching problem, proving the following theorem. Theorem 5.3. There is an online algorithm with competitive ratio 2 log 1 + cm for the fractional matroid caching problem on a matroid M, where cm is the circumference of M. At a high level the description of the algorithm is as follows. Without loss of generality we assume that the cache is initially full this can be achieved by requesting a sequence of rm independent dummy pages, before the actual input sequence. The algorithm maintains a 56

67 Algorithm 5.1 Matroid Caching Algorithm Initiate η = log 2. During execution, maintain the following relation between primal and dual variables: y p,t = fb p,t+1 = eη bp,t+1 1 e η 1. Start with an empty cache y p,0 = 1, and set b p,1 = 1 accordingly. for t = 1, 2,... do Let p t be the current requested page. Update step: Set y pt,t = b pt,t+1 = 0. Normalization step: As long as p E y p,t < E re: 1. Let S be the set of evictable pages. 2. Update update b p,t+1 to maintain y p,t unchanged. { } S η max η, log S rs + 1, 3. For each p S, aside from p t, update b p,t+1 b p,t+1 + a S,t and y p,t accordingly, where a S,t is the smallest value such that there exists p S that becomes unevictable. end for solution y t such that 1 y t is in the base polytope of M. Whenever a page p t is fetched, the algorithm updates y pt,t = 0. This generates a solution y t whose complement is in the spanning polytope of M, i.e., 1 y t P ss M. Next, the algorithm performs a sequence of steps which gradually evict other pages from the cache making it feasible again. We refer to each such step as a normalization step. In each normalization step we consider the set S of all evictable pages. A page is evictable if we can increase y p,t by an infinitely small value ε, and remain in P ss M. It is well known that x PM iff 1 x P ss M, where M is the dual matroid for M. Thus, we can use the latter condition to compute S efficiently. Let us consider the maximal dual tight set with respect to our current solution. A set T E is tight if p T y p,t = r T, and using submodularity of r, if T 1 and T 2 are tight, then so are T 1 T 2 and T 1 T 2. In particular, there is a maximal tight set T max containing all pages whose value y p,t cannot be increased without violating the dual matroid constraints. Therefore, in each normalization step we define the evictable set S as all pages which are not in a dual tight set, S = E \ T max, and increase their value until an additional page joins a tight set. In general, the above condition can be checked in polynomial time by a reduction to submodular function minimization [Sch03, Ch. 40]. For transversal matroids, S can be computed using flow techniques. The sequence of normalization steps ends when all pages become tight. Algorithm 5.1 formally describes the procedure. We remark that the algorithm can be easily generalized to the weighted version, where fetching page p incurs a cost of w p, by maintaining the following primal-dual relationship between the variables: y p,t = fb p,t+1 = eηb p,t+1 /wp 1 e η 1. We prove Theorem 5.3 using the following technical lemma on matroids. 57

68 Lemma cm = { max nonseparable A E A A ra }. Technion - Computer Science Department - Ph.D. Thesis PHD To prove the lemma, we rely on the following arithmetic property on mediants [Gut11]. Lemma Generalized mediant inequality, [Ben13]. For any nonnegative numbers a 1, a 2,..., a n and positive numbers b 1, b 2,..., b n, with equality only if a i b i n a i min a 1 + a a n n i=1 b i b 1 + b b n = const. { { }} B Proof of Lemma Let us denote A = arg min B : B rb = max A nonseparable A E A ra. In addition to the fact that A is nonseparable, we note that A is a circuit. To see that, we represent A as a union of circuits: let C 0 be the largest circuit in A and let p 0 be a page in it. If A \ C 0 is not empty, choose a page p 1 A \ C 0. Since A is nonseparable, there is a circuit containing any two pages in A; let C 1 denote the circuit containing p 0, p 1. If A \ C 0 C 1 is not empty, choose a page p 2 A \ C 0 C 1, and denote C 2 as the circuit containing p 0, p 2. We continue with this process until we obtain A = C 0 C 1 C 2... C j. We observe the following two properties for every 1 i j: 1. i l=0 C l is nonseparable since any two pages share a circuit as each one of them shares a circuit with p C i \ i 1 l=0 C l < C0, otherwise C i > C 0 contradicting maximality of C 0 in A. Then, using the above properties, for every 1 i j, we have C i \ i 1 l=0 C l r i l=0 C l r i 1 l=0 C l max i=1 a i b i C i \ i 1 l=0 C l rc i \ i 1 l=0 C 5.1 l 1 C i \ i 1 l=0 C l C i \ i 1 l=0 C l 1 C 0 C 0 1 = C rc 0 Inequality 5.1 follows from Property 1 which implies r i 1 l=0 C l +rc i \ i 1 l=0 C l > r i l=0 C l. Inequality 5.2 follows from Property 2. This gives us, C i \ i 1 l=0 C l C i \ i 1 l=0 C l r i l=0 C l r i 1 l=0 C l C 0 C 0 rc 0,

69 since x x 1 is monotonically decreasing in x, and thus, Technion - Computer Science Department - Ph.D. Thesis PHD A A ra = = j i=0 C i \ i 1 l=0 C l j i=0 C i \ i 1 l=0 C j l i=0 r i l=0 C l r i 1 l=0 C l { C i \ i 1 l=0 max C } l 0 i j Ci \ i 1 l=0 C l r i l=0 C l r i 1 l=0 C l C 0 C 0 rc 0, where 5.4 follows the generalized mediant inequality. By minimality of A we have that A = C 0. A Since A is a circuit, we conclude that max nonseparable A A ra = max C C circuitsm C C 1 = cm. Next, we are ready to prove Theorem 5.3. Proof of Theorem 5.3. The analysis of the algorithm s performance is done using the primal-dual method. Primal P is feasible: 5.4 Clearly y 0 is feasible. By induction on the steps, we prove that the algorithm produces a feasible solution i.e., 1 y t BM. The update step sets y pt,t = 0. Then, in the normalization step the value of each y p,t only grows. Note that as long as the primal solution is not feasible, y t PM, but y t BM, hence not all pages are in a dual tight set and S is non-empty. After at most n iterations all pages become tight, and y t is feasible. Dual D is feasible: Since initially the cache is empty, and for each p, y p,0 = b p,1 = 1, then by setting a S,0 = 0 for all S E, we have that the first set of dual constraints is feasible. The primal solution is feasible, thus 0 y p,t 1, so as we preserve the primal-dual relation we get: 0 eη bp,t+1 1 e η 1 1. Simplifying we get 0 b p,t+1 1. Finally, by the algorithm construction we keep the dual constraints with equality: b p,t+1 = b p,t + S p S a S,t. The only exception is due to line 2 in the Normalization step. By the fact that f is monotonically decreasing in η and monotonically increasing in b p,t+1, every increase in η in line 2, also induces an increase in b p,t+1, implying b p,t+1 b p,t + S p S a S,t. Primal-dual relation: cost. Let dd da S,t We bound the primal cost in each iteration by the change in the dual be the increase rate of the dual solution at time t, when a S,t gradually increases. Let η t denote the current value of η, and let η final denote its value at the end of the execution. The dual increase rate is S rs. For the primal cost, let us consider eviction costs instead of fetching costs clearly, this adds at most re to the overall cost. That is, we bound the change in the primal variables during the normalization step instead of the update step. 59

70 p S\{p t} dy p,t = η t da S,t p S\{p t} 1 y p,t + e ηt < η t r E r E \ S + S 1 e ηt dd dd 2η t S rs = 2η t 2η final da S,t da S,t log1 + cm dd da S,t. 5.8 Equality 5.5 follows from the fact that dyp,t da S,t = dyp,t db p,t+1 = η t y p,t + 1 e η t 1. Inequality 5.6 follows as p E\S y p,t = r E \ S. Inequality 5.7 follows by the dual rank function definition and as η t log 1 + S S rs in the algorithm. Finally, Inequality 5.8 follows by Lemma 5.3.1, since η final max S {log S S rs }. + 1 It is not hard to see that S is nonseparable. Assume by contradiction that S can be partitioned into S 1, S 2 such that rs 1 + rs 2 = rs, and assume without loss of generality that p t S 1. Then, clearly, since every page p S 2 is unevictable before p t is fetched, it remains unevictable when y pt,t is set to Rounding the Fractional Solution Online In this section we describe our rounding procedure for matroid caching. Our goal is to map the fractional solution produced by the algorithm of Section 5.3 into a distribution on the bases of the matroid, and show how to maintain the distribution while paying a small cost. Let y t 1 BM be the fractional solution at time t 1. Moving from y t 1 BM to y t BM can be divided without loss of generality into a sequence of changes, where in each one y u,t 1 is increased by ε, and y v,t 1 is decreased by ε, for some choice of u and v. Thus, the change in the cost of the fractional solution is ε. Our algorithm holds at any time a decomposition D of the current fractional solution y into a subset of matroid bases, such that y = B D λ B B, and B D λ B = 1. We want to update the current distribution D so it is consistent with the new fractional solution y t, while making as few changes as possible. This immediately gives us an online mapping of our fractional algorithm to a randomized integral one see, e.g., [BBN12a] for the explicit mapping. To update the distribution D we use an auxiliary graph which was initially introduced by [Cun84] to determine whether a given point is inside PM, and if so find a decomposition of it into independent sets. Given a decomposition D of a fractional solution y BM, let GD = V, E be a directed graph, with V = E. The edges of the graph are defined as follows: E = {u, v : u, v V and B D, λb > 0 such that B + u v is a base}. 60

71 Proposition Let y BM be a fractional solution and let D be a decomposition of y. Let y BM be a fractional solution such that y u = y u + ε, y v = y v ε and y w = y w for any w u, v. Then, there exists a directed path in GD from u to v. Furthermore, if P G u, v is the shortest path from u to v, D can be converted to a decomposition of y, D, while paying ε P G u, v. Proof. Let us look at the problem considered by [Cun84]. In this work, we are given x 0 PM, a decomposition of x 0 into independent sets, and x 1 R S, where x 0 x 1. The goal is to iteratively bring x 0 closer to x 1, i.e. find x PM such that x 0 x x 1, until we either reach x 1, or no such solution exists. To do so, an auxiliary graph similar to ours is constructed and an augmenting path in it is computed. In particular, it is shown that if x 1 PM, then such a path always exists [Cun84, Theorem 2.2]; we follow this path, and for every edge e, f on it we add e and remove f, in some base B D in which B + e f is also a base 2, and obtain a feasible decomposition of x [Cun84, Lemma 4.3]. In fact, our setting is a special case of the latter setting. If we set x 0 = y ε 1 v, x 1 = x 0 + ε 1 u = y, and a decomposition of x 0 is defined as D after removing v from ε-measure of its bases, then our problem satisfies the conditions of [Cun84]. As a result, there exists a path from u to v, and to obtain D all we need to do is follow the path P G u, v and perform P G u, v swaps. Proposition suggests a way of maintaining a decomposition of the fractional solution online. The payment of the rounding scheme depends on the length of the path in GD. The property that determines the worst case rounding quality is therefore the diameter of GD, that is, the maximum shortest-path among all pairs u, v V for which there exists a path from u to v. For a matroid M, let d G M be the maximum diameter over any two fractional solutions y, y BM such that y differs from y in two coordinates, and any valid decomposition of y into bases. We obtain the following bound on d G M. Lemma For any matroid M, d G M rm + 1. Proof. Let y, y BM denote two fractional solutions, such that y u = y u + ε, y v = y v ε and y w = y w for all other coordinates. Let D be a valid decomposition of y, so that y = B D λ B B. We prove that there exists a path from u to v in GD of length at most rm + 1. Let A E denote a subset of pages that have a directed path from u. Starting from A = {u}, we would like to add in each step a new node w, such that there is an edge from A to w, and ra {w} > ra. After at most r 1 steps we reach a point where no such node w exists. By [Sch03], Theorem 39.12, for every base B D there is an independent set I B B of size and rank A, such that for every e I B exists e A where B + e e is a base. This implies that, for every B, all pages in I B are connected to A, but as A cannot be further extended, 2 Only an ε-measure of B is updated. That is λ B λ B ε, and the probability of B + e f increases by ε. 61

72 ra B D I B = ra. Therefore, either u A B D I B and we are done, or, y w = λ B B I B 5.9 w I B B D λ B B I B 5.10 B D = λ B I B = ra = ra I B B D However, this contradicts the feasibility of y as, y w ε + w I B A y w 5.12 w I B ε + ra I B, 5.13 where 5.12 follows as u A and v A B D I B. The above lemma immediately provides an upper bound on the performance of our rounding algorithm. In Section we show that this result is almost tight. In general, this bound may be quite large. However, in Section 5.5 we explore several interesting special cases of the restricted caching problems and show that the diameter in these cases is much smaller. Moreover, for the general matroid case, we are able to guarantee a logarithmic competitive ratio using an online rounding algorithm recently proposed by [GTW14]. The idea behind the algorithm is to maintain spanning sets in every iteration, and then transform them into bases without incurring any additional loss. Rounding spanning sets is based on recent results on contention resolution schemes [VCZ11] See [GTW14], Section 4 for more details. Therefore, for rounding, one can always apply the auxiliary graph approach, and if d G M is greater than log rm switch to the rounding approach of [GTW14]. Combining Theorem 5.3 with Proposition as well as with the above insight, yields our main Theorem A Lower Bound on the Auxiliary Graph Diameter In this section we show that the upper bound proven in Lemma is almost tight. Consider the graphic matroid denoted by graph G = V, E, with V = {v 1, v 2,..., v n }, and assume we are given a uniform distribution D over n 2 spanning trees T 1, T 2,..., T n 2, such that, T i = P n v i, v i+1 + v 1, v i+2, where P n = {v j, v j+1 } n 1 j=1 is the path from v 1 to v n. The distribution is presented in Fig

73 T 1 T 1 Technion - Computer Science Department - Ph.D. Thesis PHD T 2 T 3 T n 3 T n 2... v 1 v 2 v n 1 v n Figure 5.3: Uniform decomposition into spanning trees of the initial fractional solution. The current solution can be represented fractionally as, T 2 T 3 T n 3 T n 2 y n 1,n = 1 y = y i,i+1 = n 3 n 2 1 i n 2 y 1,i+2 = 1 n 2 1 i n v 1 v 2 v n 1 v n Figure 5.4: Decomposition into spanning trees of the updated fractional solution. Now assume a new fractional solution y is obtained by increasing y 1,2 from n 3 n 2 to 1, and decreasing y n 1,n from 1 to n 3 n 2. We would like to apply a sequence of edge swaps, to obtain a decomposition of y. Let us observe the corresponding auxiliary graph GD. For every 1 i n 2, we note that there are only two edges in GD which are going forward from v i, v i+1 : v i+1, v i+2 and v 0, v i+2, and both of them belong to T i. Therefore, the shortest path from v 1, v 2 to v n 1, v n is of length n 2, and we must perform edge swaps in all n 2 graph bases to obtain the new decomposition see Fig That is, the graph diameter for this matroid is at least r Special Cases of Restricted Caching As already mentioned, although d G M has a tight upper bound of rm + 1 in general, there are several special cases in which the diameter becomes significantly smaller. We demonstrate this on two previously studied cache architectures, obtaining tight performance guarantees. Classical Paging: In the classical paging problem there is a cache of size k and n pages. Each page may be located anywhere in the cache. It is not hard to see that in this case cm = k + 1, and d G M = 2 as for every u, v ε-update, the out-degree of u is at least n k, and the in-degree of v is at least k. Thus, our algorithm is Olog k-competitive which is optimal up to constants. 63

74 Companion Caching: In the companion caching problem there are m types of pages, where page of type i can be stored either in the ith associative cache of size l, or in the fully associative cache of size n. As companion cache is a restricted cache, we can represent it via a transversal matroid, and prove the following result. Lemma For the companion caching matroid cm = min{m, n + 1} l + n + 1 and d G M = 3. Thus, our algorithm is Olog n + log l-competitive which is optimal up to constant factors. Proof. Let M be the transversal matroid induced by an n, l-companion cache. Bounding cm: First observe that for every circuit C in M, the number of pages of each type t is either 0, or at least l + 1, otherwise evicting a page of type t also reduces the rank of C by 1 all pages of type t can be assigned to slots in the private cache, and thus an eviction of one of these pages decreases the assignment size by one, contradicting the rank property of a circuit. Moreover, the size of any circuit C with τ types of pages, is at most τl + n + 1, because there are only τl + n possible cache slots for these pages. Let C be a circuit, and let τ be the number of pages in C. Then from the above two observations we have, τl + τ C τl + n + 1. If m > n, then C is maximized when τ = n + 1, implying C = n + 1l + 1. For example, a subset of pages containing l + 1 pages from each of any n + 1 types, is a circuit. First note that the subset is dependent as its size is nl + n + l + 1 > n + 1l + n. Next, note that evicting any of the pages leads to a feasible cache as the cache size would be n + 1l + n, with at least l pages of each type. If m n, then C is maximized for τ = m, implying C = ml + n + 1. Any subset of ml + n + 1 pages containing at least l + 1 pages of every type is a circuit. As before, the subset is dependent as its size is ml + n + 1 > ml + n, and in addition the eviction of any page leads to a feasible cache as the cache size would be ml + n, with at least l pages of each type. Bounding d G M: Let there be some decomposition D of the fractional solution into integral solutions. We would like to add ε-measure of page u and remove ε-measure of page v instead. Next we prove there is a path from u to v of maximum length 3. Let A denote a base in D which does not contain u and let B denote a base in D which contains v such base exists otherwise we cannot make the fractional update. Let us start with a simple case where all bases D have exactly l pages of v s type, which means u and v are of the same page type. If v A then there is an edge in GD from u to v and we are done. If v A, then there exists a page of the same type w A \ B, implying u, w, w, v GD. A complementary case is where there exists a base B in D with at least l + 1 pages of v s type. In particular, either v B, or there is a page of the same type v B, such that 64

75 v B, and thus there is an edge in GD from v to v. Hence, all we need to show is that the distance from u to v is at most 2 in case v B we denote v as v. Let T A denote the subset of page types which have at least l + 1 pages in A, and let S denote the set of pages in A of these types. Clearly S = T A l + n, and u is connected to all pages in S in graph GD. Similarly, let T B denote the subset of page types which have at least l + 1 pages in B. if v S then u, v GD and we are done. Otherwise, there are two possibilities. The first is that T A T B, and as v s page type is in T B, taking up at least one slot from the shared cache in B, we have that at most T A l + n 1 of the pages in S belong to B. This means there is a page q S \ B, implying u, q, q, v GD. The second possibility is that there is a page type j with at least l + 1 pages in A, j T A, but only l pages in B, j T B. Therefore, there exists a page r A \ B of type j, implying u, r, r, v GD. This result matches the lower bound shown by [FMS02]. As argued by [FMS02], any algorithm with free reorganization can be implemented online in the no-reorganization model while losing at most a factor of 3. Thus, we also get tight competitiveness to the companion caching problem without free reorganization. 5.6 Concluding Remarks In this chapter we study the restricted caching problem in which each page in memory can only be placed in a restricted subset of cache locations, presenting a framework which guarantees an Olog 2 k-competitiveness for any restricted cache of size k, independently of its structure. This study suggests several future research directions. First, we showed that d G M can be in some cases as large as rm. However, we could not come up with an example where the rounding algorithm can be forced to use long paths repeatedly for many steps. Thus, a reasonable conjecture is that the amortized cost of our rounding algorithm via the auxiliary graph might only be a constant. Proving this conjecture will give an optimal Olog k-competitive algorithm for any restricted caching problem. Another direction is finding and characterizing the circumference of a transversal matroid. This parameter is interesting since it serves as the performance bound of our fractional algorithm. The problem of finding the circumference in a general matroid has very poor approximation factors, as in graphic matroids it reduces to finding the longest simple cycle. However, we do not know anything about the hardness of the problem for transversal matroids. 65

76 66

77 Chapter 6 Online Packing and Covering Framework with Convex Objectives We consider online fractional covering problems with a convex objective f, where the covering constraints arrive over time. We also consider the corresponding online dual packing problem with a concave objective. In this chapter we provide an online primal-dual framework for both classes of problems with competitive ratio depending on certain monotonicity and smoothness parameters of f. Using the notion of convex conjugacy and Fenchel duality, well studied techniques in online convex optimization, our analysis extends the online primal-dual linear programming techniques developed in competitive analysis. Our results match or improve on guarantees for some special classes of functions f considered previously. Using the new fractional solver with problem-dependent randomized rounding procedures, we obtain competitive algorithms for various problems, such as online covering LPs minimizing l p -norms of arbitrary packing constraints, set cover with multiple cost functions, capacity constrained facility location, and profit maximization with nonseparable production costs. Some of these results are new and others provide a unified view of previous results, with matching or slightly worse competitive ratios. 6.1 Introduction We consider the following class of fractional covering problems: min {fx : Ax 1, x 0}. 6.1 Above, f : R n R is a nondecreasing convex function and A m n is nonnegative. Observe that we can transform the more general constraints Ax b with all nonnegative entries into this form by scaling the constraints. The covering constraints a i x 1 arrive online over time, and must be satisfied upon arrival. We want to design an online algorithm that maintains a feasible 67

78 fractional solution x, where x is required to be nondecreasing over time. We also consider the Fenchel dual of 6.1 which is the following packing problem: Technion - Computer Science Department - Ph.D. Thesis PHD max {1 y f µ : A y µ, y 0}. 6.2 Here, the variables y i along with columns of A or, alternatively, rows of A arrive over time, and the f is the convex conjugate of f, formally defined in 6.7; see, e.g., [Roc70] for background and properties. Let d denote the row sparsity of the matrix A, i.e., the maximum number of non-zeroes in any row, and let l fz be the l th coordinate of the gradient of f at point z R n. This chapter presents an online primal-dual algorithm for the pair of convex programs 6.1 and 6.2. This extends the widely-used online primal-dual framework for linear objective functions to the convex case. The competitive ratio is given as the ratio between the primal and dual objective functions 1. It depends on certain smoothness parameters of the function f. We provide two general algorithms: In the first algorithm, the primal variables x and dual variables µ are monotonically nondecreasing, while the dual variables y are allowed to both increase and decrease over time. The competitive ratio of this algorithm to problem 6.1 is: Dual Primal max min c>0 z 1 4 ln1 + d and the competitive ratio of this algorithm to problem 6.2 is: Dual Primal max c>0 [ min z 1 4 ln1 + d { } n l fz min z fz fz, 6.3 l=1 l fcz fcz { } n l fz min max l=1 l fcz z z fz fz fcz ]. 6.4 In the second algorithm, all variables primal variables x as well as dual variables y, µ are required to be monotonically nondecreasing. The competitive ratio of problem 6.2 is slightly worse in this case, given by: [ Dual Primal max min c>0 z 1 2 ln1 + dρ { } n l fz min max l=1 l fcz z z fz fz fcz ]. 6.5 Observe that the difference from 6.4 is the additional parameter ρ, which is defined to be an upper bound on the maximum-to-minimum ratio of positive entries in any column of A. The above expressions are difficult to parse because of their generality, so the first special case of interest is that of linear objectives. In this case z fz = fz, and also fz = fcz, hence the competitive ratios are Oln d for monotone primals, and Olndρ for monotone primals and duals. Both of these competitive ratios are known to be best possible [BN09b, GN14]. 1 However, for clarity of exposition we provide the ratio as Dual/Primal and not vice versa. 68

79 The applicability of our framework extends to a number of settings, most of which have been studied before in different works. We now outline some of these connections. Technion - Computer Science Department - Ph.D. Thesis PHD Mixed Covering and Packing LPs. In this problem, covering constraints Ax 1 arrive online. There are also K packing constraints n j=1 b kj x j λ k, for k [K], that are given up-front. The right hand sides λ k of these packing constraints are themselves variables, and the objective is to minimize the l p -norm K k=1 λp k 1/p of the load vector λ = λ 1,..., λ K. All entries a ij and b kj are nonnegative. Clearly, the objective function is a monotonically nondecreasing convex function. We obtain an Op ln d-competitive algorithm for this problem, where d n is the rowsparsity of matrix A. Prior to our work, [ABFP13] gave an Oln K lndκγ-competitive algorithm for the special case of p = ln K corresponding to λ, the makespan of the loads; here γ and κ are the maximum-to-minimum ratio of the entries in the covering and packing constraints. Set Cover with Multiple Costs. Here the offline input is a collection of n sets {S j } n j=1 over a universe U, and K different linear cost functions B k : [n] R + for k [K]. Elements from U arrive online and must be covered by some set upon arrival, where the decision to select a set into the solution is irrevocable. The goal is to maintain a set-cover that minimizes the l p norm of the K cost functions. Combining our framework with a simple randomized rounding scheme gives an O p3 ln p ln d ln U -competitive randomized online algorithm; here d is the maximum number of sets containing any element. The special case of K = 1 when p = 1 without loss of generality is the online set-cover problem [AAA + 09], for which the resulting Oln d ln U -competitive bound is tight, at least for randomized polynomial-time online algorithms [Kor05]. Profit Maximization with Production Costs PMPC. This is an application of the dual packing problem 6.2, in contrast to the above applications which are all applications of the primal covering problem. Consider a seller with m items that can be produced and sold. The seller has a production cost function g : R m + R + which is monotone, convex and satisfies some other technical conditions; the total cost incurred by the seller to produce µ j units of every item j [m] is given by gµ. 2 There are n buyers who arrive online. Each buyer i [n] is interested in subsets of items bundles that belong to a set family S i 2 [m]. The value of buyer i for subset S S i is given by v i S, where v i : S i R + is her valuation function. If buyer i is allocated a bundle T S i, she pays the seller her valuation v i T. Observe: this is not an auction setting. The goal in the PMPC problem is to produce items and allocate subsets to buyers so as to maximize the profit n i=1 v it i gµ, where T i S i denotes the subset allocated to buyer i and µ R m is the total quantity of all items produced. As 2 See the Related Work section for important differences from prior work on such problems [BGMS11, HK15]. 69

80 mentioned above, we consider a non-strategic setting, where the valuation of each buyer is known to the seller. Technion - Computer Science Department - Ph.D. Thesis PHD Our main result here is for the fractional version of the problem where the allocation to each buyer i is allowed to be any point in the convex hull of the set family S i. We show that for a large class of valuation functions e.g., supermodular, or weighted rank-functions of matroids and production cost functions, our framework provides a polynomial time online algorithm: the precise competitive ratio is given by expression 6.5 with f = g. As a concrete example, suppose the production cost function is gµ = m j=1 µ j p for some p > 1. In this case, we get an Oq ln β q -competitive algorithm, where q > 1 satisfies 1 q + 1 p = 1, and β is the maximum-to-minimum ratio of the valuation functions {v i}. As the above list indicates, the framework for solving fractional convex programs is fairly versatile and gives good fractional results for a variety of problems. In some cases, solving the particular relaxation we consider and then rounding ends up being weaker than the best known results for that specific problems by a logarithmic factor; we hope that further investigation into this problem will help close this gap. Bibliographic Note: In independent and concurrent work, Azar et al. [ACP14] consider online covering problems with convex objectives i.e., problem 6.1. They also obtain a competitive ratio that depends on properties of the function f, but their parameterization is somewhat different from ours. As an example, for online covering LPs minimizing the l p -norm of packing constraints, they obtain an Op lndκγ-competitive algorithm, whereas we obtain a tighter Op ln d ratio Techniques and Chapter Outline In Section 6.2.1, we give the first general algorithm for the convex covering problem 6.1 maintaining monotone primal variables but allowing dual variables to decrease. The main observation is simple, yet powerful: convex optimization problems with a function f can be reduced to linear optimization using the gradient of the convex function f. In the process we end up also giving a cleaner algorithm and proof for linear optimization problems as well, significantly simplifying the previous algorithm from [GN14]. The resulting algorithm performs multiplicative increases on the primal variables; for the dual, it does an initial increase followed by a linear decrease after some point. In Section we give the second general algorithm, which is simpler. The primal updates are the same as above but we skip the dual decreases. This results in a worse competitive ratio, but the loss is necessary for any monotone primal-dual algorithm [BN09b]. In Section 6.3 we deal with the various applications of our framework. The high-level idea in all of these is to suitably cast each application in the form of either 6.1 or 6.2. The applications in Section and Section 6.3.2, are for the convex covering problem 6.1. We 70

81 comment that for applications to combinatorial problems we have to define the convex relaxations with some care in order to avoid bad integrality gaps. Moreover, some of our convex relaxations are motivated by the particular constraints we want to enforce when subsequently rounding. In Section we consider the problem of profit maximization with production costs, which after some simplifications can be cast as a convex packing program as in 6.2. We want allocations to be nondecreasing over time, so we use our second general primal-dual algorithm, which maintains monotone solutions. We also show how this problem can be solved efficiently for some special classes of valuation functions: supermodular and matroid-rank-functions. This convex program can also be randomly rounded online to get integral allocations with the same multiplicative competitive ratio, but with an extra additive term. The additive term depends only on the number m of items and the cost function g; in particular it does not depend on n, the number of buyers. We also show that such an additive loss is necessary 1 for our approach due to an integrality gap of the convex relaxation, and 2 for any randomized online algorithm even in a special case of the problem. Related Work: This study adds to the body of work in online primal-dual algorithms, which has been applied successfully to a large class of online problems, such as set cover [AAA + 09], graph connectivity and cuts [AAA + 06], caching [BBN12a], auctions [HK15], scheduling [DH14], etc. Below we discuss in more detail only work that is directly relevant to us. Online packing and covering linear programs were first considered by Buchbinder and Naor [BN09b], where they obtained an Oln n-competitive algorithm for covering and an Olnn amax a min -competitive algorithm for packing. The competitive ratio for covering linear programs was improved to Oln d by Gupta and Nagarajan [GN14], where d n is the maximum number of non-zero entries in any row. Azar, Bhaskar, Fleischer, and Panigrahi [ABFP13] gave the first algorithm for online mixed packing and covering LPs, where the packing constraints are given upfront and covering constraints arrive online; the objective is to minimize the maximum violation of the packing constraints. Their algorithm had a competitive ratio of Oln K lndκγ, where K is the number of packing constraints and γ resp. κ denotes the maximum-to-minimum ratio of entries in the covering resp. packing constraints. Using our framework, this bound can be improved to Oln K ln d. This is also best possible as shown in [ABFP13]. [ABFP13] also introduced the capacity constrained facility location problem CCFL and gave an Oln m ln mn-competitive algorithm. Following directly from our general framework, one also obtain a result for this specific setting possibly worse by a log-factor. Moreover, our approach can be naturally extended to other problems, such as the capacitated multicast problem, which is a generalization of CCFL to multi-level facility costs. The online multicast problem without capacities was considered by Alon et al. [AAA + 06] where they obtained an Oln m ln n-competitive randomized algorithm. The class of online maximization problems with production costs was introduced by Blum, Gupta, Mansour, and Sharma [BGMS11] and extended by Huang and Kim [HK15]. The key 71

82 differences from our setting are: i these papers deal with an auction setting where the seller is not aware of the valuations of the buyers, whereas our setting is not strategic, and ii in these papers each item j has a separate production cost function g j µ j, and gµ := j g jµ j. We call this the separable case. Our techniques allow the production cost to be nonseparable over items e.g., we can handle gµ = m j=1 µ j 2. Overall, methods based on Fenchel duality are widely used in convex optimization; for more details see [BL10]. In particular, it is a powerful algorithmic and analytical tool in online learning [KSST12, SS11, GLS01], with strong connections to the regularization approach [CBL06, Rak09, ACV13]. On the other hand, nonlinear optimization and Fenchel duality were not as widely studied in the competitive analysis domain. Recently, the use of Fenchel duality has emerged in the context of online scheduling [BCP09, BPS09, GKP10, AGK12, DH14], and welfare maximization [BGMS11, HK15], but as already mentioned these works are restricted to special convex objectives, mainly separable functions. 6.2 The General Framework Let f : R n R be a nonnegative nondecreasing convex function. We assume that the function f is continuous and differentiable, and satisfies the following monotonicity condition: x x R n, fx fx 6.6 Here, x x means x i x i for all i [n]. We consider the online fractional covering problem 6.1 where the constraints in A arrive online. Our algorithm is a primal-dual algorithm, which works with the following pair of convex programs: P : min fx D : max m i=1 y i f µ Ax 1 y A µ x 0. y 0. Here f is the convex conjugate of f, which is defined as f µ = sup{µ z fz}. 6.7 z Observe that by scaling the rows of A appropriately, we can transform any covering LP of the form Ax b into the form above. The following duality is standard. Lemma Weak duality. Let x, y, µ be feasible primal and dual solutions to P and D respectively. Then, Primal objective = fx m y i f µ = Dual objective. 6.8 i=1 72

83 Proof. Technion - Computer Science Department - Ph.D. Thesis PHD m y i = y 1 y Ax µ x = µ x fx + fx f µ + fx. i=1 Rearranging, we get the desired The Algorithm The algorithm maintains a feasible primal x and a feasible dual solution y at each time. Fractional Algorithm: At round t: Let τ be a continuous variable denoting the current time. While the new constraint is unsatisfied, i.e., n j=1 a tjx j < 1, increase τ at rate 1 and: Change of primal variables: For each j with a tj > 0, increase each x j at rate x j τ = a tj x j + 1 d t j fx. 6.9 Here d t is the row sparsity of the constraint matrix so far. j th -coordinate of the gradient fx. Change in dual variables: Set µ = fδx, where δ > 0 is determined later. Increase y t at rate r = 1 ln1+2d t minn l=1 { l fδx l fx }. j fx is the If the dual constraint of variable x j is tight, that is, t i=1 a ijy i = µ j, then, Let m j = arg maxt i=1 {a ij y i > 0}. Increase y m j at rate a tj a m j j r. Note that this change occurs only if a tj is strictly positive. We emphasize that the primal algorithm does not depend on the value δ. The last step in the algorithm decreases certain dual variables; all other steps only increase primal and dual variables. For the analysis, we denote x τ, y τ, µ τ, r τ as the value of x, y, µ, r at time τ, respectively. In addition, we denote d τ as the value of d t, if at time τ constraint t is being handled. Observation 6.1. For any δ > 0, the following are maintained. 73

84 The algorithm maintains a feasible monotonically nondecreasing primal solution. The algorithm maintains a feasible dual solution with nondecreasing µ j. Technion - Computer Science Department - Ph.D. Thesis PHD Proof. The first property follows by construction, since we only increase x till reaching a feasible solution. For the second property, we observe that the dual variables µ are nondecreasing since fx is nondecreasing. We prove that y, µ is feasible by induction over the execution of the algorithm. While processing constraint t, if t i=1 a ijyi τ < µτ j for column j we are trivially satisfied. Suppose that during the processing of constraint t, we have t i=1 a ijyi τ = µτ j for some dual constraint j and time τ. Now the dual decrease part of the algorithm kicks in, and the rate of change in the left-hand side of the dual constraint is: t a ij yi τ = a tj r τ a m τ j j i=1 a tj a m j j r τ = 0. Before analyzing the competitive factor, let us first prove the following claim. Claim For a variable x j, let T j = {i a ij > 0} and let S j be any subset of T j. Then, x τ 1 j exp ln 1 + 2d τ max i Sj {a ij } d τ µ τ a ij y τ i j i S j Proof. The proof is by induction on the size of S j. Let τi denote the value of τ at the arrival of the ith primal constraint. We first note that the increase in the primal variables at any time τi τ τi + 1 can be alternatively formulated by the following differential equation. x j y i = ln 1 + 2d i min n l=1 { l fδx l fx } aij x j + 1 d i j fx By solving the latter equation we get for any τi τ τi + 1, x τ j + 1 a ij d i x τi j + 1 a ij d i ln 1 + 2d i aij x j + 1 d i j fδx ln 1 + 2di exp j fδx τ a ijy τ i, 6.12 where we use the fact that j fδx is monotonically nondecreasing. Note that Inequality 6.12 is satisfied even when no decrease is performed on the dual variables, and such a decrease only effects the right-hand side of the inequality. Now, if S j = {i}, then we have, x τ j x τi+1 j ln 1 + 2di exp j fδx τi+1 a ijyi τ 1 exp max i Sj {a ij } d i x τi ln 1 + 2d i µ τ j 1 j + 1 a ij d i a ij d i 6.13 a ij yi τ 1, i S j

85 where Inequality 6.13 follows immediately from Inequality 6.14 follows as x τi j 0, µ τ j = jfδx τ and the value of j fδx is monotonically nondecreasing in time. Next, we use the crucial observation [BG13] that the expression Hd, L = 1 exp L ln 1 + 2d d is monotonically decreasing in d, for any 0 L 1, to deduce 1 exp ln 1 + 2d τ max i Sj {a ij } d τ µ τ a ij y τ i j i S j To prove the observation we show that the derivative of Hd, L in d is nonpositive, i.e, 1 2L exp L ln 1 + 2d 1 exp L ln 1 + 2d 1 0. d 2d + 1 d2 Simplifying we get: 2d + 1 2dL exp L ln 1 + 2d 2d + 1, which is equivalent to: ln 2d + 1 2dL + L ln 1 + 2d ln2d + 1. It is easy to check that the latter expression holds with equality for L = 0 and L = 1. Furthermore, 2 the left-hand side is a concave function of L its second derivative equals 2d+1 2dL and therefore the inequality holds for any 0 L 1. Next, assume that the claim is true for any subset S j < s. Then given S j = {i 1,..., i s } we have, x τ j x τis+1 j exp exp exp ln 1 + 2dis exp ln 1 + 2d is µ τ j ln 1 + 2d is µ τ j ln 1 + 2d is µ τ j j fδx τis+1 a ijyi τ a ij yi τ x τis j + x τi s 1+1 j + x τis j a ij d is max i Sj {a ij } d is 1 a ij yi τ 1 max i Sj {a ij } d is a ij yi τ 1 exp max i Sj {a ij } d is a ij d is max i Sj {a ij } d is 1 ln 1 + 2d i s µ τ j max i Sj {a ij } d is a ij yi τ i S j \{i s} max i Sj {a ij } d is 1 = exp ln 1 + 2d i s max i Sj {a ij } d is µ τ a ij y τ i 1 j i S j 1 exp ln 1 + 2d τ max i Sj {a ij } d τ µ τ a ij y τ i 1, 6.20 j i S j Inequality 6.17 follows from Inequality 6.18 follows as e x 1 0 and the fact that 75

86 j fδx is monotonically nondecreasing. Inequality 6.19 follows as x τi s 1+1 j 1 exp ln 1 + 2d i s max i Sj \{i s}{a ij } d is µ τ j i S j \{i s} a ij yi τ 1 by the induction hypothesis and the monotonicity of Hd, L in d. Inequality 6.20 follows from the monotonicity of Hd, L as well. Theorem 6.2. The competitive ratio of the algorithm is: min z { l fδz l fz minn l=1 4 ln1 + 2d } max z where δ > 0 is the parameter chosen in the algorithm. δz fδz fδz fz, 6.21 Proof. Consider the update when primal constraint t arrives and τ is the current time. Let Uτ denote the set of tight dual constraints at time τ. That is, for every j Uτ we have a tj > 0 and t i=1 a ijyi τ = µ τ j. Moreover, let us define for every j Uτ, S j = {i a ij > 0, yi τ > 0}. Clearly, i S j a ij yi τ = t i=1 a ijyi τ = µτ j, hence by Claim and the fact that j a tjx τ j < 1, we get, 1 > a tj x τ j a tj exp ln1 + 2d t 1, max i Sj {a ij } d t j Uτ j Uτ and after rearranging we get a tj j Uτ max i Sj {a ij } 1 2. As a result, we can bound the rate of change in the dual expression t i=1 y i at any time τ: t i=1 y i τ r τ a tj a m j Uτ j j r τ 1 2 rτ On the other hand, when processing constraint t during the execution of the algorithm, the rate of increase of the primal objective f is: fx τ τ = j j fx τ xτ j τ = j a tj >0 j fx τ atj x τ j + 1 d t j fx τ = j a tj >0 a tj x τj + 1dt The final inequality uses the fact that the covering constraint is unsatisfied, and that d t is at least the number of non-zeroes in the vector a t. From 6.22 and 6.23 we can now bound the following primal-dual ratio: t i=1 yτ i fx τ rτ { } l fδx τ l fx τ minn 4 = l=1 4 ln 1 + 2d t { } l fδx τ l fx τ 4 ln 1 + 2d min n l=1 76

87 Thus, if x and y are the final primal and dual solutions we get, { } m min x min n l fδx l=1 l fx y i fx ln 1 + 2d i=1 To complete the proof of Theorem 6.2, we use the following standard claim. Claim For any a R n, we have f fa = a fa fa. Proof. By definition, f fa = sup x {x fa fx}. Note that x fa fx is concave as a function of x. So a necessary and sufficient condition for optimality is: i fx = i fa, i [n]. Thus setting x = a, we have f fa = a fa fa. Finally, we can attain the competitive ratio by a simple application of Claim and Inequality 6.25 to the definition of the dual. Indeed, Dual = m y i f µ min x minn l=1 4 ln1 + 2d i=1 { } l fδx l fx by Inequality 6.25, and using Claim with a = δx, we get = min x minn l=1 4 ln1 + 2d Hence the proof. min z { } l fδx l fx { l fδz l fz minn l=1 4 ln1 + 2d } f fδx fx fx δx fδx fδx fx fx max z δz fδz fδz Primal How to choose the value of δ? If we set c = 1/δ and optimize over c, the competitive ratio is: Dual Primal max c>0 min z minn l=1 { } l fz l fcz 4 ln1 + 2d max z fz z fz fz fcz This expression looks quite formidable, however it simply captures how sharply the function f changes locally. For special cases it gives us very simple expressions; e.g., for linear cost functions fx = c x it gives us Dual P rimal/oln d. See Section 6.3 for several such examples of applications using this framework. 77

88 Online Minimization Technion - Computer Science Department - Ph.D. Thesis PHD In the general framework above, we maintained both the primal and dual solutions simultaneously. If our goal is to solve 6.1 online, i.e., to minimize the convex function fx subject to covering constraints arriving online, then the dual values can be determined with hindsight once the final value of the primal variables x has been computed. In particular, we set µ = fδx once and for all, and increase y at a constant rate r = { } l fδx l fx. ln 1 + 2d min n l=1 These modifications can be easily plugged into the analysis above, allowing us to obtain a constant lower bound in 6.24, and thus to omit the minimization over x in the competitive ratio. Observe that the update for the primal variables remains the same. Corollary 6.3. For online minimization, the competitive ratio of the algorithm is: max c>0 min z { } l fz l fcz minn l=1 4 ln1 + 2d Monotone Online Maximization z fz fz 6.27 fcz If our goal is to solve 6.2 and maximize a dual objective function subject to packing constraints, then indeed the above framework increases the dual variables µ, however the dual variables y can both increase and decrease. Moreover, this potential decrease is essential for the competitive ratio to be independent of the magnitude of entries in the matrix A [GN14]. In settings where decrease in dual variables is not allowed, we need to slightly modify and simplify the online dual update in the algorithm by setting where ρ is an upper bound on r = { } l fδx l fx, ln 1 + d t ρ min n l=1 max t{a tj } min t,atj >0{a tj } for all 1 j n. And we skip the last step which decreases duals. Here, application of Claim at any round t and time τt τ τt + 1 yields 1 a tj x τ j 1 ln 1 + dρ max t i=1 {a exp ij} d µ τ j which implies ln 1 + d maxt i=1 {a ij} a tj / ln 1 + dρ µ τ j. t i=1 a ijy i µ τ j t a ij y i 1, 6.28 i=1, and thus guarantees t i=1 a ijy i 78

89 Corollary 6.4. For online maximization, when decreasing dual variables is not allowed, the adjusted algorithm obtains the following competitive ratio: max c>0 min z { l fz l fcz minn l=1 2 ln1 + ρd } max z z fz fz fcz 6.29 This results in a worse competitive ratio, but having monotone duals is useful for two reasons: a in some settings we need monotone duals, as in the profit maximization application in Section 6.3.3, and b we get a simpler algorithm since we skip the third step of the online dual update involving the dual decrease. 6.3 Applications We demonstrate how the general framework above can be used to give algorithms for several previously-studied as well as new problems. In contrast to previous papers where a primal-dual algorithm had to be tailored to each of these problems, we use the framework above to solve the underlying convex program, and then apply a suitable rounding algorithm to the fractional solution l p -norm of Packing Constraints We consider the problem of solving a mixed packing-covering linear program online, as defined by Azar et al. [ABFP13]. The covering constraints Ax 1 arrive online, as in the above setting. There are also K packing constraints n j=1 b kj x j λ k for k [K] that are given up-front. The right sides λ k of these packing constraints are themselves variables, and the objective is to minimize K k=1 λp k or alternatively, λ K p = p k=1 λp k. All the entries in the constraint matrices A = a ij and B = b kj are nonnegative. Theorem 6.5. There is an Op ln d-competitive online algorithm for fractional covering with the objective of minimizing l p -norm of multiple packing constraints. Proof. In order to apply our framework to this problem, we seek to minimize the convex function fx = 1 p Bx p p = 1 p K B k x p = 1 p k=1 K n b kj x j k=1 j=1 p. This is the p-power of the original objective; above B k = b k1,, b kn is the k th packing constraint. To obtain the competitive ratio, observe that j fx = K k=1 b kj P k x p 1. Thus, we have 79

90 for all c > 0, x R n + and 1 j n: Technion - Computer Science Department - Ph.D. Thesis PHD fz fcz = 1/cp j fz j fcz = 1/cp 1 n j=1 z j j fz n j=1 = z K j k=1 b kj B k z p 1 fcz fcz Substituting δ = 1/c and plugging into 6.26 we get: = p fz fcz = p1/c p. δ p 1 Dual 4 ln1 + 2d pδp + δ p Primal 6.30 So the primal-dual ratio as a function of δ is Dual Primal δp 1 /L p 1δ p where L = 4 ln1 + 2d. This quantity is maximized when δ = 1 pl, leading to a primal-dual ratio of 1/pLp. Taking the p th root of this quantity gives us that the l p -norm of the primal is at most pl = Op ln d times the optimum. When p = Θln m, the l p and l norms are within constant factors of each other, we obtain the online mixed packing-covering LP OMPC problem studied by Azar et al. [ABFP13]. For this setting this gives an improved Oln d ln m-competitive ratio, where d is the row-sparsity of the matrix A, and m is the number of packing constraints. This competitive ratio is known to be tight [ABFP13, Theorem 1.2]. Remark 6.6. The above result also holds if function f is the sum of distinct powers of linear functions, i.e. fx = K k=1 B kx p k where p 1,, p K 1 may be non-uniform. For this case, we obtain an Op ln d-competitive algorithm where p = max K k=1 p k Online Set Cover with Multiple Costs Consider the online set-cover problem [AAA + 09] with n sets {S j } n j=1 over some ground set U. Apart from the set system, we are also given K linear cost functions B k : [n] R + for k [K]. Elements from U arrive online and must be covered by some set upon arrival; the decision to select a set into the solution is irrevocable. The goal is to maintain a set-cover that minimizes the l p norm of the K cost functions. We use Theorem 6.5 along with a rounding scheme similar to [GKP12] to obtain: Theorem 6.7. There is an O p 3 ln p -competitive ln d ln r randomized online algorithm for set cover minimizing the l p -norm of multiple cost-functions. Here d is the maximum number of sets containing any element, and r = U is the number of elements. 80

91 Proof. We use the following convex relaxation. There is a variable x j for each set j [n] which denotes whether this set is chosen. min gx = s.t. K n p b kj x j + k=1 j:e S j x j 1, x 0. j=1 e U n K j=1 k=1 b p kj x j We can use our framework to solve this fractional convex covering problem online. Although the objective has a linear term in addition to the p-powers, we obtain an Op ln d p -competitive algorithm as noted in Remark 6.6. Let C denote the p th power of the optimal objective of the given set cover instance. Then it is clear that the optimal objective of the above fractional relaxation is at most 2C. Thus the objective of our fractional online solution gx = Op ln d p C. To get an integer solution, we use a simple online randomized rounding algorithm. For each set j [n], define X j to be a {0, 1}-random variable with Pr[X j = 1] = min{4p ln r x j, 1}. This can easily be implemented online. It is easy to see by a Chernoff bound that for each 1 element e, it is not covered with probability at most. If an element e is not covered by this r 2p rounding, we choose the set minimizing min n j=1 { K k=1 bp kj : e S j}; let e [n] index this set and C e = K k=1 bp ke. Observe that C e C for all e U. To bound the l p -norm of the cost, let C k = n j=1 b kj X j be the cost of the randomly rounded solution under the k th cost function, and let C := K k=1 Cp k. Also for each element e U, define: D ek = b ke for all k [K] and D e = C e if e is not covered by the rounding. D ek = 0 for all k [K] and D e = 0 otherwise. Note that D e = K k=1 Dp ek. The pth power of the cost paid by the algorithm is: C = K C k + p D ek 2 p e U k=1 K k=1 C p k + 2p K k=1 p D ek 2 p C + 2 p e U K k=1 r p e U = 2 p C + 2r p e U D e 6.31 We now bound E[C] using Observe that E[C k ] 4p ln r n j=1 b kj x j. Since each C k is the sum of independent nonnegative random variables, we can bound E[C p k ] using a concentration inequality involving p th moments [Lat97]: E[C p k ] K p E[C k ] p + n j=1 n p E[b p kj Xp j ] K p 4p ln r p b kj x j + 4p ln r j=1 j=1 D p ek n b p kj x j. 81

92 Above K p = Op/ ln p p. By linearity of expectation, Technion - Computer Science Department - Ph.D. Thesis PHD E[C] = K E[C p k ] K p4p ln r p k=1 K k=1 n p b kj x j + j=1 Thus we have E[C] = O p 3 p ln p ln d ln r C. Observe that, n b p kj x j = K p 4p ln r p gx. [ ] E D e = Pr[e uncovered] C e r 2p C = r 1 2p C. e U e U Hence, using these bounds in 6.31, we have j=1 e U E[C] 2 p E[C] + 2r p p 3 p E[D e ] = O ln d ln r C. ln p e U Profit maximization with nonseparable production costs We consider a profit maximization problem called PMPC for a single seller with production costs for items. There are m items that the seller can produce and sell. The production levels are given by a vector µ R m + ; the total cost incurred by the seller to produce µ j units of every item j [m] is gµ for some production cost function g : R m + R +. In this work we allow for functions g which are convex and monotone in a certain sense 3. There are n buyers who arrive online. Each buyer i [n] is interested in certain subsets of items a.k.a. bundles which belong to some set family S i 2 [m]. The extent of interest of buyer i for subset S S i is given by v i S, where v i : S i R + is her valuation function. If buyer i is allocated a subset T S i of items, he pays the seller his valuation v i T. Consider the optimization problem for the seller: he must produce some items and allocate bundles to buyers so as to maximize the profit n i=1 v it i gµ, where T i S i denotes the bundle allocated to buyer i and µ = n i=1 χ T i R m is the total quantity of all items produced. Here χ S {0, 1} m is the characteristic function of the set S. Observe that in this paper we consider a non-strategic setting, where the valuation of each buyer is known to the seller; this differs from an auction setting, where the seller has to allocate items to buyers without knowledge of the true valuation, and the buyers may have an incentive to mis-report their true valuations. This class of maximization problems with production costs was introduced by Blum et al. [BGMS11] and more recently studied by Huang and Kim [HK15]. Both these works dealt with the online auction setting, but in both works they considered a special case where the production costs were separable over items; i.e, where gµ = j g jµ j for some convex functions 3 The formal conditions on g appear in Assumption

93 g j. In contrast, we can handle general production costs g, but we do not consider the auction setting. Our main result is for the fractional version of the problem where the allocation to each buyer i is allowed to be any point in the convex hull of the S i. In particular, we want to solve following convex program in an online fashion: n maximize v i T y it gµ i=1 T S i n i=1 D T S i y it 1 i [n], j T y it µ j 0 j [m], 6.33 T S i y, µ 0. Note that this problem looks like the dual of the covering problems we have been studying in previous sections, and hence is suggestively called D. Consider the following dual program that gives an upper bound on the value of D. n minimize u i + g x P i=1 u i + x j v i T j T i [n], T S i, 6.34 u, x 0. Again, to be consistent with our general framework, we refer to this minimization covering problem as the primal P. Observe that this primal-dual pair falls into the general framework of Section 6.2 if we set n fu, x := u i + g x. Indeed, if we were to construct the Fenchel dual of P as in Section 6.2, we would again arrive at D after some simplification using the fact that g = g for any convex function g with subgradients 4 [Roc70]. In order to apply now our framework, we assume that f is continuous, differentiable and satisfies fz fz for all z z. This translates to the following assumptions on the production function g: Assumption Function g : R m + R + recall g x = sup µ {x T µ gµ} is monotone, convex, continuous, differentiable and has g x g x for all x x. i=1 Since we require irrevocable allocations, we cannot use the primal-dual algorithm from 4 A subgradient of g : R m R at u is a vector V u R m such that gw gu + V T u w u for all w R m. 83

94 Section 6.2.1, since that algorithm could decrease the dual variables y it. Instead, we use the algorithm from Section which ensures both primal and dual variables are monotonically raised. We can now use the competitive ratio from 6.29 when g 0 = 0 this ratio is at least max c>0 min z { l g z l g cz minn l=1 2 ln1 + ρd } max z z g z g z g cz 6.35 In this expression, recall that d is the row-sparsity of the covering constraints in P, i.e. d = 1 + max T i S T. And the term ρ is the ratio between the maximum and minimum i non-zero valuations any player i has for any set in S i. In other words, ρ R := An Efficient Algorithm for D max {v i T : T S i, i [n]} min {v i T : T S i, v i T > 0, i [n]} To solve the primal-dual convex programs using our general framework in polynomial time, we need access to the following oracle: Oracle: Given vectors u, x, and an index i, find a set T S i such that or else report that no such set exists. u i + j T x j v i T < 0, 6.37 Given such an oracle, we maintain a u, x such that 2u, 2x is feasible for P as follows. When a new buyer i arrives, we use the oracle on 2u, 2x. While it returns a set T S i, we update u, x to satisfy the constraint Else we know that 2u, 2x is a feasible solution for P. This scaling by a factor of 2 allows us to bound the number of iterations as follows: when buyer i arrives, define Q i = min{u i, Vmax} i + m j=1 min{x j, Vmax} i where Vmax i = max {v i T T S i }. Note that Q i m + 1Vmax i and Q i increases by at least Vmin i /2 in each iteration where Vmin i = min {v it T S i, v i T > 0}. So the number of iterations is at most OmR where R is defined in This gives us a polynomial-time online algorithm if R is polynomially bounded. What properties do we need from the collection S i and valuation functions v i such that can we implement the oracle efficiently? Here are some cases when this is possible. Small S i. If each S i is polynomially bounded then we can solve 6.37 just by enumeration. An example is when each buyer is single-minded i.e. she wants exactly one bundle. Supermodular valuations. Here, buyer i has S i = 2 [m] and v i : 2 [m] R + is supermodular, i.e. v i T 1 + v i T 2 v i T 1 T 2 + v i T 1 T 2 for all T 1, T 2 [m]. In this case, we can solve 6.37 using polynomial-time algorithms for submodular minimization [Sch03], since the expression inside the minimum is a linear function minus a supermodular function. 84

95 Matroid constrained valuations. In this setting, each buyer i has some value v ij for each item j [m] and the feasible bundles S i are independent sets of some matroid. 5 Here we can solve 6.37 by maximizing a linear function over a matroid. This is because the minimization min x j v i T T S i j T = min can be done in polynomial time [Sch03]. Online Rounding T S i j T x j v ij = maxv ij x j, T S i We now have a deterministic online algorithm for D with competitive ratio as given in Moreover, this algorithm runs in polynomial time for many special cases. Here we show how the fractional online solution can be rounded to give integral allocations. We make the following additional assumption on the production costs. Assumption There is a constant β > 1 such that gaµ a β gµ 0 < a < 1, µ R m +. Theorem 6.8. For any ɛ [ 1 2, 1 there is a randomized online algorithm for PMPC under Assumptions and that achieves expected profit at least 1+ɛ 2 2 β opt m α gl 1, where opt is the offline optimal profit, α is the fractional competitive ratio and L = O ln m. ɛ 2 Note that the additive error term gl 1 is independent of the number n of buyers: it depends only on the number m of items and the production function g. We also give an example below which shows that any rounding algorithm for D must incur some such additive error. We now describe the rounding algorithm. Let ɛ [ 1 2 2, 1 be any value; set a = 1 + ɛ 2 The rounding algorithm scales the fractional allocation y by factor a < 1 and performs randomized rounding. Let M Z m + denote the integral quantities of different items produced at any point in the online rounding. Upon arrival of buyer i, the algorithm does the following. 1. Update fractional solution y, µ according to the fractional online algorithm. 2. If M j > 1 + ɛaµ j + 6 ɛ ln m for any j [m] then skip. 3. Else, allocate set T S i to buyer i with probability a y it. Claim Pr[M j > 1 + ɛaµ j + 6 ɛ ln m] 1 m 2 for all items j [m] and ɛ [ 1 2, 1. Proof. Fix j [m] and ɛ [ 1 2, 1. Note that M j is the sum of independent 0 1 random variables with E[M j ] a µ j. The claim now follows by Chernoff bound, Pr[M j > 1 + δe[m j ]] exp δ2 2+δ E[M j], when setting δ = ɛ + 6 ln m ɛe[m j ]. β 1. 5 An alternative description of such valuation functions is to have S i = 2 [m] and v it = maximum weight independent subset of T where each item j has weight v ij. Viewed this way, the buyer s valuation is a weighted matroid rank function which is a special submodular function. 85

96 Below l := 6 ɛ ln m + 1 and L := ln m ɛ l = O. ɛ 2 Technion - Computer Science Department - Ph.D. Thesis PHD Lemma The expected objective of the integral allocation is at least a1 1 n m v i T y it gµ gl 1. i=1 T S i Proof. For convenience assume that m 3. 6 Note that the algorithm ensures in step 2 above that M 1 + ɛa µ + l 1. So with probability one, the production cost is at most: 1 g1 + ɛa µ + l 1 = g 1 + ɛ 1 + ɛ2 a µ + ɛ ɛ g 1 + ɛ 2 a µ + g 1 + ɛ ɛ l ɛ l ɛ ɛ 2 a β gµ + g L 1 = 1 + ɛ 1 a gµ + g L 1 a1 1 gµ + g L 1. m The first inequality holds by convexity of g. The second inequality uses Assumption and 1 + ɛ 2 a < 1. The next equality is by definition of a. The last inequality follows as ɛ 1 2 and m 3. By Claim 6.3.3, the probability that we skip some buyer i is at most 1 m. Thus the expected total value is at least 1 1 n m a i=1 T S i v i T y it. Subtracting the upper bound on the cost from the expected value, we obtain the lemma. This completes the proof of Theorem 6.8. Integrality Gap. We note that the additive error term is necessary for any algorithm based on the convex relaxation D. Consider a single buyer with S 1 = 2 [m] and v 1 T = T. Let gµ = m j=1 µ2 j. The optimal integral allocation clearly has profit zero. However the fractional optimum is Ωm due to the feasible solution with y 1T = 2 m for all T [m] and µ j = 1 2 for all j [m]. Thus any algorithm using this relaxation incurs an additive error depending on m. Examples of Production Costs Here we give two examples of production costs satisfying Assumptions and to which our results apply. In each case, we first show the competitive ratio obtained for the fractional convex program, and then use the rounding algorithm to obtain an integral solution. 6 If there are only one or two items we can always add a dummy item, without effecting the competitive ratio by more than a constant factor. 86

97 Example 1. Consider a seller who can produce items in K different factories, where the k th factory produces in one hour of work p kj units of item j. The production cost is the sum of q th powers of the work hours of the K factories specifically, we get a linear production cost for q = 1 and the q th power of makespan when q ln K. This corresponds to the following function: gµ = min { 1 q K K } z q k : p kj z k µ j, j [m], z k=1 We scale the objective by 1/q to get a more convenient form. The dual function is: g x = 1 p k=1 K m p kj x j k=1 j=1 p, where 1 p + 1 q = 1. Applying our framework as Assumption is satisfied, as in Section 6.3.1, we obtain an α = Op ln ρd p -competitive fractional online algorithm, where ρ = R the maximum-to-minimum ratio of valuations and row-sparsity d m + 1. We note that our framework obtains a monotone feasible solution y, µ, but does not explicitly assign value to z. To obtain a monotone feasible z, we use the fact that by the algorithm construction, µ j = j g δx = δ p 1 m k=1 p kj m p kj x j Hence we can define the following assignment, z k = δ p 1 m j=1 p kj x j p 1, which is monotonically increasing as x is monotonically increasing, and consistent with the dual feasible solution y, µ. Furthermore, the ratio between the values of the primal and dual programs matches exactly the competitive ratio stated in Theorem 6.2. Therefore, combined with Theorem 6.8 note that Assumption is satisfied with β = q, setting ɛ = 1 2, we obtain: Corollary 6.9. There is a randomized online algorithm for PMPC with cost function 6.38 for q > 1 that achieves expected profit at least 1 1 m Op ln Rd goln m 1. p Note that goln m 1 K O ln m p min q where pmin > 0 is the minimum positive entry in p kj s. Example 2. j=1 This deals with the dual of the above production cost. Suppose there are K different linear cost functions: for k [K] the k th cost function is given by c k1,, c km where c kj is the cost per unit of item j [m]. The production cost g is defined to be the scaled sum of p th powers of these K different costs: gµ = 1 p opt K m c kj µ j k=1 j=1 p p

98 This has dual: g x = min { 1 q K K } z q k : c kj z k x j, j [m], z 0, where 1 p + 1 q = 1. k=1 k=1 The primal program P after eliminating variables {x j } m j=1 is given below with its dual: minimize u i + j T n u i + 1 q i=1 K z q k P k=1 K c kj z k v i T, i [n], T S i, k=1 u, z 0. n maximize v i T y it 1 p i=1 T S i y it 1, i [n], T S i n c kj y it λ k 0, i=1 T S i j T y, λ 0. K λ p k D k=1 k [K], Note that the row-sparsity in P is d = K + 1 which is incomparable to m. We obtain a solution to P by setting x = Cz from any solution u, z to P where C m K has k th column c k1,, c km. We can apply our algorithm to the convex covering problem P as g λ = 1 K p k=1 λp k satisfies Assumption This algorithm maintains monotone feasible solutions u, z to P and y, λ to D. However, to solve D online we need to maintain variables y, µ which is different from the variables y, λ in D. We maintain y in D to be the same as that in D. We set the production quantities µ j = n i=1 T S i 1 j T y it so that all constraints in D are satisfied. Note that the dual variables y allocations and µ production quantities are monotonically increasing, so this is a valid online algorithm. In order to bound the objective in D we use the feasible solution y, λ to D. Note that for all k [K]: m c T k µ = c kj µ j = j=1 m c kj j=1 i=1 n T S i 1 j T y it = n y it i=1 T S i j T c kj λ k. So the objective of y, µ in D is at least that of y, λ in D. Our general framework then implies a competitive ratio for the fractional problem of α = Oq ln ρd q = Oq ln ρ q where ρ = R K max{c kj : k [K], j [m]}. min{c kj : k [K], j [m]} Above R is the maximum-to-minimum ratio of valuations, and recall d K + 1. Combined with Theorem 6.8 ɛ = 1 2, we obtain: Corollary There is a randomized online algorithm for PMPC with cost function 6.39 for p > 1 that achieves expected profit at least 1 1 m Oq ln ρ goln m 1. q Here goln m 1 K O m ln m c max p where c max is the maximum entry in c kj s. opt 88

99 Bibliography [AAA + 03] N. Alon, B. Awerbuch, Y. Azar, N. Buchbinder, and J. Naor. The online set cover problem. In Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, STOC, pages , [AAA + 06] N. Alon, B. Awerbuch, Y. Azar, N. Buchbinder, and J. Naor. A general approach to online network optimization problems. ACM Transactions on Algorithms, 24: , [AAA + 09] N. Alon, B. Awerbuch, Y. Azar, N. Buchbinder, and J. Naor. The online set cover problem. SIAM Journal on Computing, 392: , [ABBS10] J. Abernethy, P. L. Bartlett, N. Buchbinder, and I. Stanton. A regularization approach to metrical task systems. In Proceedings of the 21st International Conference on Algorithmic Learning Theory, ALT, pages , [ABFP13] Y. Azar, U. Bhaskar, L. K. Fleischer, and D. Panigrahi. Online mixed packing and covering. In Proceedings of the Twenty-Fourth Annual ACM- SIAM Symposium on Discrete Algorithms, SODA, pages , [ABL + 13] L. Andrew, S. Barman, K. Liggett, M. Lin, A. Meyerson, A. Roytman, and A. Wierman. A tale of two metrics: simultaneous bounds on competitiveness and regret. In SIGMETRICS, pages , [ACER12] A. Adamaszek, A. Czumaj, M. Englert, and H. Räcke. An olog k- competitive algorithm for generalized caching. In Proceedings of the Twentythird Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages , [ACN00] D. Achlioptas, M. Chrobak, and J. Noga. Competitive analysis of randomized paging algorithms. Theoretical Computer Science, 234: , [ACP14] Y. Azar, I. R. Cohen, and D. Panigrahi. Online covering with convex objectives and applications. CoRR, abs/ ,

100 [ACV13] J. Abernethy, Y. Chen, and J. Wortman Vaughan. Efficient market making via convex optimization, and a connection to online learning. ACM Transactions on Economics and Computation, 12:12:1 12:39, [AGK12] S. Anand, N. Garg, and A. Kumar. Resource augmentation for weighted flow-time explained by dual fitting. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, SODA, pages SIAM, [BB97] A. Blum and C. Burch. On-line learning and the metrical task system problem. In Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT, pages 45 53, [BB00] A. Blum and C. Burch. On-line learning and the metrical task system problem. Machine Learning, 391:35 58, [BBK99] A. Blum, C. Burch, and A. T. Kalai. Finely-competitive paging. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science, FOCS, pages , [BBMN11] N. Bansal, N. Buchbinder, A. Madry, and J. Naor. A polylogarithmiccompetitive algorithm for the k-server problem. In Proceedings of the 52nd Annual Symposium on Foundations of Computer Science, FOCS, pages , [BBN10] N. Bansal, N. Buchbinder, and J. Naor. Towards the randomized k-server conjecture: a primal-dual approach. In Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 40 55, [BBN12a] N. Bansal, N. Buchbinder, and J. Naor. A primal-dual randomized algorithm for weighted paging. Journal of the ACM, 594:19:1 19:24, [BBN12b] N. Bansal, N. Buchbinder, and J. Naor. Randomized competitive algorithms for generalized caching. SIAM Journal on Computing, 412: , [BCK02] A. Blum, S. Chawla, and A. T. Kalai. Static optimality and dynamic search-optimality in lists and trees. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1 8, [BCP09] N. Bansal, Ho-Leung Chan, and K. Pruhs. Speed scaling with an arbitrary power function. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages Society for Industrial and Applied Mathematics,

101 [Bel66] L.A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 52:78 101, Technion - Computer Science Department - Ph.D. Thesis PHD [Ben13] M. Bensimhoun. A note on the mediant inequality [BETW01] M. Brehob, R. Enbody, E. Torng, and S. Wagner. On-line restricted caching. In Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages , [BEWT04] M. Brehob, R. Enbody, S. Wagner, and E. Torng. Optimal replacement is nphard for nonstandard caches. IEEE Transactions on computers, 531:73 76, [BEY98] A. Borodin and R. El-Yaniv. Online computation and competitive analysis. Cambridge University Press, [BG13] N. Buchbinder and R. Gonen. Incentive compatible multi-unit combinatorial auctions: a primal dual approach. Algorithmica, pages 1 24, [BGMS11] A. Blum, A. Gupta, Y. Mansour, and A. Sharma. Welfare and profit maximization with production costs. In Proceedings of the 52nd Annual Symposium on Foundations of Computer Science, FOCS, pages 77 86, [BKRS00] A. Blum, H. J. Karloff, Y. Rabani, and M. E. Saks. A decomposition theorem for task systems and bounds for randomized server problems. SIAM Journal on Computing, 305: , [BL10] J. M. Borwein and A. S. Lewis. Convex analysis and nonlinear optimization: theory and examples, volume 3. Springer Science & Business Media, [BLS92] A. Borodin, N. Linial, and M. E. Saks. An optimal on-line algorithm for metrical task system. Journal of the ACM, 394: , [BN05] N. Buchbinder and J. Naor. Online primal-dual algorithms for covering and packing problems. In Proceedings of the 13th Annual European Conference on Algorithms, ESA, pages , [BN09a] N. Buchbinder and J. Naor. The design of competitive online algorithms via a primal-dual approach. Foundations and Trends in Theoretical Computer Science, 32-3:93 263, [BN09b] N. Buchbinder and J. Naor. Online primal-dual algorithms for covering and packing. Mathematics of Operations Research, 342: ,

102 [BNS13] N. Buchbinder, J. Naor, and R. Schwartz. Simplex partitioning via exponential clocks and the multiway cut problem. In Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC, pages , [BPS09] N. Bansal, K. Pruhs, and C. Stein. Speed scaling for weighted flow time. SIAM Journal on Computing, 394: , [BV04] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, [CBL06] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, [CCPV11] G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 406: , [CFLP00] R. D. Carr, L. K. Fleischer, V. J. Leung, and C. A. Phillips. Strengthening integrality gaps for capacitated network design and covering problems. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages , [CMEDV10] K. Crammer, Y. Mansour, E. Even-Dar, and J. Wortman Vaughan. Regret minimization with concept drift. In Conference on Computational Learning Theory COLT, pages , [CT91] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley- Interscience, [Cun84] W. H. Cunningham. Testing membership in matroid polyhedra. Journal of Combinatorial Theory, Series B, 362: , [DH14] N. R. Devanur and Z. Huang. Primal dual gives almost optimal energy efficient online algorithms. In Proceedings of the Twenty-Fifth Annual ACM- SIAM Symposium on Discrete Algorithms, SODA, pages , [EDKMM09] E. Even-Dar, R. Kleinberg, S. Mannor, and Y. Mansour. Online learning for global cost functions. In Conference on Computational Learning Theory COLT, [Edm70] J. Edmonds. Submodular functions, matroids, and certain polyhedra. Combinatorial Structures and Their Applications, pages 69 87,

103 [Edm71] J. Edmonds. Matroids and the greedy algorithm. Mathematical Programming, 11: , Technion - Computer Science Department - Ph.D. Thesis PHD [EvS07] L. Epstein and R. van Stee. Calculating lower bounds for caching problems. Computing, 803: , [FKL + 91] A. Fiat, R. M. Karp, M. Luby, L. A. McGeoch, D. D. Sleator, and N. E. Young. Competitive paging algorithms. Journal of Algorithms, 124: , [FKM05] A. D. Flaxman, A. T. Kalai, and H. B. McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages , [FL93] J. Friedman and N. Linial. On convex body chasing. Discrete & Computational Geometry, 9: , [FMS02] A. Fiat, M. Mendel, and S. S. Seiden. Online companion caching. In In Proc. of the 10th Annual European Symposium on Algorithms, ESA, pages , [FS97] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 551: , [GKP10] A. Gupta, R. Krishnaswamy, and K. Pruhs. Scalably scheduling powerheterogeneous processors. In Automata, Languages and Programming, pages Springer, [GKP12] A. Gupta, R. Krishnaswamy, and K. Pruhs. Online primal-dual for nonlinear optimization with applications to speed scaling. In Workshop on Approximation and Online Algorithms, WAOA, pages , [GLS01] A. J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 433: , [GN14] A. Gupta and V. Nagarajan. Approximating sparse covering integer programs online. Mathematics of Operations Research, 394: , [Gor99] G. J. Gordon. Regret bounds for prediction problems. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, COLT, pages 29 40,

104 [GTW14] A. Gupta, K. Talwar, and U. Wieder. Changing bases: Multistage optimization for matroids and matchings. In Automata, Languages, and Programming - 41st International Colloquium, ICALP, pages , [Gut11] S. B. Guthery. A motif of mathematics. Docent Press, [Haz09] E. Hazan. A survey: The convex optimization approach to regret minimization [HK15] Z. Huang and A. Kim. Welfare maximization with production costs: A primal dual approach. pages 59 72, [HKKA06] E. Hazan, A. T. Kalai, S. Kale, and A. Agarwal. Logarithmic regret algorithms for online convex optimization. In Learning Theory, pages Springer Berlin / Heidelberg, [HS09] E. Hazan and C. Seshadhri. Efficient learning algorithms for changing environments. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML, pages , [HW69] F. Harary and D. Welsh. Matroids versus graphs. In The Many Facets of Graph Theory, volume 110 of Lecture Notes in Mathematics, pages [HW98] M. Herbster and M. K. Warmuth. Tracking the best expert. Machine Learning, 322: , [Kor05] S. Korman. On the use of randomness in the online set cover problem. M.Sc. thesis, Weizmann Institute of Science, [KSST12] S. Kakade, S. Shalev-Shwartz, and A. Tewari. Regularization techniques for learning with matrices. The Journal of Machine Learning Research, 131: , [KV05] A. T. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 713: , [KW97] J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 1321:1 63, [KWK10] W. M. Koolen, M. K. Warmuth, and J. Kivinen. Hedging structured concepts. In Conference on Computational Learning Theory COLT, pages ,

105 [Lat97] R. Lata la. Estimation of moments of sums of independent real random variables. The Annals of Probability, 253: , Technion - Computer Science Department - Ph.D. Thesis PHD [Law76] E.L. Lawler. Combinatorial Optimization: Networks and Matroids. Dover Books on Mathematics Series. Dover Publications, [LW94] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 1082: , [MS91] L. A. McGeoch and D. D. Sleator. A strongly competitive randomized paging algorithm. Algorithmica, 61-6: , [NN94] Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Methods in Convex Programming. Studies in Applied and Numerical Mathematics. Society for Industrial and Applied Mathematics, [Pes03] E. Peserico. Online paging with arbitrary associativity. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages , [Rak09] A. Rakhlin. Lecture notes on online learning draft [Roc70] R. T. Rockafellar. Convex analysis. Number 28. Princeton University Press, [RST11] A. Rakhlin, K. Sridharan, and A. Tewari. Online learning: beyond regret. In Conference on Computational Learning Theory COLT, pages , [Sch03] A. Schrijver. Combinatorial optimization. polyhedra and efficiency, volume 24 of Algorithms and Combinatorics. Springer-Verlag, Berlin, [SS11] S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, [ST85] D. D. Sleator and R. E. Tarjan. Amortized efficiency of list update and paging rules. Communication of the ACM, 282: , [Vaz01] V. V. Vazirani. Approximation Algorithms. Springer-Verlag New York, Inc., [VCZ11] J. Vondrák, C. Chekuri, and R. Zenklusen. Submodular function maximization via the multilinear relaxation and contention resolution schemes. In Proceedings of the Fourty-third Annual ACM Symposium on Theory of Computing, STOC, pages ,

106 [Whi35] [Zin03] H. Whitney. On the abstract properties of linear dependence. American Journal of Mathematics, 573:pp , M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, ICML, pages ,

107 iv

108 פרק 6 עוסק בבעיות כיסוי מקוונות הנדרשות למזער פונקציית מטרה קמורה, שבהן אילוצי הכיסוי מגיעים לאורך הזמן, וכן בבעיה הדואלית המתאימה המורכבת מאילוצי אריזה ומפונקציית מטרה קעורה. אנו מציגים טכניקה פרימאלית דואלית לפתרון שני סוגי הבעיות, עם יחס תחרותיות התלוי בתכונות מונוטוניות ו חלקות מסוימות של פונקציית המטרה, אשר משווה ואף משפר את החסמים המוכרים למשפחות ספציפיות של פונקציות מטרה. הטכניקה החדשה הינה הרחבה של הטכניקה הפרימאלית דואלית לתוכניות ליניאריות שפותחה בתחום ניתוח התחרותיות, תוך שימוש בדואליות ובפונקציות צמודות כלים אנליטיים שנחקרו רבות בתחום הלמידה והאופטימיזציה המקוונת. iii

109 מצב, עולם ניתוח התחרותיות מניח קיום של עלות תנועה למעבר בין מצבים. בעולם הלמידה המקוונת, הדבר יהיה שקול להגדרת עלות נוספת על כל החלפה של פעולה בין סיבובים עלות שעל פי רוב אינה מוגדרת. הבדל נוסף הוא שבניתוח תחרותיות אנו מניחים כי מקבל ההחלטה ראשית נחשף לעלויות של הסיבוב הנוכחי, ורק לאחר מכן בוחר כיצד לפעול. לעומת זאת, בלמידה מקוונת מקבל ההחלטה נחשף לעלויות רק לאחר שבחר פעולה. בעבודה זו, אנו מנסים לחבר בין שני העולמות: למידה מקוונת וניתוח תחרותיות. ניתן לסווג ניסיון זה לשני קווי פעולה. קו הפעולה הראשון שואף לנצל את הדמיון בין שני התחומים על מנת להציג גישה אלגוריתמית מאוחדת, אשר משיגה גם חרטה אופטימלית וגם יחס תחרותיות אופטימלי, עבור מחלקה גדולה של בעיות בשני התחומים. קו הפעולה השני מטרתו לגשר על הפער האנליטי השורר בין שני התחומים. כלומר, בעוד ששתי קהילות המחקר פעלו בנפרד, שיטות אלגוריתמיות וכלים אנליטיים שונים התפתחו בתחום אחד ללא התייחסות מרובה מצד התחום השני. במחקר זה אנו מנסים לשאול טכניקות שונות, במיוחד מעולם הלמידה, על מנת להשיג תוצאות חדשות בתחום ניתוח התחרותיות. מרבית התיזה עוסקת בעניין זה. תרומה וחלוקה לפרקים בפרק 3 אנחנו מתארים גישה חדשה לפיתוח אלגוריתמים תחרותיים המתבססת על רגולריזציה טכניקה נפוצה בתחום הלמידה, ובפרט בתחום האופטימיזציה המקוונת. באמצעות גישה זו אנו מציגים אלגוריתם דטרמיניסטי כללי למציאת פתרון תחרותי לא בהכרח בשלמים אשר מספק אוסף אילוצי כיסוי אשר משתנה עם הזמן. שיטה זו מאפשרת לנו להגדיר ולפתור מגוון בעיות הכוללות הן עלויות לביצוע פעולות והן עלויות תנועה, עבור מכלול של יישומים. בנוסף לכך אנו מציגים אלגוריתם תחרותי אקראי לבעיית הכיסוי בקבוצות עם מחירי שירות, גירסא מקוונת של בעיית הכיסוי בקבוצות [03 + [AAA Online Set Cover, שמאפשרת גם הוספה וגם הסרה של קבוצות מתוך הפתרון לאורך שלבי הביצוע. בפרק 4, אנו מאמצים את גישת הרגולריזציה לפיתוח אלגוריתם מאוחד, אשר באמצעות כיול משתנים מאפשר להשיג חרטה אופטימלית עבור בעיות למידה מקוונת מצד אחד ויחס תחרותיות אופטימלי עבור בעיות ניתוח תחרותיות מאידך. האלגוריתם מאפשר לנו גם להשיג חסמי חרטה חדשים אל מול מומחים משתנים drifting,experts תוצאה שעשויה לעורר עניין באופן בלתי תלוי. יתר על כן, הגישה החדשה מאפשרת לנו להרחיב את התוצאות אל בעיות שבהן אוסף הפעולות בכל סיבוב מורכב יותר, ובעל תכונות מסוימות שניתן להציגן בתור מטרואיד. בפרק 5 אנו נעזרים בכלים שפיתחנו עד כה על מנת לפתור את בעיית ניהול זיכרון מטמון מוגבל. בעיית ניהול זיכרון מטמון היא בעיה קלאסית בתחום האלגוריתמים המקוונים, שבה נתון זיכרון מטמון היכול להכיל מספר קטן יחסית של דפי זיכרון. בקשות לדפי זיכרון שונים מגיעות בזו אחר זו, וכאשר מגיעה בקשה לדף שאינו בזיכרון האלגוריתם משלם עבור הבאת הדף למטמון. המטרה היא למזער את סך המחירים שמשלם האלגוריתם להבאת דפים למטמון. בעיית ניהול זיכרון מטמון מוגבל היא הרחבה של בעיה זו שבה לכל דף זיכרון מוגדרת תת קבוצה של מקומות במטמון שבהם יוכל להישמר. למיטב ידיעתנו, אנו מציגים אלגוריתם תחרותי אקראי ראשון המשיג יחס תחרותיות פולי לוגריתמי לבעיה זו. כמו כן, אנו משווים ואף משפרים תוצאות קיימות עבור ארכיטקטורות מטמון מיוחדות שנלמדו בעבר. ii

110 תקציר תחום הלמידה המקוונת, בעולם התיאוריה של קבלת החלטות, טומן בחובו את הקושי שבו מקבל החלטות נדרש באופן איטרטיבי לקבל החלטות מבלי לדעת מה צופן העתיד. בכל איטרציה, מקבל ההחלטות בוחר פעולה מתוך אוסף של פעולות, וסופג עלות כלשהי הנובעת מהפעולה שבחר. העלויות אינן ידועות מראש, ואף ייתכן שייבחרו על ידי יריב אשר מודע לאסטרטגית קבלת ההחלטות. איכות קבלת ההחלטות נמדדת לרוב במונחים של חרטה, כלומר ההפרש בין סך העלויות שנספגו לבין עלות ביצוע אסטרטגיה קבועה, מתוך אוסף ידוע של אסטרטגיות. כיום ידועים אלגוריתמים לא טריוויאליים אשר ברוב המקרים משיגים חרטה תת ליניארית במספר הסיבובים. בעוד שלמידה מקוונת הינה תחום חזק ומשמעותי, בעל קשרים עמוקים ללמידה סטטיסטית, יש לה גם חסרונות. בפרט, ידוע כי במקרים רבים חרטה אל מול אסטרטגיה קבועה הינה מדד חלש מדי, במיוחד כאשר הסביבה משתנה לאורך הזמן ועל כן לא קיימת אסטרטגיה בודדת שתמיד תימצא כטובה. עובדה זו הובילה לאוסף עבודות למשל, RST11] [HW98, HS09, CMEDV10, אשר דנות במדדים חזקים יותר של חרטה, כגון חרטה אדפטיבית או מעקב אחר המומחה הטוב ביותר. בהקשר זה, חסרון נוסף של למידה מקוונת הוא שהיא אינה מייצגת בצורה טובה בעיות בעלות מצבים, בהן העלויות תלויות גם בקונפיגורציה שבה נמצא מקבל ההחלטה ובהיסטוריית ההחלטות שקיבל. הסתכלו למשל בבעיה של הקצאת משימות לשרתים באופן מקוון. מובן כי הזמן הנדרש לביצוע משימה תלוי רבות במצב המערכת, כגון מידת העומס על כל שרת, כפי שנקבע מההחלטות בסיבובים הקודמים. חרטה אינה מדד טוב בבעיה שכזו, מאחר שהיא נמדדת ביחס לאסטרטגיה קבועה, תוך הנחה כי בכל שלב העלות של כל פעולה היא קבועה ואינה תלויה בהחלטות העבר. כתוצאה מכך, ייתכן כי נרצה לשאוף לתכנן אלגוריתמים עבור עולם רחב יותר של בעיות, שבו יש להתחרות אל מול אסטרטגיות אופטימליות משתנות הכוללות גם את האסטרטגיה האופטימלית אשר חשופה לעלויות העתידיות שאינן ידועות למקבל ההחלטה, ושניתן להגדיר במסגרתו גם מצבים. בעיות מסוג זה נחקרו בצורה מעמיקה בתחום הנקרא ניתוח תחרותיות לסקירה נרחבת, ראו.[BEY98] בעולם הבעיות הללו, השגת חרטה תת ליניארית במקרה הכללי היא בלתי אפשרית. במקום זאת, המדד העיקרי שמשתמשים בו הוא יחס התחרותיות, אשר חוסם את היחס בין העלות הכוללת של מקבל ההחלטה לבין העלות הכוללת של האסטרטגיה המשתנה הטובה ביותר שידוע לה העתיד, עבור כל קלט אפשרי. לרוב, מדד יחס התחרותיות מבטיח ביצועים פחות טובים מאשר חרטה, אולם הוא נמדד ביחס להגדרת אופטימליות נוקשה יותר. בעוד שהבעיות שנחקרות בשני העולמות שתוארו דומות במקרים רבים, לא קיים מחקר רב על קשרים כלליים בין השניים. הסיבה העיקרית לכך מעבר להיבטים חברתיים אשר נובעים מכך שהמחקרים נעשו בשתי קהילות נפרדות נעוצה בהבדלים מכריעים בהנחות היסוד הקיימות בתהליך המידול והניסוח. לדוגמא, על מנת למדל i

111

112 המחקר בוצע בהנחייתם של פרופסור ספי נאור ודוקטור ניב בוכבינדר, בפקולטה למדעי המחשב. Technion - Computer Science Department - Ph.D. Thesis PHD חלק מן התוצאות בחיבור זה פורסמו כמאמרים מאת המחבר ושותפיו למחקר בכנסים ובכתבי עת במהלך תקופת מחקר הדוקטורט של המחבר, אשר גרסאותיהם העדכניות ביותר הינן: Niv Buchbinder, Shahar Chen, Anupam Gupta, Viswanath Nagarajan, and Joseph Naor. packing and covering framework with convex objectives. CoRR, abs/ , Online Niv Buchbinder, Shahar Chen, and Joseph Naor. Competitive algorithms for restricted caching and matroid caching. In Algorithms - ESA th Annual European Symposium, Wroclaw, Poland, September 8-10, Proceedings, pages , Niv Buchbinder, Shahar Chen, and Joseph Naor. Competitive analysis via regularization. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages , Niv Buchbinder, Shahar Chen, Joseph Naor, and Ohad Shamir. Unified algorithms for online learning and competitive analysis. In COLT The Twenty-fifth Annual Conference on Learning Theory, pages , תודות ברצוני להודות למנחים שלי, ספי נאור וניב בוכבינדר, שהובילו והכווינו אותי בחוכמה ובסובלנות במשך שנות עבודתי. הייתה זו תקופה מרתקת ומהנה, ואני מוקיר תודה על הזמן ועל התמיכה שהקדישו לעבודה איתי. למדתי מהם המון, ואני מעריך את ההזדמנות שנפלה בחלקי לעבוד איתם. אודה כמובן גם להוריי עפרה ודני שתמכו בי באהבה ובגאווה, והיו לצדי בכל עת. אחרונה חביבה, ברצוני להודות לאשתי ענבל, על התמיכה, ההבנה והאהבה כל השנים. שאת בחיי. יש לי מזל גדול אני מודה לג'ואן ואירווין ג'ייקובס, למלגת זף, ולטכניון על התמיכה הכספית הנדיבה בהשתלמותי.

113

114 למידה מקוונת וניתוח תחרותיות: גישה מאוחדת חיבור על מחקר לשם מילוי חלקי של הדרישות לקבלת התואר דוקטור לפילוסופיה שחר חן הוגש לסנט הטכניון מכון טכנולוגי לישראל אייר התשע"ה חיפה אפריל 2015

115

116 למידה מקוונת וניתוח תחרותיות: גישה מאוחדת שחר חן

9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1

9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1 9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1 Seffi Naor Computer Science Dept. Technion Haifa, Israel Introduction

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm. Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm. Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

More information

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor Online algorithms for combinatorial problems Ph.D. Thesis by Judit Nagy-György Supervisor: Péter Hajnal Associate Professor Doctoral School in Mathematics and Computer Science University of Szeged Bolyai

More information

Fairness in Routing and Load Balancing

Fairness in Routing and Load Balancing Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]

More information

Online Primal-Dual Algorithms for Maximizing Ad-Auctions Revenue

Online Primal-Dual Algorithms for Maximizing Ad-Auctions Revenue Online Primal-Dual Algorithms for Maximizing Ad-Auctions Revenue Niv Buchbinder 1, Kamal Jain 2, and Joseph (Seffi) Naor 1 1 Computer Science Department, Technion, Haifa, Israel. 2 Microsoft Research,

More information

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where. Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris [email protected] Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Definition 11.1. Given a graph G on n vertices, we define the following quantities:

Definition 11.1. Given a graph G on n vertices, we define the following quantities: Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define

More information

The Relative Worst Order Ratio for On-Line Algorithms

The Relative Worst Order Ratio for On-Line Algorithms The Relative Worst Order Ratio for On-Line Algorithms Joan Boyar 1 and Lene M. Favrholdt 2 1 Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, [email protected]

More information

An Approximation Algorithm for Bounded Degree Deletion

An Approximation Algorithm for Bounded Degree Deletion An Approximation Algorithm for Bounded Degree Deletion Tomáš Ebenlendr Petr Kolman Jiří Sgall Abstract Bounded Degree Deletion is the following generalization of Vertex Cover. Given an undirected graph

More information

Online Adwords Allocation

Online Adwords Allocation Online Adwords Allocation Shoshana Neuburger May 6, 2009 1 Overview Many search engines auction the advertising space alongside search results. When Google interviewed Amin Saberi in 2004, their advertisement

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Analysis of Approximation Algorithms for k-set Cover using Factor-Revealing Linear Programs

Analysis of Approximation Algorithms for k-set Cover using Factor-Revealing Linear Programs Analysis of Approximation Algorithms for k-set Cover using Factor-Revealing Linear Programs Stavros Athanassopoulos, Ioannis Caragiannis, and Christos Kaklamanis Research Academic Computer Technology Institute

More information

Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA A Factor 1 2 Approximation Algorithm for Two-Stage Stochastic Matching Problems Nan Kong, Andrew J. Schaefer Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA Abstract We introduce

More information

Mechanisms for Fair Attribution

Mechanisms for Fair Attribution Mechanisms for Fair Attribution Eric Balkanski Yaron Singer Abstract We propose a new framework for optimization under fairness constraints. The problems we consider model procurement where the goal is

More information

Online Algorithms: Learning & Optimization with No Regret.

Online Algorithms: Learning & Optimization with No Regret. Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the

More information

The Online Set Cover Problem

The Online Set Cover Problem The Online Set Cover Problem Noga Alon Baruch Awerbuch Yossi Azar Niv Buchbinder Joseph Seffi Naor ABSTRACT Let X = {, 2,..., n} be a ground set of n elements, and let S be a family of subsets of X, S

More information

Class constrained bin covering

Class constrained bin covering Class constrained bin covering Leah Epstein Csanád Imreh Asaf Levin Abstract We study the following variant of the bin covering problem. We are given a set of unit sized items, where each item has a color

More information

Stiffie's On Line Scheduling Algorithm

Stiffie's On Line Scheduling Algorithm A class of on-line scheduling algorithms to minimize total completion time X. Lu R.A. Sitters L. Stougie Abstract We consider the problem of scheduling jobs on-line on a single machine and on identical

More information

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving

More information

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09 1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

OPRE 6201 : 2. Simplex Method

OPRE 6201 : 2. Simplex Method OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2

More information

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams André Ciré University of Toronto John Hooker Carnegie Mellon University INFORMS 2014 Home Health Care Home health care delivery

More information

Scheduling to Minimize Power Consumption using Submodular Functions

Scheduling to Minimize Power Consumption using Submodular Functions Scheduling to Minimize Power Consumption using Submodular Functions Erik D. Demaine MIT [email protected] Morteza Zadimoghaddam MIT [email protected] ABSTRACT We develop logarithmic approximation algorithms

More information

Equilibrium computation: Part 1

Equilibrium computation: Part 1 Equilibrium computation: Part 1 Nicola Gatti 1 Troels Bjerre Sorensen 2 1 Politecnico di Milano, Italy 2 Duke University, USA Nicola Gatti and Troels Bjerre Sørensen ( Politecnico di Milano, Italy, Equilibrium

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Online Convex Optimization

Online Convex Optimization E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting

More information

The Multiplicative Weights Update method

The Multiplicative Weights Update method Chapter 2 The Multiplicative Weights Update method The Multiplicative Weights method is a simple idea which has been repeatedly discovered in fields as diverse as Machine Learning, Optimization, and Game

More information

ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J.

ARTICLE IN PRESS. European Journal of Operational Research xxx (2004) xxx xxx. Discrete Optimization. Nan Kong, Andrew J. A factor 1 European Journal of Operational Research xxx (00) xxx xxx Discrete Optimization approximation algorithm for two-stage stochastic matching problems Nan Kong, Andrew J. Schaefer * Department of

More information

Approximating Minimum Bounded Degree Spanning Trees to within One of Optimal

Approximating Minimum Bounded Degree Spanning Trees to within One of Optimal Approximating Minimum Bounded Degree Spanning Trees to within One of Optimal ABSTACT Mohit Singh Tepper School of Business Carnegie Mellon University Pittsburgh, PA USA [email protected] In the MINIMUM

More information

The Advantages and Disadvantages of Online Linear Optimization

The Advantages and Disadvantages of Online Linear Optimization LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA

More information

4.6 Linear Programming duality

4.6 Linear Programming duality 4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS 1. INTRODUCTION ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

More information

Frequency Capping in Online Advertising

Frequency Capping in Online Advertising Frequency Capping in Online Advertising (Extended Abstract) Niv Buchbinder 1, Moran Feldman 2, Arpita Ghosh 3, and Joseph (Seffi) Naor 2 1 Open University, Israel [email protected] 2 Computer Science

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

CHAPTER 9. Integer Programming

CHAPTER 9. Integer Programming CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral

More information

On the effect of forwarding table size on SDN network utilization

On the effect of forwarding table size on SDN network utilization IBM Haifa Research Lab On the effect of forwarding table size on SDN network utilization Rami Cohen IBM Haifa Research Lab Liane Lewin Eytan Yahoo Research, Haifa Seffi Naor CS Technion, Israel Danny Raz

More information

A simpler and better derandomization of an approximation algorithm for Single Source Rent-or-Buy

A simpler and better derandomization of an approximation algorithm for Single Source Rent-or-Buy A simpler and better derandomization of an approximation algorithm for Single Source Rent-or-Buy David P. Williamson Anke van Zuylen School of Operations Research and Industrial Engineering, Cornell University,

More information

Constant Factor Approximation Algorithm for the Knapsack Median Problem

Constant Factor Approximation Algorithm for the Knapsack Median Problem Constant Factor Approximation Algorithm for the Knapsack Median Problem Amit Kumar Abstract We give a constant factor approximation algorithm for the following generalization of the k-median problem. We

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

Bargaining Solutions in a Social Network

Bargaining Solutions in a Social Network Bargaining Solutions in a Social Network Tanmoy Chakraborty and Michael Kearns Department of Computer and Information Science University of Pennsylvania Abstract. We study the concept of bargaining solutions,

More information

Steiner Tree Approximation via IRR. Randomized Rounding

Steiner Tree Approximation via IRR. Randomized Rounding Steiner Tree Approximation via Iterative Randomized Rounding Graduate Program in Logic, Algorithms and Computation μπλ Network Algorithms and Complexity June 18, 2013 Overview 1 Introduction Scope Related

More information

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE/ACM TRANSACTIONS ON NETWORKING 1 A Greedy Link Scheduler for Wireless Networks With Gaussian Multiple-Access and Broadcast Channels Arun Sridharan, Student Member, IEEE, C Emre Koksal, Member, IEEE,

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

Load balancing of temporary tasks in the l p norm

Load balancing of temporary tasks in the l p norm Load balancing of temporary tasks in the l p norm Yossi Azar a,1, Amir Epstein a,2, Leah Epstein b,3 a School of Computer Science, Tel Aviv University, Tel Aviv, Israel. b School of Computer Science, The

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;

More information

Discrete Optimization

Discrete Optimization Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Large induced subgraphs with all degrees odd

Large induced subgraphs with all degrees odd Large induced subgraphs with all degrees odd A.D. Scott Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, England Abstract: We prove that every connected graph of order

More information

Chapter 13: Binary and Mixed-Integer Programming

Chapter 13: Binary and Mixed-Integer Programming Chapter 3: Binary and Mixed-Integer Programming The general branch and bound approach described in the previous chapter can be customized for special situations. This chapter addresses two special situations:

More information

Online and Offline Selling in Limit Order Markets

Online and Offline Selling in Limit Order Markets Online and Offline Selling in Limit Order Markets Kevin L. Chang 1 and Aaron Johnson 2 1 Yahoo Inc. [email protected] 2 Yale University [email protected] Abstract. Completely automated electronic

More information

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

What is Linear Programming?

What is Linear Programming? Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to

More information

A Note on the Bertsimas & Sim Algorithm for Robust Combinatorial Optimization Problems

A Note on the Bertsimas & Sim Algorithm for Robust Combinatorial Optimization Problems myjournal manuscript No. (will be inserted by the editor) A Note on the Bertsimas & Sim Algorithm for Robust Combinatorial Optimization Problems Eduardo Álvarez-Miranda Ivana Ljubić Paolo Toth Received:

More information

Solving polynomial least squares problems via semidefinite programming relaxations

Solving polynomial least squares problems via semidefinite programming relaxations Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose

More information

THE SCHEDULING OF MAINTENANCE SERVICE

THE SCHEDULING OF MAINTENANCE SERVICE THE SCHEDULING OF MAINTENANCE SERVICE Shoshana Anily Celia A. Glass Refael Hassin Abstract We study a discrete problem of scheduling activities of several types under the constraint that at most a single

More information

Using Generalized Forecasts for Online Currency Conversion

Using Generalized Forecasts for Online Currency Conversion Using Generalized Forecasts for Online Currency Conversion Kazuo Iwama and Kouki Yonezawa School of Informatics Kyoto University Kyoto 606-8501, Japan {iwama,yonezawa}@kuis.kyoto-u.ac.jp Abstract. El-Yaniv

More information

Integrating Benders decomposition within Constraint Programming

Integrating Benders decomposition within Constraint Programming Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP

More information

Duplicating and its Applications in Batch Scheduling

Duplicating and its Applications in Batch Scheduling Duplicating and its Applications in Batch Scheduling Yuzhong Zhang 1 Chunsong Bai 1 Shouyang Wang 2 1 College of Operations Research and Management Sciences Qufu Normal University, Shandong 276826, China

More information

Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs

Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs Leah Epstein Magnús M. Halldórsson Asaf Levin Hadas Shachnai Abstract Motivated by applications in batch scheduling of jobs in manufacturing

More information

Load Balancing and Switch Scheduling

Load Balancing and Switch Scheduling EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: [email protected] Abstract Load

More information

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method Introduction to dual linear program Given a constraint matrix A, right

More information

On an anti-ramsey type result

On an anti-ramsey type result On an anti-ramsey type result Noga Alon, Hanno Lefmann and Vojtĕch Rödl Abstract We consider anti-ramsey type results. For a given coloring of the k-element subsets of an n-element set X, where two k-element

More information

How To Prove The Dirichlet Unit Theorem

How To Prove The Dirichlet Unit Theorem Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

More information

The Trip Scheduling Problem

The Trip Scheduling Problem The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems

More information

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued. Linear Programming Widget Factory Example Learning Goals. Introduce Linear Programming Problems. Widget Example, Graphical Solution. Basic Theory:, Vertices, Existence of Solutions. Equivalent formulations.

More information

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH 31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

ON GALOIS REALIZATIONS OF THE 2-COVERABLE SYMMETRIC AND ALTERNATING GROUPS

ON GALOIS REALIZATIONS OF THE 2-COVERABLE SYMMETRIC AND ALTERNATING GROUPS ON GALOIS REALIZATIONS OF THE 2-COVERABLE SYMMETRIC AND ALTERNATING GROUPS DANIEL RABAYEV AND JACK SONN Abstract. Let f(x) be a monic polynomial in Z[x] with no rational roots but with roots in Q p for

More information

On the representability of the bi-uniform matroid

On the representability of the bi-uniform matroid On the representability of the bi-uniform matroid Simeon Ball, Carles Padró, Zsuzsa Weiner and Chaoping Xing August 3, 2012 Abstract Every bi-uniform matroid is representable over all sufficiently large

More information