The Geometry of Online Packing Linear Programs

The Geometry of Onlne Packng Lnear Programs Marco Molnaro R. Rav Abstract We consder packng lnear programs wth m rows where all constrant coeffcents are n the unt nterval. In the onlne model, we know the total number n of columns that arrve n random order. The goal s to set the decson varables correspondng to the arrvng columns rrevocably so as to maxmze the expected reward. Prevous (1 ɛ)-compettve algorthms requre that the rght-hand sdes of the constrants are of magntude at least Ω( m ɛ log n 2 ɛ ), a bound that worsens wth the number of columns and rows. However, the dependence on the number of columns s not requred n the sngle-row case of onlne secretary problems. Moreover, known lower bounds for the general case of m rows only demonstrate that the rght-hand sdes must be as large as Ω( log m ɛ ), wth no dependence on n to obtan (1 ɛ)-compettve algorthms. 2 Our goal s to understand whether the dependence on n s requred n the mult-row case, makng t fundamentally harder than the sngle-row verson. We show that ths s not the case by exhbtng an algorthm whch s (1 ɛ)-compettve as long as the rght-hand sdes are Ω( m2 ɛ log m 2 ɛ ). Our technques refne prevous PAC-learnng based approaches, whch nterpret the onlne decsons as lnear classfcatons of the columns based on dual prces obtaned from sampled columns. Our mproved bounds are proved by constructng a small set of wtnesses for msclassfcatons, whch are then used to obtan mproved generalzaton bounds for the learnng algorthm. The key component of our mprovement s recognzng why the sngle-row problem s seemngly easer: f the columns of the LP belong to few one-dmensonal subspaces, there s hgh overlap among the msclassfcatons and hence the assocated learnng problem s ntrnscally more robust. For general lnear programs, the dea s to modfy the nput to make the columns le n a few one-dmensonal subspaces whle not changng the feasble set by much.

1 Introducton Tradtonal optmzaton models usually assume that the nput s known a pror. However, n most applcatons, the data s ether revealed over tme or only coarse nformaton about the nput s known, often modeled n terms of a probablty dstrbuton. Consequently, much effort has been drected towards understandng the qualty of solutons that can be obtaned wthout full knowledge of the nput, whch led to the development of onlne and stochastc optmzaton [7, 6]. Emergng problems such as allocatng advertsement slots to advertsers and yeld management n the nternet are of nherent onlne nature and have further accelerated ths development [1]. Lnear programmng s arguably the most mportant and thus well-studed optmzaton problem. Therefore, understandng the lmtatons of solvng lnear programs when complete data s not avalable s a fundamental theoretcal problem wth a slew of applcatons, ncludng the ad allocaton and yeld management problems above. Indeed, a smple lnear program wth one unform knapsack, the Secretary Problem, was one of the frst onlne problems to be consdered and an optmal soluton was already obtaned by the early 60 s [13, 15]. Although the sngle knapsack case s currently well-understood under dfferent models of how nformaton s revealed [4], much less s known about problems wth multple knapsacks. Only recently, algorthms wth guaranteed soluton qualty have been developed for these more general packng problems [14, 1, 10]. The Model. We consder the followng onlne packng LP problem. Consder a fxed but unknown LP wth n columns a t [0, 1] m (whose assocated varables are constraned to be n [0, 1]) and m packng constrants: n OPT = max π t x t n a t x t B x t [0, 1]. (LP) Columns are presented n a random (unform) order, and whenever a column s presented we are requred to rrevocably choose the value of ts correspondng varable. We assume that the number of columns n s known. 1 The goal s to obtan a feasble soluton to the LP whle maxmzng ts value. Note that we use OPT to denote the optmum value of the (offlne) LP. By scalng down rows as necessary, we assume wthout loss of generalty that all entres of B are the same, whch we also denote wth some overload of notaton by B. Due to the packng nature of the problem, we also assume wthout loss of generalty that all the π t s are non-negatve and all the a t s are non-zero: we can smply gnore columns whch do not satsfy the frst property and always set to 1 the varables assocated to the remannng columns whch do not satsfy the second property. Fnally, we assume that the columns a t s are n general poston: for all p R m, there are at most m dfferent t [n] such that π t = pa t. Notce that perturbng the nput randomly by a tny amount acheves ths property wth probablty one, whle the effect of the perturbaton s absorbed n our approxmaton guarantees [11, 1]. The random permutaton model where the nput s presented n a random order has grown n popularty [16, 11, 4], snce t avods strong lower bounds of the pessmstc adversaral-order model [8], whle stll capturng the lack of total nformaton a pror. Moreover, the random permutaton model s weaker than the..d. model that assumes that the parts consttutng the nput are sampled ndependently from a fxed dstrbuton, whch s ether known or unknown. Related work. Many dfferent types of onlne problems have already been studed n the random permutaton model. These nclude bn-packng [20], matchngs [19, 16], the AdWords Problem [11] and dfferent generalzatons of the Secretary Problem [4, 2, 5, 25, 18]. Closest to our work are packng problems wth a sngle knapsack [11]. 1 Actually knowng n up to (1 ± ɛ) factor s enough. Ths assumpton s requred to allow algorthms wth non-trval compettve rato 1

constrant. In [21], Klenberg consdered the B-Choce Secretary Problem, where the goal s to select at most B tems comng onlne n random order to maxmze proft. The author presented an algorthm wth compettve rato 1 O(1/ B) and showed that 1 Ω(1/ B) s best possble. Generalzng the B-Choce Secretary Problem, Babaoff et al. [3] consdered the onlne knapsack problem and presented a (1/10e)-compettve algorthm. Notce that n both cases the compettve rato does not depend on n. Despte all these works, the frst result for the more general onlne packng LP under study here was only recently obtaned by Feldman et al. [14]. They gave an algorthm that obtans wth hgh probablty a soluton of value at least (1 ɛ)opt whenever B Ω( m log n πmaxm log n ) and OPT Ω( ɛ 3 ɛ ), where π max s the largest proft. The authors actually consdered a more general allocaton problem, where a set of columns representng varous optons arrve at each step, and the soluton may choose at most one of the optons. Ther algorthm s tranngbased and generalzes the work of Devanur and Hayes [11] on the AdWords problem. In an as yet unpublshed manuscrpt, Agrawal et al. [1] presented an algorthm (DPA) whch managed to further reduce the requred dependence on the sze of B and OPT. Ther algorthm returns a soluton wth expected value at least (1 ɛ)opt whenever B Ω ( m log n ) ( ) ɛ 2 ɛ or OPT Ω πmaxm 2 log n ɛ 2 ɛ. Another way of statng ths result m log(n) log B s that the algorthm obtans a soluton wth compettve rato 1 O( B ); notce that the guarantee degrades as n ncreases. Ther algorthm also uses tranng-based deas, but now re-tranng as the sample sze doubles to obtan the mproved guarantees. They also show that there are nstances wth B log m for whch no ɛ 2 onlne algorthm can be (1 ɛ)-compettve n the random permutaton model. The above works on the problem under study draw a connecton between solvng the onlne LP and PAC-learnng [9] a lnear classfcaton of ts columns. Here we further explore ths connecton, and our mproved bounds can be seen as a consequence of makng the learnng algorthm more robust by sutably changng the nput LP. Robustness s a topc well-studed n learnng theory [12, 22], although exstng results do not seem to apply drectly to our problem. We remark that a component of robustness more closely related to the standard PAClearnng lterature s used n [11]. In recent work, Devanur et al. [10] consder the weaker..d. model for the general allocaton problem and substantally mprove the lower bound on B to Ω( log(m/ɛ) ), whle showng that the lower bound of log m s stll ɛ 2 ɛ 2 requred on B to get (1 ɛ)-compettve algorthms. Our results. Our focus s to understand how large B s requred to be n order to allow (1 ɛ)-compettve algorthms. In partcular, the best known bounds for B mentoned above degrade as the number of columns n the LP ncreases, whle the mnmum requrement on ts magntude does not. Wth the trend of handlng LP s wth larger number of columns (e.g. these columns correspond to the keywords n the ad allocaton problem, whch n turn correspond to vsts of a search engne s webpage), ths gap s very unsatsfactory from a practcal pont of vew. Furthermore, gven that guarantees for the sngle knapsack case do not depend on the number of columns, t s mportant to understand f the mult-knapsack case s fundamentally more dffcult. In ths work, we gve a precse ndcaton of why the latter problem was resstant to arguments used n the sngle knapsack case, and overcome ths dffculty to exhbt an algorthm wth dmenson-ndependent guarantee. We show that a modfcaton of DPA [1] that we call Robust DPA obtans a (1 ɛ)-compettve soluton for onlne packng LP s wth m constrants n the random permutaton model whenever B Ω( m2 log m ɛ 2 ɛ ). Another way of statng ths result s that the algorthm has compettve rato 1 O(m log B/ B). Contrastng to prevous results, our guarantee does not depend on n and n the case m = 1 matches the bounds for the B-Choce Secretary Problem up to lower order terms. We fnally remark that we can replace the requrement B Ω( m2 log m ɛ 2 ɛ ) by OPT Ω( πmaxm3 log m ɛ 2 ɛ ) exactly as done n Secton 5.1 of [1]. Hgh-level outlne. As mentoned before, we use the connecton between solvng an onlne LP and PAClearnng a good lnear classfcaton of ts columns; n order to obtan the mproved guarantee, we focus on tghtenng the bounds for the generalzaton error of the learnng problem. More precsely, solvng the LP can be 2

seen as classfyng the columns nto 0/1, whch corresponds to settng ther assocated varable to 0/1. Consder a famly X {0, 1} n of lnear classfcatons of the columns. Our algorthms essentally sample a set S of columns and learn a classfcaton x S X whch s good for the columns S (.e., obtans large proportonal revenue whle not fllng up the proportonally scaled budget too much). The goal s to upper bound the probablty that x S s not good for the whole LP. Ths s typcally done by unon boundng over the classfcatons n X [11, 1]. To obtan mproved guarantees, we refne the unon bound usng an argument akn to coverng: we consder wtness classfcatons whch can be used to bound the probablty that any bad classfcaton s learned. The problem s that, when the columns (π t, a t ) s do not le n a two-dmensonal subspace of R m, the set X may contan a large number of dsjont bad classfcatons; ths s a roadblock for obtanng a small set of wtnesses. In stark contrast, when these columns do le n a two-dmensonal subspace, the (support of the) classfcatons form a unon of two chans wth respect to ncluson; n the specal case where the a t s belong to a one-dmensonal subspace (e.g., case m = 1), they form a sngle chan. The fact that the latter learnng problem s ntrnscally more robust than the former seems to precsely capture the ncreased dffculty n obtaned good bounds for the mult-knapsack case. Motvated by ths dscusson, we frst consder LP s whose columns a t s le n few one-dmensonal subspaces (Secton 2). For each of these subspaces, we are able to approxmate the classfcatons nduced n the columns lyng n the subspace by consderng a small subset of the nduced classfcatons. Takng the product of these subsets gves us a wtness set for X. However, ths strategy as stated does not make use of the fact that the subspaces are embedded n an m-dmensonal space, and hence obtans large wtness sets. By establshng a connecton between the useful terms n the product wth faces of a hyperplane arrangement n R m, we are able to make use of the dmenson of the host space and exhbt wtness sets of much smaller szes, whch leads to mproved bounds. For the general problem, the dea s to perturb the columns a t s to make them le n few one-dmensonal subspaces, whle not alterng the feasblty and optmalty of the LP by more than a (1 ± ɛ) factor (Secton 3). Fnally, we tghten the bound by usng the dea of recomputng the classfcaton as the number of columns doubles, followng [1] (Secton 4). 2 OTP for almost 1-dm columns In ths secton we analyze the behavor of the algorthm OTP (One-Tme Prcng) for LP s whose columns are contaned n few 1-dmensonal subspaces of R m. The overall goal s to fnd an approprate dual (perhaps nfeasble) soluton p for (LP) and use t to classfy the columns of the LP. More precsely, gven p R m, we defne x(p) t = 1 f π t > pa t and x(p) t = 0 otherwse. Thus, x(p) s the result of classfyng the columns (π t, a t ) s wth the homogeneous hyperplane n R m+1 wth normal ( 1, p). The motvaton behnd ths classfcaton s that t selects the columns whch have postve reduced cost wth respect to the dual soluton p, or alternatvely, t solves to optmalty the Lagrangan relaxaton usng p as multplers. Samplng LP s. In order to obtan a good dual soluton p, we use the (random) LP consstng on the frst s columns of (LP) wth approprately scaled rght-hand sde. s max π σ(t) x σ(t) s a σ(t) x σ(t) s n δb x σ(t) [0, 1] t = 1,..., s. (s, δ)-lp mn s m n δb p + s =1 α σ(t) pa σ(t) + α σ(t) π σ(t) p 0 α 0. t = 1,..., s (s, δ)-dual Here σ denotes the random permutaton of the columns of the LP. We use OPT(s, δ) to denote the optmal value 3

of (s, δ)-lp, and OPT(s) to denote the optmal value of (s, 1)-LP. The statc prcng algorthm OTP of [1] can then be descrbed succnctly as follows. 2 1. Wat for the frst ɛn columns of (LP) and solve (ɛn, 1 ɛ)-dual, lettng (p, α) be the obtaned dual optmal soluton. 2. Use the classfcaton gven by p as above by settng x σ(t) = x(p) σ(t) for t = ɛn + 1, ɛn + 2,... for as long as the soluton obtaned remans vald. From ths pont on set all further varables to zero. Note that by defnton ths algorthm outputs a feasble soluton wth probablty one. Our goal s then to analyze the qualty of the soluton produced, ultmately leadng to the followng theorem. Theorem 2.1 Fx ɛ (0, 1]. Suppose that there are K m 1-dm subspaces of R m contanng the columns a t s and that B Ω ( m log K ) ɛ 3 ɛ. Then algorthm OTP returns a feasble soluton wth expected value at least (1 5ɛ)OPT. Let S = {σ(1),..., σ(ɛn)} be the (random) ndex set of the columns sampled by OTP. We use p S to denote the optmal dual soluton obtaned by OTP; notce that p S s completely determned by S. To smplfy the notaton, we also use x S to denote x(p S ). Notce that, for all the scenaros where x S s feasble, the soluton returned by OTP s dentcal to x S wth ts components x S σ(1),..., xs σ(ɛn) set to zero. Gven ths observaton, we can actually focus on provng that xs s a good soluton. Lemma 2.2 Fx ɛ (0, 1]. Suppose that there are K m 1-dm subspaces of R m contanng the columns a t s and that B Ω ( m log K ) ɛ 3 ɛ. Then wth probablty at least (1 ɛ), x S s a feasble soluton for (LP) wth value at least (1 3ɛ)OPT. To see how Theorem 2.1 follows from ths, frst note that E[ n π σ(t) x σ(t) ] = E[ t>ɛn π σ(t) x S σ(t) ]. Now let E denote the event that x S s feasble for (LP) wth value at least (1 3ɛ)OPT, whch occurs wth probablty at least (1 ɛ). By the non-negatvty of the profts, we obtan E[ n n π σ(t) x S σ(t) ] E[ π σ(t) x S σ(t) E] Pr(E) (1 4ɛ)OPT. Fnally notcng that E[ t ɛn π σ(t)x S σ(t) ] ɛopt (see, e.g., Lemma 2.4 of [1]), we then get and the result follows. 2.1 Connecton to PAC learnng E[ n π σ(t) x S σ(t) ] E[ π σ(t) x S σ(t) ] ɛopt (1 5ɛ)OPT, t>ɛn We assume from now on that B Ω( m ɛ 3 log K ɛ ). Let X = {x(p) : p Rm + } {0, 1} n denote the set of all possble lnear classfcatons of the LP columns whch can be generated by OTP. Wth slght overload n the notaton, we dentfy a vector x {0, 1} n wth the subset of [n] correspondng to ts support. 2 To smplfy the exposton, we assume that ɛn s an nteger. 4

Defnton 2.3 (Bad soluton) Gven a scenaro, we say that x S s bad f t does not satsfy the propertes of Lemma 2.2, namely x S s ether nfeasble or has value less than (1 3ɛ)OPT. We say that x S s good otherwse. As noted n prevous work, the man observaton used to control the guarantee of the soluton output by the algorthm s that t suffces to analyze ts budget occupaton. To make ths precse, gven x {0, 1} n let a (x) = t x at be ts occupaton of the th budget and let as (x) = 1 ɛ t x S at be ts approprately scaled occupaton of th budget n the sampled LP (recall that S = ɛn). Recall that the soluton x S s obtaned by selectng the columns wth postve reduced cost wth respect to the optmal dual soluton p S. Therefore, t s ntutvely clear that x S resembles an optmal soluton for (ɛn, 1 ɛ)- LP and thus should (approxmately) be feasble and satsfy complementary slackness condtons. Usng the assumpton that the nput s n general poston, ths s made formal n the followng lemma. Lemma 2.4 In every scenaro, x S satsfes the followng: () for all [m], a S (xs ) (1 ɛ)b and () for every [m] wth p S > 0, a S (xs ) (1 2ɛ)B. Conversely, the next lemma states that ths approxmate complementary slackness (now wth respect to (LP)) s enough to guarantee near-optmalty, makng formal a prevous observaton that budget occupaton determnes the qualty of the soluton. 3 Lemma 2.5 Consder a scenaro where x S satsfes the followng: () for all [m], a (x S ) B and () for all [m] wth p S > 0, a (x S ) (1 3ɛ)B. Then x S s good. Gven the propertes of x S guaranteed by Lemma 2.4, together wth the observaton that a (x) = E[a S (x)] for all x, the dea s to use concentraton nequaltes to argue that the condtons n Lemma 2.5 holds wth good probablty. One dffculty s the presence of correlaton between the sample set S and the soluton x S. We deal wth ths dffculty n standard PAC-learnng way. Defnton 2.6 (Badly learnable) For a gven scenaro, we say that x X can be badly learned for budget f ether () a S (x) (1 ɛ)b and a (x) > B or () a S (x) (1 2ɛ)B and a (x) < (1 3ɛ)B. Essentally these are the classfcatons whch look good for the sampled (ɛn, 1 ɛ)-lp but are actually bad for (LP). More precsely, Lemmas 2.4 and 2.5 gve the followng. Observaton 2.7 Consder a scenaro for whch x S s bad. Then x S = x for some x that can be badly learned n ths scenaro for some budget [m]. Ths observaton drectly mples that Pr ( x S s bad ) Pr [m],x X x can be badly learned for budget. (2.1) Notce that ndeed the rght-hand sde of ths nequalty does not depend on x S, t s only a functon of how skewed a S (x) s as compared to ts expectaton a (x). From ths pont on, usually the rght-hand sde n the prevous equaton s upper bounded by takng a unon bound over all ts terms [1]. However, ths strategy can be too wasteful, because f x and x are smlar there s a large overlap between the scenaros where a S (x) s skewed and those where as (x ) s skewed. In order to obtan mproved guarantees we use somethng akn to a coverng argument, although we need to use a sutable (and non-standard) measure to capture the smlarty between classfcatons. 3 Ths lemma can be seen as an approxmate verson of an observaton on Lagrangan relaxaton made by Everett n the early 60 s [17] and s also related to the approxmate complementary slackness condtons n [26]. 5

2.2 Smlarty va wtnesses Frst, we partton the classfcatons whch can be badly learned for budget nto two sets, dependng on why they are bad: for [m], let X + = {x X : a (x) > B} and X = {x X : a (x) < (1 3ɛ)B}. In order to smplfy the notaton, gven a set x we defne skewm (ɛ, x) to be the event that a S (x) (1 ɛ)b and skewp (ɛ, x) to be the event that a S + (x) (1 2ɛ)B. Notce that f x X, then skewm (ɛ, x) s the event that a S (x) s sgnfcantly smaller than ts expectaton (skewed n the mnus drecton), whle for x X skewp (ɛ, x) s the event that a S (x) s sgnfcantly larger than ts expectaton (skewed n the plus drecton). These defntons drectly gve the equvalence Pr x can be badly learned for budget = Pr skewm (ɛ, x) skewp (ɛ, x). (2.2),x X In order to ntroduce the concept of wtnesses, consder two sets x, x, say, n X +. Take a subset w x x ; the man observaton s that, snce a t 0 for all t, for all scenaros we have a S (w) as (x) and as (w) a S (x ). In partcular, the event skewm (ɛ, x) skewm (ɛ, x ) s contaned n skewm(ɛ, w). The set w serves as a wtness for scenaros whch are skewed for ether x or x ; f addtonally a (w) reasonably larger than (1 ɛ)b, we can then use concentraton nequaltes over skewm (ɛ, w) n order to bound probablty of skewm(ɛ, x) skewm(ɛ, x ). Ths ablty of boundng multple terms of the rght-hand sde of (2.2) smultaneously s what gves an mprovement over the nave unon bound. Defnton 2.8 (Wtness) We say that W + s a wtness set for X + f: () for all w W +, a (w) (1 ɛ/2)b and () for all x X + there s w W + contaned n x. Smlarly, we say that W s a wtness set for X f: () for all w W, a (w) (1 3ɛ/2)B and () for all x X there s w W contanng x. As ndcated by the prevous dscusson, gven wtness sets W + and W for X + and X, we drectly get the bound Pr skewm(ɛ, x) skewp(ɛ, x) Pr skewm(ɛ, w) skewp(ɛ, w).,x X +,x X,x X +,x X,w W +,w W (2.3) Usng ths nequalty, together wth (2.1) and (2.2), we can bound the probablty that x S s bad n term of the sze of wtnesses sets. Lemma 2.9 Suppose that, for ( all ) [m], there are wtness sets for X + Pr(x S s bad ) 8mM exp. ɛ3 B 33 and X of sze at most M. Then The usefulness of defnng wtnesses as such s of course contngent upon the ablty of fndng wtness sets whch are much smaller than X + and X. One reasonable choce of a wtness set for, say, X + s the collecton of all of ts mnmal sets; unfortunately, ths may not gve a wtness set of small enough sze. However, notce that a wtness set need not be a subset of X + (or even X ). Allowng elements outsde X + gves the flexblty of obtanng wtnesses whch are assocated to multple smlar mnmal elements of X +, whch s effectve n reducng the sze of wtness sets. 2.3 Good wtnesses for almost 1-dm columns Gven the prevous lemma, our task s to fnd small wtness sets. Unfortunately, when the (π t, a t ) s le n a space of dmenson at least 3, X + and X may contan many (Ω(n)) dsjont sets (see Fgure 2.1), whch shows that n general we cannot fnd small wtness sets drectly. Ths sharply contrasts wth the case where the (π t, a t ) s le 6

a 2 a 1 π Fgure 2.1: Case m = 2, columns (π t, a t ) equal to (1, sn( π 4 + δt), cos( π 4 + δt)) for suffcently small δ > 0, represented by black dots. Each segment {t, t + 1,..., t + j} can be lnearly classfed and hence belongs to X. Furthermore, all segments {j2b,..., (j + 1)2B} belong to X +, whch then contans Ω( n B ) dsjont sets. Smlar analyss holds for X. n a 2-dmensonal subspace of R m+1. In ths case, t s not dffcult to show that X s a unon of 2 chans wth respect to ncluson. In the specal case where the a t s le n a 1-dmensonal subspace of R m, we show that X s actually a sngle chan (Lemma 2.11), and therefore we can take W + as the mnmal set of X + and W as the maxmal set of X. Due to the above observatons, we focus on LP s whose a t s le n few 1-dmensonal subspaces. In ths case, X + and X are suffcently well-behaved so that we can fnd small (ndependent of n) wtness sets. Lemma 2.10 Suppose that there are K m 1-dmensonal subspaces of R m whch contan the a t s. Then there are wtness sets for X + and X of sze at most (O( K ɛ log K ɛ ))m. Assumng the hypothess of the lemma, partton the ndex set [n] nto C 1, C 2,..., C K such that for all j [K] the columns {a t } t Cj belong to the same 1-dmensonal subspace. Equvalently, for each j [K] there s a vector c j of l -norm 1 such that for all t C j we have a t = a t c j. An mportant observaton s that now we can order the columns (locally) by the rato of proft over budget occupaton: wthout loss of generalty assume that for all j [K] and t, t C j wth t < t π, we have t a t π t. 4 a t Gven a classfcaton x, we use x Cj to denote ts projecton onto the coordnates n C j ; so x Cj s the nduced classfcaton on columns wth ndces n C j. Identfyng sngleton sets wth ther only element, we use the product notaton for x = j [K] x C j. Smlarly, we defne X Cj = {x Cj : x X } as the set of all classfcatons nduced n the columns n C j. Strengthenng a prevous observaton, the man property that we get from workng wth 1-dm subspaces s the followng. Lemma 2.11 For each j [K], the sets n X Cj are prefxes of C j. Proof. Fx j [K]. Consder a set x X and let p be a dual vector such that x(p) = x. Let t be the last ndex of C j whch belongs to x Cj ; ths mples that π t > pa t = pc j a t π, or alternatvely t > pc j. By the orderng of the columns, for all t C j smaller than t we have π t a t π t a t a t defnton of t t follows that x Cj = {t C j : t t }, a prefx of C j ; ths concludes the proof. > pc j and hence t x Cj. By To smplfy the notaton, fx [m] for the rest of ths secton, so we am at provdng wtness sets for X + and X. It s nstructve to map a classfcaton x = K j=1 x C j to a box wth sdes of length a (x Cj ). The dea for producng a wtness set for X + s smple: for x X +, we nclude n the wtness set a classfcaton w whose sdes are prefxes of the C j s obtaned by shortenng the sdes of x (more specfcally, roundng ther lengths down to a power of (1 + ɛ)). The pont s that all boxes n X + whch have sdes n the same powers of (1 + ɛ) wll gve rse to the same wtness. Usng the fact that reasonable boxes n X + have sde lengths upper bounded by O(B), ths gves a wtness set of sze only dependent on ɛ, B and m. 4 Notce that ths rato s well-defned snce by assumpton a t 0 for all t [n]. 7

To make ths formal, we frst classfy boxes accordng to ther sze lengths. Start by coverng the nterval [0, B + m] wth ntervals {I l } l L, where I 0 = [0, ɛb 4K ) and I l = [ ɛb 4K (1 + ɛ 4 )l 1, ɛb 4K (1 + ɛ 4 )l ) for l > 0 and L = 8K {0,..., log 1+ɛ/4 ɛ } (note that snce B m, we have B + m 2B). Defne Bl,j as the set of classfcatons x X Cj whose budget occupaton a (x) les n the nterval I l. For v L K, defne the famly of classfcatons B v = j Bv j,j. Notce that every box n Bv has smlar (wthn (1 + ɛ/4)) sde lengths. Also note that the Bv s may nclude classfcatons not n X and may not nclude classfcatons n X whch have occupaton a (.) greater than B + m. Now consder a non-empty B v. Let wv be the ncluson-wse smallest element n Bv. Notce that such unque smallest element exsts: snce X Cj s a chan, so s B v j,j, and hence wv s the product (over j) of the smallest element n B v j,j. Smlarly, let wv denote the largest element n Bv. Intutvely, wv and wv wll serve as wtnesses for all the sets n B v. Fnally, defne the wtness sets by addng the w v and wv s of approprate sze correspondng to meanngful Bv s: set W + = {w v : Bv X =, a (w v ) (1 ɛ/2)b} and W = {w v : Bv X =, a (w v ) (1 3ɛ/2)B}. It s not too dffcult to see that ndeed, say, W + s a wtness set for X + : If x X + belongs to some B v, then w v belongs to W+ and s easly shown to be a wtness for x. However, f x does not belong to any B v, by havng too large sdes, the dea s to fnd a smaller set x x whch belongs to some B v and to X, and then use wv as a wtness for x. We note that consderng B v s for sde lengths at most B + m and only addng wtnesses for Bv s whch ntersect X are crucally used when boundng the sze of W + and W. Lemma 2.12 The sets W + and W are wtness sets for X + and X. 8K Clearly these wtness sets have sze at most (log 1+ɛ/4 ɛ + 1) K. Although ths sze s ndependent of n, t s stll unnecessarly large snce t only uses locally for each C j the fact that X conssts of lnear classfcatons; n partcular, t does not use the dmenson of the ambent space R m. Suppose that J K, of cardnalty m, s such that the drectons {c j } j J form a bass of R m. Knowng the partal classfcaton x(p) Cj, or more precsely the value of pc j, for all j J completely determnes the whole classfcaton x(p). Smlarly, knowng that x(p) Cj B v j for all j J should gve some nformaton about whch B v j s x(p) Cj can belong to for j / J; ths ndcates that there are not enough degrees of freedom to allow a lnear classfcaton n B v for each v LK. The dffculty n makng ths argument formal s that the latter nformaton does not completely determne whch B v the classfcaton x(p) belongs to. The dea s not to use a fxed set J of ndces, but look at the whole K smultaneously. Lemma 2.13 At most (O( K ɛ log K ɛ ))m of the B v s contan an element from X. Proof. In order to capture the fact that our classfcaton s obtaned va dual vectors n R m, we move from analyzng classfcatons to analyzng dual vectors. For v L K defne P v as the set of non-negatve dual vectors p such that x(p) belongs to B v. It suffces to prove that at most (O( K ɛ log K ɛ ))m of the famles P v s are nonempty. The man dea s to use that fact that the P v s come from a hyperplane arrangement [23] n R m. To start, for j [K] and l L defne Pj l = {p Rm + : x(p) Cj B,j l }. Snce x(p) Bv f and only f for all j [K] we have x(p) Cj B v j,j, t follows that P v = j P v j j. Let τ j l denote the frst ndex n C j such that the prefx {t C j : t τj l} occupes the budget to an extent n I l. Usng Lemma 2.11 and the fact that the a t s are non-negatve, we get that B,j l s the set of all prefxes of C j whch contan τj l l+1 but do not contan τj. Moreover, notce that the set x(p) Cj contans τj l f and only f π τj l > paτ j l. It then follows from these observatons we can express the set Pj l usng lnear nequaltes: P j l = {p Rm + : π τ l > pa τ j l, π j τ l+1 pa τ l+1 j }. Snce P v = j P v j j, j we have that P v s gven by the ntersecton of halfspaces defned by hyperplanes of the form π τ l = pa τ j l and j p = 0. 8

So consder the arrangement gven by all hyperplanes {π τ l = pa τ j l } j j [K],l L and {p = 0} m =1. Gven a face F n ths arrangement and a set P v, ether F s contaned n P v or these sets are dsjont. Snce the faces of the arrangement cover R m, t follows that each non-empty P v contans at least one these faces. Notce that the arrangement s defned by K L K m O( Km ɛ log K ɛ ) hyperplanes, where the last nequalty uses the fact that log(1 + ɛ 4 ) ɛ log(1 + 1 4 ) holds (by concavty) for ɛ [0, 1]. It s known that an arrangement wth h m hyperplanes n R m has at most ( ) eh m m faces (see Secton 6.1 of [23] and p. 82 of [24]). Usng the concluson of the prevous paragraph, we get that there are at most (O( K ɛ log K ɛ ))m non-empty P v s and the result follows. Ths lemma mples that W + and W each has sze at most (O( K ɛ log K ɛ ))m, whch then proves Lemma 2.10. Fnally, applyng Lemma 2.9 we conclude the proof of Lemma 2.2. 3 Robust OTP In ths secton we consder (LP) wth columns that may not belong to few 1-dmensonal subspaces. Gven the results of the prevous secton, the dea s clear: we would lke to perturb the columns of ths LP so that t belongs to few 1-dm subspaces and such that an approxmate soluton ths perturbed LP s also an approxmate soluton for the orgnal one. More precsely, we wll obtan a set of vectors Q R m and transform each the vector a t nto ã t whch s a scalng of a vector n Q, and we let the rewards π t reman unchanged. A basc but crucal observaton s that solutons to an LP are robust to slght changes n the the constrant matrx. The followng lemma makes ths precse and wll gude us to obtanng the desred set Q. Lemma 3.1 Consder real numbers π 1,..., π n and vectors a 1,..., a n and ã 1,..., ã n n R m + such that ã t a t ɛ m+1 at. If x s an ɛ-approxmate soluton for (LP) wth columns (π t, ã t ) and rght-hand sde (1 ɛ)b, then x s a 2ɛ-approxmate soluton for the LP (LP). Perturbng the columns. To smplfy the notaton, set δ = ɛ that 1/δ s ntegral. m+1 ; also, for smplcty of exposton we assume When constructng Q, we want the rays spanned by the each of ts vectors to be unform over R m +. Naturally, we focus on the ntersecton of these rays and the unt l sphere: we set Q to be a δ-net of the latter. More explctly, we take Q to be the vectors n {0, δ, 2δ, 3δ,..., 1} m whch have l norm 1. Note that Q = (O( m ɛ ))m. Gven a vector a t wth l -norm 1, we set the transformed vector ã t to be the vector n Q closest (n l ) to a t. More generally, we let ã t = a t q t, where q t s the vector n Q closest to a t a t. By defnton of Q, for every vector v R m of unt l -norm, there s a vector q Q wth v q δ. Usng ths observaton, t follows that the vectors ã t satsfy the property requred n Lemma 3.1: a t ã t = a t a t a t q t δ a t. Algorthm Robust OTP. One way to thnk of the algorthm Robust OTP s that t works n two phases. Frst, t transforms the vectors a t nto ã t as descrbed above. Then t returns the soluton obtaned by runnng the algorthm OTP over the LP wth columns (π t, ã t ) and rght-hand sde (1 ɛ)b. Notce that ths algorthm can ndeed be mplemented to run n an onlne fashon. Puttng together the dscusson n the prevous paragraphs and the guarantee of OTP for almost 1-dm columns gven by Theorem 2.1 wth K = Q = (O( m ɛ ))m, we obtan the followng theorem. ( ) Theorem 3.2 Fx ɛ (0, 1] and suppose that B Ω m 2 log m ɛ 3 ɛ. Then the algorthm Robust OTP returns a soluton to the onlne (LP) wth expected value at least (1 10ɛ)OPT. 9

4 Robust DPA In ths secton we descrbe our fnal algorthm, whch has an mproved dependence on 1/ɛ. Followng [1], the dea s to update the dual vector used n the classfcaton as new columns arrve. More precsely, we use the frst 2 ɛn columns to classfy columns 2 ɛn + 1,..., 2 +1 ɛn. Ths leads to mproved generalzaton bounds, whch n turn gve the reduced dependence on 1/ɛ. The algorthm Robust DPA (as the algorthm DPA) can be seen as a combnaton of solutons to multple sampled LP s, obtaned va a modfcaton of OTP denoted by (s, δ)-otp. Algorthm (s, δ)-otp. Ths algorthm ams at solvng the LP (2s, 1)-LP and can be descrbed as follows: t fnds an optmal dual soluton (p, α) for (s, (1 δ))-lp and sets x t = x(p) t (for s < t 2s, 0 otherwse), but stops pckng columns to guarantee that 2s t=s aσ(t) x σ(t) s n B. The analyss of (s, δ)-otp s smlar to the one employed for OTP. The man dfference s that ths algorthm tres to approxmate the value of the random LP (2s, 1)-LP. Ths requres a partton of the bad classfcatons whch s more refned than smply splttng nto X + and X, and wtness sets need to be redefned approprately. Nonetheless, usng these deas we can prove the followng guarantee for (s, δ)-otp. Lemma 4.1 Suppose that there are K m 1-dm subspaces of R m contanng the columns a t s. Fx an nteger s and a real number δ (0, 1/10) such that δ2 sb n Ω(m ln K δ ). Then algorthm (s, δ)-otp returns a feasble soluton for (2s, 1)-LP wth expected value at least (1 3δ)E[OPT(2s)] E[OPT(s)] δ 2 OPT. Algorthm Robust DPA. log(1/ɛ) s an nteger. In order to smplfy the descrpton of the algorthm, we assume n ths secton that Agan the algorthm Robust DPA be thought of n two phases. In the frst phase t converts the vectors a t nto ã t, just as n the frst phase of Robust OTP. In the second phase, for = 0,..., log(1/ɛ) 1, t runs (ɛ2 n, ɛ/2 )- OTP over (LP) wth columns (π t, ã t ) and rght-hand sde (1 ɛ)b to obtan the soluton x. The algorthm fnally returns the soluton x consstng of the unon of x s: x = x. Note that the second phase corresponds exactly to usng the frst ɛ2 n columns to classfy the columns ɛ2 n + 1,..., ɛ2 +1 n. Ths relatve ncrease n the sze of the tranng data for each learnng problem allow us to reduce the dependence of B on ɛ n each of the teratons, whle the error from all the teratons telescope and are stll bounded as before. Furthermore, notce that Robust DPA can be mplemented to run onlne. The analyss of Robust DPA reduces to that of (s, δ)-otp. That s, usng the defnton of the parameters of (s, δ)-otp used n Robust DPA and Lemma 4.1, t s routne to check that the algorthm produces a feasble soluton whch has expected value (1 ɛ)opt. Ths s formally stated n the followng theorem. Theorem 4.2 Fx ɛ (0, 1/100) and suppose that B Ω( m2 ln m ɛ 2 ɛ ). Then the algorthm Robust DPA returns a soluton to the onlne LP (LP) wth expected value at least (1 50ɛ)OPT. 5 Open problems A very nterestng open queston s whether the technques ntroduced n ths work can be used to obtan mproved algorthms for generalzed allocaton problems [14]. The dffculty n ths problem s that the classfcatons of the columns are not lnear anymore; they essentally come from a conjuncton of lnear classfers. Gven ths addtonal flexblty, havng the columns n few 1-dmensonal subspaces does not seem to mpose strong enough propertes n the classfcatons. It would be nterestng to fnd the approprate geometrc structure of the columns n ths case. Of course a drect open queston s to mprove the lower or upper bound on the dependence on the rght-hand sde B to obtan (1 ɛ)-compettve algorthms. 10

References [1] S. Agrawal, Z. Wang, and Y. Ye. A dynamc near-optmal algorthm for onlne lnear programmng. http: //arxv.org/abs/0911.2974. [2] Moshe Babaoff, Mchael Dntz, Anupam Gupta, Ncole Immorlca, and Kunal Talwar. Secretary problems: weghts and dscounts. In SODA, pages 1245 1254, 2009. [3] Moshe Babaoff, Ncole Immorlca, Davd Kempe, and Robert Klenberg. A knapsack secretary problem wth applcatons. In APPROX-RANDOM, pages 16 28, 2007. [4] Moshe Babaoff, Ncole Immorlca, Davd Kempe, and Robert Klenberg. Onlne auctons and generalzed secretary problems. SIGecom Exchanges, 7(2), 2008. [5] MohammadHossen Baten, MohammadTagh Hajaghay, and Morteza Zadmoghaddam. Submodular secretary problem and extensons. In Proceedngs of the 13th nternatonal conference on Approxmaton, and 14 the Internatonal conference on Randomzaton, and combnatoral optmzaton: algorthms and technques, APPROX/RANDOM 10, pages 39 52, Berln, Hedelberg, 2010. Sprnger-Verlag. [6] John R. Brge and Franços Louveaux. Introducton to Stochastc Programmng. Sprnger Seres n Operatons Research and Fnancal Engneerng. Sprnger, 1997. [7] Allan Borodn and Ran El-Yanv. Onlne computaton and compettve analyss. Cambrdge Unversty Press, 1998. [8] Nv Buchbnder and Joseph (Seff) Naor. Onlne prmal-dual algorthms for coverng and packng. Math. Oper. Res., 34:270 286, May 2009. [9] Felpe Cucker and Dng Xuan Zhou. Learnng Theory: An Approxmaton Theory Vewpont. Cambrdge Unversty Press, 2007. [10] Nkhl R. Devanur, Kamal Jan, Balasubramanan Svan, and Chrstopher A. Wlkens. Near optmal onlne algorthms and fast approxmaton algorthms for resource allocaton problems. In Yoav Shoham, Yan Chen, and Tm Roughgarde, edtors, ACM Conference on Electronc Commerce, pages 29 38. ACM, 2011. [11] Nkhl R. Devenur and Thomas P. Hayes. The adwords problem: onlne keyword matchng wth budgeted bdders under random permutatons. In John Chuang, Lance Fortnow, and Pearl Pu, edtors, ACM Conference on Electronc Commerce, pages 71 78. ACM, 2009. [12] L. Devroye and T. Wagner. Dstrbuton-free performance bounds for potental functon rules. IEEE Transactons on Informaton Theory, 25:601 604, 1979. [13] E. B. Dynkn. The optmum choce of the nstant for stoppng a Markov process. Sovet Math. Dokl, 4, 1963. [14] Jon Feldman, Monka Henznger, Ntsh Korula, Vahab S. Mrrokn, and Clfford Sten. Onlne stochastc packng appled to dsplay ad allocaton. In Mark de Berg and Ulrch Meyer, edtors, ESA (1), volume 6346 of Lecture Notes n Computer Scence, pages 182 194. Sprnger, 2010. [15] John P. Glbert and Frederck Mosteller. Recognzng the Maxmum of a Sequence. Journal of the Amercan Statstcal Assocaton, 61(313):35 73, 1966. [16] Gagan Goel and Aranyak Mehta. Onlne budgeted matchng n random nput models wth applcatons to adwords. In Shang-Hua Teng, edtor, SODA, pages 982 991. SIAM, 2008. 11

[17] Everett III. Generalzed lagrange multpler method for solvng problems of optmum allocaton of resources. Operatons Research, 11:399 417, 1963. [18] Sungjn Im and Yajun Wang. Secretary problems: Lamnar matrod and nterval schedulng., 2011. [19] Rchard M. Karp, Umesh V. Vazran, and Vjay V. Vazran. An optmal algorthm for on-lne bpartte matchng. In STOC, pages 352 358. ACM, 1990. [20] Clare Kenyon. Best-ft bn-packng wth random order. In Symposum on Dscrete Algorthms, pages 359 364, 1996. [21] Robert Klenberg. A multple-choce secretary algorthm wth applcatons to onlne auctons. In Proceedngs of the sxteenth annual ACM-SIAM symposum on Dscrete algorthms, SODA 05, pages 630 631, Phladelpha, PA, USA, 2005. Socety for Industral and Appled Mathematcs. [22] Samuel Kutn and Partha Nyog. Almost-everywhere algorthmc stablty and generalzaton error. In Uncertanty n Artfcal Intellgence, pages 275 282, 2002. [23] Jr Matousek. Lectures on Dscrete Geometry. Sprnger-Verlag New York, Inc., Secaucus, NJ, USA, 2002. [24] Jr Matousek and Jaroslav Nesetrl. Invtaton to Dscrete Mathematcs. Oxford Unversty Press, 1998. [25] Jose A. Soto. Matrod secretary problem n the random assgnment model. In Dana Randall, edtor, SODA, pages 1275 1284. SIAM, 2011. [26] V. Vazran. Approxmaton algorthms. Sprnger, 2001. 12