Optimal Delivery of Sponsored Search Advertisements Subject to Budget Constraints

Transcription

1 Optmal Delvery of Sponsored Search Advertsements Subject to Budget Constrants Zoē Abrams Yahoo!, Inc. 701 Frst Avenue Sunnyvale, CA, USA Ofer Mendelevtch Yahoo!, Inc. 701 Frst Avenue Sunnyvale, CA, USA John A. Tomln Yahoo! Research 701 Frst Avenue Sunnyvale, CA, USA ABSTRACT We dscuss an aucton framewor n whch sponsored search advertsements are delvered n response to queres. In practce, the presence of bdder budgets can have a sgnfcant mpact on the ad delvery process. We propose an approach based on lnear programmng whch taes bdder budgets nto account, and uses them n conjuncton wth forecastng of query frequences, and prcng and ranng schemes, to optmze ad delvery. Smulatons show sgnfcant mprovements n revenue and effcency. Categores and Subject Descrptors G.4 [Mathematcs of Computng]: Mathematcal Software Algorthm desgn and analyss General Terms Algorthms, Performance Keywords column generaton, sponsored search, budgets, advertsng 1. INTRODUCTION Search engne companes such as Yahoo!, Google, and MSN, earn mllons of dollars each day by auctonng off advertsement slots. In addton to the bds, there are two essental sets of parameters of the system that contrbute to ths revenue the dstrbuton of query frequences and the advertser budgets. The query frequences lmt the number of tmes the search engne can dsplay ts advertsers. Query frequences are not under the control of the advertsers or the search engne. It s well nown [17] that search engne query frequency dstrbuton typcally has a few queres wth large volume (and large revenue and effcency), and a very large number of queres wth extremely low volume. Therefore, we overcome most of the uncertanty n query volumes by selectng a relatvely small subset of queres whose near-term volumes are Permsson to mae dgtal or hard copes of all or part of ths wor for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. EC 07, June 13 16, 2007, San Dego, Calforna, USA. Copyrght 2007 ACM /07/ $5.00. easy to forecast, yet stll consttute a large amount of the overall revenue. Advertsers or ther agents, on the other hand, do have the ablty to control ther budgets. An advertser s budget may constran the number of tmes ther ads appear, even when they have made a hgh bd on a query term. One mght as why they would wsh to do so. There are several possble reasons, among them: protecton aganst clc-fraud, an over-all company advertsng budget, and the desre to control the allocaton of that budget between varous meda and campagns. Whatever the reason, the search engne must determne whch advertsers to dsplay for whch queres, gven these constrants. Prcng and ranng are addtonal parameters of the system that nfluence revenue. The VCG mechansm [19, 5, 10] can be appled, but n practce search engnes predomnantly use the generalzed second prce (GSP) aucton (see Edelman et al. [9] for more detals). Advertsers are raned accordng to the product of the prce they bd for recevng a clc, and a qualty score. Each s charged a prce per clc equal to the mnmum they would have had to pay to mantan ther ran. The problem we consder s how to allocate advertsers to queres such that budget constrants are satsfed and effcency (or revenue) s maxmzed. Ths problem s posed as a lnear program that taes a global vew, and coordnates advertser spend across the chosen tme-perod, such as the next hour, or an entre day. Wth the combned nowledge of forecast query volumes, advertser budgets, advertser bds and the prcng and ranng algorthm, we formulate a comprehensve mathematcal framewor. 1.1 Related Wor Incorporatng advertser budgets nto the maretplace desgn s recognzed as crucal, and a growng amount of research addresses ths subject. Several recent papers have consdered the effect of budgeted advertsers specfcally wthn the context of nternet eyword auctons (e.g. [3] and [1]). The felds of on-lne and approxmaton algorthms have also approached the topc. The wdely noted paper by Mehta et al. [18] presents an onlne algorthm, wth a compettve rato 1 1, when the volume and sequence of ɛ queres s unnown. Mahdan et al. [13] extend ths wor by consderng a tradeoff dependng on the level of accuracy n volume predctons. We wll return to ths algorthm when we dscuss possble future wor n secton 6. There has been a consderable amount of wor done n the related feld of (one-off) mult-tem combnatoral auc-

2 tons, leadng to algorthms whch are practcal and effcent n a wde varety of settngs. (See Schrage[16] for a useful survey). These algorthms typcally employ lnear or nteger programmng (LP/IP), explotng a very mature and effcent group of technologes. Although the characterstcs of the sponsored search problem we consder mae t dffcult to apply the nown technques drectly for nstance the frequently repeated nature of the auctons the approach s appealng. In partcular the wor of Detrch and Forrest [8], whch uses column generaton to determne the set of wnnng bds, s suggestve. 1.2 Paper Outlne In the remander of ths paper we wll frst present a smple example whch motvates the need for algorthms whch tae advertser budgets nto account. We then present the notaton, assumptons and algorthmc framewor necessary for our approach. Ths s followed by a detaled descrpton of the algorthm, a descrpton of the mplementaton, and our prelmnary computatonal results. Fnally, we outlne some future areas of follow-up research. 2. MOTIVATING EXAMPLE To llustrate how crtcal the proper consderaton of advertser budgets mght be, and how poorly a greedy algorthm mght perform n the presence of these budgets, we examne the followng hghly smplfed example. Suppose we have two queres q 1, and q 2. A sngle advertser s dsplayed for each query, all advertsers have the same expected clcthrough rate, and the relevant bds and budgets are shown n Table 1. Table 1: Bds and Budgets Bdder Bd for q 1 Bd for q 2 Budget b 1 C 1 + ɛ C 1 C 1 b 2 C 1 0 C 1 b 3 C 1 ɛ C 1 ɛ 2C 1 As Table 2 shows, a straghtforward applcaton of GSP that dsplays the hghest bdder s not optmal from a revenue perspectve. To see ths, let us assume that wthn the budgets tme ntervals, q 1 appears, followed by q 2. Then bdder b 1 would pay the second bd prce C 1 (bd by bdder b 2) and exhaust hs budget on query q 1. When query q 2 arrves only bdder b 3 s now elgble, and wll only pay the reserve prce, whch we tae to be ɛ. Now consder the alternatve allocaton that shows b 2 for q 1, producng revenue C 1 ɛ, then shows b 1 for query q 2, also producng revenue C 1 ɛ for a total of 2C 1 2ɛ, or nearly double the revenue of the greedy allocaton. Thus, a more global vewpont, one whch taes nto account the eywords throughout the tme perod, and the budget stuaton for each advertser, can lead to ncreased revenues. When ths smple example s complcated many fold by thousands of queres, bdders and budgets, the potental for neffcency s obvous. Table 2: Allocaton Optons Allocaton Shown for q 1 Shown for q 2 Total Revenue Greedy b 1 b 3 C 1 + ɛ Optmal b 2 b 1 2C 1 2ɛ 3. FINDING THE OPTIMAL ALLOCATION - PROBLEM DEFINITION Let the aucton maretplace consst of a set of queres Q = {q 1, q 2,..., q N } and bdders B = {b 1, b 2,..., b M }. We usually use smply the ndex to denote query q and the ndex j to refer to bdder b j. The bddng state of the maretplace at tme t s defned by a (sparse) matrx A(t), where A j(t) s the bd amount that the j-th bdder s s bddng on the -th query q. We assume a statc bddng state (A(t) = A) over some tme-slot. Whle realzng that n practce ths s not true due to bd management (ether manually or by software), smulatons suggest the effects of these types of changes are neglgble. We wll accommodate ths dynamc aspect by frequently resolvng our model as the data evolves, as s done n many other applcatons. It s also assumed that bds do not change n response to the allocaton rule (.e. ths s not an equlbrum analyss). For each bdder b j, we denote by d j the daly budget lmt specfed by the bdder. d j s an account level lmt,.e., t represents a spend lmt across all queres for that account 1. If a budget s not specfed, we refer to ths bdder as an unbudgeted bdder and set d j =. Gven a tme-slot of nterest, let v be a determnstc estmate of the number of tmes each query q wll appear wthn that tme slot. For each query, we defne the bddng landscape as an ordered set of bdder ndces L = {j p : j p B, p = 1,..., P }, where the ndces j p are sorted by some ranng functon, and P s the number of bdders n the landscape for query q. In prncple, we could now formulate an optmzaton model n whch the varables corresponded to the number of tmes each avalable ad was shown wth each query. However, such a model would requre complex constrants and auxlary dscrete features to ensure the orderng requred by the bddng landscapes for each query. We qucly abandoned such an approach n favor of a column-orented approach whch enforces the precedence explctly. Ths modelng approach has a consderable hstory, and was qute recently used n a related model for tem allocaton n combnatoral auctons (see [8]). The algorthm here s qute dfferent, snce the payoffs are not determnstc and the aucton s frequently repeated, among other features, but the sprt s smlar. We now defne the crucal concept of a slate of ads correspondng to (and n fact a subset of) the bddng landscape. These slates wll correspond to the columns and varables of the lnear program (LP) we formulate below. Each bddng landscape L s mapped nto a set of slates L, each beng a unque subset of L whch can be obtaned by deletng members of L (whle mantanng the orderng) and then truncatng (f necessary) to P (at most P ) members. More formally, the -th slate for ad ncludes a unque subset (of length P ) of the ndces of L, and s defned as L = {jl : jl L, l P P }, where P s the maxmum number of slots avalable for advertsng on the page. The ndces n L are ordered as n L (.e., n order of ranng). By conventon, f there are less than P +1 members an addtonal dummy member, bddng the reserve prce, may be added for the purpose of computng second-bd prces. Two ranng (orderng) methods have been commonly used. The frst and older method, sometmes nown as the 1 We could also assocate budgets wth other enttes such as a campagn.

3 Overture method s bd-ranng, so that (by a slght abuse of notaton): A 1... A j... A P Ths scheme has now been generally superseded by expected revenue-ranng where the bdders on the term are raned by product of the bd value A j and a qualty score or clcablty value Q j whch s assumed to ncorporate the lelhood of the ad beng clced on, based on relevance of the ad to the query, among other factors. The bdders are thus ordered so that: A 1Q 1... A jq j... A P Q P Hence n any slate for query we expect the subset of ads chosen to also satsfy: A j 1 Q j 1... A j p Q j p A j Q j.... Snce the bds reflect the maxmum amount that a bdder s wllng to pay for a clc for ths query, ths mples that for bdder j p to hold hs poston n the slate hs prce per clc (PPC), denoted P P C j p, should satsfy: P P C j p A,j Q,j Q,j p (n practce we may add some small quantty to the rght hand sde and treat ths as an equaton). Note that ths mples a modfed second bd aucton where the prce per clc actually pad depends on the next bd, and the rato of clcabltes. Snce settng all the Q j to 1 would result n the same slate and PPC as f we had used bd ranng, we shall gnore bd-ranng as a specal case and consder only revenue ranng. Let us also ntroduce a clc through rate (CTR) denoted T,j p,p for the rate at whch the ad from bdder jp n poston p n slate for query s clced on per showng of the slate. These CTRs are estmated based on hstorcal clc data as well as other factors, and wll be used both n the algorthm and the smulaton descrbed n Secton 5.4. Assumng ndependence of the CTRs, we can now express the expected revenue-per-search (rps) n our model for ths slate and ths query as the sum of the ndvdual expected revenues per clc: P r = A,j Q,j Q,j p T,j p,p (1) We now ntroduce the varables x, whch represent the number of tmes slate L appears n the gven tme slot. The expected total revenue for the tme-slot, over all queres, s therefore: N r x (2) =1 We represent the total spend for each bdder j as N c j x (3) =1 where: Q,j A c j =,j T Q,j,j p,p 0 < p P P p 0 otherwse (4) s the expected cost to bdder j of appearng (n poston p) n slate for query. 3.1 Lnear Programmng Formulaton We may now formally defne the followng lnear programmng (LP) problem: Indces = 1,..., N The queres j = 1,..., M The bdders = 1,..., K The slates (for query ) Data d j v c j r Varables x Constrants (Budget) (Inventory) The total budget of bdder j Expected number of occurrences of eyword Expected cost to bdder j f slate s shown for eyword Expected revenue from slate for eyword Number of tmes to show slate for eyword Revenue Objectve Maxmze c j x d j j (5) x v (6) r x 3.2 Alternate Objectve Functons Whle maxmzng expected overall revenue s obvously attractve from the auctoneer s pont of vew, t may not appear so attractve to the ndvdual bdders. However the LP approach can accommodate a varety of alternate objectve functons. One possblty s to maxmze expected effcency under the assumpton that the bdders have expressed ther true value for a clc by the bds. Whle ths may not always be a vald assumpton, t s reasonable to vew the bd as somewhat commensurate wth a bdder s true value. Ths would correspond to an LP objectve of: Value Objectve Maxmze, p A,j p T,j p,p x. Note that ths objectve has coeffcents computed from the frst prces, not second prces. Even more smply we may decde to optmze the number of clcs obtaned. Ths s accomplshed by usng an LP objectve: Clcs Objectve Maxmze, p T,j p,p x. For these partcular two objectves, we could also reformulate the LP to have a polynomal number of varables and constrants, versus usng the more nvolved column generaton approach. Regardless of how the solutons are com-

4 puted, t s obvous that other objectves could be formulated. If we use the column generaton approach, we mght tae some composte weghted sum of the value, clcs, and revenue objectves. In general the Value Objectve has advantages when consderng the problem from the perspectve of economcs. 4. COLUMN GENERATION It s clear that the number of potental columns n the above LP could be very large ndeed. For each query, the number of possble slates s exponental n the number of budgeted bdders on the query, and there are many queres. It s therefore mpractcal to enumerate all the possble columns correspondng to the varables x. If there were no bndng budgets, then the optmal soluton would be the trval one consstng only of the top P bdders from the landscape (called a base slate) beng shown for each query. In fact we would expect, and experment confrms, that for many queres, the base slate wll be the only one shown even when there are actve (bndng) budget constrants, whle some others use multple slates usually not more than a handful. The trc s to now whch handful. The same stuaton occurs n many other LP applcatons where each column represents one of many possble complex actvtes. Early examples ncluded models where the column represented a path through a networ or a cuttng pattern for cuttng up stoc szes of materal. These partcular models requre that small auxlary models be solved to generate relevant columns shortest path problems n the former case, and a napsac problem n the latter. Usng a conventonal column-generaton approach (see [8], [12] ), we do not attempt to generate every slate L a pror, but to generate an ntal subset (say the L ) and then generate columns as needed usng the dual values of the lnear program. Consderng to begn wth the revenue maxmzng objectve, let π j be the margnal value for bdder j s budget,.e. the smplex multplers[6] for the j th constrant (5) and let γ be the margnal value for the th eyword,.e. the smplex multplers for the th constrant (6), then a column correspondng to slate L (and hence to varable x ) can be proftably ntroduced nto the model f: r c j π j γ > 0 (7) j L For each eyword we see to maxmze r j L c j π j (or equvalently, mnmze j L c j π j r ) over all legal slates L. If a slate s found such that (7) s satsfed, the correspondng slate and ts varable are ntroduced nto the problem. If no such slate exsts (for any ) then an optmal soluton has been obtaned. Loong at the structure of the coeffcents n (7), we see that: and P r = T,j p,p A,j c j p = T,jp,p A,j Q,j Q,j p (8) Q,j Q,j p (9) so the subproblem s to maxmze (for the gven π j): P F (π) = T,j p p A,j Q,j Q,j p (1 π j p ) (10) over all legal slates L. When the number of budgeted bdders for a query s not too large ths may be done by enumeratng all legal subsets of L, evaluatng (10) on the fly, untl a L s found for whch F > γ. The correspondng column s then added to the problem. If no new column satsfyng ths condton s found, the present soluton s optmal. If there are more than a few such legal subsets, we may need to algorthmcally generate columns (slates) whch maxmze (10) and test whether they satsfy F > γ. If the maxmzng slate does not satsfy ths nequalty we may pass on to the next query, snce no mprovng slate can be found for the present set of π j values. If ths s true for all queres, then the current soluton s optmal. The overall algorthm proceeds by generatng mprovng sets of slates n ths way, then re-solvng the LP, untl no further mprovement s possble, or some other heurstc termnaton crteron s met (such as percentage mprovement n the objectve). The actual algorthm used to generate mprovng slates can be somewhat ntrcate, and s gven n detal n a companon paper[15]. It suffces to say here that the slate generaton algorthm s a form of cardnalty constraned napsac problem[7], complcated by the orderng requrement and the fact that only certan tems can be omtted. The algorthm gven n [15] s an effcent form of dynamc program, whch gven the dmensons of the problems we are consderng here (perhaps a dozen ads chosen from a few dozen canddates) s fast enough to be solved many thousands of tmes n the column generaton process. Column generaton extends to the alternate objectve functons we have suggested. In partcular, f the maxmum value objectve s chosen, we see that the objectve functon coeffcents are: A,j p T,j p,p (11) p whch must replace r n (7). The functon we wsh to maxmze n ths case s then: P F (π) = T,j p p (A,j p A,j Q,j Q,j p π j p ) (12) Once agan, algorthmc detals are gven n [15]. In much the same way, we may generate columns for the maxmum clcs objectve by replacng A,j p by 1 n (12). Even though we have sad that complete enumeraton of the slates s mpractcal, t turns out to be advantageous n practce to partally enumerate them. We can generate a subset of slates for each query that nclude the base slate, and a subset obtaned by omttng a relatvely small subset of the hghest raned budgeted bdders. We do ths because: 1. The most hghly raned bdders are the most lely to spend heavly, and therefore to consume ther budgets most qucly. 2. A substantal ntal set of slates wll lead to a more realstc set of ntal π j values after solvng the frst LP,

5 gvng the column generaton algorthm values closer to those of the full LP. It turns out that we can usually obtan over 98% of the true optmal value by enumeratng several qute large subsets, but ths s wasteful, and much slower than usng the column generaton algorthm, whch acheves a true optmum. 5. COMPUTATIONAL DETAILS There are a number of practcal detals whch must be consdered, and overcome, to use the algorthm we have descrbed n practce 2. These nclude problem scope, soluton speed and frequency, and ntegraton wth the ad servng archtecture. 5.1 Query selecton Snce there are tens of mllons, perhaps hundreds of mllons, of queres, we are necessarly lmted to worng wth a subset of them. Clearly, we wsh to deal wth a manageable subset whch captures a large part of the benefts we hope to gan from our optmzaton algorthm. As we have already remared, ths s aded by the typcal dstrbuton of query volumes, where the head queres capture a dsproportonate share of the revenue from sponsored search. In confrmaton of ths, we found that n one sample, the top 5000 queres captured a sgnfcant fracton of the revenue. Ths ndcates that even a modest gan for the head queres can lead to a sgnfcant overall gan. 5.2 Adjusted Bdder Budgets Ths cherry pcng can only be acheved at some cost. Whle t s easy to segment the queres nto the head and the rest, t s not so obvous how to segment the bdders. We must somehow solate the bdders assocated wth these chosen queres. Unbudgeted bdders present no problem, but we must tae nto account the fact that some budgeted bdders may have bd on queres n both the chosen head set, and the remander. We must therefore partton the budgets of these bdders nto two parts that spent on our chosen set of queres, and the rest. As a practcal matter, ths s not too dffcult; we may base the dvson on hstorcal data. However, ths obvously ntroduces a measure of uncertanty that we would wsh to mnmze. Fortunately, ths sort of problem has been consdered before. Carrasco et al [4] consder the problem of subdvdng a query-bdder bpartte graph correspondng to a sponsored search maret nto submarets as sparsely connected as possble, whch we may consder a generalzaton of our present problem of solatng a lucratve head maret. Usng a varant of the algorthms n [4] we may hope to choose a set of head queres mnmally connected to the remander. Havng done so, we may then compute adjusted budgets for the budgeted bdders whch straddle the chosen/non-chosen query set. Whle ths a desrable property, t s not essental to our approach. 5.3 Column Generaton Implementaton Our prototype system s mplemented n C++ to run under Lnux (or Cygwn) on an Intel-based wor staton. 2 The results we report here are based on smulaton only, and the algorthm presented s not n operaton on Yahoo!s producton ad system. We use the open source COIN-OR lbrary[11] and ts LP code Clp[14], whch allows effcent mplementaton of the column-generaton framewor and fast updatng of the model wthout the need for any external nterface. The column generaton code tself both the ntal enumeraton and subsequent dynamc programmng subproblem soluton are also mplemented n C++. Ths s economcal and avods compatblty problems, whle at the same tme allowng us, through the COIN-OR nterface, to easly use a commercal LP code f we should ever fnd Clp nadequate for our models. So far ths has been far from the case. The models on whch we have carred out most of our experments, usng real data on approxmately 5,000 queres and about 50,000 bdders of whom about 60% are budgeted, are solved to optmalty n less than half a mnute on a 32-bt Lnux box wth a 2.8 GHz Xeon processor, and n less than 1 mnute, even when doubled n sze. We therefore expect the algorthm to scale well, even f we were to re-solve at short ntervals of, say, 15 mnutes. Note that the algorthm naturally produces results whch are sutable drectly for servng wth organc query results, a tas normally performed by the ad server. The ordered slates of ads, and ther servng frequences n response to queres, can be used to releve the ad server of the need to execute the aucton process for our chosen set of queres. Snce we choose from the head queres, ths can lead to a sgnfcant reducton n worload at the ad server tself. 5.4 Smulaton Methodology In order to test our approach, we measured ts performance aganst a greedy baselne algorthm, whch allocates to a query all bdders who have any remanng budget. For ths evaluaton we used a fxed set of 5000 queres (see 5.1), and captured hourly (over an eleven-day perod) the bdders, bds and budgets for each of the queres n ths set. We used predctors of the clc-through rates that use hstorcal clc data for ther predcton (the exact detals of these predctors s beyond the scope of ths paper). We used an mpresson predctor to predct v t, the number of tmes query wll appear at hour t of the day. To compensate for the dynamcs n mpresson volumes, we adapted our algorthm as follows: we convert the varables x to frequences f = x v. Then, for each query nstance we use a con-toss, weghted by the frequences f, to decde whch slate to show for ths query. For each algorthm we evaluate, we then perform the followng steps, every hour: 1. Smulate the eyword aucton mechansm for all queres each hour n arbtrary order (the order wll not matter) accordng to the algorthm chosen. For the greedy algorthm the aucton s performed wth all canddates that have any budget remanng, whle for the LP based algorthms, a con s tossed to decde whch slate to show based on the frequences f. 2. Compute metrcs such as revenue, effcency, clcs and PPC. 3. Chec f any bdder has exceeded ther adjusted daly budget. If so, we remburse those bdders the amount they are owed and remove them from partcpatng n future auctons.

6 There are several naccuraces n the smulaton. Frst, due to the hourly granularty, bdders who exceed ther budget n the mddle of a gven hour may cause llegtmate prce hes for other bdders. Second, we assume at each showng of the slate that we receve exactly the expected number of clcs on each advertsement. In realty, there wll be some amount of varance and nevtable naccuraces n the clc-through estmates. However, we beleve that the above naccuraces are not sgnfcant to our results; furthermore, the naccuraces wll have smlar affects on both of the algorthms, so n terms of measurng relatve performance these naccuraces have lttle mpact. 5.5 Results Our smulaton results are qute promsng. Whether we optmze for revenue or effcency, n both cases our results show a sgnfcant ncrease n both values. As we would antcpate, we get better performance n effcency when that s the objectve functon, and smlarly for revenue. However, the revenue appears to be more volatle n response to the objectve functon, wth a percent ncrease that roughly doubles, whereas the percent ncrease for effcency sees roughly a 30% ncrease. It s also nterestng to note that the gans n revenue and effcency for each day of the smulaton, as seen n Fgures 1 and 2, follow smlar patterns, peang and dppng on the same days. Ths suggests that the ablty to maxmze n ether case s based on smlar propertes of the problem. We see that revenue and effcency are closely ted together and that gans along one objectve mply smlar gans along the other. Fgure 1: Gans When Effcency s Maxmzed Fgure 3: Impact of the optmzaton on bdders Fgure 3 shows that the mpact on advertsers s favorable, but dffers between budgeted and unbudgeted advertsers. We see here an nterestng dstncton: budgeted bdders get a steep ncrease n clcs wth low prces, whereas the unbudgeted advertsers receve lttle ncrease n overall clcs and a slghtly hgher PPC (and value, although not graphed here). Overall, the mpact n both cases s postve: an ncrease n clcs and a PPC that drops or slghtly ncreases but remans proportonal to value per clc s lely to have a sustanable mpact. We emphasze that our smulatons do not tae nto account that bdders mght react by changng ther bds and reported budgets. 6. FUTURE WORK We focus on queres at the head of the query dstrbuton. However, there s also a heavy tal n ths dstrbuton, and many of these queres are not as easly forecasted as our subset. As descrbed n secton 1.1, when query frequences cannot be forecasted wth a hgh degree of certanty, prevous research has proposed onlne algorthms wth provable worst case performance guarantees. Ths prevous research does not ncorporate prcng and ranng nto ther soluton. However, there may be a way to extend the wor n [13] to use our LP as a subroutne. We are also explorng other approaches when query frequences are unnown, ncludng machne learnng and stochastc programmng. We may also consder usng parallel processng to scale up the approach even further. In partcular, nstead of dvdng queres nto our chosen set and the rest, we may use an algorthm such as that proposed n [4] to partton the query-bdder graph nto multple submarets, and apply the budget adjustment method dscussed n secton 5.2 to partton the affected budgets and allow parallel soluton. It s not clear at ths pont how advertsers mght react to our approach, and specfcally how advertser bds and reported budgets mght change. Although we optmze for effcency, ths s the effcency of the overall system, and any ndvdual user may receve a dsproportonate amount of beneft or detrment as a result of the optmzaton. Even f we use a truthful prcng scheme, such as VCG [19, 5, 10] or the laddered aucton [2], optmzng over the overall system results n a system that may not algn player ncentves. An nterestng area for future wor s to extend our formulaton to account for ndvdual advertser ncentves. Fgure 2: Gans When Revenue s Maxmzed

7 7. ACKNOWLEDGEMENTS The authors wsh to than ther colleagues Andre Z. Broder, Jan Pedersen, Mchael Schwarz, Kevn Lang and S. Sathya Keerth for many helpful dscussons, and ther encouragement n the course of ths research. 8. REFERENCES [1] Z. Abrams. Revenue maxmzaton when bdders have budgets. In Proc. Symposum on Dscrete Algorthms, pages , [2] G. Aggarwal, A. Goel, and R. Motwan. Truthful auctons for prcng search eywords. In Proc. 7th ACM conference on Electronc Commerce, pages 1 7, [3] C. Borgs, J. Chayes, N. Immorlca, M. Mahdan, and A. Saber. Mult-unt auctons wth budget-constraned bdders. In Proc. 6th ACM Conference on Electronc Commerce, pages 44 51, [4] J. J. Carrasco, D. Fan, K. Lang, and L. Zhuov. Clusterng of bpartte advertser-eyword graphs. Worshop on Large Scale Clusterng at IEEE Internatonal Conference on Data Mnng, [5] E. Clare. Multpart prcng of publc goods. Publc Choce, 11:17-33, [6] G. B. Dantzg. Lnear Programmng and Extensons. Prnceton Unversty Press, Prnceton, NJ, [7] I. de Faras and G. Nemhauser. A polyhedral study of the cardnalty constraned napsac problem. Mathematcal Programmng (Ser. A), 96: , [8] B. Detrch and J. J. Forrest. A column generaton approach for combnatoral auctons. Worshop on Mathematcs of the Internet: E-Aucton and Marets Insttute for Mathematcs and ts Applcatons, [9] B. Edelman, M. Ostrovsy, and M. Schwarz. Internet advertsng and the generalzaed second prce aucton: Sellng bllons of dollars worth of eywords. Second Worshop on Sponsored Search Auctons, Ann Arbor, MI. June, [10] T. Groves. Incentves n teams. Econometrca, 41: , [11] R. Lougee-Hemer, F. Barahona, B. L. Detrch, J. Fasano, J. J. Forrest, R. Harder, L. Ladany, T. Pfender, T. Ralphs, M. Saltzman, and K. Schenberg. The COIN-OR ntatve: acceleratng operatons research progress through open-source software. ORMS Today, 28(5), [12] M. E. Lubbece and J. Desrosers. Selected topcs n column generaton. Operatons Research, 53(6): , [13] M. Mahdan, H. Nazerzadeh, and A. Saber. Allocatng onlne advertsement space wth unrelable estmates. ACM Conference on Electronc Commerce, [14] COIN-OR Foundaton: [15] S. Sathya Keerth and J. A. Tomln. Constructng an optmal slate of advertsements. Yahoo! Research Report, [16] L. Schrage. Solvng mult-object auctons wth LP/IP. Unversty of Chcago, Unpublshed Manuscrpt, [17] C. Slversten, H. Maras, M. Henznger, and M. Morcz. Analyss of a very large web search engne query log. SIGIR Forum, 33(1):6 12, [18] A. Mehta, A. Saber, U. Vazran, and V. Vazran. Adwords and the generalzed bpartte matchng problem. In Proceedngs of the Symposum on the Foundatons of Computer Scence, pages , [19] W. Vcrey. Counterspeculaton, auctons, and compettve sealed tenders. Journal of Fnance, 16:8-37, 1961.