Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising

Dynamic Cos-Per-Acion Mechanisms and Applicaions o Online Adverising Hamid Nazerzadeh Sanford Universiy Sanford, CA 94304 hamidnz@sanford.edu Amin Saberi Sanford Universiy Sanford, CA 94304 saberi@sanford.edu Rakesh Vohra Norhwesern Universiy Evanson, IL 60208 r-vohra@kellogg.nwu.edu ABSTRACT We sudy he Cos-Per-Acion or Cos-Per-Acquisiion (CPA) charging scheme in online adverising. In his scheme, insead of paying per click, he adverisers pay only when a user akes a specific acion (e.g. fills ou a form) or complees a ransacion on heir websies. We focus on designing efficien and incenive compaible mechanisms ha use his charging scheme. We describe a mechanism based on a sampling-based learning algorihm ha under suiable assumpions is asympoically individually raional, asympoically Bayesian incenive compaible and asympoically ex-ane efficien. In paricular, we demonsrae our mechanism for he case where he uiliy funcions of he adverisers are independen and idenically-disribued random variables as well as he case where hey evolve like independen refleced Brownian moions. Caegories and Subjec Descripors J.4 [Social and Behavioral Sciences]: Economics; F.2.0 [Analysis of Algorihms and Problem Complexiy]: General; I.2.6 [Arificial Inelligence]: Learning General Terms Economics, Algorihm, Theory Keywords Mechanism Design, Cos-Per-Acion, Inerne Adverising. INTRODUCTION Currenly, he main wo charging models in he online adverising indusry are cos-per-impression (CPM) and cosper-click (CPC). In he CPM model, he adverisers pay he publisher for he impression of heir ads. CPM is commonly used in radiional media (e.g. magazines and elevision) or banner adverising and is more suiable when he goal of he adveriser is o increase brand awareness. A more aracive and more popular charging model in online adverising is he CPC model in which he adverisers pay he publisher only when a user clicks on heir ads. In he las few years, here has been a remendous shif owards he CPC charging model. CPC is adoped by search engines Copyrigh is held by he Inernaional World Wide Web Conference Commiee (IW3C2). Disribuion of hese papers is limied o classroom use, and personal use by ohers. WWW 2008, April 2 25, 2008, Beijing, China. ACM 978--60558-085-2/08/04. such as Google or Yahoo! for he placemen of ads nex o search resuls (also known as sponsored search) and on he websie of hird-pary publishers. In his paper we will focus on anoher naural and widely advocaed charging scheme known as he Cos-Per-Acion or Cos-Per-Acquisiion (CPA) model. In his model, insead of paying per click, he adveriser pays only when a user akes a specific acion (e.g. fills ou a form) or complees a ransacion. Recenly, several companies like Google, ebay, Amazon, Adverising.com, and Snap.com have sared o sell adverising in his way. CPA models can be he ideal charging scheme, especially for small and risk averse adverisers. We will briefly describe a few advanages of his charging scheme over CPC and refer he reader o [8] for a more deailed discussion. One of he drawbacks of he CPC scheme is ha i requires he adverisers o submi heir bids before observing he profis generaed by he users clicking on heir ads. Learning he expeced value of each click, and herefore he righ bid for he ad, is a prohibiively difficul ask especially in he conex of sponsored search in which he adverisers ypically bid for housands of keywords. CPA eliminaes his problem because i allows he adverisers o repor heir payoff afer observing he user s acion. Anoher drawback of he CPC scheme is is vulnerabiliy o click fraud. Click fraud refers o clicks generaed by someone or somehing wih no genuine ineres in he adverisemen. Such clicks can be generaed by he publisher of he conen who has an ineres in receiving a share of he revenue of he ad or by a rival who wishes o increase he cos of adverising for he adveriser. Click fraud is considered by many expers o be he bigges challenge facing he online adverising indusry [3, 0, 23, 20]. CPA schemes are less vulnerable because generaing a fraudulen acion is ypically more cosly han generaing a fraudulen click. For example, an adveriser can define he acion as a sale and pay he publisher only when he ad yields profi. On he oher hand, here is a fundamenal difference beween CPA and CPC charging models. A click on he ad can be observed by boh adveriser and publisher. However, he acion of he user is hidden from he publisher and is observable only by he adveriser. Alhough he publisher can require he adverisers o insall a sofware ha will monior acions ha ake place on heir web sie, even moderaely sophisicaed adverisers can find a way o manipulae he sofware if hey find i sufficienly profiable. CPA makes generaing a fraudulen acion a more cosly enerprize, bu no impossible (e.g., using a solen credi). 79

Are he publishers exposed o he manipulaion or misreporing of he adverisers in he CPA scheme? Does CPA creae an incenive for he adverisers o misrepor he number of acions or heir payoffs for he acions? The main resul of his paper is o give a negaive answer o hese quesions. We design a mechanism ha, asympoically and under reasonable assumpions, removes he incenives of he adverisers o misrepor heir payoffs. A he same ime, our mechanism has he same asympoic efficiency and hence revenue as he currenly used CPC mechanisms. We will use echniques in learning and mechanism design o obain his resul. In he nex secion, we will formally describe our model in mechanism design erminology (see [2].) We will refer o adverisers as agens and o he impression of an ad as an iem. For simpliciy of exposiion only, we assume only one adverisemen slo per page. In secion 6 we ouline how o exend our resuls o he case where more han one adverisemen can be displayed in each page. Alhough our work is essenially moivaed by online adverising, we believe ha he applicaion of our mechanism is no limied his domain.. Model We sudy he following problem: here are a number of self-ineresed agens compeing for idenical iems sold repeaedly a imes =, 2,. A each ime, a mechanism allocaes he iem o one of he agens. Agens discover heir uiliy for he good only if i is allocaed o hem. If agen i receives he good a ime, she discovers uiliy u i (denominaed in money) for i and repors (no necessarily ruhfully) he realized uiliy o he mechanism. Then, he mechanism deermines how much he agen has o pay for receiving he iem. We allow he uiliy of an agen o change over ime. For his environmen we are ineresed in aucion mechanisms which have he following four properies.. The mechanism is individually raional in each period. 2. Agens have an incenive o ruhfully repor heir realized uiliies. 3. The efficiency (and revenue) is, in an appropriae sense, no oo small compared o a second price aucion. 4. The correcness of he mechanism does no depend on an a-priori knowledge of he disribuion of u i s. This feaure is moivaed by he Wilson docrine [24] 2. The precise manner in which hese properies are formalized is described in secion 2. We will build our mechanisms on a sampling-based learning algorihm. The learning algorihm is used o esimae he expeced uiliy of he agens, and consiss of wo alernaing phases: exploraion and exploiaion. During an exploraion phase, he iem is allocaed for free o a randomly chosen agen. During an exploiaion phase, he mechanism allocaes he iem o he agen wih he highes esimaed expeced uiliy. Afer each allocaion, he agen who has received he iem, discovers her uiliy and repors i o he mechanism. Subsequenly, he mechanism updaes he esimae of uiliies and deermines he paymen. 2 Wilson criicizes relying oo much on common-knowledge assumpions. We characerize a class of learning algorihms ha ensure ha he corresponding mechanism has he four desired properies. The main difficuly in obaining his resul is he following: since here is uncerainy abou he uiliies, i is possible ha in some periods he iem is allocaed o an agen who does no have he highes uiliy in ha period. Hence, he naural second-highes price paymen rule would violae individual raionaliy. On he oher hand, if he mechanism does no charge an agen because her repored uiliy afer he allocaion is low, i gives her an incenive o shade her repored uiliy down. Our mechanism solves hese problems by using an adapive, cumulaive pricing scheme. We illusrae our resuls by idenifying simple mechanisms ha have he desired properies. We demonsrae hese mechanisms for he case in which he u i s are independen and idenically-disribued random variables as well as he case where heir expeced values evolve like independen refleced Brownian moions. In hese cases he mechanism is acually ex-pos individually raional. In our proposed mechanism, he agens do no have o bid for he iems. This is advanageous when he bidders hemselves are unaware of heir uiliy values. However, in some cases, an agen migh have a beer esimae of her uiliy for he iem han our mechanism. For his reason, we describe how we can slighly modify our mechanism o allow hose agens o bid direcly..2 Relaed Work There is a large number of ineresing resuls on using machine learning echniques in mechanism design. We only briefly survey he main echniques and ideas and compare hem wih he approach of his paper. Mos of hese works, like [5, 8,?, 7], consider one-sho games or repeaed aucions in which he agens leave he environmen afer hey received an iem. In our seing we may allocaes iems o an agen several imes and hence, we need o consider he sraegic behavior of he agens over ime. There is also a big lieraure on regre minimizaion or exper algorihms. In our conex, hese algorihms are applicable even if he uiliies of he agens are changing arbirarily. However, he efficiency (and herefore he revenue) of hese algorihms is comparable o he mechanisms ha allocaes he iem o he single bes agen (exper) (e.g. see [6]). Our goal is more ambiious: our efficiency is close he mos efficien allocaion which migh allocae he iem o differen agens a differen imes. On he oher hand, we focus on uiliy values ha change smoohly (e.g. like a Brownian moion). In a finiely repeaed version of he environmen considered here, Ahey and Segal [2] consruc an efficien, budge balanced, mechanism where ruhful revelaion in each period is Bayesian incenive compaible. Bapna and Weber [4] consider he infinie horizon version of [2] and propose a class of incenive compaible mechanisms based on he Giins index (see []). Taking a differen approach, Bergemann and Välimäki [6] and Cavallo e al. [9] propose an incenive compaible generalizaion of he Vickrey-Clark-Groves mechanism based on he marginal conribuion of each agen for his environmen. All hese mechanisms need he exac soluion of he underlying opimizaion problems, and herefore require complee informaion abou he prior of he uiliies 80

of he agens; also, hey do no apply when he evoluion of he uiliies of he agens is no saionary over ime. This violaes he las of our desideraa. For a comprehensive survey in dynamic mechanism design lieraure see [22]. In he conex of sponsored search, aenion has focused on ways of esimaing click hrough raes. Gonen and Pavlov [2] give a mechanism which learns he click-hrough raes via sampling and show ha ruhful bidding is, wih high probabiliy, a (weakly) dominan sraegy in his mechanism. Along his line, Worman e al. [25] inroduced an exploraion scheme for learning adverisers click-hrough raes in sponsored search which mainains he equilibrium of he sysem. In hese works, unlike ours, he disribuion of he uiliies of agens are assumed o be fixed over ime. Immorlica e al. [4], and laer Mahdian and Tomak [8], examine he vulnerabiliy of various procedures for esimaing click hrough, and idenify a class of click hrough learning algorihms in which fraudulen clicks canno increase he expeced paymen per impression by more han o(). This is under he assumpion ha he slo of an agen is fixed and he bids of oher agens remain consan overime. In conras, we sudy condiions which guaranee incenive compaibiliy and efficiency, while he uiliy of (all) agens may evolve over ime. 2. DEFINITIONS AND NOTATION Suppose n agens compeing in each period for a single iem. The iem is sold repeaedly a ime =, 2,. Denoe by u i he nonnegaive uiliy of agen i for he iem a ime. Uiliies are denominaed in a common moneary scale. The uiliies of agens may evolve over ime according o a sochasic process. We assume ha for i j, he evoluion of u i and u j are independen sochasic processes. We also define µ i = E[u i u i,, u i,]. Throughou his paper, expecaions are aken condiioned on he complee hisory. For simpliciy of noaion, we now omi hose erms ha denoe such a condiioning. Wih noaional convenion, i follows, for example, ha E[u i] = E[µ i]. Here he second expecaion is aken over all possible hisories. Le M be a mechanism used o sell he iems. A each ime, M allocaes he iem o one of he agens. Le i be he agen who has received he iem a ime. Define x i o be he variable indicaing he allocaion of he iem o i a ime. Afer he allocaion, agen i observes her uiliy, u i, and hen repors r i, as her uiliy for he iem, o he mechanism. Noe ha we do no require an agen o know her uiliy for possessing he iem in advance of acquiring i. The mechanism hen deermines he paymen, denoed by p i. Definiion. An agen i is ruhful if r i = u i, for all ime x i =, > 0. Our goal is o design a mechanism which has he following properies. We assume n, he number of agens, is consan. Individual Raionaliy: A mechanism is ex-pos individually raional if for any ime T > 0 and any agen i n, he oal paymen of agen i does no exceed he sum of her repors: x ir i p i > 0. M is asympoically ex-ane individually raional if: lim inf E[ T x iµ i p i] 0. T Incenive Compaibiliy: This propery implies ha ruhfulness defines an asympoic Bayesian Nash equilibrium. Consider agen i and suppose all agens excep i are ruhful. Le U i(t) be he expeced oal profi of agen i, if agen i is ruhful beween ime and T. Also, le e U i(t) be he maximum of expeced profi of agen i under any oher sraegy. Asympoic incenive compaibiliy requires ha eu i(t) U i(t) = o(u i(t)). Efficiency: An ex-ane efficien mechanism allocaes he iem o an agen in argmax i {µ i} a each ime (and for each hisory). The oal social welfare obained by an ex-ane efficien mechanism up o ime T is maxi{µi}]. Le W(T) be he expeced welfare of mechanism M beween ime and T, when all agens are ruhful, i.e., E[ P T n W(T) = E[ i= x iµ i] Then, M is asympoically ex-ane efficien if: E[ max{µ i}] W(T) = o(w(t)). i 3. PROPOSED MECHANISM We build our mechanism on op of a learning algorihm ha esimaes he expeced uiliy of he agens. We refrain from an explici descripion of he learning algorihm. Raher, we describe sufficien condiions for a learning algorihm ha can be exended o a mechanism wih all he properies we seek (see secion 3.). In secion 4 and 5 we give wo examples of environmens where learning algorihms saisfying hese sufficien condiions exis. The mechanism consiss of wo phases: explore and exploi. During he explore phase, wih probabiliy η(), η : N [0,], he iem is allocaed for free o a randomly chosen agen. During he exploi phase, he mechanism allocaes he iem o he agen wih he highes esimaed expeced uiliy. Aferwards, he agen repors her uiliy o he mechanism and he mechanism deermines he paymen. We firs formalize our assumpions abou he learning algorihm and hen we discuss he paymen scheme. The mechanism is given in Figure. The learning algorihm, samples u i s a rae η(), and based on he hisory of he repors of agen i, reurns an esimae of µ i. Le bµ i(t) be he esimae of he algorihm for µ i condiional on he hisory of he repors up o ime T. The hisory of he repors of agen i up o ime T is he sequence of he repored values and imes of observaion of u i up o bu no including ime T. Noe ha we allow T >. Thus, informaion a ime T > can be used o revise an esimae of µ i made a some earlier ime. We assume ha increasing he number of samples only increases he 8

For =,2,... Wih probabiliy η(), explore: Uniformly a random, allocae he iem o an agen i, i n. p i 0 Wih probabiliy η(), exploi: Randomly allocae he iem o an agen i argmax i {bµ i()}. p i P y ik min{bγ k (), bµ ik (k)} P p ik r i he repor of agen i. p j 0, j i Figure : Mechanism M accuracy of he esimaions, i.e. for any ruhful agen i, and imes T T 2: E[ bµ i(t ) µ i ] E[ bµ i(t 2) µ i ]. () In he inequaliy above, and in he res of he paper, he expecaions of bµ i are aken over he evoluion of u i s and he random choices of he mechanism. For simpliciy of noaion, we omi hose erms ha denoe such a condiioning. To describe he paymens recall ha γ is he second highes µ i and le bγ (T) = max j i {bµ j(t)}, where i is he agen who received he iem a ime. We define y i o be he indicaor variable of he allocaion of he iem o agen i during an exploi phase. The paymen of agen i a ime, denoed p i, is deermined so ha: p ik = y ik min{bγ k (), bµ ik (k)}. An agen only pays for iems ha are allocaed o her during he exploi phase, up o bu no including ime. A ime, he paymen of agen i for he iem she received a ime k < is min{bγ k (), bµ ik (k)}. The firs erm is he reminiscence of he second highes pricing scheme. The second erm, under some reasonable condiions, leads o individually raionaliy. Since he esimaions of learning algorihm for he uiliies of agens become more precise over ime, our adapive cumulaive paymen scheme allows i o correc for errors in he pas. 3. Sufficien Condiions We sar wih a condiion ha guaranees asympoic exane individual raionaliy and asympoic incenive compaibiliy. Le l i be he las ime up o ime ha he iem is allocaed o agen i wihin an exploi phase. If i has no been allocaed any iem ye, l i is defined o be zero. Also, define = max i{ bµ i() µ i }, assuming all agens were ruhful up o ime. Theorem. If for he learning algorihm, for all i n, and T > 0: (C) E[max T {µ i} + P T ] = o(e[p T η()µi]) hen mechanism M is asympoically ex-ane individually raional and incenive compaible. We ouline he proof firs. As we prove in Lemma 2, by condiion (C), he expeced profi of a ruhful agen up o ime T is a leas ( o())e[p T n η()µi]. Also, he expeced oal error in he esimaes of he paymens up o ime T is bounded by O(E[ P T ]). We prove ha he oal uiliy an agen could obain by deviaing from he ruhful sraegy, beween ime and T, is bounded by O(max T {µ i} + E[ P T ]). Hence, he claim follows by condiion (C). Similar o oher applicaions of learning algorihms, we can observe a naural rade-off beween exploiaion and exploraion raes in our conex: higher exploraion raes lead o more accurae esimaes of he uiliies of he agens, a he cos of efficiency. Condiion (C) provides us wih a lower bound on he exploraion rae. Lemma 2. If condiion (C) holds, hen he expeced profi of a ruhful agen i up o ime T, U i(t), is a leas: ( T n o())e[ η()µ i]. Proof. The iems ha agen i receives during he explore phase are free. The expeced oal uiliy of agen i from hese iems up o ime T is E[P T n η()µi]. Le C T = { < l it y i =, if i is ruhful} be he subse of periods ha agen i is charged for he iem she received wihin he period. U i(t) = E[ x iu i p i] = E[ x iu i] + E[ u i p i] / C T C T T n E[ η()µ i] +E[ C T (µ i min{bγ (T), bµ i()})] (2) 82

For C T: E[(µ i min{bγ (T), bµ i()})i( C T)] E[(µ i bµ i())i( C T)] E[ µ i bµ i() ] E[ ] Subsiuing ino inequaliy (2), by condiion (C): U i(t) T n E[ η()µ i] E[ ] = T n E[ η()µ i] o(e[ η()µ i]) (3) Proof of Theorem : Lemma 2 yields asympoic exane individual raionaliy. We show ha ruhfulness is asympoically a bes response when all oher agens are ruhful. Fix an agen i inending o deviae and le S be he sraegy she deviaes o. Fixing he evoluion of all u j s, j n, and all random choices of he mechanism, i.e. he seps in he explore phase and he randomly chosen agens, le D T be he imes ha i receives he iem under sraegy S during he exploi phase before ime l it, i.e. D T = { < l it y i =, if he sraegy of i is S}. Similarly, le C T = { < l it y i =, if i is ruhful}. Also, le bµ i, and bγ correspond o he esimaes of he mechanism when he sraegy of i is S. We firs bound he expeced profi of i, under sraegy S, during he exploi phase: E[ y iu i p i] E[ D T µ i min{bγ (T), bµ i()}] + (4) E[max {µi}] T = E[ µ i min{bγ (T), bµ i()}] + D T \C T E[ µ i min{bγ (T), bµ i()}] + D T C T E[max{µi}] (5) T The erm E[max T {µ i}] bounds he ousanding paymen of agen i; recall ha he agen has no paid for he las allocaed iem. For ime, we examine wo cases:. If D T C T, hen agen i, in expecaion, canno decrease he curren price, min{bγ (T), bµ i()}, by more han O( ): min{bγ (T), bµ i()} where (z) + = max{z,0}. min{bγ (T), bγ ()} γ max{γ bγ (T),γ bγ ()} γ (γ bγ (T)) + (γ bγ ()) + Recall ha bγ (T) = max j i {bµ i(t)} and all oher agen are ruhful. Hence, aking expecaion from boh sides, by (): E[min{bγ (T), bµ i()}i( D T C T)] E[(γ (γ bγ (T)) + (γ bγ ()) + )I( D T C T)] E[γ I( D T C T)] E[2 ] (6) 2. If D T \ C T, agen i canno increase her expeced profi, µ i min{bγ (T), bµ i()}, by more han O( ): µ i min{bγ (T), bµ i()} µ i min{bγ (T), bγ ()} (µ i bµ i()) + (bµ i() γ ) + max{γ bγ (T),γ bγ ()} 2 + (γ bγ (T)) + +(γ bγ ()) + Taking expecaion from boh sides, by (): E[(µ i min{bγ (T), bµ i()})i( D T C T)] E[2 I( D T C T)] + E[((γ bγ (T)) + + (γ bγ ()) + )I( D T C T)] E[4 ] (7) Subsiuing inequaliies (6) and (7) ino (5): E[ y iu i p i] E[ 6 ] + E[max {µi}] T +E[ µ i γ ] D T C T E[ 6 ] + E[max{µi}] + (8) T E[ µ i γ ] E[ µ i γ ] C T C T \D T For C T, since bµ i() bγ (), we have: Subsiuing ino (8): E[γ µ i] E[2 ] E[ y iu i p i] 8E[ ] + E[max {µi}] T +E[ C T µ i γ ] Wih algebraic manipulaion, using (), we ge: E[ y iu i p i] O(E[ ] + E[max {µi}]) T +E[ µ i min{bγ (T), bµ i()}] C T By condiion (C), we ge he inequaliy below which complees he proof: E[ y iu i p i] o(e[ η()µ i]) +E[ C T µ i min{bγ (T), bµ i()}] and he las inequaliy follows by (C). The expeced uiliy of he ruhful sraegy and S during he explore phase 83

is equal. Therefore, by Lemma 2, he mechanism is asympoically incenive compaible. In he nex heorem we show if he loss in efficiency during exploraion asympoically goes o zero, hen by Condiion (C) he mechanism is asympoically ex-ane efficien. Theorem 3. If for he learning algorihm, in addiion o (C), he following condiion holds (C2) E[ P T η()maxi{µi}] = o(e[p T maxi{µi}]) hen, M is asympoically ex-ane efficien. Proof. M may fail o be ex-ane efficien for wo reasons. Firs one is he loss in welfare during he exploraion when he iem is allocaed randomly o one of adverisers. The expeced loss in his case is equal o E[ P T η() maxi{µi}]. Anoher reason is he misakes during exploiaion. The error in esimaion can lead o allocaion o an agen who does no value he iem he mos. A ime, in he wors case, he iem migh be allocaed o an agen whose expeced uiliy is a mos 2 less han he highes expeced uiliy. Therefore, he expeced efficiency loss during exploraion is bounded by O(E[ P T ]). Since, for he expeced welfare of M beween ime and T, denoed by W(T), we have: E[ max{µ i i}] W(T) = O(E[ ( + η()max{µ i})]) (9) i Bu, condiion (C) implies: n E[ ] = o(e[ η()µ i}]) Plugging ino (9): i= = θ(e[ η() max{µ i i}]) E[ max{µ i}] W(T) = O(E[ η() max{µ i}]) i i = o(w(t)) The las equaliy is followed by (C2) and implies asympoic ex-ane efficiency. While Condiion (C) gives a lower bound on he exploraion rae, Condiion (C2) gives an upper bound. In he nex secion, we will show wih wo examples how condiions (C) and (C2) can be used o adjus he exploraion rae of a learning algorihm in order o obain efficiency and incenive compaibiliy. Remark. In Theorem 3 we showed ha under some assumpions, he welfare obained by he mechanism is asympoically equivalen o efficien mechanism ha every ime allocaes he iem o he agen wih he highes expeced uiliy. We can give similar condiions o (C2) o guaranee ha he revenue of he mechanism is also asympoically equal o he revenue of he efficien mechanism ha every ime charges he winning agen he second highes expeced uiliy. To avoid repeiion, we refrain from explaining his condiion in deails. 3.2 Allowing agens o bid In mechanism M no agen explicily bids for an iem. Wheher an agen receives an iem or no depends on he hisory of heir repored uiliies and he esimaes ha M forms from hem. This may be advanageous when he bidders hemselves are unaware of wha heir uiliies will be. However, when agens may posses a beer esimae of heir uiliies we would like o make use of ha. For his reason we describe how o modify M so as o allow agens o bid for an iem. If ime occurs during an exploi phase le B be he se of he agens who bid a his ime. The mechanism bids on he behalf of all agen i / B. Denoe by b i he bid of agen i B for he iem a ime. The modificaion of M ses b i = bµ i(), for i / B. Then, he iem is allocaed a random o one of he agens in arg max i b i. If i is he agen who received he iem a ime, le A = {b j j B } {µ j, j / B }. Define γ as he second highes value in A. Le bγ (T) o be equal o max j i b jk. The paymen of agen i will be p i y ik min{bγ k (), b ik } p ik. To incorporae he fac ha bidders can bid for an iem, we mus modify he definiion of ruhfulness. Definiion 2. Agen i is ruhful if:. r i = u i, for all ime x i =,. 2. If i bids a ime, hen E[ b i µ i ] E[ bµ i µ i ]. Noe ha iem 2 does no require ha agen i bid heir acual uiliy only ha heir bid be closer o he mark han he esimae. Wih his modificaion in definiion, Theorems and 3 coninue o hold. 4. INDEPENDENT AND IDENTICALLY DISTRIBUTED UTILITIES In his secion, we assume ha for each i, u i s are independen and idenically-disribued random variables. For simpliciy, we define µ i = E[u i], > 0. Wihou loss of generaliy, we also assume 0 < µ i. In his environmen, he learning algorihm we use is an ε- greedy algorihm for he muli-armed bandi problem 3. Le n i = P xi. For ɛ (0,), we define: n i = x i η ɛ() = min{, n ɛ ln +ɛ } ( ( P T bµ i(t) = x ikr ik )/n it, n it > 0 0, n it = 0 Call he mechanism based on his learning algorihm M ɛ(iid). Lemma 4. If all agens are ruhful, hen, under M ɛ(iid) E[ ] = O( ). ɛ 3 See [3] for a similar algorihm. 84

The proof of his lemma is given in appendix A. We show ha M ɛ(iid), for ε, saisfies all he desired 3 properies we discussed in he previous secion. Moreover, i saisfies a sronger noion of individual raionaliy. M ɛ(iid) saisfies ex-pos individual raionaliy if for any agen i, and for all T : p i x ir i Theorem 5. M ɛ(iid) is ex-pos individually raional. Also, for 0 ɛ, Mɛ(iid) is asympoically incenive compaible and ex-ane 3 efficien. Proof. We firs prove ex-pos individual raionaliy. I is sufficien o prove i only for he periods ha agen i has received he iem wihin an exploi phase. For T, such ha y it =, we have: p i = y i min{bγ (T), bµ i()} y ibγ (T) y ibµ it (T) n ibµ it (T) = x ir i The hird inequaliy follows because he iem is allocaed o i a ime T which implies bµ it(t) bγ (T). We complee he proof by showing ha condiions (C) and (C2) hold. Noe ha µ i. By lemma 4, for ɛ 3 : E[+ ] = O(T +ɛ 2 ) = o(t ɛ ln +ɛ T) = O( η ɛ()µ i). Therefore, (C) holds. The welfare of any mechanism beween ime and T is bounded by T. For any ɛ > 0, E[ + P T + η] = o(t) which implies (C2). 5. BROWNIAN MOTION In his secion, we assume for each i, i n, he evoluion of µ i is a refleced Brownian moion wih mean zero and variance σ 2 i ; he reflecion barrier is 0. In addiion, we assume µ i0 = 0, and σ 2 i σ 2, for some consan σ. The mechanism observes he values of µ i a discree imes =, 2,. In his environmen our learning algorihm esimaes he refleced Brownian moion using a mean zero maringale. We define l i is defined as he las ime up o ime ha he iem is allocaed o agen i. This includes boh explore and exploi phases. If i has no been allocaed any iem ye, l i is zero. η ɛ() = min{, n ɛ ln 2+2ɛ } (0) 8 >< r ili < T bµ i(t) = r ili, = T () >: r ili,t > T Call his mechanism M ɛ(b). For simpliciy, we assume ha he adveriser repors he exac value of µ i. I is no difficul o verify ha he resuls in his secion hold as long as he expeced value of he error of hese esimaes a ime is o( 6 ). We begin analyzing he mechanism by saing some wellknown properies of refleced Brownian moions (see [7]). Proposiion 6. Le [W, 0] be a refleced Brownian moion wih mean zero and variance σ 2 ; he reflecion barrier is 0. Assume he value of W a ime is equal o y: E[y] = θ( σ 2 ) (2) For T > 0, le z = W +T. For he probabiliy densiy funcion of z y we have: r 2 Pr[(z y) dx] πtσ e x 2 2 2T σ 2 (3) r 8Tσ 2 Pr[ z y x] π x e x 2 2T σ 2 (4) r 8Tσ 2 E[ z y I( z y x)] π e x2 2T σ 2 (5) Corollary 7. The expeced value of he maximum of µ it, i n, is θ( T). Noe ha in he corollary above n and σ are consan. Now, similar o Lemma 4, we bound E[ T]. The proof is given in appendix B. Lemma 8. Suppose under M ɛ(b) all agens are ruhful unil ime T, hen, E[ T] = O(T 2 ɛ ). Now we are ready o prove he main heorem of his secion: Theorem 9. M ɛ(b) is ex-pos individually raional. Also, for 0 ɛ, Mɛ(B) is asympoically incenive compaible 3 and ex-ane efficien. Proof. We firs prove ex-pos individual raionaliy. I is sufficien o prove i only for he periods ha agen i has received he iem wihin an exploi phase. For T, such ha y it =, we have: p i = y i min{bγ (T), bµ i()} T y ibµ i() = x ir i. y ir ili, We complee he proof by showing he condiions (C) and (C2) hold. By (2), he expeced uiliy of each agen a ime from random exploraion is θ( σ 2 ɛ ln +ɛ ) = θ( 2 ɛ ln +ɛ ). Therefore, he expeced uiliy up o ime T from exploraion is θ(t 2 3 ɛ ln +ɛ T). By Lemma (8) and Corollary 7: T E[max T {µit } + ] = O(T + 2 ɛ ). For ɛ, 3 ɛ + ɛ his yields Condiion(C). 3 2 2 85

By Corollary 7, he expeced value of max i{µ it } and γ T are θ( T). Therefore, he expeced welfare of an efficien mechanism beween ime and T is θ(t 3 2 ). For any 0 < ɛ <, we have: θ(t 3 2 ) = ω(t 3 2 ɛ ln +ɛ + T + ɛ 2 ) By condiion (C2), M ɛ(b) is asympoically ex-ane efficien. To apply his model o sponsored search we rea each iem as a bundle of search queries. Each ime sep is defined by he arrival of m queries. The mechanism allocaes all m queries o an adveriser and afer ha, he adveriser repors he average uiliy for hese queries. The paymen p i is now he price per iem, i.e. he adveriser pays mp i for he bundle of queries. The value of m is chosen such ha µ i can be esimaed wih high accuracy. 6. DISCUSSION AND OPEN PROBLEMS In his secion we discuss some exensions of he mechanisms. Muliple Slos. To modify M so ha i can accommodae muliple slos we borrow from Gonen and Pavlov [2], who assume here exis a se of condiional disribuions which deermine he condiional probabiliy ha he ad in slo j is clicked condiional on he ad in slo j 2 being clicked. During he exploi phase, M allocaes he slos o he adverisers wih he highes expeced uiliy, and he prices are deermined according o Holmsrom s lemma ([9], see also []) The esimaes of he uiliies are updaed based on he repors, using he condiional disribuion. Delayed Repors. In some applicaions, he value of receiving he iem is realized a some laer dae. For example, a user clicks on an ad and visis he websie of he adveriser. A couple of days laer, she reurns o he websie and complees a ransacion. I is no difficul o adjus he mechanism o accommodae his seing by allowing he adveriser o repor wih a delay or change her repor laer. Creaing Muliple Ideniies. When a new adveriser joins he sysem, in order o learn her uiliy value our mechanism gives i a few iems for free in he explore phase. Therefore our mechanism is vulnerable o adverisers who can creae several ideniies and join he sysem. I is no clear wheher creaing a new ideniy is cheap in our conex because he raffic generaed by adverising should evenually be roued o a legiimae business. Sill, one way o avoid his problem is o charge users wihou a reliable hisory using CPC. Acknowledgmen. We would like o hank Arash Asadpour, Peer Glynn, Ashish Goel, Ramesh Johari, and Thomas Weber for fruiful discussions. The second auhor acknowledges he suppor from NSF and a gif from Google. 7. REFERENCES [] G. Aggarwal, A. Goel, and R. Mowani. Truhful aucions for pricing search keywords. Proceedings of ACM conference on Elecronic Commerce, 2006. [2] S. Ahey, and I. Segal. An Efficien Dynamic Mechanism. manuscrip, 2007. [3] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finie-ime Analysis of he Muliarmed Bandi Problem. Machine Learning archive, Volume 47, Issue 2-3, 235-256, 2002. [4] A. Bapna, and T. Weber. Efficien Dynamic Allocaion wih Uncerain Valuaions. Working Paper, 2006. [5] M. Balcan, A. Blum, J. Harline, and Y. Mansour. Mechanism Design via Machine Learning. Proceedings of 46h Annual IEEE Symposium on Foundaions of Compuer Science, 2005. [6] D. Bergemann, and J. Välimäki. Efficien Dynamic Aucions. Proceedings of Third Workshop on Sponsored Search Aucions, 2007. [7] A. Borodin, and P. Salminen. Handbook of Brownian Moion: Facs and Formulae. Springer, 2002. [8] A. Blum, V. Kumar, A. Rudra, and F. Wu. Online Learning in Online Aucions. Proceedings of he foureenh annual ACM-SIAM symposium on Discree Algorihms, 2003. [9] R. Cavallo, D. Parkes, and S. Singh, Efficien Online Mechanism for Persisen, Periodically Inaccessible Self-Ineresed Agens. Working Paper, 2007. [0] K. Crawford. Google CFO: Fraud A Big Threa. CNN/Money, December 2, 2004. [] J. Giins. Muli-Armed Bandi Allocaion Indices. Wiley, New York, NY, 989. [2] R. Gonen, and E. Pavlov. An Incenive-Compaible Muli-Armed Bandi Mechanism. Proceedings of he Tweny-Sixh Annual ACM Symposium on Principles of Disribued Compuing, 2007. [3] B. Grow, B. Elgin, and M. Herbs. Click Fraud: The dark side of online adverising. BusinessWeek. Cover Sory, Ocober 2, 2006. [4] N. Immorlica, K. Jain, M. Mahdian, and K. Talwar. Click Fraud Resisan Mehods for Learning Click-Through Raes. Proceedings of he s Workshop on Inerne and Nework Economics, 2005. [5] B. Kis, P. Laxminarayan, B. LeBlanc, and R. Meech. A Formal Analysis of Search Aucions Including Predicions on Click Fraud and Bidding Tacics. Workshop on Sponsored Search Aucions, 2005. [6] R. Kleinberg. Online Decision Problems Wih Large Sraegy Ses. Ph.D. Thesis, MIT, 2005. [7] S. Lahaie, and D. Parkes. Applying Learning Algorihms o Preference Eliciaion. Proceedings of he 5h ACM conference on Elecronic Commerce, 2004. [8] M. Mahdian, and K. Tomak. Pay-per-acion model for online adverising. Proceedings of he 3rd Inernaional Workshop on Inerne and Nework Economics, 549-557, 2007. [9] P. Milgrom, Puing Aucion Theory o Work. Cambridge Universiy Press, 2004. [20] D. Michell. Click Fraud and Halli-bloggers. New York Times, July 6, 2005. [2] N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, ediors. Algorihmic Game Theory, Cambridge Universiy Press, 2007. [22] D. Parkes. Online Mechanisms Algorihmic Game Theory (Nisan e al. eds.), 2007. 86

[23] B. Sone. When Mice Aack: Inerne Scammers Seal Money wih Click Fraud. Newsweek, January 24, 2005. [24] R. Wilson. Game-Theoreic Approaches o Trading Processes. Economic Theory: Fifh World Congress, ed. by T. Bewley, chap. 2, pp. 33-77, Cambridge Universiy Press, Cambridge, 987. [25] J. Worman, Y. Vorobeychik, L. Li, and J. Langford. Mainaining Equilibria During Exploraion in Sponsored Search Aucions. Proceedings of he 3rd Inernaional Workshop on Inerne and Nework Economics, 2007. APPENDI A. PROOF OF LEMMA 4 Proof. We prove he lemma by showing ha for any agen i, Pr[ µ i bµ i() µi] = o( ), c > 0. ɛ c Firs, we esimae E[n i]. There exiss a consan d such ha: E[n i] η ɛ(k) n = min{ n, k ɛ ln +ɛ k} > d ɛ ln +ɛ By he Chernoff-Hoeffding bound: Pr[n i E[ni] ] e ɛ ln +ɛ 8d. 2 Inequaliy () and he Chernoff-Hoeffding bound imply: Pr[ µ i bµ i() ɛ µi] = = Pr[ µ i bµ i() E[ni] µi ni ] ɛ 2 + Pr[ µ i bµ i() E[ni] µi ni < ] ɛ 2 2e ɛ ɛ ln +ɛ µ i 2d + e ɛ ln +ɛ 8d = o( ), c > 0 c Therefore, wih probabiliy o( ), for all agens,. Since he maximum value of ui is, E[ ] = ɛ O( ɛ ). B. PROOF OF LEMMA 8 Proof. Define i = µ i,t µ i,t. We firs prove Pr[ i > T 2 ɛ ] = o( ), c > 0. There exiss a consan T c T d such ha for any ime T T d, he probabiliy ha i has no been randomly allocaed he iem in he las < T d sep is a mos: Pr[T l i,t > ] < ( T ɛ ln 2+2ɛ T) e ln2+2ɛ T T ɛ. (6) Le = ln +ɛ T T ɛ. By equaion (4) and (6), Pr[ i > T ɛ 2 ] = Pr[ i > T ɛ 2 T l i,t ] + Pr[ i > T 2 ɛ T l i,t > ] = o( ), c > 0. T c Hence, wih high probabiliy, for all he n agens, i T ɛ 2. If for some of he agens i T ɛ 2, hen, by Corollary 7, he expeced value of he maximum of µ i over hese agens is θ( T). Therefore, E[max i{ i}] = O(T ɛ 2 ). The lemma follows because E[ T] E[max i{ i}]. 87