One Practical Algorithm for Both Stochastic and Adversarial Bandits


 Camilla Boyd
 1 years ago
 Views:
Transcription
1 One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis Yevgeny Seldin Queenslnd Universiy of Technology, Brisbne, Ausrli Aleksndrs Slivkins Microsof Reserch, New York NY, USA Absrc We presen n lgorihm for mulirmed bndis h chieves lmos opiml performnce in boh sochsic nd dversril regimes wihou prior knowledge bou he nure of he environmen. Our lgorihm is bsed on ugmenion of he EXP lgorihm wih new conrol lever in he form of explorion prmeers h re ilored individully for ech rm. The lgorihm simulneously pplies he old conrol lever, he lerning re, o conrol he regre in he dversril regime nd he new conrol lever o deec nd exploi gps beween he rm losses. This secures problemdependen logrihmic regre when gps re presen wihou compromising on he worscse performnce gurnee in he dversril regime. We show h he lgorihm cn exploi boh he usul expeced gps beween he rm losses in he sochsic regime nd deerminisic gps beween he rm losses in he dversril regime. The lgorihm reins logrihmic regre gurnee in he sochsic regime even when some observions re conmined by n dversry, s long s on verge he conminion does no reduce he gp by more hn hlf. Our resuls for he sochsic regime re suppored by experimenl vlidion.. Inroducion Sochsic mulirmed bndis Thompson, 9; Robbins, 95; Li & Robbins, 985; Auer e l., nd dversril mulirmed bndis Auer e l., 995; b hve coexised in prllel for lmos wo decdes by now, in he sense h no lgorihm for sochsic mulirmed bndis is pplicble o dversril mulirmed bndis nd l Proceedings of he s Inernionl Conference on Mchine Lerning, Beijing, Chin, 4. JMLR: W&CP volume. Copyrigh 4 by he uhors. gorihms for dversril bndis re unble o exploi he simpler regime of sochsic bndis. The recen emp of Bubeck & Slivkins o bring hem ogeher did no mke i in he full sense of unificion, since he lgorihm of Bubeck nd Slivkins relies on he knowledge of ime horizon nd mkes oneime irreversible swich beween sochsic nd dversril operion modes if he beginning of he gme is esimed o exhibi dversril behvior. We presen n lgorihm h res boh sochsic nd dversril mulirmed bndi problems wihou disinguishing beween hem. Our lgorihm jus runs, s mos oher bndi lgorihms, wihou knowledge of ime horizon nd wihou mking ny hrd semens bou he nure of he environmen. We show h if he environmen hppens o be dversril he performnce of he lgorihm is jus fcor of worse hn he performnce of he EXP lgorihm wih he bes consns, s described in Bubeck & CesBinchi nd if he environmen hppens o be sochsic he performnce of our lgorihm is comprble o he performnce of UCB of Auer e l.. Thus, we cover he full rnge nd chieve lmos opiml performnce he exreme poins. Furhermore, we show h he new lgorihm cn exploi boh he usul expeced gps beween he rm losses in he sochsic regime nd deerminisic gps beween he rm losses in he dversril regime. We lso show h he lgorihm reins logrihmic regre gurnee in he sochsic regime even when some observions re dversrilly conmined, s long s on verge he conminion does no reduce he gp by more hn hlf. To he bes of our knowledge, no oher lgorihm hs been ye shown o be ble o exploi gps in he dversril or dversrilly conmined sochsic regimes. The conmined sochsic regime is very prcicl model, since in mny rellife siuions we re deling wih sochsic environmens wih occsionl disurbnces. Since he inroducion of Thompson s smpling Thompson, 9 which ws nlyzed only fer 8 yers Kufmnn e l., ; Agrwl & Goyl, vriey of l
2 One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis gorihms were invened for he sochsic mulirmed bndi problem. The mos powerful for ody re KLUCB Cppé e l.,, EwS Millrd,, nd he foremenioned Thompson s smpling. I is esy o show h ny deerminisic lgorihm cn poenilly suffer liner regre in he dversril regime see he supplemenry meril for proof. Alhough nohing is known bou he performnce of rndomized lgorihms for sochsic bndis in he dversril regimes, empiriclly hey re exremely sensiive o deviions from he sochsic ssumpion. In he dversril world he mos powerful lgorihm for ody is INF Audiber & Bubeck, 9; Bubeck & Ces Binchi,. Neverheless, he EXP lgorihm of Auer e l. b sill reins n imporn plce, minly due o is simpliciy nd wide pplicbiliy, which covers combinoril bndis, pril monioring gmes, nd mny oher dversril problems. Since ny sochsic problem cn be seen s n insnce of n dversril problem, boh INF nd EXP hve he worscse roo regre gurnee in he sochsic regime, bu i is no known wheher hey cn do beer. Empiriclly in he sochsic regime EXP is inferior o ll oher known lgorihms for his seing, including he simples UCB lgorihm. I is ineresing o ke brief look ino he developmen of EXP. The lgorihm ws firs suggesed in Auer e l. 995 nd is prmerizion nd nlysis were improved in Auer e l. b. The EXP of Auer e. l. ws designed for he mulirmed bndi gme wih rewrds nd is plying sregy is bsed on mixing Gibbs disribuion lso known s exponenil weighs wih uniform explorion disribuion in proporion o he lerning re. The uniform explorion leves no hope for chieving logrihmic regre in he sochsic regime simulneously wih he roo regre in he dversril regime, since ech rm is plyed les Ω imes in rounds of he gme. By chnging he lerning re CesBinchi & Fischer 998 mnged o derive differen prmerizion of he lgorihm h ws shown o chieve logrihmic regre in he sochsic regime, bu i hd no regre gurnees in he dversril regime. Solz 5 hs observed h in he gme wih losses he roo regre gurnee in he dversril regime cn be chieved wihou mixing in he uniform disribuion nd even led o beer consns. However, mixing in ny disribuion h elemenwise does no exceed he lerning re does no brek he worscse performnce of he lgorihm in he gme wih losses. We exploi his emerged freedom in order o derive modificion of he EXP lgorihm h chieves lmos opiml regre in boh dversril nd sochsic regimes wihou prior knowledge bou he nure of he environmen. Rewrds cn be rnsformed ino losses by king l = r.. Problem Seing We sudy he mulirmed bndi MAB gme wih losses. In ech round of he gme he lgorihm chooses one cion A mong K possible cions,.k.. rms, nd observes he corresponding loss l A. The losses of oher rms re no observed. There is lrge number of loss generion models, four of which re considered below. In his work we resric ourselves o loss sequences l }, h re genered independenly of he lgorihm s cions. Under his ssumpion we cn ssume h he loss sequences re wrien down before he gme srs bu no reveled o he lgorihm. We lso mke sndrd ssumpion h he losses re bounded in he [, inervl. The performnce of he lgorihm is qunified by regre, defined s he difference beween he expeced loss of he lgorihm up o round nd he expeced loss of he bes rm up o round : R = E [ l As s min E [ l s } The expecion is ken over he possible rndomness of he lgorihm nd loss generion model. The gol of he lgorihm is o minimize he regre. We consider wo sndrd loss generion models, he dversril regime nd he sochsic regime nd wo inermedie regimes, he conmined sochsic regime nd he dversril regime wih gp. Adversril regime. In his regime he loss sequences re genered by n unresriced dversry who is oblivious o he lgorihm s cions. This is he mos generl seing nd he oher hree regimes cn be seen s specil cses of he dversril regime. An rm rg min l s is known s bes rm in hindsigh for he firs rounds. Sochsic regime. In his regime he losses l re smpled independenly from n unknown disribuion h depends on, bu no on. We use µ = E [l o denoe he expeced loss of rm. Arm is clled bes rm if µ = min µ } nd subopiml oherwise; le denoe some bes rm. For ech rm, define he gp = µ µ. Le = min : > } denoe he miniml gp. Leing N be he number of imes rm ws plyed up o nd including round, he regre cn be rewrien s R = E [N. Conmined sochsic regime. In his regime he dversry picks some roundrm pirs, locions before he gme srs nd ssigns he loss vlues here in n.
3 One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis rbirry wy. The remining losses re genered ccording o he sochsic regime. We cll conmined sochsic regime moderely conmined fer τ rounds if for ll τ he ol number of conmined locions of ech subopiml rm up o ime is mos /4 nd he number of conmined locions of ech bes rm is mos /4. By his definiion, for ll τ on verge over sochsiciy of he loss sequences he dversry cn reduce he gp of every rm by mos hlf. Adversril regime wih gp. An dversril regime is nmed by us n dversril regime wih gp if here exiss round τ nd n rm τ h persiss o be he bes rm in hindsigh for ll rounds τ. We nme such rm consisenly bes rm fer round τ. If no such rm exiss hen τ is undefined. Noe h if τ is defined for some τ hen τ is defined for ll τ > τ. We use λ = l s o denoe he cumulive loss of rm. Whenever τ is defined we define deerminisic gp of rm on round τ s: τ, = min τ λ λ τ If τ is undefined, τ, is defined s zero. }. Noion. We use E} o denoe he indicor funcion of even E nd = A=} o denoe he indicor funcion of he even h rm ws plyed on round.. Min Resuls Our min resuls include new lgorihm, which we nme EXP++, nd is nlysis in he four regimes defined in he previous secion. The EXP++ lgorihm, provided in Algorihm box, is generlizion of he EXP lgorihm wih losses. Algorihm Algorihm EXP++. Remrk: See ex for definiion of η nd ξ. : L =. for =,,... do β = ln K K. : ε = min K, β, ξ }. : ρ = e η L / e η L. : ρ = ε ρ + ε. Drw cion A ccording o ρ nd ply i. Observe nd suffer he loss l A. : l = la ρ. : L = L + l. end for The EXP++ lgorihm hs wo conrol levers: he lerning re η nd he explorion prmeers ξ. The EXP wih losses s described in Bubeck & CesBinchi is specil cse of he EXP++ wih η = β nd ξ =. The crucil innovion in EXP++ is he inroducion of explorion prmeers ξ, which re uned individully for ech rm depending on he ps observions. In he sequel we show h uning only he lerning re η suffices o conrol he regre of EXP++ in he dversril regime, irrespecive of he choice of he explorion prmeers ξ. Then we show h uning only he explorion prmeers ξ suffices o conrol he regre of EXP++ in he sochsic regime irrespecive of he choice of η, s long s η β. Applying he wo conrol levers simulneously we obin n lgorihm h chieves he opiml roo regre in he dversril regime up o logrihmic fcors nd lmos opiml logrihmic regre in he sochsic regime hough wih subopiml power in he logrihm. Then show h he new conrol lever is even more powerful nd llows o deec nd exploi he gp in even more chllenging siuions, including moderely conmined sochsic regime nd dversril regime wih gp. Adversril Regime Firs, we show uning η is sufficien o conrol he regre of EXP++ in he dversril regime. Theorem. For η = β nd ny ξ he regre of EXP++ for ny sisfies: R 4 K ln K. Noe h he regre bound in Theorem is jus fcor of worse hn he regre of EXP wih losses Bubeck & CesBinchi,. Sochsic Regime Now we show h for ny η β uning he explorion prmeers ξ suffices o conrol he regre of he lgorihm in he sochsic regime. By choosing η = β we obin lgorihms h hve boh he opiml roo regre scling in he dversril regime nd logrihmic regre scling in he sochsic regime. We consider number of differen wys of uning he explorion prmeers ξ, which led o differen prmerizions of EXP++. We sr wih n idelisic ssumpion h he gp is known, jus o give n ide of wh is he bes resul we cn hope for. Theorem. Assume h he gps re known. For ny choice of η β nd ny c 8, he regre of EXP++ wih ξ = c ln in he sochsic regime
4 One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis sisfies: R ln O + K Õ. The consns in his heorem re smll nd re provided explicily in he nlysis. We lso show h c cn be mde lmos s smll s. Nex we show h using he empiricl gp s n esime of he rue gp ˆ = min, L min L } we cn lso chieve polylogrihmic regre gurnee. We cll his lgorihm EXP++ AVG. Theorem. Le c 8 nd η β. Le be he miniml ineger h sisfies 4c K ln 4 nd le = mx, e / }. cln ˆ lnk The regre of EXP++ wih ξ = ermed EXP++ AVG in he sochsic regime sisfies: R ln O +. Alhough he ddiive consns in his heorem re very lrge, in he experimenl secion we show h minor modificion of his lgorihm performs comprbly o UCB in he sochsic regime nd hs he dversril regre gurnee in ddiion. In he following heorem we show h if we ssume known ime horizon T, hen we cn elimine he ddiive erm e / in he regre bound. The lgorihm in Theorem 4 replces he empiricl gp esime in he definiion of ξ wih lower confidence bound on he gp nd slighly djuss oher erms. We nme his lgorihm EXP++ LCBT. Theorem 4. Consider he sochsic regime wih known ime horizon T. The EXP++ LCBT lgorihm wih ny η β nd ppropriely defined ξ chieves regre RT Olog T. The precise definiion of EXP++ LCBT nd he proof of Theorem 4 re provided in he supplemenry meril. I seems h simulneous eliminion of he ssumpion on he known ime horizon nd he exponenilly lrge ddiive erm is very chllenging problem nd we defer i for fuure work. Conmined Sochsic Regime Nex we show h EXP++ AVG cn susin modere conminion in he sochsic regime wihou significn deeriorion in performnce. Theorem 5. Under he prmerizion given in Theorem, for = mx, e 4/ }, where is defined s before, he regre of EXP++ AVG in he sochsic regime h is moderely conmined fer τ rounds sisfies: R ln O + mx, τ}. The price h is pid for modere conminion fer τ rounds is he scling of by fcor of / nd he ddiive fcor of τ. The scling of ffecs he definiion of nd he consn in O ln. As before, he regre gurnee of Theorem 5 comes in ddiion o he gurnee of Theorem. Adversril Regime wih Gp Finlly, we show h EXP++ AVG cn lso ke dvnge of deerminisic gp in he dversril regime. Theorem 6. Under he prmerizion given in Theorem, he regre of EXP++ AVG in he dversril regime sisfies: R min mx, τ, e / τ,} } ln + O. τ τ, We remind he reder h in he bsence of consisenly bes rm τ, is defined s zero nd he regre bound is vcuous bu he regre bound of Theorem sill holds. We lso noe h τ, is nondecresing funcion of τ. Therefore, here is rdeoff: incresing τ increses τ,, bu loses he regre gurnee on he rounds before τ for simpliciy, we ssume h we hve no gurnees before τ. Theorem 6 llows o pick τ h minimizes his rdeoff. An imporn implicion of he heorem is h if he deerminisic gp is growing wih ime he regre gurnee improves oo. 4. Proofs We prove he heorems from he previous secion in he order hey were presened. The Adversril Regime The proof of Theorem relies on he following lemm, which is n inermedie sep in he nlysis of EXP by Bubeck see lso Bubeck & CesBinchi. Lemm 7. For ny K sequences of nonnegive numbers X, X,... indexed by,..., K} nd ny nonincresing posiive sequence η, η,..., for ρ =
5 One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis exp η X s h exp η X s exponen is zero we hve: ssuming for = he sum in he T = ρ T X min = X T η ρ X + ln K. η T = More precisely, we re using he following corollry, which follows by llowing X s o be rndom vribles nd king expecions of he wo sides of nd using he fc h E [min[ min [E [. We decompose expecions of incremenl sums ino incremenl sums of condiionl expecions nd use E [ o denoe expecions condiioned on relizion of ll rndom vribles up o round. Corollry 8. Le X, X,... for,..., K} be nonnegive rndom vribles nd le η nd ρ s defined in Lemm 7. Then: [ T [ [ T E E ρ X min E E [X = = [ T [ η E E ρ X + ln K. η T = Proof of Theorem. We ssocie X in wih l in he EXP++ lgorihm. We hve E [ l = l nd since ρ = ε ρ ε ρ ε nd l [, we lso hve: E [ ρ l [ E ρ ε l E [l A ε. As well, we hve: [ E ρ l = E ρ [ ρ E ρ = = l A ρ ρ ρ ρ ε ρ + ε K, where he ls inequliy follows by he fc h ε by he definiion of ε. Subsiuion of he bove clculions ino Corollry 8 yields: [ T [ T R = E l A min E l K T = = η + ln K η T + = ε K T = η + ln K η T. The resul of he heorem follows by he choice of η. The Sochsic Regime Our proofs re bsed on he following form of Bernsein s inequliy, which is minor improvemen over CesBinchi & Lugosi 6, Lemm A.8 bsed on he ides from Boucheron e l., Theorem.. Theorem 9 Bernsein s inequliy for mringles. Le X,..., X n be mringle difference sequence wih respec o filrion F = F i in nd le S i = i j= X j be he ssocied mringle. Assume h here exis posiive numbers ν nd c, such h X j c for ll j wih probbiliy nd [ n i= E X i Fi ν wih probbiliy. Then for ll b > : P [ S n > νb + cb e b. We re lso using he following echnicl lemm, which is proved in he supplemenry meril. Lemm. For ny c > : = e c = O c. The proof of Theorems nd is bsed on he following lemm. Lemm. Le ε } = be nonincresing deerminisic sequences, such h ε ε wih probbiliy nd ε ε for ll nd. Define ν = ε s nd define he even E L L ν + ν b +.5b ε. E Then for ny posiive sequence b, b,... nd ny he number of imes rm is plyed by EXP++ up o round is bounded s: E [N + e bs + ε s E } s= s= + e ηsgs, s= where g = b ε + ε.5b ε.
6 One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis Proof. Noe h elemens of } he mringle difference sequence l l re upper bounded by = ε +. Since ε ε /K /4 we cn simplify he upper bound by using ε +.5 ε. Furher noe h = [ E s l s [ E s l s [ E s ls l s l s [ + E s l s p s + p s ε s + ε s ε s + ε s = ν + ν wih probbiliy. Le E denoe he complemen of even E. Then by Bernsein s inequliy P [ E b. The number of imes rm is plyed up o round is bounded s: E [N = = P [A s = P [ A s = [ E s P E s + P [ A s = [ Es P E s P [ A s = E s E s } + P [ Es P [ A s = E s E s } + e bs. For he erms of he sum bove we hve: P [ A = E E s } = ρ E s } ρ + ε E s } L = ε + e η L e η E s } ε + e η L L E s } ε E s } + e ηg, Where in he ls inequliy we used he fcs h even E holds nd h since ε is nonincresing sequence ν ε. Subsiuion of his resul bck ino he compuion of E [N complees he proof. Proof of Theorem. The proof is bsed on Lemm. Le b = ln nd ε = ε. For ny c 8 nd ny, where is he miniml ineger for which 4c K ln 4 lnk, we hve: g = b ε + ε b ε.5b ε =.5 c c..5b ε The choice of ensures h for ll subopiml cions we hve ε = ξ, which slighly simplifies he clculions. Also noe h since ε = min K, β }, sympoiclly /ε erm in g domines /ε erm nd wih bi more creful bounding c cn be mde lmos s smll s. By subsiuion of he lower bound on g ino Lemm we hve: E [N + ln + c ln + c ln e 4 s lnk K + ln K + O +, where we used Lemm o bound he sum of he exponens. Noe h is of order Õ K 4. Proof of Theorem. Noe h since by our definiion ˆ } he sequence ε = ε = min K, β, c ln sisfies he condiion of Lemm. Also noe h for lrge enough, so h 4c K ln 4 ln K, we hve ε = c ln. Le b = ln nd le be lrge enough, so h for ll we hve 4c K ln 4 ln K nd e. We re going o bound he hree erms in he bound on E [N in Lemm. Bounding s= e bs is esy. For bounding s= ε s E s } we noe h when E holds nd c 8 we hve: ˆ L min L L L g = b.5b 4 ε ε =.5 c ln c ln.5 c c,
7 One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis where in 4 we used he fc h E holds nd in he ls line we used he fc h for we hve ln /. Thus ε E s } cln ˆ 4c ln nd s= ε s E s } = O ln. Finlly, for he ls erm in Lemm we hve lredy shown s n inermedie sep in he clculion of he bound on ˆ h for we hve g. Therefore, he ls erm K is of order O. By king ll hese clculions ogeher we obin he resul of he heorem. Noe h he resul holds for ny η β. The Conmined Sochsic Regime Proof of Theorem 5. The key elemen of he previous proof ws highprobbiliy lower bound on L L. We show h we cn obin similr lower bound in he conmined seing oo. Le, denoe he indicor funcion of conminion in locion,, kes vlue if conminion occurred nd oherwise. Le m =,l +, µ, in oher words, if eiher ws conmined on round hen m is he dversrilly ssigned vlue of he loss of rm on round nd oherwise i is he expeced loss. Le M = m s hen M M L L is mringle. By definiion of moderely conmined fer τ rounds process, for τ nd ny subopiml cion he ol number of rounds up o where eiher iself or were conmined is mos /. Therefore, M M / / /. Define even B : L L ν b +.5b, B ε where ε is defined in he proof of Theorem nd ν = ε. Then by Bernsein s inequliy P [ B s b. The reminder of he proof is idenicl o he proof of Theorem wih replced by /. The Adversril Regime wih Gp The proof of Theorem 6 is bsed on he following lemm, which is n nlogue of Theorems nd 5. Lemm. Under he prmerizion given in Theorem, he number of imes subopiml rm is plyed by EXP++ AVG in n dversril regime wih gp sisfies: E [N mx, τ, e / τ,} ln + O τ,. Proof. Agin, he only modificion we need is highprobbiliy lower bound on L L τ. We noe h λ λ τ L L τ is mringle nd h by definiion for τ we hve λ λ τ τ,. Define he evens W : τ, L L τ ν b +.5b, W ε where ε nd ν re s in he proof of Theorem 5. By Bernsein s inequliy P [ W b. The reminder of he proof is idenicl o he proof of Theorem. Proof of Theorem 6. Noe h by definiion τ, is nondecresing sequence of τ. Since Lemm is deerminisic resul i holds for ll τ simulneously nd we re free o choose he one h minimizes he bound. 5. Empiricl Evluion: Sochsic Regime We consider he sochsic mulirmed bndi problem wih Bernoulli rewrds. For ll he subopiml rms he rewrds re Bernoulli wih bis.5 nd for he single bes rm he rewrd is Bernoulli wih bis.5 +. We run he experimens wih K =, K =, nd K =, nd =. nd =. in ol, six combinions of K nd. We run ech gme for 7 rounds nd mke en repeiions of ech experimen. The solid lines in he grphs in Figure represen he men performnce over he experimens nd he dshed lines represen he men plus one sndrd deviion sd over he en repeiions of he corresponding experimen. In he experimens EXP++ is prmerized by ξ = ln ˆ ˆ, where ˆ is he empiricl esime of defined in. In order o demonsre h in he sochsic regime he explorion prmeers re in full conrol of he performnce we run he EXP++ lgorihm wih wo differen lerning res. EXP++ EMP corresponds o η = β nd EXP++ ACC corresponds o η =. Noe h only he EXP++ EMP hs performnce gurnee in he dversril regime. We compre EXP++ lgorihm wih he EXP lgorihm s described in Bubeck & CesBinchi, he UCB lgorihm of Auer e l., nd Thompson s smpling. Since i ws demonsred empiriclly in Seldin e l. h in he bove experimens he performnce of Thompson smpling is comprble or superior o he performnce of EwS nd KLUCB, he ler wo lgorihms re excluded from he comprison. For he EXP++ nd he EXP lgorihms we rnsform he rewrds ino losses vi l = r rnsformion, oher lgorihms opere direcly on he rewrds.
8 One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis 7 K =. =. 5 K =. =..5 x 4 K =. =. Cumulive Regre Cumulive Regre 4 Cumulive Regre x 6 K =, = x 6 b K =, = x 6 c K =, =. Cumulive Regre K =. =. Cumulive Regre 4 x K =. =. Cumulive Regre x UCB Thom EXP EXP++ EMP EXP++ ACC K =. = x 6 d K =, = x 6 e K =, = x 6 f K =, =. Figure. Comprison of UCB, Thompson smpling Thom, EXP, nd EXP++ lgorihms in he sochsic regime. The legend in figure f corresponds o ll he figures. EXP++ EMP is he Empiricl EXP++ lgorihm nd EXP++ ACC is n Accelered Empiricl EXP++, where we ke η =. Solid lines correspond o mens over repeiions of he corresponding experimens nd dshed lines correspond o he mens plus one sndrd deviion. The resuls re presened in Figure. We see h in ll he experimens he performnce of EXP++ EMP is lmos idenicl o he performnce of UCB. However, unlike UCB nd Thompson s smpling, EXP++ EMP is secured gins he possibiliy h he gme is conrolled by n dversry. In he supplemenry meril we show h ny deerminisic lgorihm is vulnerble gins n dversry. The EXP++ ACC lgorihm cn be seen s eser for fuure work. I performs beer hn EXP++ EMP, bu i does no hve he dversril regime performnce gurnee. However, we do no exclude he possibiliy h by some more sophisiced simulneous conrol of η nd ε s i my be possible o design n lgorihm h will hve boh beer performnce in he sochsic regime nd regre gurnee in he dversril regime. An exmple of such sophisiced conrol of he lerning re in he full informion gmes cn be found in de Rooij e l Discussion We presened generlizion of he EXP lgorihm, he EXP++ lgorihm, which ugmens he EXP lgorihm wih new conrol lever in he form explorion prmeers ε h re uned individully for ech rm. We hve shown h he new conrol lever is exremely useful in deecing nd exploiing he gp in wide rnge of regimes, while he old conrol lever lwys keeps he worscse performnce of he lgorihm under conrol. Due o he cenrl role of he EXP lgorihm in he dversril nlysis h sreches fr beyond he dversril bndis nd due o he simpliciy of our generlizion we believe h our resul will led o muliude of new lgorihms for oher problems h exploi he gps wihou compromising on he worscse performnce gurnees. There is lso room for furher improvemen of he presened echnique h we pln o pursue in fuure work. Acknowledgmens The uhors would like o hnk Sébsien Bubeck nd Wouer Koolen for useful discussions nd Csb Szepesvári for bringing up he reference o CesBinchi & Fischer 998. This reserch ws suppored by n Ausrlin Reserch Council Ausrlin Luree Fellowship FL8.
9 One Prcicl Algorihm for Boh Sochsic nd Adversril Bndis References Agrwl, Shipr nd Goyl, Nvin. Furher opiml regre bounds for Thompson smpling. In AISTATS,. Audiber, JenYves nd Bubeck, Sébsien. Minimx policies for dversril nd sochsic bndis. In Proceedings of he Inernionl Conference on Compuionl Lerning Theory COLT, 9. Auer, Peer, CesBinchi, Nicolò, Freund, Yov, nd Schpire, Rober E. Gmbling in rigged csino: The dversril mulirmed bndi problem. In Proceedings of he Annul Symposium on Foundions of Compuer Science, 995. Seldin, Yevgeny, Szepesvári, Csb, Auer, Peer, nd Abbsi Ydkori, Ysin. Evluion nd nlysis of he performnce of he EXP lgorihm in sochsic environmens. In JMLR Workshop nd Conference Proceedings, volume 4 EWRL,. Solz, Gilles. Incomplee Informion nd Inernl Regre in Predicion of Individul Sequences. PhD hesis, Universié Pris Sud, 5. Thompson, Willim R. On he likelihood h one unknown probbiliy exceeds noher in view of he evidence of wo smples. Biomerik, 5, 9. Auer, Peer, CesBinchi, Nicolò, nd Fischer, Pul. Finieime nlysis of he mulirmed bndi problem. Mchine Lerning, 47,. Auer, Peer, CesBinchi, Nicolò, Freund, Yov, nd Schpire, Rober E. The nonsochsic mulirmed bndi problem. SIAM Journl of Compuing,, b. Boucheron, Séphne, Lugosi, Gábor, nd Mssr, Pscl. Concenrion Inequliies A Nonsympoic Theory of Independence. Oxford Universiy Press,. Bubeck, Sébsien. Bndis Gmes nd Clusering Foundions. PhD hesis, Universié Lille,. Bubeck, Sébsien nd CesBinchi, Nicolò. Regre nlysis of sochsic nd nonsochsic mulirmed bndi problems. Foundions nd Trends in Mchine Lerning, 5,. Bubeck, Sébsien nd Slivkins, Aleksndrs. The bes of boh worlds: sochsic nd dversril bndis. In Proceedings of he Inernionl Conference on Compuionl Lerning Theory COLT,. Cppé, Olivier, Grivier, Aurélien, Millrd, OdlricAmbrym, Munos, Rémi, nd Solz, Gilles. KullbckLeibler upper confidence bounds for opiml sequenil llocion. Annls of Sisics, 4,. CesBinchi, Nicolò nd Fischer, Pul. Finieime regre bounds for he mulirmed bndi problem. In Proceedings of he Inernionl Conference on Mchine Lerning ICML, 998. CesBinchi, Nicolò nd Lugosi, Gábor. Predicion, Lerning, nd Gmes. Cmbridge Universiy Press, 6. de Rooij, Seven, vn Erven, Tim, Grünwld, Peer D., nd Koolen, Wouer M. Follow he leder if you cn, hedge if you mus. Journl of Mchine Lerning Reserch, 4. Kufmnn, Emilie, Kord, Nhniel, nd Munos, Rémi. Thompson smpling: An opiml finie ime nlysis. In Proceedings of he Inernionl Conference on Algorihmic Lerning Theory ALT,. Li, Tze Leung nd Robbins, Herber. Asympoiclly efficien dpive llocion rules. Advnces in Applied Mhemics, 6, 985. Millrd, OdlricAmbrym. Apprenissge Séqueniel: Bndis, Sisique e Renforcemen. PhD hesis, INRIA Lille,. Robbins, Herber. Some specs of he sequenil design of experimens. Bullein of he Americn Mhemicl Sociey, 95.
Most contracts, whether between voters and politicians or between house owners and contractors, are
Americn Poliicl Science Review Vol. 95, No. 1 Mrch 2001 More Order wih Less Lw: On Conrc Enforcemen, Trus, nd Crowding IRIS BOHNET Hrvrd Universiy BRUNO S. FREY Universiy of Zürich STEFFEN HUCK Universiy
More informationFollow the Leader If You Can, Hedge If You Must
Journal of Machine Learning Research 15 (2014) 12811316 Submied 1/13; Revised 1/14; Published 4/14 Follow he Leader If You Can, Hedge If You Mus Seven de Rooij seven.de.rooij@gmail.com VU Universiy and
More informationCostSensitive Learning by CostProportionate Example Weighting
CosSensiive Learning by CosProporionae Example Weighing Bianca Zadrozny, John Langford, Naoki Abe Mahemaical Sciences Deparmen IBM T. J. Wason Research Cener Yorkown Heighs, NY 0598 Absrac We propose
More informationANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS
ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS R. Caballero, E. Cerdá, M. M. Muñoz and L. Rey () Deparmen of Applied Economics (Mahemaics), Universiy of Málaga,
More informationThe U.S. Treasury Yield Curve: 1961 to the Present
Finance and Economics Discussion Series Divisions of Research & Saisics and Moneary Affairs Federal Reserve Board, Washingon, D.C. The U.S. Treasury Yield Curve: 1961 o he Presen Refe S. Gurkaynak, Brian
More informationAnchoring Bias in Consensus Forecasts and its Effect on Market Prices
Finance and Economics Discussion Series Divisions of Research & Saisics and Moneary Affairs Federal Reserve Board, Washingon, D.C. Anchoring Bias in Consensus Forecass and is Effec on Marke Prices Sean
More informationDynamic Contracting: An Irrelevance Result
Dynamic Conracing: An Irrelevance Resul Péer Eső and Balázs Szenes Sepember 5, 2013 Absrac his paper considers a general, dynamic conracing problem wih adverse selecion and moral hazard, in which he agen
More informationToday s managers are very interested in predicting the future purchasing patterns of their customers, which
Vol. 24, No. 2, Spring 25, pp. 275 284 issn 7322399 eissn 1526548X 5 242 275 informs doi 1.1287/mksc.14.98 25 INFORMS Couning Your Cusomers he Easy Way: An Alernaive o he Pareo/NBD Model Peer S. Fader
More informationA Simple Introduction to Dynamic Programming in Macroeconomic Models
Economics Deparmen Economics orking Papers The Universiy of Auckland Year A Simple Inroducion o Dynamic Programming in Macroeconomic Models Ian King Universiy of Auckland, ip.king@auckland.ac.nz This paper
More informationCentral Bank Communication: Different Strategies, Same Effectiveness?
Cenral Bank Communicaion: Differen Sraegies, Same Effeciveness? Michael Ehrmann and Marcel Frazscher * European Cenral Bank Michael.Ehrmann@ecb.in, Marcel.Frazscher@ecb.in November 2004 Absrac The paper
More informationDoes Britain or the United States Have the Right Gasoline Tax?
Does Briain or he Unied Saes Have he Righ Gasoline Tax? Ian W.H. Parry and Kenneh A. Small March 2002 (rev. Sep. 2004) Discussion Paper 02 12 rev. Resources for he uure 1616 P Sree, NW Washingon, D.C.
More informationBIS Working Papers. Globalisation, passthrough. policy response to exchange rates. No 450. Monetary and Economic Department
BIS Working Papers No 450 Globalisaion, passhrough and he opimal policy response o exchange raes by Michael B Devereux and James Yeman Moneary and Economic Deparmen June 014 JEL classificaion: E58, F6
More informationAre Under and Overreaction the Same Matter? A Price Inertia based Account
Are Under and Overreacion he Same Maer? A Price Ineria based Accoun Shengle Lin and Sephen Rasseni Economic Science Insiue, Chapman Universiy, Orange, CA 92866, USA Laes Version: Nov, 2008 Absrac. Theories
More informationOUTOFBAG ESTIMATION. Leo Breiman* Statistics Department University of California Berkeley, CA. 94708 leo@stat.berkeley.edu
1 OUTOFBAG ESTIMATION Leo Breiman* Saisics Deparmen Universiy of California Berkeley, CA. 94708 leo@sa.berkeley.edu Absrac In bagging, predicors are consruced using boosrap samples from he raining se
More informationWhen Simulation Meets Antichains (on Checking Language Inclusion of NFAs)
When Simultion Meets Antichins (on Checking Lnguge Inclusion of NFAs) Prosh Aziz Abdull 1, YuFng Chen 1, Lukáš Holík 2, Richrd Myr 3, nd Tomáš Vojnr 2 1 Uppsl University 2 Brno University of Technology
More informationSolving BAMO Problems
Solving BAMO Problems Tom Dvis tomrdvis@erthlink.net http://www.geometer.org/mthcircles Februry 20, 2000 Abstrct Strtegies for solving problems in the BAMO contest (the By Are Mthemticl Olympid). Only
More informationContextualizing NSSE Effect Sizes: Empirical Analysis and Interpretation of Benchmark Comparisons
Contextulizing NSSE Effect Sizes: Empiricl Anlysis nd Interprettion of Benchmrk Comprisons NSSE stff re frequently sked to help interpret effect sizes. Is.3 smll effect size? Is.5 relly lrge effect size?
More informationI M F S T A F F D I S C U S S I O N N O T E
I M F S T A F F D I S C U S S I O N N O T E February 29, 2012 SDN/12/01 Two Targes, Two Insrumens: Moneary and Exchange Rae Policies in Emerging Marke Economies Jonahan D. Osry, Aish R. Ghosh, and Marcos
More informationAsymmetry of the exchange rate passthrough: An exercise on the Polish data 1
Asymmery of he exchange rae passhrough: An exercise on he Polish daa Jan Przysupa Ewa Wróbel 3 Absrac We propose a complex invesigaion of he exchange rae passhrough in a small open economy in ransiion.
More informationResearch Division Federal Reserve Bank of St. Louis Working Paper Series
Research Division Federal Reserve Bank of S. Louis Working Paper Series Wihsanding Grea Recession like China Yi Wen and Jing Wu Working Paper 204007A hp://research.slouisfed.org/wp/204/204007.pdf March
More informationWhich Archimedean Copula is the right one?
Which Archimedean is he righ one? CPA Mario R. Melchiori Universidad Nacional del Lioral Sana Fe  Argenina Third Version Sepember 2003 Published in he YieldCurve.com ejournal (www.yieldcurve.com), Ocober
More informationSCRIBE: A largescale and decentralized applicationlevel multicast infrastructure
!! IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 2, NO. 8, OCTOBER 22 1 SCRIBE: A lrgescle nd decentrlized pplictionlevel multicst infrstructure Miguel Cstro, Peter Druschel, AnneMrie Kermrrec
More informationKONSTANTĪNS BEŅKOVSKIS IS THERE A BANK LENDING CHANNEL OF MONETARY POLICY IN LATVIA? EVIDENCE FROM BANK LEVEL DATA
ISBN 9984 676 20 X KONSTANTĪNS BEŅKOVSKIS IS THERE A BANK LENDING CHANNEL OF MONETARY POLICY IN LATVIA? EVIDENCE FROM BANK LEVEL DATA 2008 WORKING PAPER Lavias Banka, 2008 This source is o be indicaed
More informationFIRST PASSAGE TIMES OF A JUMP DIFFUSION PROCESS
Adv. Appl. Prob. 35, 54 531 23 Prined in Norhern Ireland Applied Probabiliy Trus 23 FIRST PASSAGE TIMES OF A JUMP DIFFUSION PROCESS S. G. KOU, Columbia Universiy HUI WANG, Brown Universiy Absrac This paper
More informationBoard of Governors of the Federal Reserve System. International Finance Discussion Papers. Number 1003. July 2010
Board of Governors of he Federal Reserve Sysem Inernaional Finance Discussion Papers Number 3 July 2 Is There a Fiscal Free Lunch in a Liquidiy Trap? Chrisopher J. Erceg and Jesper Lindé NOTE: Inernaional
More informationWhen Should Public Debt Be Reduced?
I M F S T A F F D I S C U S S I ON N O T E When Should Public Deb Be Reduced? Jonahan D. Osry, Aish R. Ghosh, and Raphael Espinoza June 2015 SDN/15/10 When Should Public Deb Be Reduced? Prepared by Jonahan
More informationExchange Rate PassThrough into Import Prices: A Macro or Micro Phenomenon? Abstract
Exchange Rae PassThrough ino Impor Prices: A Macro or Micro Phenomenon? Absrac Exchange rae regime opimaliy, as well as moneary policy effeciveness, depends on he ighness of he link beween exchange rae
More informationThe Simple Analytics of Helicopter Money: Why It Works Always
Vol. 8, 201428 Augus 21, 2014 hp://dx.doi.org/10.5018/economicsejournal.ja.201428 The Simple Analyics of Helicoper Money: Why I Works Always Willem H. Buier Absrac The auhor proides a rigorous analysis
More informationFirst variation. (onevariable problem) January 21, 2015
First vrition (onevrible problem) Jnury 21, 2015 Contents 1 Sttionrity of n integrl functionl 2 1.1 Euler eqution (Optimlity conditions)............... 2 1.2 First integrls: Three specil cses.................
More informationOn the Robustness of Most Probable Explanations
On the Robustness of Most Probble Explntions Hei Chn School of Electricl Engineering nd Computer Science Oregon Stte University Corvllis, OR 97330 chnhe@eecs.oregonstte.edu Adnn Drwiche Computer Science
More information