The Sample Complexity of Exploration in the MultiArmed Bandit Problem


 Merryl Rich
 1 years ago
 Views:
Transcription
1 Joura of Machie Learig Research 5 004) Submitted 1/04; Pubished 6/04 The Sampe Compexity of Exporatio i the MutiArmed Badit Probem Shie Maor Joh N. Tsitsikis Laboratory for Iformatio ad Decisio Systems Massachusetts Istitute of Techoogy Cambridge, MA 0139, USA Editors: Kristi Beett ad Nicoò CesaBiachi Abstract We cosider the mutiarmed badit probem uder the PAC probaby approximatey correct ) mode. It was show by EveDar et a. 00) that give arms, a tota of O /ε )og1/δ) ) trias suffices i order to fid a εoptima arm with probabiity at east 1 δ. We estabish a matchig ower boud o the expected umber of trias uder ay sampig poicy. We furthermore geeraize the ower boud, ad show a expicit depedece o the ukow) statistics of the arms. We aso provide a simiar boud withi a Bayesia settig. The case where the statistics of the arms are kow but the idetities of the arms are ot, is aso discussed. For this case, we provide a ower boud of Θ 1/ε ) + og1/δ)) ) o the expected umber of trias, as we as a sampig poicy with a matchig upper boud. If istead of the expected umber of trias, we cosider the maximum over a sampe paths) umber of trias, we estabish a matchig upper ad ower boud of the form Θ /ε )og1/δ) ). Fiay, we derive ower bouds o the expected regret, i the spirit of Lai ad Robbis. 1. Itroductio The mutiarmed badit probem is a cassica probem i decisio theory. There is a umber of aterative arms, each with a stochastic reward whose probabiity distributio is iitiay ukow. We try these arms i some order, which may deped o the sequece of rewards that have bee observed so far. A commo objective i this cotext is to fid a poicy for choosig the ext arm to be tried, uder which the sum of the expected rewards comes as cose as possibe to the idea reward, i.e., the expected reward that woud be obtaied if we were to try the best arm at a times. Oe of the attractive features of the mutiarmed badit probem is that despite its simpicity, it ecompasses may importat decisio theoretic issues, such as the tradeoff betwee exporatio ad expoitatio. The mutiarmed badit probem has bee widey studied i a variety of setups. The probem was first cosidered i the 50 s, i the semia work of Robbis 195), which derives poicies that asymptoticay attai a average reward that coverges i the imit to the reward of the best arm. The mutiarmed badit probem was ater studied i discouted, Bayesia, Markovia, expected reward, ad adversaria setups. See Berry ad Fristedt 1985) for a review of the cassica resuts o the mutiarmed badit probem. c 004 Shie Maor ad Joh Tsitsikis.
2 MANNOR AND TSITSIKLIS Lower bouds for differet variats of the mutiarmed badit have bee studied by severa authors. For the expected regret mode, where the regret is defied as the differece betwee the idea reward if the best arm were kow) ad the reward uder a oie poicy, the semia work of Lai ad Robbis 1985) provides asymptoticay tight bouds i terms of the KubackLeiber divergece betwee the distributios of the rewards of the differet arms. These bouds grow ogarithmicay with the umber of steps. The adversaria mutiarmed badit probem i.e., without ay probabiistic assumptios) was cosidered i Auer et a. 1995, 00b), where it was show that the expected regret grows proportioay to the square root of the umber of steps. Of reated iterest is the work of Kukari ad Lugosi 000) which shows that for ay specific time t, oe ca choose the reward distributios so that the expected regret is iear i t. The focus of this paper is the cassica mutiarmed badit probem, but rather tha ookig at the expected regret, we are cocered with PACtype bouds o the umber of steps eeded to idetify a earoptima arm. I particuar, we are iterested i the expected umber of steps that are required i order to idetify with high probabiity at east 1 δ) a arm whose expected reward is withi ε from the expected reward of the best arm. This aturay abstracts the case where oe must evetuay commit to oe specific arm, ad quatifies the amout of exporatio ecessary. This is i cotrast to most of the resuts for the mutiarmed badit probem, where the mai aim is to maximize the expected cumuative reward whie both exporig ad expoitig. I EveDar et a. 00), a poicy, caed the media eimiatio agorithm, was provided which requires O/ε )og1/δ)) trias, ad which fids a εoptima arm with probabiity at east 1 δ. A matchig ower boud was aso derived i EveDar et a. 00), but it oy appied to the case where δ > 1/, ad therefore did ot capture the case where high cofidece sma δ) is desired. I this paper, we derive a matchig ower boud which aso appies whe δ > 0 is arbitrariy sma. Our mai resut ca be viewed as a geeraizatio of a O1/ε )og1/δ)) ower boud provided i Athoy ad Bartett 1999), ad Cheroff 197), for the case of two badits. The proof i Athoy ad Bartett 1999) is based o a hypothesis iterchage argumet, ad reies criticay o the fact there are oy two uderyig hypotheses. Furthermore, it is imited to oadaptive poicies, for which the umber of trias is fixed a priori. The techique we use is based o a ikeihood ratio argumet ad a tight martigae boud, ad appies to geera poicies. A differet type of ower boud was derived i Auer et a. 00b) for the expected regret i a adversaria setup. The bouds derived there ca aso be used to derive a ower boud for our probem, but do ot appear to be tight eough to capture the og1/δ) depedece o δ. Our work aso provides fudameta ower bouds i the cotext of sequetia aaysis see, e.g., Cheroff, 197; Jeiso et a., 198; Siegmud, 1985). I the aguage of Siegmud 1985), we provide a ower boud o the expected egth of a sequetia sampig poicy uder ay adaptive aocatio scheme. For the case of two arms, it was show i Siegmud 1985) p. 148) that if oe restricts to sampig poicies that oy take ito accout the empirica average rewards from the differet arms, the the probems of iferece ad arm seectio ca be treated separatey. As a cosequece, ad uder this restrictio, Siegmud 1985) shows that a optima aocatio caot be much better tha a uiform oe. Our resuts are differet i a umber of ways. First, we cosider mutipe hypotheses mutipe arms). Secod, we aow the aocatio rue to be competey geera ad to deped o the whoe history. Third, uike most of the sequetia aaysis iterature see, e.g., Jeiso et a., 198), we do ot restrict ourseves to the imitig case where the probabiity of error coverges to zero. Fiay, we cosider fiite time bouds, rather tha asymptotic oes. We further commet that 64
3 EXPLORATION IN MULTIARMED BANDITS our resuts exted those of Jeiso et a. 198), i that we cosider the case where the reward is ot Gaussia. Paper Outie The paper is orgaized as foows. I Sectio, we set up our framework, ad sice we are maiy iterested i ower bouds, we restrict to the specia case where each arm is a coi, i.e., the rewards are Beroui radom variabes, but with ukow parameters biases ). I Sectio 3, we provide a O/ε )og1/δ)) ower boud o the expected umber of trias uder ay poicy that fids a εoptima coi with probabiity at east 1 δ. I Sectio 4, we provide a refied ower boud that depeds expicity o the specific though ukow) biases of the cois. This ower boud has the same og1/δ) depedece o δ; furthermore, every coi roughy cotributes a factor iversey proportioa to the square differece betwee its bias ad the bias of a best coi, but o more that 1/ε. I Sectio 5, we derive a ower boud simiar to the oe i Sectio 3, but withi a Bayesia settig, uder a prior distributio o the set of biases of the differet cois. I Sectio 6 we provide a boud o the expected regret which is simiar i spirit to the boud i Lai ad Robbis 1985). The costats i our bouds are sighty worse tha the oes i Lai ad Robbis 1985), but the differet derivatio, which iks the PAC mode to regret bouds, may be of idepedet iterest. Our boud hods for ay fiite time, as opposed to the asymptotic resut provided i Lai ad Robbis 1985). The case where the coi biases are kow i advace, but the idetities of the cois are ot, is discussed i Sectio 7. We provide a poicy that fids a εoptima coi with probabiity at east 1 δ, uder which the expected umber of trias is O 1/ε ) + og1/δ)) ). We show that this boud is tight up to a mutipicative costat. If istead of the expected umber of trias, we cosider the maximum over a sampe paths) umber of trias, we estabish a matchig upper ad ower bouds of the form Θ/ε )og1/δ)). Fiay, Sectio 8 cotais some brief cocudig remarks.. Probem Defiitio The exporatio probem for mutiarmed badits is defied as foows. We are give arms. Each arm is associated with a sequece of ideticay distributed Beroui i.e., takig vaues i 0, 1}) radom variabes Xk, k = 1,,..., with ukow mea. Here, Xk correspods to the reward obtaied the kth time that arm is tried. We assume that the radom variabes Xk, for = 1,...,, k = 1,,..., are idepedet, ad we defie p = p 1,..., p ). Give that we restrict to the Beroui case, we wi use i the seque the term coi istead of arm. A poicy is a mappig that give a history, chooses a particuar coi to be tried ext, or seects a particuar coi ad stops. We aow a poicy to use radomizatio whe choosig the ext coi to be tried or whe makig a fia seectio. However, we oy cosider poicies that are guarateed to stop with probabiity 1, for every possibe vector p. Otherwise, the expected umber of steps woud be ifiite.) Give a particuar poicy, we et P p be the correspodig probabiity measure o the atura probabiity space for this mode). This probabiity space captures both the radomess i the cois accordig to the vector p), as we as ay additioa radomizatio carried out by the poicy. We itroduce the foowig radom variabes, which are we defied, except possiby o the set of measure zero where the poicy does ot stop. We et T be the tota umber of times that 65
4 MANNOR AND TSITSIKLIS coi is tried, ad et T = T 1 + +T be the tota umber of trias. We aso et I be the coi which is seected whe the poicy decides to stop. We say that a poicy is ε,δ)correct if ) P p p I > max ε 1 δ, for every p [0,1]. It was show i EveDar et a. 00) that there exist costats c 1 ad c such that for every, ε > 0, ad δ > 0, there exists a ε,δ)correct poicy uder which E p [T ] c 1 ε og c δ, p [0,1]. A matchig ower boud was aso estabished i EveDar et a. 00), but oy for arge vaues of δ, amey, for δ > 1/. I cotrast, we aim at derivig bouds that capture the depedece of the sampecompexity o δ, as δ becomes sma. 3. A Lower Boud o the Sampe Compexity We start with our cetra resut, which ca be viewed as a extesio of Lemma 5.1 from Athoy ad Bartett 1999), as we as a specia case of Theorem 5. We preset it here because it admits a simper proof, but aso because parts of the proof wi be used ater. Throughout the rest of the paper, og wi stad for the atura ogarithm. Theorem 1 There exist positive costats c 1, c, ε 0, ad δ 0, such that for every, ε 0,ε 0 ), ad δ 0,δ 0 ), ad for every ε,δ)correct poicy, there exists some p [0,1] such that E p [T ] c 1 ε og c δ. I particuar, ε 0 ad δ 0 ca be take equa to 1/8 ad e 4 /4, respectivey. Proof Let us cosider a mutiarmed badit probem with + 1 cois, which we umber from 0 to. We cosider a fiite set of + 1 possibe parameter vectors p, which we wi refer to as hypotheses. Uder ay oe of the hypotheses, coi 0 has a kow bias p 0 = 1+ε)/. Uder oe hypothesis, deoted by H 0, a the cois other tha zero have a bias of 1/, H 0 : p 0 = 1 + ε, p i = 1, for i 0, which makes coi 0 the best coi. Furthermore, for = 1,...,, there is a hypothesis H : p 0 = 1 + ε, = 1 + ε, p i = 1, for i 0,, which makes coi the best coi. We defie ε 0 = 1/8 ad δ 0 = e 4 /4. From ow o, we fix some ε 0,ε 0 ) ad δ 0,δ 0 ), ad a poicy, which we assume to be ε/,δ)correct. If H 0 is true, the poicy must have probabiity at east 1 δ of evetuay stoppig ad seectig coi 0. If H is true, for some 0, the poicy must have probabiity at east 1 δ of evetuay stoppig ad seectig coi. We deote by E ad P the expectatio ad probabiity, respectivey, uder hypothesis H. 66
5 EXPLORATION IN MULTIARMED BANDITS We defie t by t = 1 cε og 1 4δ = 1 cε og 1 θ, 1) where θ = 4δ, ad where c is a absoute costat whose vaue wi be specified ater. 1 Note that θ < e 4 ad ε < 1/4. Reca that T stads for the umber of times that coi is tried. We assume that for some coi 0, we have E 0 [T ] t. We wi evetuay show that uder this assumptio, the probabiity of seectig H 0 uder H exceeds δ, ad vioates ε/,δ)correctess. It wi the foow that we must have E 0 [T ] > t for a 0. Without oss of geeraity, we ca ad wi assume that the above coditio hods for = 1, so that E 0 [T 1 ] t. We wi ow itroduce some specia evets A ad C uder which various radom variabes of iterest do ot deviate sigificaty from their expected vaues. We defie ad obtai from which it foows that A = T 1 4t }, t E 0 [T 1 ] 4t P 0 T 1 > 4t ) = 4t 1 P 0 T 1 4t )), P 0 A) 3/4. We defie K t = X X t 1, which is the umber of uit rewards heads ) if the first coi is tried a tota of t ot ecessariy cosecutive) times. We et C be the evet defied by C = max 1 t 4t Kt 1 t < t og1/θ) We ow estabish two emmas that wi be used i the seque. Lemma We have P 0 C) > 3/4. Proof We wi prove a more geera resut: we assume that coi i has bias p i uder hypothesis H, defie Kt i as the umber of uit rewards heads ) if coi i is tested for t ot ecessariy cosecutive) times, ad et K i C i = max t p i t < } t og1/θ). 1 t 4t First, ote that Kt i p i t is a P martigae i the cotext of Theorem 1, p i = 1/ is the bias of coi i = 1 uder hypothesis H 0 ). Usig Komogorov s iequaity Coroary 7.66, i p. 44 of Ross, 1983), the probabiity of the compemet of C i ca be bouded as foows: K i P max t p i t ) t og1/θ) 1 t 4t Sice E [K i 4t 4p it ) ] = 4p i 1 p i )t, we obtai }. E [K4t i 4p it ) ] t. og1/θ) P C i ) 1 4p i1 p i ) og1/θ) > 3 4, ) where the ast iequaity foows because θ < e 4 ad 4p i 1 p i ) I this ad subsequet proofs, ad i order to avoid repeated use of trucatio symbos, we treat t as if it were iteger.. The proof for a geera p i wi be usefu ater. 67
6 MANNOR AND TSITSIKLIS Lemma 3 If 0 x 1/ ad y 0, the where d = x) y e dxy, Proof A straightforward cacuatio shows that og1 x) + dx 0 for 0 x 1/. Therefore, yog1 x) + dx) 0 for every y 0. Rearragig ad expoetiatig, eads to 1 x) y e dxy. We ow et B be the evet that I = 0, i.e., that the poicy evetuay seects coi 0. Sice the poicy is ε/,δ)correct for δ < e 4 /4 < 1/4, we have P 0 B) > 3/4. We have aready show that P 0 A) 3/4 ad P 0 C) > 3/4. Let S be the evet that A, B, ad C occur, that is S = A B C. We the have P 0 S) > 1/4. Lemma 4 If E 0 [T 1 ] t ad c 100, the P 1 B) > δ. Proof We et W be the history of the process the sequece of cois chose at each time, ad the sequece of observed coi rewards) uti the poicy termiates. We defie the ikeihood fuctio L by ettig L w) = P W = w), for every possibe history w. Note that this fuctio ca be used to defie a radom variabe L W). We aso et K be a shorthad otatio for K T1, the tota umber of uit rewards heads ) obtaied from coi 1. Give the history up to time t 1, the coi choice at time t has the same probabiity distributio uder either hypothesis H 0 ad H 1 ; simiary, the coi reward at time t has the same probabiity distributio, uder either hypothesis, uess the chose coi was coi 1. For this reaso, the ikeihood ratio L 1 W)/L 0 W) is give by L 1 W) L 0 W) = 1 + ε)k 1 ε)t 1 K 1 )T 1 = 1 + ε) K 1 ε) K 1 ε) T 1 K = 1 4ε ) K 1 ε) T 1 K. 3) We wi ow proceed to ower boud the terms i the righthad side of Eq. 3) whe evet S occurs. If evet S has occurred, the A has occurred, ad we have K T 1 4t, so that 1 4ε ) K 1 4ε ) 4t = 1 4ε ) 4/cε ))og1/θ) e 16d/c)og1/θ) = θ 16d/c. We have used here Lemma 3, which appies because 4ε < 4/4 < 1/. Simiary, if evet S has occurred, the A C has occurred, which impies, T 1 K t og1/θ) = /ε c)og1/θ), 68
7 EXPLORATION IN MULTIARMED BANDITS where the equaity above made use of the defiitio of t. Therefore, 1 ε) T 1 K 1 ε) /ε c)og1/θ) e 4d/ c)og1/θ) = θ 4d/ c. Substitutig the above i Eq. 3), we obtai L 1 W) L 0 W) c) θ16d/c)+4d/. By pickig c arge eough c = 100 suffices), we obtai that L 1 W)/L 0 W) is arger tha θ = 4δ wheever the evet S occurs. More precisey, we have L 1 W) L 0 W) 1 S 4δ1 S, where 1 S is the idicator fuctio of the evet S. The, [ ] L1 W) P 1 B) P 1 S) = E 1 [1 S ] = E 0 L 0 W) 1 S E 0 [4δ1 S ] = 4δP 0 S) > δ, where we used the fact that P 0 S) > 1/4. To summarize, we have show that whe c 100, if E 0 [T 1 ] 1/cε )og1/4δ)), the P 1 B) > δ. Therefore, if we have a ε/,δ)correct poicy, we must have E 0 [T ] > 1/cε )og1/4δ)), for every > 0. Equivaety, if we have a ε,δ)correct poicy, we must have E 0 [T ] > /4cε ))og1/4δ)), which is of the desired form. 4. A Lower Boud o the Sampe Compexity  Geera Probabiities I Theorem 1, we worked with a particuar ufavorabe vector p the oe correspodig to hypothesis H 0 ), uder which a ot of exporatio is ecessary. This eaves ope the possibiity that for other, more favorabe choices of p, ess exporatio might suffice. I this sectio, we refie Theorem 1 by deveopig a ower boud that expicity depeds o the actua though ukow) vector p. Of course, for ay give vector p, there is a optima poicy, which seects the best coi without ay exporatio: e.g., if p 1 for a, the poicy that immediatey seects coi 1 is optima. However, such a poicy wi ot be ε,δ)correct for a possibe vectors p. We start with a ower boud that appies whe a coi biases p i ie i the rage [0,1/]. We wi ater use a reductio techique to exted the resut to a geeric rage of biases. I the rest of the paper, we use the otatioa covetio x) + = max0,x}. Theorem 5 Fix some p 0,1/). There exists a positive costat δ 0, ad a positive costat c 1 that depeds oy o p, such that for every ε 0,1/), every δ 0,δ 0 ), every p [0,1/], ad every ε,δ)correct poicy, we have } Mp,ε) 1) + 1 E p [T ] c 1 ε + p ) og 1 8δ, 69 Np,ε)
8 MANNOR AND TSITSIKLIS where p = max i p i, Mp,ε) = : > p ε, ad > p, ad ε + p } 1 +, 4) 1/ ad Np,ε) = : p ε, ad > p, ad ε + p } ) 1/ I particuar, δ 0 ca be take equa to e 8 /8. Remarks: a) The ower boud ivoves two sets of cois whose biases are ot too far from the best bias p. The first set Mp,ε) cotais cois that are withi ε from the best ad woud therefore be egitimate seectios. I the presece of mutipe such cois, a certai amout of exporatio is eeded to obtai the required cofidece that oe of these cois is sigificaty better tha the others. The secod set Np,ε) cotais cois whose bias is more tha ε away from p ; they come ito the ower boud because agai some exporatio is eeded i order to obtai the required cofidece that oe of these cois is sigificaty better tha the best coi i Mp,ε). b) The expressio ε + p )/1 + 1/) i Eqs. 4) ad 5) ca be repaced by ε + p )/ α) for ay positive costat α, by chagig some of the costats i the proof. c) This resut actuay provides a famiy of ower bouds, oe for every possibe choice of p. A tighter boud ca be obtaied by optimizig the choice of p, whie aso takig ito accout the depedece of the costat c 1 o p. This is ot hard the depedece of c 1 o p is described i Remark 7), but does ot provide ay ew isights. Proof Let us fix δ 0 = e 8 /8, some p 0,1/), ε 0,1/), δ 0,δ 0 ), a ε,δ)correct poicy, ad some p [0,1/]. Without oss of geeraity, we assume that p = p 1. Let us deote the true ukow) bias of each coi by q i. We cosider the foowig hypotheses: H 0 : q i = p i, for i = 1,...,, ad for = 1,...,, H : q = p 1 + ε, q i = p i, for i. If hypothesis H is true, the poicy must seect coi. We wi boud from beow the expected umber of times the cois i the sets Np,ε) ad Mp,ε) must be tried, whe hypothesis H 0 is true. As i Sectio 3, we use E ad P to deote the expectatio ad probabiity, respectivey, uder the poicy beig cosidered ad uder hypothesis H. We defie θ = 8δ, ad ote that θ < e 8. Let t = 1 cε og 1 θ, 1 cp 1 ) og 1 θ, 630 if Mp,ε), if Np,ε),
9 EXPLORATION IN MULTIARMED BANDITS where c is a costat that oy depeds o p, ad whose vaue wi be chose ater. Reca that T stads for the tota umber of times that coi is tried. We defie the evet A = T 4t }. As i the proof of Theorem 1, if E 0 [T ] t, the P 0A ) 3/4. We defie Kt = X1 + + X t, which is the umber of uit rewards heads ) if the th coi is tried a tota of t ot ecessariy cosecutive) times. We et C be the evet defied by C = max 1 t 4t Simiar to Lemma, ad sice θ = 8δ < e 8, we have 3 } Kt t < t og1/θ). P 0 C ) > 7/8. Let B be the evet I = }, i.e., that the poicy evetuay seects coi, ad et B c compemet. Sice the poicy is ε,δ)correct with δ < δ 0 < 1/, we must have be its P 0 B c ) > 1/, Np,ε). We aso have Mp,ε) P 0 B ) 1, so that the iequaity P 0 B ) > 1/ ca hod for at most oe eemet of Mp,ε). Equivaety, the iequaity P 0 B c ) 1/ ca hod for at most oe eemet of Mp,ε). Let M 0 p,ε) = Mp,ε) ad P 0 B c ) > 1 }. It foows that M 0 p,ε) Mp,ε) 1) +. The foowig emma is a aaog of Lemma 4. Lemma 6 Suppose that M 0 p,ε) Np,ε) ad that E 0 [T ] t. If the costat c i the defiitio of t is chose arge eough possiby depedig o p), the P B c ) > δ. Proof Fix some M 0 p,ε) Np,ε). For future referece, we ote that the defiitios of Mp,ε) ad Np,ε) icude the coditio ε+ p )/1+ 1/). Recaig that p = p 1, 1/, ad usig the defiitio = p 1 0, some easy agebra eads to the coditios ε + 1 ε ) We defie the evet S by S = A B c C. Sice P 0 A ) 3/4, P 0 B c ) > 1/, ad P 0C ) > 7/8, we have P 0 S ) > 1 8, M 0p,ε) Np,ε). 3. The derivatio is idetica to Lemma except for Eq. ), where oe shoud repace the assumptio that θ < e 4 with the stricter assumptio that θ < e 8 used here. 631
10 MANNOR AND TSITSIKLIS As i the proof of Lemma 4, we defie the ikeihood fuctio L by ettig L w) = P W = w), for every possibe history w, ad use agai L W) to defie the correspodig radom variabe. Let K be a shorthad otatio for K T, the tota umber of uit rewards heads ) obtaied from coi. We have L W) L 0 W) = p 1 + ε) K 1 p 1 ε) T K p K 1 ) T K p1 = + ε ) K 1 p1 ε 1 1 = 1 + ε + ) K 1 ε + ) T K, 1 ) T K where we have used the defiitio = p 1. It foows that L W) L 0 W) = = = 1 + ε + ) K 1 ε + ) ) ε + K 1 1 ε + ) K 1 ε + ) ) ε + K 1 1 ε + ) K 1 ε + 1 ) K 1 ε + 1 ) K 1 ε + 1 ) T K ) K1 p )/ 1 ε + 1 ) T K ) p T K)/. 7) We wi ow proceed to ower boud the righthad side of Eq. 7) for histories uder which evet S occurs. If evet S has occurred, the A has occurred, ad we have K T 4t, so that for every Nε, p), we have ) ) ε + K 1 = a b ) ) ε + 4t 1 ) ) ε + 4/c )og1/θ) 1 exp exp = θ 16d/p c. ε/ ) + 1 d 4 c d 16 cp og1/θ) ) } og1/θ) I step a), we have used Lemma 3 which appies because of Eq. 6); i step b), we used the fact ε/ 1, which hods because Nε, p). } 63
11 EXPLORATION IN MULTIARMED BANDITS Simiary, for Mε, p), we have ) ) ε + K 1 = a b ) ) ε + 4t 1 ) ) ε + 4/cε )og1/θ) 1 exp exp = θ 16d/p c. 1 + /ε) d 4 c d 16 cp og1/θ) ) } og1/θ) I step a), we have agai used Lemma 3; i step b), we used the fact /ε 1, which hods because Mε, p). We ow boud the product of the secod ad third terms i Eq. 7). If b 1, the the mappig y 1 y) b is covex for y [0,1]. Thus, 1 y) b 1 by, which impies that 1 ε + ) 1 p )/ 1 ε + ), 1 so that the product of the secod ad third terms ca be ower bouded by 1 ε + ) K 1 ε + ) K1 p )/ 1 ε + ) K 1 ε + ) K = 1. 1 We sti eed to boud the fourth term of Eq. 7). We start with the case where Np,ε). We have 1 ε + ) p T K)/ a 1 ε b = c d e 1 ε + 1 } ) 1/p ) t og1/θ) 8) ) 1/p c )og1/θ) exp d } ε + og1/θ) c 1 ) } d exp og1/θ) c1 p ) exp 4d } og1/θ) cp = θ 4d/ c). Here, a) hods because we are assumig that the evets A ad C occurred; b) uses the defiitio of t for Np,ε); c) foows from Eq. 6) ad Lemma 3; d) foows because > ε; ad e) hods because 0 1/, which impies that 1/1 ). 9) 10) 633
12 MANNOR AND TSITSIKLIS Cosider ow the case where M 0 p,ε). Equatio 8) hods for the same reasos as whe Np,ε). The oy differece from the above cacuatio is i step b), where t shoud be repaced with 1/cε )og1/θ). The, the righthad side i Eq. 9) becomes exp d } ε + og1/θ). c ε1 ) For M 0 p,ε), we have ε, which impies that ε + )/ε, which the eads to the same expressio as i Eq. 10). The rest of the derivatio is idetica. Summarizig the above, we have show that if M 0 p,ε) Np,ε), ad evet S has occurred, the L W) L 0 W) θ4d/ c)+16d/p c). For M 0 p,ε) Np,ε), we have p <. We ca choose c arge eough so that L W)/L 0 W) θ = 8δ; the vaue of c depeds oy o the costat p. Simiar to the proof of Theorem 1, we have L W) L 0 W) 1 S 8δ1 S, where 1 S is the idicator fuctio of the evet S. It foows that [ ] P B c ) P L W) S ) = E [1 S ] = E 0 L 0 W) 1 S E 0 [8δ1 S ] = 8δP 0 S ) > δ, where the ast iequaity reies o the aready estabished fact P 0 S ) > 1/8. Sice the poicy is ε,δ)correct, we must have P B c ) δ, for every. Lemma 6 the impies that E 0 [T ] > t for every M 0p,ε) Np,ε). We sum over a M 0 p,ε) Np,ε), use the defiitio of t, together with the fact M 0p,ε) Mp,ε) 1) +, to cocude the proof of the theorem. Remark 7 A cose examiatio of the proof reveas that the depedece of c 1 o p is captured by a requiremet of the form c 1 c p, for some absoute costat c. This suggests that there is a tradeoff i the choice of p. By choosig a arge p, the costat c 1 is made arger, but the sets M ad N become smaer, ad vice versa. The precedig resut may give the impressio that the sampe compexity is high oy whe the p i are bouded by 1/. The ext resut shows that simiar ower bouds hod with a differet costat) wheever the p i ca be assumed to be bouded away from 1. However, the ower boud becomes weaker i.e., the costat c 1 is smaer) whe the upper boud o the p i approaches 1. I fact, the depedece of a ower boud o ε caot be Θ1/ε ) whe max i p i = 1. To see this, cosider the foowig poicy π. Try each coi O1/ε)og/δ)) times. If oe of the cois aways resuted i heads, seect it. Otherwise, use some ε,δ)correct poicy π. It ca be show that the poicy π is ε,δ)correct for every p [0,1] ), ad that if max i p i = 1, the E p [T ] = O/ε)og/δ)). 634
13 EXPLORATION IN MULTIARMED BANDITS Theorem 8 Fix a iteger s, ad some p 0,1/). There exists a positive costat c 1 that depeds oy o p such that for every ε 0, s+) ), every δ 0,e 8 /8), every p [0,1 s ], ad every ε,δ)correct poicy, we have E p [T ] c 1 sη M p,εη) 1) + ε + N p,ηε) } 1 p ) og 1 8δ, where p = max i p i, η = s+1 /s, p is the vector with compoets p i = 1 1 p i ) 1/s for i = 1,,...,), ad M ad N are as defied i Theorem 5. Proof Let us fix s, p 0,1/), ε 0, s+) ), ad δ 0,e 8 /8). Suppose that we have a ε,δ)correct poicy π whose expected time to termiatio is E p [T ], wheever the vector of coi biases happes to be p. We wi use the poicy π to costruct a ew poicy π such that ) P p p I > max p i ηε 1 δ, p [0,1/) + ηε] ; i we wi the say that π is ηε,δ)correct o [0,1/)+ηε] ). Fiay, we wi use the ower bouds from Theorem 5, appied to π, to obtai a ower boud o the sampe compexity of π. The ew poicy π is specified as foows. Ru the origia poicy π. Wheever π chooses to try a certai coi i oce, poicy π tries coi i for s cosecutive times. Poicy π the feeds π with 0 if a s trias resuted i 0, ad feeds π with 1 otherwise. If p is the true vector of coi biases faced by poicy π, ad if poicy π chooses to sampe coi i, the poicy π sees a outcome which equas 1 with probabiity p i = 1 1 p i ) s. Let us defie two mappigs f,g : [0,1] [0,1], which are iverses of each other, by f p i ) = 1 1 p i ) 1/s, g p i ) = 1 1 p i ) s, ad with a sight abuse of otatio, et f p) = f p 1 ),..., f p )), ad simiary for g p). With our costructio, whe poicy π is faced with a bias vector p, it evoves i a idetica maer as the poicy π faced with a bias vector p = g p). But uder poicy π, there are s trias associated with every tria uder poicy π, which impies that T = st T is the umber of trias uder poicy π) ad therefore E π p[ T ] = se π g p) [T ], E π f p) [ T ] = se π p[t ], 11) where the superscript i the expectatio operator idicates the poicy beig used. We wi ow determie the correctess guaratees of poicy π. We first eed some agebraic preimiaries. Let us fix some p [0,1/)+ηε] ad a correspodig vector p, reated by p = f p) ad p = g p). Let aso p = max i p i ad p = max i p i. Usig the defiitio η = s+1 /s ad the assumptio ε < s+), we have p 1/) + 1/s), from which it foows that p s ) s = 1 1 s 1 1 s ) s 1 1 s 1 4 = 1 s+). The derivative f of f is mootoicay icreasig o [0,1). Therefore, f p ) f 1 s+) ) = 1 s+)) 1/s) 1 1 = s s s+)1 s)/s = 1 s s+1 /s) 1 s s+1 = η. 635
14 MANNOR AND TSITSIKLIS Thus, the derivative of the iverse mappig g satisfies g p ) 1 η, which impies, usig the cocavity of g, that g p ηε) g p ) g p )εη g p ) ε. Let I be the coi idex fiay seected by poicy π whe faced with p, which is the same as the idex chose by π whe faced with p. We have the superscript i the probabiity idicates the poicy beig used) P π p p I p ηε) = P π p g p I ) g p ηε)) P π p g p I ) g p ) ε) = P π p p I p ε) 1 δ, where the ast iequaity foows because poicy π was assumed to be ε,δ)correct. We have therefore estabished that π is ηε,δ)correct o [0,1/) + ηε]. We ow appy Theorem 5, with ηε istead of ε. Eve though that theorem is stated for a poicy which is ε,δ)correct for a possibe p, the proof oy requires the poicy to be ε,δ)correct for p [0,1/) + ε]. This gives a ower boud o E π p [ T ] which, usig Eq. 11), trasates to the caimed ower boud o E π p[t ]. This ower boud appies wheever p = g p), for some p [0,1/], ad therefore wheever p [0,1 s ]. 5. The Bayesia Settig There is aother variat of the probem which is of iterest. I this variat, the parameters p i associated with each arm are ot ukow costats, but radom variabes described by a give prior. I this case, there is a sige uderyig probabiity measure which we deote by P, ad which is the average of the measures P p over the prior distributio of p. We aso use E to deote the expectatio with respect to P. We the defie a poicy to be ε,δ)correct, for a particuar prior ad associated measure P, if We the have the foowig resut. P ) p I > max p i ε 1 δ. i Theorem 9 There exist positive costats c 1, c, ε 0, ad δ 0, such that for every ad ε 0,ε 0 ), there exists a prior for the badit probem such that for every δ 0,δ 0 ), ad ε,δ)correct poicy for this prior, we have E[T ] c 1 ε og c δ. I particuar, ε 0 ad δ 0 ca be take equa to 1/8 ad e 4 /1, respectivey. 636
15 EXPLORATION IN MULTIARMED BANDITS Proof Let ε 0 = 1/8 ad δ 0 = e 4 /1, ad et us fix ε 0,ε 0 ) ad δ 0,δ 0 ). Cosider the hypotheses H 0,...,H, itroduced i the proof of Theorem 1. Let the prior probabiity of H 0 be 1/, ad the prior probabiity of H be 1/, for = 1,...,. Fix a ε/,δ)correct poicy with respect to this prior, ad ote that it satisfies E[T ] 1 E 0[T ] 1 =1 E 0 [T ]. 1) Sice the poicy is ε/,δ)correct, we have Pp I > max ε/)) 1 δ. As i the proof of Theorem 5, et B be the evet that the poicy evetuay seects coi. We have 1 P 0B 0 ) + 1 P B ) 1 δ, which impies that 1 =1 =1 P B 0 ) δ. 13) Let G be the set of hypotheses 0 uder which the probabiity of seectig coi 0 is at most 3δ, i.e., G = : 1, P B 0 ) 3δ}. From Eq. 13), we obtai 1 G )3δ < δ, which impies that G > /3. Foowig the same argumet as i the proof of Lemma 4, we obtai that there exists a costat c such that if δ 0,e 4 /4) ad E 0 [T ] 1/cε )og1/4δ ), the P B 0 ) > δ. By takig δ = 3δ ad requirig that δ 0,e 4 /1), we see that the iequaity E 0 [T ] 1/cε )og1/1δ) impies that P B 0 ) > 3δ here, c is the same costat as i Lemma 4). But for every G we have P B 0 ) 3δ, ad therefore E 0 [T ] 1/cε )og1/1δ). The, Eq. 1) impies that E[T ] 1 E 0 [T ] G 1 G cε og 1 1δ c 1 ε og c δ, where we have used the fact G > /3 i the ast iequaity. To cocude, we have show that there exists costats c 1 ad c ad a prior for a probem with + 1 cois, such that ay ε/,δ)correct poicy satisfies E[T ] c 1 /ε )ogc /δ). The resut foows by takig a arger costat c 1 to accout for havig + 1 ad ot cois, ad ε istead of ε/). 6. Regret Bouds I this sectio we cosider ower bouds o the regret of ay poicy, ad show that oe ca derive the Θogt) regret boud of Lai ad Robbis 1985) usig the techiques i this paper. The resuts of Lai ad Robbis 1985) are asymptotic as t, whereas ours dea with fiite times t. Our ower boud has simiar depedece i t as the upper bouds give by Auer et a. 00a) for some 637
16 MANNOR AND TSITSIKLIS atura sampig agorithms. As i Lai ad Robbis 1985) ad Auer et a. 00a), we aso show that whe t is arge, the regret depeds ieary o the umber of cois. Give a poicy, et S t be the tota umber of uit rewards heads ) obtaied i the first t time steps. The regret by time t is deoted by R t, ad is defied by R t = t max i p i S t. Note that the regret is a radom variabe that depeds o the resuts of the coi tosses as we as of the radomizatio carried out by the poicy. Theorem 10 There exist positive costats c 1,c,c 3,c 4, ad a costat c 5, such that for every, ad for every poicy, there exists some p [0,1] such that for a t 1, E p [R t ] mic 1 t, c + c 3 t, c 4 ogt og + c 5 )}. 14) The iequaity 14) suggests that there are essetiay two regimes for the expected regret. Whe is arge compared to t, the expected regret is iear i t. Whe t is arge compared to, the regret behaves ike ogt, but depeds ieary o. Proof We wi prove a stroger resut, by cosiderig the regret i a Bayesia settig. By provig that the expectatio with respect to the prior is ower bouded by the righthad side i Eq. 14), it wi foow that the boud aso hods for at east oe of the hypotheses. Cosider the same sceario as i Theorem 1, where we have +1 cois ad +1 hypotheses H 0,H 1,...,H. The prior assigs a probabiity of 1/ to H 0, ad a probabiity of 1/ to each of the hypotheses H 1,H,...,H. Simiar to Theorem 1, we wi use the otatio E ad P to deote expectatio ad probabiity whe the th hypothesis is true, ad E to deote expectatio with respect to the prior. Let us fix t for the rest of the proof. We defie T as the umber of times coi is tried i the first t time steps. The expected regret whe H 0 is true is E 0 [R t ] = ε =1 ad the expected regret whe H = 1,...,) is true is so that the expected Bayesia) regret is E[R t ] = 1 ε E 0 [T ], E [R t ] = ε E [T 0 ] + ε E [T i ], i 0, =1 E 0 [T ] + ε 1 =1 E [T 0 ] + ε Let D be the evet that coi 0 is tried at east t/ times, i.e., D = T 0 t/}. =1 i 0, E [T i ]. 15) We cosider separatey the two cases P 0 D) < 3/4 ad P 0 D) 3/4. Suppose first that P 0 D) < 3/4. I that case, E 0 [T 0 ] < 7t/8, so that =1 E 0[T ] t/8. Substitutig i Eq. 15), we obtai E[R t ] εt/3. This gives the first term i the righthad side of Eq. 14), with c 1 = ε/3. 638
17 EXPLORATION IN MULTIARMED BANDITS We assume from ow o that P 0 D) 3/4. Rearragig Eq. 15), ad omittig the third term, we have E[R t ] ε 4 E 0 [T ] + 1 ) E [T 0 ]. Sice E [T 0 ] t/)p D), we have E[R t ] ε 4 For every 0, et us defie δ by =1 =1 E 0 [T ] + t ) P D). 16) E 0 [T ] = 1 cε og 1 4δ. Such a δ exists because of the mootoicity of the mappig x og1/x).) Let δ 0 = e 4 /4. If δ < δ 0, we argue exacty as i Lemma 4, except that the evet B i that emma is repaced by evet D. Sice P 0 D) 3/4, the same proof appies ad shows that P D) δ, so that E 0 [T ] + t P D) 1 cε og 1 4δ + t δ. If o the other had, δ δ 0, the E 0 [T ] 1/cε )og1/4δ 0 ), which impies by the earier aaogy with Lemma 4) that P D) δ 0, ad E 0 [T ] + t P D) 1 cε og 1 4δ + t δ 0. Usig the above bouds i Eq. 16), we obtai E[R t ] ε 4 =1 1 cε og 1 + hδ ) t ), 17) 4δ where hδ) = δ if δ < δ 0, ad hδ) = δ 0 otherwise. We ca ow view the δ as free parameters, ad cocude that E[R t ] is ower bouded by the miimum of the righthad side of Eq. 17), over a δ. Whe optimizig, a the δ wi be set to the same vaue. The miimizig vaue ca be δ 0, i which case we have E[R t ] 4cε og 1 ε + δ 0 4δ 0 8 t. Otherwise, the miimizig vaue is δ = /ctε, i which case we have 1 E[R t ] 16cε + 1 ) 4cε ogcε /) + 1 4cε og1/) + 4cε ogt. Thus, the theorem hods with c = 1/4cε)og1/4δ 0 ), c 3 = δ 0 ε/8, c 4 = 1/4cε, ad c 5 = 1/4) + ogcε /). 639
18 MANNOR AND TSITSIKLIS 7. Permutatios We ow cosider the case where the coi biases p i are kow up to a permutatio. More specificay, we are give a vector q [0,1], ad we are tod that the true vector p of coi biases is of the form p = q σ, where σ is a ukow permutatio of the set 1,...,}, ad where q σ stads for permutig the compoets of the vector q accordig to σ, i.e., q σ) = q σ). We say that a poicy is q,ε,δ)correct if the coi I evetuay seected satisfies ) P q σ p I > maxq ε 1 δ, for every permutatio σ of the set 1,...,}. We start with a O + og1/δ))/ε ) upper boud o the expected umber of trias, which is sigificaty smaer tha the boud obtaied whe the coi biases are competey ukow cf. Sectios 3 ad 4). We aso provide a ower boud which is withi a costat factor of our upper boud. We the cosider a differet measure of sampe compexity: istead of the expected umber of trias, we cosider the maximum over a sampe paths) umber of trias. We show that for every q,ε,δ)correct poicy, there is a Θ/ε )og1/δ)) ower boud o the maximum umber of trias. We ote that i the media eimiatio agorithm of EveDar et a. 00), the egth of a sampe paths is the same ad withi a costat factor from our ower boud. Hece our boud is agai tight. We therefore see that for the permutatio case, the sampe compexity depeds criticay o whether our criterio ivoves the expected or maximum umber of trias. This is i cotrast to the geera case cosidered i Sectio 3: the ower boud i that sectio appies uder both criteria, as does the matchig upper boud from EveDar et a. 00). 7.1 A Upper Boud o the Expected Number of Trias Suppose we are give a vector q [0,1], ad we are tod that the true vector p of coi biases is a permutatio of q. The poicy i Tabe 1 takes as iput the accuracy ε, the cofidece parameter δ, ad the vector q. I fact the poicy oy eeds to kow the bias of the best coi, which we deote by q = max q. The poicy aso uses a additioa parameter δ 0,1/]. The foowig theorem estabishes the correctess of the poicy, ad provides a upper boud o the expected umber of trias. Theorem 11 For every δ 0,1/], ε 0,1), ad δ 0,1), the poicy i Tabe 1 is guarateed to termiate after a fiite umber of steps, with probabiity 1, ad is q, ε,δ)correct. For every permutatio σ, the expected umber of trias satisfies E q σ [T ] 1 ε c 1 + c og 1 ), δ for some positive costats c 1 ad c that deped oy o δ. Proof We start with a usefu cacuatio. Suppose that at iteratio k, the media eimiatio agorithm seects a coi I k whose true bias is p Ik. The, usig the Hoeffdig iequaity, we have P ˆp k p Ik ε/3) exp ε/3) m k } δ k. 18) 640
19 EXPLORATION IN MULTIARMED BANDITS Iput: Accuracy ad cofidece parameters ε 0,1) ad δ 0,1); the bias of the best coi q. Parameter: δ 1/. 0. k = 1; 1. Ru the media eimiatio agorithm to fid a coi I k whose bias is withi ε/3 of q, with probabiity at east 1 δ.. Try coi I k for m k = 9/ε )og k /δ) times. Let ˆp k be the fractio of these trias that resut i heads. 3. If ˆp k q ε/3 decare that coi I k is a εoptima coi ad termiate. 4. Set k := k + 1 ad go back to Step 1. Tabe 1: A poicy for fidig a εoptima coi whe the bias of the best coi is kow. Let K be the umber of iteratios uti the poicy termiates. Give that K > k 1 i.e., the poicy did ot termiate i the first k 1 iteratios), there is probabiity at east 1 δ 1/ that p Ik q ε/3), i which case, from Eq. 18), there is probabiity at east 1 δ/ k ) 1/ that ˆp k q ε/3). Thus, PK > k K > k 1) 1 η, with η = 1/4. Cosequety, the probabiity that the poicy does ot termiate by the kth iteratio, PK > k), is bouded by 1 η) k. Thus, the probabiity that the poicy ever termiates is bouded above by 3/4) k for a k, ad is therefore 0. We ow boud the expected umber of trias. Let c be such that the umber of trias i oe executio of the media eimiatio agorithm is bouded by c/ε )og1/δ ). The, the umber of trias, tk), durig the kth iteratio is bouded by c/ε )og1/δ ) + m k. It foows that the expected tota umber of trias uder our poicy is bouded by k)tk) k=1pk 1 ) ε 1 η) k 1 cog1/δ ) + 9/)og k /δ) + 1 k=1 = 1 ε 1 η) k 1 cog1/δ ) + 9/)og1/δ) + 9k/)og + 1 ) k=1 1 ε c 1 + c og1/δ)), for some positive costats c 1 ad c. We fiay argue that the poicy is q,ε,δ)correct. For the poicy to seect a coi I with bias p I q ε, it must be that at some iteratio k, a coi I k with p Ik q ε was obtaied, but ˆp k came out arger tha q ε/3. From Eq. 18), for ay fixed k, the probabiity of this occurrig is bouded by δ/ k. By the uio boud, the probabiity that p I q ε is bouded by k=1 δ/k = δ. Remark 1 The kowedge of q turs out to be sigificat: it eabes the poicy to termiate as soo as there is high cofidece that a coi has bee foud whose bias is arger tha q ε, without havig to check the other cois. A poicy of this type woud ot work for the hypotheses 641
20 MANNOR AND TSITSIKLIS cosidered i the proofs of Theorems 1 ad 5: uder those hypotheses, the vaue of q is ot a priori kow. We ote that Theorem 11 disagrees with a ower boud i a preimiary versio Maor ad Tsitsikis, 003) of this paper. It turs out that the atter ower boud is oy vaid uder a additioa restrictio o the set of poicies, which wi be the subject of Sectio A Lower Boud We ow prove that the upper boud i Theorem 11 is tight, withi a costat. Theorem 13 There exist positive costats c 1, c, ε 0, ad δ 1, such that for every ad ε 0,ε 0 ), there exists some q [0,1], such that for every δ 0,δ 1 ) ad every q,ε,δ)correct poicy, there exists some permutatio σ such that E q σ [T ] 1 ε c 1 + c og 1 ). δ Proof Let ε 0 = 1/4 ad et δ 1 = δ 0 /5, where δ 0 is the same costat as i Theorem 5. Let us fix some ad ε 0,ε 0 ). We wi estabish the caimed ower boud for q = ε, 0.5 ε,...,0.5 ε), 19) ad for every δ 0,δ 1 ). I fact, it is sufficiet to estabish a ower boud of the form c /ε )og1/δ) ad a ower boud of the form c 1 /ε. We start with the former. Part I. Let us cosider the foowig three hypothesis testig probems. For each probem, we are iterested i a δcorrect poicy, i.e., a poicy whose probabiity of error is ess tha δ uder ay hypothesis. We wi show that a δcorrect poicy for the first probem ca be used to costruct a δcorrect poicy for the third probem, with the same sampe compexity, ad the appy Theorem 5 to obtai a ower boud. Π 1 : We have two cois ad the bias vector is either 0.5 ε, ε) or ε, 0.5 ε). We wish to determie the best coi. This is a specia case of our permutatio probem, with =. Π : We have a sige coi whose bias is either 0.5 ε or ε, ad we wish to determie the bias of the coi. 4 Π 3 : We have two cois ad the bias vector ca be 0.5, 0.5 ε), 0.5+ε, 0.5 ε), or 0.5,0.5+ε). We wish to determie the best coi. Cosider a δcorrect poicy for probem Π 1 except that the coi outcomes are ecoded as foows. Wheever coi 1 is tried, record the outcome uchaged; wheever coi is tried, record the opposite of the outcome i.e., record a 0 outcome as a 1, ad vice versa). Uder the first hypothesis i probem Π 1, every tria o matter which coi was tried) has probabiity 0.5 ε of beig equa to 1, ad uder the secod hypothesis has probabiity ε of beig equa to 1. With this ecodig, it is apparet that the iformatio provided by a tria of either coi i probem Π 1 is the same as the 4. A ower boud for this probem was provided i Lemma 5.1 from Athoy ad Bartett 1999). However, that boud is oy estabished for poicies with a a priori fixed umber of trias, whereas our poicies aow the umber of trias to be determied adaptivey, based o observed outcomes. 64
Consistency of Random Forests and Other Averaging Classifiers
Joural of Machie Learig Research 9 (2008) 20152033 Submitted 1/08; Revised 5/08; Published 9/08 Cosistecy of Radom Forests ad Other Averagig Classifiers Gérard Biau LSTA & LPMA Uiversité Pierre et Marie
More informationHOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1
1 HOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1 Brad Ma Departmet of Mathematics Harvard Uiversity ABSTRACT I this paper a mathematical model of card shufflig is costructed, ad used to determie
More informationMAXIMUM LIKELIHOODESTIMATION OF DISCRETELY SAMPLED DIFFUSIONS: A CLOSEDFORM APPROXIMATION APPROACH. By Yacine AïtSahalia 1
Ecoometrica, Vol. 7, No. 1 (Jauary, 22), 223 262 MAXIMUM LIKELIHOODESTIMATION OF DISCRETEL SAMPLED DIFFUSIONS: A CLOSEDFORM APPROXIMATION APPROACH By acie AïtSahalia 1 Whe a cotiuoustime diffusio is
More informationStéphane Boucheron 1, Olivier Bousquet 2 and Gábor Lugosi 3
ESAIM: Probability ad Statistics URL: http://wwwemathfr/ps/ Will be set by the publisher THEORY OF CLASSIFICATION: A SURVEY OF SOME RECENT ADVANCES Stéphae Bouchero 1, Olivier Bousquet 2 ad Gábor Lugosi
More informationSOME GEOMETRY IN HIGHDIMENSIONAL SPACES
SOME GEOMETRY IN HIGHDIMENSIONAL SPACES MATH 57A. Itroductio Our geometric ituitio is derived from threedimesioal space. Three coordiates suffice. May objects of iterest i aalysis, however, require far
More informationCounterfactual Reasoning and Learning Systems: The Example of Computational Advertising
Joural of Machie Learig Research 14 (2013) 32073260 Submitted 9/12; Revised 3/13; Published 11/13 Couterfactual Reasoig ad Learig Systems: The Example of Computatioal Advertisig Léo Bottou Microsoft 1
More informationSUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1
The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: 10.1214/09AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI,
More informationThe Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
Joural of Machie Learig Research 0 2009 22952328 Submitted 3/09; Revised 5/09; ublished 0/09 The Noparaormal: Semiparametric Estimatio of High Dimesioal Udirected Graphs Ha Liu Joh Lafferty Larry Wasserma
More informationRamseytype theorems with forbidden subgraphs
Ramseytype theorems with forbidde subgraphs Noga Alo Jáos Pach József Solymosi Abstract A graph is called Hfree if it cotais o iduced copy of H. We discuss the followig questio raised by Erdős ad Hajal.
More informationType Less, Find More: Fast Autocompletion Search with a Succinct Index
Type Less, Fid More: Fast Autocompletio Search with a Succict Idex Holger Bast MaxPlackIstitut für Iformatik Saarbrücke, Germay bast@mpiif.mpg.de Igmar Weber MaxPlackIstitut für Iformatik Saarbrücke,
More informationEverything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask
Everythig You Always Wated to Kow about Copula Modelig but Were Afraid to Ask Christia Geest ad AeCatherie Favre 2 Abstract: This paper presets a itroductio to iferece for copula models, based o rak methods.
More informationCrowds: Anonymity for Web Transactions
Crowds: Aoymity for Web Trasactios Michael K. Reiter ad Aviel D. Rubi AT&T Labs Research I this paper we itroduce a system called Crowds for protectig users aoymity o the worldwideweb. Crowds, amed for
More informationPresent Values, Investment Returns and Discount Rates
Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies
More informationWhich Extreme Values Are Really Extreme?
Which Extreme Values Are Really Extreme? JESÚS GONZALO Uiversidad Carlos III de Madrid JOSÉ OLMO Uiversidad Carlos III de Madrid abstract We defie the extreme values of ay radom sample of size from a distributio
More informationSystemic Risk and Stability in Financial Networks
America Ecoomic Review 2015, 105(2): 564 608 http://dx.doi.org/10.1257/aer.20130456 Systemic Risk ad Stability i Fiacial Networks By Daro Acemoglu, Asuma Ozdaglar, ad Alireza TahbazSalehi * This paper
More informationHow Has the Literature on Gini s Index Evolved in the Past 80 Years?
How Has the Literature o Gii s Idex Evolved i the Past 80 Years? Kua Xu Departmet of Ecoomics Dalhousie Uiversity Halifax, Nova Scotia Caada B3H 3J5 Jauary 2004 The author started this survey paper whe
More informationSoftware Reliability via RuTime ResultCheckig Hal Wasserma Uiversity of Califoria, Berkeley ad Mauel Blum City Uiversity of Hog Kog ad Uiversity of Califoria, Berkeley We review the eld of resultcheckig,
More information4. Trees. 4.1 Basics. Definition: A graph having no cycles is said to be acyclic. A forest is an acyclic graph.
4. Trees Oe of the importat classes of graphs is the trees. The importace of trees is evidet from their applicatios i various areas, especially theoretical computer sciece ad molecular evolutio. 4.1 Basics
More informationOn the Number of CrossingFree Matchings, (Cycles, and Partitions)
O the Number of CrossigFree Matchigs, (Cycles, ad Partitios Micha Sharir Emo Welzl Abstract We show that a set of poits i the plae has at most O(1005 perfect matchigs with crossigfree straightlie embeddig
More informationGCE Further Mathematics (6360) Further Pure Unit 2 (MFP2) Textbook. Version: 1.4
GCE Further Mathematics (660) Further Pure Uit (MFP) Tetbook Versio: 4 MFP Tetbook Alevel Further Mathematics 660 Further Pure : Cotets Chapter : Comple umbers 4 Itroductio 5 The geeral comple umber 5
More informationAsymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
More informationSignal Reconstruction from Noisy Random Projections
Sigal Recostructio from Noisy Radom Projectios Jarvis Haut ad Robert Nowak Deartmet of Electrical ad Comuter Egieerig Uiversity of WiscosiMadiso March, 005; Revised February, 006 Abstract Recet results
More informationTesting for Welfare Comparisons when Populations Differ in Size
Cahier de recherche/workig Paper 039 Testig for Welfare Comparisos whe Populatios Differ i Size JeaYves Duclos Agès Zabsoré Septembre/September 200 Duclos: Départemet d écoomique, PEP ad CIRPÉE, Uiversité
More informationNo Eigenvalues Outside the Support of the Limiting Spectral Distribution of Large Dimensional Sample Covariance Matrices
No igevalues Outside the Support of the Limitig Spectral Distributio of Large Dimesioal Sample Covariace Matrices By Z.D. Bai ad Jack W. Silverstei 2 Natioal Uiversity of Sigapore ad North Carolia State
More informationDesign of FollowUp Experiments for Improving Model Discrimination and Parameter Estimation
Design of FoowUp Experiments for Improving Mode Discrimination and Parameter Estimation Szu Hui Ng 1 Stephen E. Chick 2 Nationa University of Singapore, 10 Kent Ridge Crescent, Singapore 119260. Technoogy
More informationBOUNDED GAPS BETWEEN PRIMES
BOUNDED GAPS BETWEEN PRIMES ANDREW GRANVILLE Abstract. Recetly, Yitag Zhag proved the existece of a fiite boud B such that there are ifiitely may pairs p, p of cosecutive primes for which p p B. This ca
More informationOPINION Two cheers for Pvalues?
Journa of Epidemioogy and Biostatistics (2001) Vo. 6, No. 2, 193 204 OPINION Two cheers for Pvaues? S SENN Department of Epidemioogy and Pubic Heath, Department of Statistica Science, University Coege
More informationCahier technique no. 194
Collectio Techique... Cahier techique o. 194 Curret trasformers: how to specify them P. Foti "Cahiers Techiques" is a collectio of documets iteded for egieers ad techicias, people i the idustry who are
More informationStatistica Siica 6(1996), 31139 EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM Zhidog Bai ad Hewa Saraadasa Natioal Su Yatse Uiversity Abstract: With the rapid developmet of moder computig
More informationThe Arithmetic of Investment Expenses
Fiacial Aalysts Joural Volume 69 Number 2 2013 CFA Istitute The Arithmetic of Ivestmet Expeses William F. Sharpe Recet regulatory chages have brought a reewed focus o the impact of ivestmet expeses o ivestors
More information