Follow the Leader If You Can, Hedge If You Must
|
|
|
- Brent Clifton Fleming
- 10 years ago
- Views:
Transcription
1 Journal of Machine Learning Research 15 (2014) Submied 1/13; Revised 1/14; Published 4/14 Follow he Leader If You Can, Hedge If You Mus Seven de Rooij VU Universiy and Universiy of Amserdam Science Park 904, P.O. Box 94323, 1090 GH Amserdam, he Neherlands Tim van Erven Déparemen de Mahémaiques Universié Paris-Sud, Orsay Cedex, France Peer D. Grünwald Wouer M. Koolen Leiden Universiy (Grünwald) and Cenrum Wiskunde & Informaica (Grünwald and Koolen) Science Park 123, P.O. Box 94079, 1090 GB Amserdam, he Neherlands Edior: Nicolò Cesa-Bianchi Absrac Follow-he-Leader (FTL) is an inuiive sequenial predicion sraegy ha guaranees consan regre in he sochasic seing, bu has poor performance for wors-case daa. Oher hedging sraegies have beer wors-case guaranees bu may perform much worse han FTL if he daa are no maximally adversarial. We inroduce he FlipFlop algorihm, which is he firs mehod ha provably combines he bes of boh worlds. As a sepping sone for our analysis, we develop AdaHedge, which is a new way of dynamically uning he learning rae in Hedge wihou using he doubling rick. AdaHedge refines a mehod by Cesa-Bianchi, Mansour, and Solz (2007), yielding improved wors-case guaranees. By inerleaving AdaHedge and FTL, FlipFlop achieves regre wihin a consan facor of he FTL regre, wihou sacrificing AdaHedge s wors-case guaranees. AdaHedge and FlipFlop do no need o know he range of he losses in advance; moreover, unlike earlier mehods, boh have he inuiive propery ha he issued weighs are invarian under rescaling and ranslaion of he losses. The losses are also allowed o be negaive, in which case hey may be inerpreed as gains. Keywords: advice Hedge, learning rae, mixabiliy, online learning, predicion wih exper 1. Inroducion We consider sequenial predicion in he general framework of Decision Theoreic Online Learning (DTOL) or he Hedge seing (Freund and Schapire, 1997), which is a varian of predicion wih exper advice (Lilesone and Warmuh, 1994; Vovk, 1998; Cesa-Bianchi and Lugosi, 2006). Our goal is o develop a sequenial predicion algorihm ha performs well no only on adversarial daa, which is he scenario mos sudies worry abou, bu also when he daa are easy, as is ofen he case in pracice. Specifically, wih adversarial daa, he wors-case regre (defined below) for any algorihm is Ω( T ), where T is he number of predicions o be made. Algorihms such as Hedge, which have been designed o achieve his lower bound, ypically coninue o suffer regre of order T, even for easy daa, where c 2014 Seven de Rooij, Tim van Erven, Peer D. Grünwald and Wouer M. Koolen.
2 De Rooij, Van Erven, Grünwald and Koolen he regre of he more inuiive bu less robus Follow-he-Leader (FTL) algorihm (also defined below) is bounded. Here, we presen he firs algorihm which, up o consan facors, provably achieves boh he regre lower bound in he wors case, and a regre no exceeding ha of FTL. Below, we firs describe he Hedge seing. Then we inroduce FTL, discuss sophisicaed versions of Hedge from he lieraure, and give an overview of he resuls and conens of his paper. 1.1 Overview In he Hedge seing, predicion proceeds in rounds. A he sar of each round = 1, 2,..., a learner has o decide on a weigh vecor w = (w,1,..., w,k ) R K over K expers. Each weigh w,k is required o be nonnegaive, and he sum of he weighs should be 1. Naure hen reveals a K-dimensional vecor conaining he losses of he expers l = (l,1,..., l,k ) R K. Learner s loss is he do produc h = w l, which can be inerpreed as he expeced loss if Learner uses a mixed sraegy and chooses exper k wih probabiliy w,k. We denoe aggregaes of per-rial quaniies by heir capial leer, and vecors are in bold face. Thus, L,k = l 1,k l,k denoes he cumulaive loss of exper k afer rounds, and H = h h is Learner s cumulaive loss (he Hedge loss). Learner s performance is evaluaed in erms of her regre, which is he difference beween her cumulaive loss and he cumulaive loss of he bes exper: R = H L, where L = min k L,k. We will always analyse he regre afer an arbirary number of rounds T. We will omi he subscrip T for aggregae quaniies such as L T or R T wherever his does no cause confusion. A simple and inuiive sraegy for he Hedge seing is Follow-he-Leader (FTL), which pus all weigh on he exper(s) wih he smalles loss so far. More precisely, we will define he weighs w for FTL o be uniform on he se of leaders {k L 1,k = L 1 }, which is ofen jus a singleon. FTL works very well in many circumsances, for example in sochasic scenarios where he losses are independen and idenically disribued (i.i.d.). In paricular, he regre for Follow-he-Leader is bounded by he number of imes he leader is overaken by anoher exper (Lemma 10), which in he i.i.d. case almos surely happens only a finie number of imes (by he uniform law of large numbers), provided he mean loss of he bes exper is sricly smaller han he mean loss of he oher expers. As demonsraed by he experimens in Secion 5, many more sophisicaed algorihms can perform significanly worse han FTL. The problem wih FTL is ha i breaks down badly when he daa are anagonisic. For example, if one ou of wo expers incurs losses 1 2, 0, 1, 0,... while he oher incurs opposie losses 0, 1, 0, 1,..., he regre for FTL a ime T is abou T/2 (his scenario is furher discussed in Secion 5.1). This has promped he developmen of a muliude of alernaive algorihms ha provide beer wors-case regre guaranees. The seminal sraegy for he learner is called Hedge (Freund and Schapire, 1997, 1999). Is performance crucially depends on a parameer η called he learning rae. Hedge can be inerpreed as a generalisaion of FTL, which is recovered in he limi for η. In many analyses, he learning rae is changed from infiniy o a lower value ha opimizes 1282
3 Follow he Leader If You Can, Hedge If You Mus some upper bound on he regre. Doing so requires precogniion of he number of rounds of he game, or of some propery of he daa such as he evenual loss of he bes exper L. Provided ha he relevan saisic is monoonically nondecreasing in (such as L ), a simple way o address his issue is he so-called doubling rick: seing a budge on he saisic, and resaring he algorihm wih a double budge when he budge is depleed (Cesa-Bianchi and Lugosi, 2006; Cesa-Bianchi e al., 1997; Hazan and Kale, 2008); η can hen be opimised for each individual block in erms of he budge. Beer bounds, bu harder analyses, are ypically obained if he learning rae is adjused each round based on previous observaions, see e.g. (Cesa-Bianchi and Lugosi, 2006; Auer e al., 2002). The Hedge sraegy presened by Cesa-Bianchi, Mansour, and Solz (2007) is a sophisicaed example of such adapive uning. The relevan algorihm, which we refer o as CBMS, is defined in (16) in Secion 4.2 of heir paper. To discuss is guaranees, we need he following noaion. Le l = min k l,k and l + = max k l,k denoe he smalles and larges loss in round, and le L = l l and L + = l l+ denoe he cumulaive minimum and maximum loss respecively. Furher le s = l + l denoe he loss range in rial and le S = max{s 1,..., s } denoe he larges loss range afer rials. Then, wihou prior knowledge of any propery of he daa, including T, S and L, he CBMS sraegy achieves regre bounded by 1 R CBMS 4 (L L )(L + ST L ) T ln K + lower order erms (1) (Cesa-Bianchi e al., 2007, Corollary 3). Hence, in he wors case L = L + ST/2 and he bound is of order S T, bu when he loss of he bes exper L [L, L + ST ] is close o eiher boundary he guaranees are much sronger. The conribuions of his work are wofold: firs, in Secion 2, we develop AdaHedge, which is a refinemen of he CBMS sraegy. A (very) preliminary version of his sraegy was presened a NIPS (Van Erven e al., 2011). Like CMBS, AdaHedge is compleely parameerless and unes he learning rae in erms of a direc measure of pas performance. We derive an improved wors-case bound of he following form. Again wihou any assumpions, we have R ah 2 S (L L )(L + L ) L + L ln K + lower order erms (2) (see Theorem 8). The parabola under he square roo is always smaller han or equal o is CMBS counerpar (since i is nondecreasing in L + and L + L +ST ); i expresses ha he regre is small if L [L, L + ] is close o eiher boundary. I is maximized in L a he midpoin beween L and L +, and in his case we recover he wors-case bound of order S T. Like (1), he regre bound (2) is fundamenal, which means ha i is invarian under ranslaion of he losses and proporional o heir scale. Moreover, no only AdaHedge s regre bound is fundamenal: he weighs issued by he algorihm are hemselves invarian 1. As poined ou by a referee, i is widely known ha he leading consan of 4 can be improved o using echniques by Györfi and Oucsák (2007) ha are essenially equivalen o our Lemma 2 below; Gerchinoviz (2011, Remark 2.2) reduced i o approximaely AdaHedge allows a sligh furher reducion o
4 De Rooij, Van Erven, Grünwald and Koolen under ranslaion and scaling (see Secion 4). The CBMS algorihm and AdaHedge are insensiive o rials in which all expers suffer he same loss, a naural propery we call imelessness. An aracive feaure of he new bound (2) is ha i expresses his propery. A more deailed discussion appears below Theorem 8. Our second conribuion is o develop a second algorihm, called FlipFlop, ha reains he wors-case bound (2) (up o a consan facor), bu has even beer guaranees for easy daa: is performance is never subsanially worse han ha of Follow-he-Leader. A firs glance, his may seem rivial o accomplish: simply ake boh FTL and AdaHedge, and combine he wo by using FTL or Hedge recursively. To see why such approaches do no work, suppose ha FTL achieves regre R fl, while AdaHedge achieves regre R ah. We would only be able o prove ha he regre of he combined sraegy compared o he bes original exper saisfies R c min{r fl, R ah } + G c, where G c is he wors-case regre guaranee for he combinaion mehod, e.g. (1). In general, eiher R fl or R ah may be close o zero, while a he same ime he regre of he combinaion mehod, or a leas is bound G c, is proporional o T. Tha is, he overhead of he combinaion mehod will dominae he regre! The FlipFlop approach we describe in Secion 3 circumvens his by alernaing beween Following he Leader and using AdaHedge in a carefully specified way. For his sraegy we can guaranee R ff = O(min{R fl, G ah }), where G ah is he regre guaranee for AdaHedge; Theorem 15 provides a precise saemen. Thus, FlipFlop is he firs algorihm ha provably combines he benefis of Follow-he- Leader wih robus behaviour for anagonisic daa. A key concep in he design and analysis of our algorihms is wha we call he mixabiliy gap, inroduced in Secion 2.1. This quaniy also appears in earlier works, and seems o be of fundamenal imporance in boh he curren Hedge seing as well as in sochasic seings. We elaborae on his in Secion 6.2 where we provide he big picure underlying his research and we briefly indicae how i relaes o pracical work such as (Devaine e al., 2013). 1.2 Relaed Work As menioned, AdaHedge is a refinemen of he sraegy analysed by Cesa-Bianchi e al. (2007), which is iself more sophisicaed han mos earlier approaches, wih wo noable excepions. Firs, Chaudhuri, Freund, and Hsu (2009) describe a sraegy called NormalHedge ha can efficienly compee wih he bes ɛ-quanile of expers; heir bound is incomparable wih he bounds for CBMS and for AdaHedge. Second, Hazan and Kale (2008) develop a sraegy called Variaion MW ha has especially low regre when he losses of he bes exper vary lile beween rounds. They show ha he regre of Variaion MW is of order VAR max T ln K, where VAR max T = max T s=1 ( ls,k 1 L,k ) 2 wih k he bes exper afer rounds. This bound dominaes our wors-case resul (2) (up o a muliplicaive consan). As demonsraed by he experimens in Secion 5, heir mehod does no achieve he benefis of FTL, however. In Secion 5 we also discuss he performance of NormalHedge and Variaion MW compared o AdaHedge and FlipFlop. 1284
5 Follow he Leader If You Can, Hedge If You Mus Oher approaches o sequenial predicion include Defensive Forecasing (Vovk e al., 2005), and Following he Perurbed Leader (Kalai and Vempala, 2003). These radically differen approaches also allow compeing wih he bes ɛ-quanile, as shown by Chernov and Vovk (2010) and Huer and Poland (2005); he laer also consider nonuniform weighs on he expers. The safe MDL and safe Bayesian algorihms by Grünwald (2011, 2012) share he presen work s focus on he mixabiliy gap as a crucial par of he analysis, bu are concerned wih he sochasic seing where losses are no adversarial bu i.i.d. FlipFlop, safe MDL and safe Bayes can all be inerpreed as mehods ha aemp o choose a learning rae η ha keeps he mixabiliy gap small (or, equivalenly, ha keeps he Bayesian poserior or Hedge weighs concenraed ). 1.3 Ouline In he nex secion we presen and analyse AdaHedge and compare is wors-case regre bound o exising resuls, in paricular he bound for CBMS. Then, in Secion 3, we build on AdaHedge o develop he FlipFlop sraegy. The analysis closely parallels ha of AdaHedge, bu wih exra complicaions a each of he seps. In Secion 4 we show ha boh algorihms have he propery ha heir behaviour does no change under ranslaion and scaling of he losses. We furher illusrae he relaionship beween he learning rae and he regre, and compare AdaHedge and FlipFlop o exising mehods, in experimens wih arificial daa in Secion 5. Finally, Secion 6 conains a discussion, wih ambiious suggesions for fuure work. 2. AdaHedge In his secion, we presen and analyse he AdaHedge sraegy. To inroduce our noaion and proof sraegy, we sar wih he simples possible analysis of vanilla Hedge, and hen move on o refine i for AdaHedge. 2.1 Basic Hedge Analysis for Consan Learning Rae Following Freund and Schapire (1997), we define he Hedge or exponenial weighs sraegy as he choice of weighs w,k = w 1,ke ηl 1,k Z, (3) where w 1 = (1/K,..., 1/K) is he uniform disribuion, Z = w 1 e ηl 1 is a normalizing consan, and η (0, ) is a parameer of he algorihm called he learning rae. If η = 1 and one imagines L 1,k o be he negaive log-likelihood of a sequence of observaions, hen w,k is he Bayesian poserior probabiliy of exper k and Z is he marginal likelihood of he observaions. Like in Bayesian inference, he weighs are updaed muliplicaively, i.e. w +1,k w,k e ηl,k. The loss incurred by Hedge in round is h = w l, he cumulaive Hedge loss is H = h h, and our goal is o obain a good bound on H T. To his end, i urns 1285
6 De Rooij, Van Erven, Grünwald and Koolen ou o be echnically convenien o approximae h by he mix loss m = 1 η ln(w e ηl ), (4) which accumulaes o M = m m. This approximaion is a sandard ool in he lieraure. For example, he mix loss m corresponds o he loss of Vovk s (1998; 2001) Aggregaing Pseudo Algorihm, and racking he evoluion of m is a crucial ingredien in he proof of Theorem 2.2 of Cesa-Bianchi and Lugosi (2006). The definiions may be exended o η = by leing η end o. We hen find ha w becomes a uniform disribuion on he se of expers {k L 1,k = L 1 } ha have incurred smalles cumulaive loss before ime. Tha is, Hedge wih η = reduces o Follow-he-Leader, where in case of ies he weighs are disribued uniformly. The limiing value for he mix loss is m = L L 1. In our approximaion of he Hedge loss h by he mix loss m, we call he approximaion error δ = h m he mixabiliy gap. Bounding his quaniy is a sandard par of he analysis of Hedge-ype algorihms (see, for example, Lemma 4 of Cesa-Bianchi e al. 2007) and i also appears o be a fundamenal noion in sequenial predicion even when only so-called mixable losses are considered (Grünwald, 2011, 2012); see also Secion 6.2. We le = δ δ denoe he cumulaive mixabiliy gap, so ha he regre for Hedge may be decomposed as R = H L = M L +. (5) Here M L may be hough of as he regre under he mix loss and is he cumulaive approximaion error when approximaing he Hedge loss by he mix loss. Throughou he paper, our proof sraegy will be o analyse hese wo conribuions o he regre, M L and, separaely. The following lemma, which is proved in Appendix A, collecs a few basic properies of he mix loss: Lemma 1 (Mix Loss wih Consan Learning Rae) For any learning rae η (0, ] 1. l m h l +, so ha 0 δ s. 1 η (w 2. Cumulaive mix loss elescopes: M = ln 1 e ηl) for η <, L for η =. 3. Cumulaive mix loss approximaes he loss of he bes exper: L M L + ln K η. 4. The cumulaive mix loss M is nonincreasing in η. In order o obain a bound for Hedge, one can use he following well-known bound on he mixabiliy gap, which is obained using Hoeffding s bound on he cumulan generaing funcion (Cesa-Bianchi and Lugosi, 2006, Lemma A.1): δ η 8 s2, (6) 1286
7 Follow he Leader If You Can, Hedge If You Mus from which S 2 T η/8, where (as in he inroducion) S = max{s 1,..., s } is he maximum loss range in he firs rounds. Togeher wih he bound M L ln(k)/η from mix loss propery #3 his leads o R = (M L ) + ln K η + ηs2 T 8. (7) The bound is opimized for η = 8 ln(k)/(s 2 T ), which equalizes he wo erms. This leads o a bound on he regre of S T ln(k)/2, maching he lower bound on wors-case regre from he exbook by Cesa-Bianchi and Lugosi (2006, Secion 3.7). We can use his uned learning rae if he ime horizon T is known in advance. To deal wih he siuaion where T is unknown, eiher he doubling rick or a ime-varying learning rae (see Lemma 2 below) can be used, a he cos of a worse consan facor in he leading erm of he regre bound. In he remainder of his secion, we inroduce a compleely parameerless algorihm called AdaHedge. We hen refine he seps of he analysis above o obain a beer regre bound. 2.2 AdaHedge Analysis In he previous secion, we spli he regre for Hedge ino wo pars: M L and, and we obained a bound for boh. The learning rae η was hen uned o equalise hese wo bounds. The main disincion beween AdaHedge and oher Hedge approaches is ha AdaHedge does no consider an upper bound on in order o obain his balance: insead i aims o equalize and ln(k)/η. As he cumulaive mixabiliy gap is nondecreasing in (by mix loss propery #1) and can be observed on-line, i is possible o adap he learning rae direcly based on. Perhaps he easies way o achieve his is by using he doubling rick: each subsequen block uses half he learning rae of he previous block, and a new block is sared as soon as he observed cumulaive mixabiliy gap exceeds he bound on he mix loss ln(k)/η, which ensures hese wo quaniies are equal a he end of each block. This is he approach aken in an earlier version of AdaHedge (Van Erven e al., 2011). However, we can achieve he same goal much more eleganly, by decreasing he learning rae wih ime according o η ah = ln K ah 1 (8) (where ah 0 = 0, so ha ηah 1 = ). Noe ha he AdaHedge learning rae does no involve he end ime T or any oher unobserved properies of he daa; all subsequen analysis is herefore valid for all T simulaneously. The definiions (3) and (4) of he weighs and he mix loss are modified o use his new learning rae: w,k ah = wah 1,k e ηah L 1,k w1 ah e ηah L 1 and m ah = 1 η ah ln(w ah e ηah l ), (9) wih w ah 1 = (1/K,..., 1/K) uniform. Noe ha he muliplicaive updae rule for he weighs no longer applies when he learning rae varies wih ; he las hree resuls of Lemma 1 are also no longer valid. Laer we will also consider oher algorihms o deermine 1287
8 De Rooij, Van Erven, Grünwald and Koolen Single round quaniies for rial : l Loss vecor l = min k l,k, l + = max k l,k Min and max loss s = l + l Loss range w alg h alg = e ηalg L 1 / k e ηalg L 1,k Weighs played = w alg l Hedge loss m alg = 1 η alg δ alg v alg = h alg ( ln m alg = Var k w alg w alg e ηalg l ) Mix loss Mixabiliy gap [l,k ] Loss variance Aggregae quaniies afer rounds: (The final ime T is omied from he subscrip where possible, e.g. L = L T ) L, L, L+, Halg, M alg, alg, V alg τ=1 of l τ, l τ, l + τ, h alg τ, m alg τ, δτ alg, vτ alg S = max{s 1,..., s } Maximum loss range L = min k L,k Cumulaive loss of he bes exper R alg = H alg L Regre Algorihms (he alg in he superscrip above): (η) Hedge wih fixed learning rae η ah AdaHedge, defined by (8) fl Follow-he-Leader (η fl = ) ff FlipFlop, defined by (16) Table 1: Noaion variable learning raes; o avoid confusion he considered algorihm is always specified in he superscrip in our noaion. See Table 1 for reference. From now on, AdaHedge will be defined as he Hedge algorihm wih learning rae defined by (8). For concreeness, a malab implemenaion appears in Figure 1. Our learning rae is similar o ha of Cesa-Bianchi e al. (2007), bu i is less pessimisic as i is based on he mixabiliy gap iself raher han is bound, and as such may exploi easy sequences of losses more aggressively. Moreover our uning of he learning rae simplifies he analysis, leading o igher resuls; he essenial new echnical ingrediens appear as Lemmas 3, 5 and 7 below. We analyse he regre for AdaHedge like we did for a fixed learning rae in he previous secion: we again consider M ah L and ah separaely. This ime, boh legs of he analysis become slighly more involved. Luckily, a good bound can sill be obained wih only a small amoun of work. Firs we show ha he mix loss is bounded by he mix loss we would have incurred if we would have used he final learning rae ηt ah all along: Lemma 2 Le dec be any sraegy for choosing he learning rae such ha η 1 η 2... Then he cumulaive mix loss for dec does no exceed he cumulaive mix loss for he sraegy ha uses he las learning rae η T from he sar: M dec M (η T ). 1288
9 Follow he Leader If You Can, Hedge If You Mus % Reurns he losses of AdaHedge. % l(,k) is he loss of exper k a ime funcion h = adahedge(l) [T, K] = size(l); h = nan(t,1); L = zeros(1,k); Dela = 0; end for = 1:T ea = log(k)/dela; [w, Mprev] = mix(ea, L); h() = w * l(,:) ; L = L + l(,:); [~, M] = mix(ea, L); dela = max(0, h()-(m-mprev)); % max clips numeric Jensen violaion Dela = Dela + dela; end % Reurns he poserior weighs and mix loss % for learning rae ea and cumulaive loss % vecor L, avoiding numerical insabiliy. funcion [w, M] = mix(ea, L) mn = min(l); if (ea == Inf) % Limi behaviour: FTL w = L==mn; else w = exp(-ea.* (L-mn)); end s = sum(w); w = w / s; M = mn - log(s/lengh(l))/ea; end Figure 1: Numerically robus malab implemenaion of AdaHedge This lemma was firs proved in is curren form by Kalnishkan and Vyugin (2005, Lemma 3), and an essenially equivalen bound was inroduced by Györfi and Oucsák (2007) in he proof of heir Lemma 1. Relaed echniques for dealing wih ime-varying learning raes go back o Auer e al. (2002). Proof Using mix loss propery #4, we have M dec T = T =1 which was o be shown. m dec = T =1 ( M (η) M (η) ) 1 T =1 ( M (η) M (η ) 1) 1 = M (η T ) T, We can now show ha he wo conribuions o he regre are sill balanced. Lemma 3 The AdaHedge regre is R ah = M ah L + ah 2 ah. Proof ah As δ ah 0 for all (by mix loss propery #1), he cumulaive mixabiliy gap is nondecreasing. Consequenly, he AdaHedge learning rae η ah as defined in (8) is nonincreasing in. Thus Lemma 2 applies o M ah ; ogeher wih mix loss propery #3 and (8) his yields M ah M (ηah T ) L + ln K η ah T = L + ah T 1 L + ah T. Subsiuion ino he rivial decomposiion R ah = M ah L + ah yields he resul. The remaining ask is o esablish a bound on ah. As before, we sar wih a bound on he mixabiliy gap in a single round, bu raher han (6), we use Bernsein s bound on he mixabiliy gap in a single round o obain a resul ha is expressed in erms of he variance of he losses, v ah = Var k w ah [l,k ] = k wah,k (l,k h ah )
10 De Rooij, Van Erven, Grünwald and Koolen Lemma 4 (Bernsein s Bound) Le η = η alg (0, ) denoe he finie learning rae chosen for round by any algorihm alg. The mixabiliy gap δ alg saisfies Furher, v alg δ alg g(s η ) s v alg (l + halg )(h alg l ) s2 /4., where g(x) = ex x 1. (10) x Proof This is Bernsein s bound (Cesa-Bianchi and Lugosi, 2006, Lemma A.5) on he cumulan generaing funcion, applied o he random variable (l,k l )/s [0, 1] wih k disribued according o w alg. Bernsein s bound is more sophisicaed han Hoeffding s bound (6), because i expresses ha he mixabiliy gap δ is small no only when η is small, bu also when all expers have approximaely he same loss, or when he weighs w are concenraed on a single exper. The nex sep is o use Bernsein s inequaliy o obain a bound on he cumulaive mixabiliy gap ah. In he analysis of Cesa-Bianchi e al. (2007) his is achieved by firs applying Bernsein s bound for each individual round, and hen using a elescoping argumen o obain a bound on he sum. Wih our learning rae (8) i is convenien o reverse hese seps: we firs elescope, which can now be done wih equaliy, and subsequenly apply Bernsein s inequaliy in a sricer way. Lemma 5 AdaHedge s cumulaive mixabiliy gap saisfies ( ah ) 2 V ah ln K + ( 2 3 ln K + 1)S ah. Proof In his proof we will omi he superscrip ah. Using he definiion of he learning rae (8) and δ s (from mix loss propery #1), we ge 2 = T =1 = ( ) = ( ) ln K 2δ + δ 2 η ( ) ( 1 + δ ) = ( ) ln K 2δ + s δ 2 ln K η ( ) 2δ 1 + δ 2 δ η + S. (11) The inequaliies in his equaion replace a δ erm by S, which is of no concern: he resuling erm S adds a mos 2S o he regre bound. We will now show δ η 1 2 v s δ. (12) This supersedes he bound δ /η (e 2)v for η s 1 used by Cesa-Bianchi e al. (2007). Even hough a firs sigh circular, he form (12) has wo major advanages. Firs, inclusion of he overhead 1 3 s δ will only affec smaller order erms of he regre, bu admis a reducion of he leading consan o he opimal facor 1 2. This gain direcly percolaes o our regre bounds below. Second, (12) holds for unbounded η, which simplifies uning considerably. 1290
11 Follow he Leader If You Can, Hedge If You Mus Firs noe ha (12) is clearly valid if η =. Assuming ha η is finie, we can obain his resul by rewriing Bernsein s bound (10) as follows: 1 2 v s δ 2g(s η ) = δ s f(s η )δ, where f(x) = ex 1 2 x2 x 1 η xe x x 2 x. Remains o show ha f(x) 1/3 for all x 0. Afer rearranging, we find his o be he case if (3 x)e x 1 2 x2 + 2x + 3. Taylor expansion of he lef-hand side around zero reveals ha (3 x)e x = 1 2 x2 + 2x x3 ue u for some 0 u x, from which he resul follows. The proof is compleed by plugging (12) ino (11) and finally relaxing s S. Combinaion of hese resuls yields he following naural regre bound, analogous o Theorem 5 of Cesa-Bianchi e al. (2007). Theorem 6 AdaHedge s regre is bounded by Proof Lemma 5 is of he form R ah 2 V ah ln K + S( 4 3 ln K + 2). wih a and b nonnegaive numbers. Solving for ah hen gives which by Lemma 3 implies ha ( ah ) 2 a + b ah, (13) ah 1 2 b b 2 + 4a 1 2 b ( b 2 + 4a) = a + b, R ah 2 a + 2b. Plugging in he values a = V ah ln K and b = S( 2 3 ln K + 1) from Lemma 5 complees he proof. This firs regre bound for AdaHedge is difficul o inerpre, because he cumulaive loss variance V ah depends on he acions of he AdaHedge sraegy iself (hrough he weighs w ah ). Below, we will derive a regre bound for AdaHedge ha depends only on he daa. However, AdaHedge has one imporan propery ha is capured by his firs resul ha is no longer expressed by he wors-case bound we will derive below. Namely, if he daa are easy in he sense ha here is a clear bes exper, say k, hen he weighs played 1 as increases, hen he loss variance mus decrease: v ah 0. Thus, Theorem 6 suggess ha he AdaHedge regre may be bounded if he weighs concenrae on he bes exper sufficienly quickly. This indeed urns ou o be he case: we can prove ha he regre is bounded for he sochasic seing where he loss vecors l are independen, and E[L,k L,k ] = Ω( β ) for all k k and any β > 1/2. This is an imporan feaure of AdaHedge when i is used as a sand-alone algorihm, and Van Erven e al. (2011) provide a proof for he previous version of he by AdaHedge will concenrae on ha exper. If w ah,k 1291
12 De Rooij, Van Erven, Grünwald and Koolen sraegy. See Secion 5.4 for an example of concenraion of he AdaHedge weighs. Here we will no pursue his furher, because he Follow-he-Leader sraegy also incurs bounded loss in ha case; we raher focus aenion on how o successfully compee wih FTL in Secion 3. We now proceed o derive a bound ha depends only on he daa, using an approach similar o he one aken by Cesa-Bianchi e al. (2007). We firs bound he cumulaive loss variance as follows: Lemma 7 Assume L H. The cumulaive loss variance for AdaHedge saisfies V ah S (L+ L )(L L ) L + L + 2S. In he degenerae case L = L + he fracion reads 0/0, bu since we hen have V ah = 0, from here on we define he raio o be zero in ha case, which is also is limiing value. Proof We omi all ah superscrips. By Lemma 4 we have v (l + h )(h l ). Now T V = v (l + h )(h l ) S =1 1 T = ST (l + h )(h l ) s (l + h )(h l ) (l + h ) + (h l ) S (L+ H)(H L ), (14) L + L where he las inequaliy is an insance of Jensen s inequaliy applied o he funcion B defined on he domain x, y 0 by B(x, y) = xy x+y for xy > 0 and B(x, y) = 0 for xy = 0 o ensure coninuiy. To verify ha B is joinly concave, we will show ha he Hessian is negaive semi-definie on he inerior xy > 0. Concaviy on he whole domain hen follows from coninuiy. The Hessian, which urns ou o be he rank one marix 2 2 B(x, y) = (x + y) 3 ( ) ( ) y y, x x is negaive semi-definie since i is a negaive scaling of a posiive ouer produc. Subsequenly using H L (by assumpion) and H L + 2 (by Lemma 3) yields as desired. (L + H)(H L ) (L+ L )(L + 2 L ) (L+ L )(L L ) + 2 L + L L + L L + L This can be combined wih Lemmas 5 and 3 o obain our firs main resul: Theorem 8 (AdaHedge Wors-Case Regre Bound) AdaHedge s regre is bounded by R ah 2 S (L+ L )(L L ) L + L ln K + S( 16 3 ln K + 2). (15) 1292
13 Follow he Leader If You Can, Hedge If You Mus Proof If H ah < L, hen R ah < 0 and he resul is clearly valid. Bu if H ah L, we can bound V ah using Lemma 7 and plug he resul ino Lemma 5 o ge an inequaliy of he form (13) wih a = S(L + L )(L L )/(L + L ) and b = S( 8 3 ln K + 1). Following he seps of he proof of Theorem 6 wih hese modified values for a and b we arrive a he desired resul. This bound has several useful properies: 1. I is always smaller han he CBMS bound (1), wih a leading consan ha has been reduced from he previously bes-known value of 2.63 o 2. To see his, noe ha (15) increases o (1) if we replace L + by he upper bound L + ST. I can be subsanially sronger han (1) if he range of he losses s is highly variable. 2. The bound is fundamenal, a concep discussed in deail by Cesa-Bianchi e al. (2007): i is invarian o ranslaions of he losses and proporional o heir scale. I is herefore valid for arbirary loss ranges, regardless of sign. In fac, no jus he bound, bu AdaHedge iself is fundamenal in his sense: see Secion 4 for a discussion and proof. 3. The regre is small when he bes exper eiher has a very low loss, or a very high loss. The laer is imporan if he algorihm is o be used for a scenario in which we are provided wih a sequence of gain vecors g raher han losses: we can ransform hese gains ino losses using l = g, and hen run AdaHedge. The bound hen implies ha we incur small regre if he bes exper has very small cumulaive gain relaive o he minimum gain. 4. The bound is no dependen on he number of rials bu only on he losses; i is a imeless bound as discussed below. 2.3 Wha are Timeless Bounds? All bounds presened for AdaHedge (and FlipFlop) are imeless. We call a regre bound imeless if i does no change under inserion of addiional rials where all expers are assigned he same loss. Inuiively, he predicion ask does no become more difficul if naure should inser same-loss rials. Since hese rials do nohing o differeniae beween he expers, hey can safely be ignored by he learner wihou affecing her regre; in fac, many Hedge sraegies, including Hedge wih a fixed learning rae, FTL, AdaHedge and CBMS already have he propery ha heir fuure behaviour does no change under such inserions: hey are robus agains such ime dilaion. If any sraegy does no have his propery by iself, i can easily be modified o ignore equal-loss rials. I is easy o imagine pracical scenarios where his robusness propery would be imporan. For example, suppose you hire a number of expers who coninually monior he asses in your porfolio. Usually hey do no recommend any changes, bu occasionally, when hey see a rare opporuniy or receive suble warning signs, hey may urge you o rade, resuling in a poenially very large gain or loss. I seems only beneficial o poll he expers ofen, and here is no reason why he many resuling equal-loss rials should complicae he learning ask. 1293
14 De Rooij, Van Erven, Grünwald and Koolen The oldes bounds for Hedge scale wih T or L, and are hus no imeless. From he resuls above we can obain fundamenal and imeless varians wih, for parameerless algorihms, he bes known leading consans (he firs iem below follows Corollary 1 of Cesa-Bianchi e al. 2007): Corollary 9 The AdaHedge regre saisfies he following inequaliies: R ah T=1 s 2 ln K + S( 4 3 ln K + 2) (analogue of radiional T -based bounds), R ah 2 S(L L ) ln K + S( 16 3 ln K + 2) (analogue of radiional L -based bounds), R ah 2 S(L + L ) ln K + S( 16 3 ln K + 2) (symmeric bound, useful for gains). Proof We could ge a bound ha depends only on he loss ranges s by subsiuing he wors case L = (L + + L )/2 ino Theorem 8, bu a sharper resul is obained by plugging he inequaliy v s 2 /4 from Lemma 4 direcly ino Theorem 6. This yields he firs iem above. The oher wo inequaliies follow easily from Theorem 8. In he nex secion, we show how we can compee wih FTL while a he same ime mainaining all hese wors-case guaranees up o a consan facor. 3. FlipFlop AdaHedge balances he cumulaive mixabiliy gap ah and he mix loss regre M ah L by reducing η ah as necessary. Bu, as we observed previously, if he daa are no hopelessly adversarial we migh no need o worry abou he mixabiliy gap: as Lemma 4 expresses, δ ah is also small if he variance v ah of he loss under he weighs w,k ah is small, which is he case if he weigh on he bes exper max k w,k ah becomes close o one. AdaHedge is able o exploi such a lucky scenario o an exen: as explained in he discussion ha follows Theorem 6, if he weigh of he bes exper goes o one quickly, AdaHedge will have a small cumulaive mixabiliy gap, and herefore, by Lemma 3, a small regre. This happens, for example, in he sochasic seing wih independen, idenically disribued losses, when a single exper has he smalles expeced loss. Similarly, in he experimen of Secion 5.4, he AdaHedge weighs concenrae sufficienly quickly for he regre o be bounded. There is he poenial for a nasy feedback loop, however. Suppose here are a small number of difficul early rials, during which he cumulaive mixabiliy gap increases relaively quickly. AdaHedge responds by reducing he learning rae (8), wih he effec ha he weighs on he expers become more uniform. As a consequence, he mixabiliy gap in fuure rials may be larger han wha i would have been if he learning rae had sayed high, leading o furher unnecessary reducions of he learning rae, and so on. The end resul may be ha AdaHedge behaves as if he daa are difficul and incurs subsanial regre, even in cases where he regre of Hedge wih a fixed high learning rae, or of Followhe-Leader, is bounded! Precisely his phenomenon occurs in he experimen in Secion 5.2 below: AdaHedge s regre is close o he wors-case bound, whereas FTL hardly incurs any regre a all. 1294
15 Follow he Leader If You Can, Hedge If You Mus I appears, hen, ha we mus eiher hope ha he daa are easy enough ha we can make he weighs concenrae quickly on a single exper, by no reducing he learning rae a all; or we fear he wors and reduce he learning rae as much as we need o be able o provide good guaranees. We canno really inerpolae beween hese wo exremes: an inermediae learning rae may no yield small regre in favourable cases and may a he same ime desroy any performance guaranees in he wors case. I is unclear a priori wheher we can ge away wih keeping he learning rae high, or ha i is wiser o play i safe using AdaHedge. The mos exreme case of keeping he learning rae high, is he limi as η ends o, for which Hedge reduces o Follow-he-Leader. In his secion we work ou a sraegy ha combines he advanages of FTL and AdaHedge: i reains AdaHedge s wors-case guaranees up o a consan facor, bu is regre is also bounded by a consan imes he regre of FTL (Theorem 15). Perhaps surprisingly, his is no easy o achieve. To see why, imagine a scenario where he average loss of he bes exper is subsanial, whereas he regre of eiher Follow-he-Leader or AdaHedge, is small. Since our combinaion has o guaranee a similarly small regre, i has only a very limied margin for error. We canno, for example, simply combine he wo algorihms by recursively plugging hem ino Hedge wih a fixed learning rae, or ino AdaHedge: he performance guaranees we have for hose mehods of combinaion are oo weak. Even if boh FTL and AdaHedge yield small regre on he original problem, choosing he acions of FTL for some rounds and hose of AdaHedge for he oher rounds may fail if we do i naively, because he regre is no necessarily increasing, and we may end up picking each algorihm precisely in hose rounds where he oher one is beer. Luckily, alernaing beween he opimisic FTL sraegy and he wors-case-proof Ada- Hedge does urn ou o be possible if we do i in a careful way. In his secion we explain he appropriae sraegy, called FlipFlop (superscrip: ff ), and show ha i combines he desirable properies of boh FTL and AdaHedge. 3.1 Exploiing Easy Daa by Following he Leader We firs invesigae he poenial benefis of FTL over AdaHedge. Lemma 10 below idenifies he circumsances under which FTL will perform well, which is when he number of leader changes is small. I also shows ha he regre for FTL is equal o is cumulaive mixabiliy gap when FTL is inerpreed as a Hedge sraegy wih infinie learning rae. Lemma 10 Le c be an indicaor for a leader change a ime : define c = 1 if here exiss an exper k such ha L 1,k = L 1 while L,k L, and c = 0 oherwise. Le C = c c be he cumulaive number of leader changes. Then he FTL regre saisfies R fl = ( ) S C. Proof We have M ( ) = L by mix loss propery #3, and consequenly R fl = ( ) + M ( ) L = ( ). To bound ( ), noice ha, for any such ha c = 0, all leaders remained leaders and incurred idenical loss. I follows ha m ( ) = L L 1 = h( ) 1295 and hence δ ( ) = 0. By
16 De Rooij, Van Erven, Grünwald and Koolen bounding δ ( ) as required. S for all oher we obain ( ) = T =1 δ ( ) = : c =1 δ ( ) : c =1 S = S C, We see ha he regre for FTL is bounded by he number of leader changes. This quaniy is boh fundamenal and imeless. I is a naural measure of he difficuly of he problem, because i remains small whenever a single exper makes he bes predicions on average, even in he scenario described above, in which AdaHedge ges caugh in a feedback loop. One example where FTL ouperforms AdaHedge is when he losses for wo expers are (1, 0) on he firs round, and keep alernaing according o (1, 0), (0, 1), (1, 0),... for he remainder of he rounds. Then he FTL regre is only 1/2, whereas AdaHedge s performance is close o he wors-case bound (because is weighs w ah converge o (1/2, 1/2), for which he bound (6) on he mixabiliy gap is igh). This scenario is illusraed furher in he experimens, Secion FlipFlop FlipFlop is a Hedge sraegy in he sense ha i uses exponenial weighs defined by (9), bu he learning rae η ff now alernaes beween infiniy, such ha he algorihm behaves like FTL, and he AdaHedge value, which decreases as a funcion of he mixabiliy gap accumulaed over he rounds where AdaHedge is used. In Definiion 11 below, we will specify he flip regime R, which is he subse of imes {1,..., } where we follow he leader by using an infinie learning rae, and he flop regime R = {1,..., } \ R, which is he se of imes where he learning rae is deermined by AdaHedge (mnemonic: he posiion of he bar refers o he value of he learning rae). We accumulae he mixabiliy gap, he mix loss and he variance for hese wo regimes separaely: = δ ff τ ; M = m ff τ ; (flip) τ R τ R = δ ff τ ; M = m ff τ ; V = vτ ff. (flop) τ R τ R τ R We also change he learning rae from is definiion for AdaHedge in (8) o he following, which differeniaes beween he wo regimes of he sraegy: η ff = { η flip if R, η flop if R, where η flip = η fl = and η flop = ln K. (16) 1 Like for AdaHedge, η flop = as long as 1 = 0, which now happens for all such ha R 1 =. Noe ha while he learning raes are defined separaely for he wo regimes, he exponenial weighs (9) of he expers are sill always deermined using he cumulaive losses L,k over all rounds. We also poin ou ha, for rounds R, he learning rae η ff = η flop is no equal o η ah, because i uses 1 insead of ah 1. For his reason, he 1296
17 Follow he Leader If You Can, Hedge If You Mus % Reurns he losses of FlipFlop % l(,k) is he loss of exper k a ime ; phi > 1 and alpha > 0 are parameers funcion h = flipflop(l, alpha, phi) [T, K] = size(l); h = nan(t,1); L = zeros(1,k); Dela = [0 0]; scale = [phi/alpha alpha]; regime = 1; % 1=FTL, 2=AH end for = 1:T if regime==1, ea = Inf; else ea = log(k)/dela(2); end [w, Mprev] = mix(ea, L); h() = w * l(,:) ; L = L + l(,:); [~, M] = mix(ea, L); dela = max(0, h()-(m-mprev)); Dela(regime) = Dela(regime) + dela; if Dela(regime) > scale(regime) * Dela(3-regime) regime = 3-regime; end end Figure 2: FlipFlop, wih new ingrediens in boldface FlipFlop regre may be eiher beer or worse han he AdaHedge regre; our resuls below only preserve he regre bound up o a consan facor. In conras, we do compee wih he acual regre of FTL. I remains o define he flip regime R and he flop regime R, which we will do by specifying he imes a which o swich from one o he oher. FlipFlop sars opimisically, wih an epoch of he flip regime, which means i follows he leader, unil becomes oo large compared o. A ha poin i swiches o an epoch of he flop regime, and keeps using η flop unil becomes oo large compared o. Then he process repeas wih he nex epochs of he flip and flop regimes. The regimes are deermined as follows: Definiion 11 (FlipFlop s Regimes) Le ϕ > 1 and α > 0 be parameers of he algorihm (uned below in Corollary 16). Then FlipFlop sars in he flip regime. If is he earlies ime since he sar of a flip epoch where > (ϕ/α), hen he ransiion o he subsequen flop epoch occurs beween rounds and + 1. (Recall ha during flip epochs increases in whereas is consan.) Vice versa, if is he earlies ime since he sar of a flop epoch where > α, hen he ransiion o he subsequen flip epoch occurs beween rounds and + 1. This complees he definiion of he FlipFlop sraegy. See Figure 2 for a malab implemenaion. The analysis proceeds much like he analysis for AdaHedge. We firs show ha, analogously o Lemma 3, he FlipFlop regre can be bounded in erms of he cumulaive mixabiliy gap; in fac, we can use he smalles cumulaive mixabiliy gap ha we encounered 1297
18 De Rooij, Van Erven, Grünwald and Koolen in eiher of he wo regimes, a he cos of slighly increased consan facors. This is he fundamenal building block in our FlipFlop analysis. We hen proceed o develop analogues of Lemmas 5 and 7, whose proofs do no have o be changed much o apply o FlipFlop. Finally, all hese resuls are combined o bound he regre of FlipFlop in Theorem 15, which, afer Theorem 8, is he second main resul of his paper. Lemma 12 (FlipFlop version of Lemma 3) The following wo bounds hold simulaneously for he regre of he FlipFlop sraegy wih parameers ϕ > 1 and α > 0: ( ) ( ) ϕα ϕ R ff ϕ 1 + 2α S ϕ ; (17) ( ϕ R ff ϕ 1 + ϕ ) α S. (18) Proof The regre can be decomposed as R ff = H ff L = + + M + M L. (19) Our firs sep will be o bound he mix loss M + M in erms of he mix loss M flop of he auxiliary sraegy ha uses η flop for all. As η flop is nonincreasing, we can hen apply Lemma 2 and mix loss propery #3 o furher bound M flop M (ηflop T ) L + ln K η flop = L + T 1 L +. (20) Le 0 = u 1 < u 2 <... < u b < T denoe he imes jus before he epochs of he flip regime begin, i.e. round u i + 1 is he firs round in he i-h flip epoch. Similarly le 0 < v 1 <... < v b T denoe he imes jus before he epochs of he flop regime begin, where we arificially define v b = T if he algorihm is in he flip regime afer T rounds. These definiions ensure ha we always have u b < v b T. For he mix loss in he flop regime we have M = (M flop u 2 Mv flop 1 ) + (Mu flop 3 Mv flop 2 ) (Mu flop b Mv flop b 1 ) + (M flop Mv flop b ). (21) Le us emporarily wrie η = η flop o avoid double superscrips. For he flip regime, he properies in Lemma 1, ogeher wih he observaion ha η flop does no change during he flip regime, give M = = b i=1 b ( ) M v ( ) i M u ( ) i = ( M (ηv i ) v i M (ηv i ) u i i=1 ( ) Mv flop 1 Mu flop 1 + b i=1 ( M ( ) v i L u i ) + ln K ) b = η vi i=1 ( ) Mv flop 2 Mu flop b i=1 ( M (ηv i ) v i L u i ) ( Mv flop i Mu flop i + ln K η ui +1 ) ( ) Mv flop b Mu flop b + b ui. (22) i=1 From he definiion of he regime changes (Definiion 11), we know he value of ui very accuraely a he ime u i of a change from a flop o a flip regime: ui > α ui = α vi 1 > ϕ vi 1 = ϕ ui
19 Follow he Leader If You Can, Hedge If You Mus By unrolling from low o high i, we see ha b b ui ϕ 1 i ub ϕ 1 i ub = i=1 i=1 i=1 ϕ ϕ 1 u b. Adding up (21) and (22), we herefore find ha he oal mix loss is bounded by b M + M M flop + ui M flop + ϕ ( ) ϕ ϕ 1 u b L + ϕ 1 + 1, i=1 where he las inequaliy uses (20). Combinaion wih (19) yields R ff ( ϕ ϕ ) +. (23) Our nex goal is o relae and : by consrucion of he regimes, hey are always wihin a consan facor of each oher. Firs, suppose ha afer T rials we are in he bh epoch of he flip regime, ha is, we will behave like FTL in round T + 1. In his sae, we know from Definiion 11 ha is suck a he value ub ha promped he sar of he curren epoch. As he regime change happened afer u b, we have ub S α ub, so ha S α. A he same ime, we know ha is no large enough o rigger he nex regime change. From his we can deduce he following bounds: 1 α ( S) ϕ α. On he oher hand, if afer T rounds we are in he bh epoch of he flop regime, hen a similar reasoning yields In boh cases, i follows ha α ( S) α. ϕ < α + S; < ϕ α + S. The wo bounds of he lemma are obained by plugging firs one, hen he oher of hese bounds ino (23). The flop cumulaive mixabiliy gap is relaed, as before, o he variance of he losses. Lemma 13 (FlipFlop version of Lemma 5) The cumulaive mixabiliy gap for he flop regime is bounded by he cumulaive variance of he losses for he flop regime: 2 V ln K + ( 2 3 ln K + 1)S. (24) 1299
20 De Rooij, Van Erven, Grünwald and Koolen Proof The proof is analogous o he proof of Lemma 5, wih insead of ah, V insead of V ah, and using η = η flop = ln(k)/ 1 insead of η = η ah = ln(k)/ ah 1. Furhermore, we only need o sum over he rounds R in he flop regime, because does no change during he flip regime. As i is sraigh-forward o prove an analogue of Theorem 6 for FlipFlop by solving he quadraic inequaliy in (24), we proceed direcly owards esablishing an analogue of Theorem 8. The following lemma provides he equivalen of Lemma 7 for FlipFlop. I can probably be srenghened o improve he lower order erms; we provide he version ha is easies o prove. Lemma 14 (FlipFlop version of Lemma 7) Suppose H ff L. variance for FlipFlop wih parameers ϕ > 1 and α > 0 saisfies V S (L+ L )(L ( L ) ϕ + L + L ϕ 1 + ϕ ) α + 2 S + S 2. Proof The sum of variances saisfies V = R v ff T =1 v ff S (L+ H ff )(H ff L ) L + L, The cumulaive loss where he firs inequaliy simply includes he variances for FTL rounds (which are ofen all zero), and he second follows from he same reasoning as employed in (14). Subsequenly using L H ff (by assumpion) and, from Lemma 12, H ff L + γ, where γ denoes he righ-hand side of he bound (18), we find which was o be shown. V S (L+ L )(L + γ L ) S (L+ L )(L L ) + Sγ, L + L L + L Combining Lemmas 12, 13 and 14, we obain our second main resul: Theorem 15 (FlipFlop Regre Bound) The regre for FlipFlop wih doubling parameers ϕ > 1 and α > 0 simulaneously saisfies he wo bounds R ff where c 1 = R ff c 1 ( ϕα ϕ 1 + 2α + 1 ) R fl + S S (L+ L )(L L ) L + L ϕ ϕ 1 + ϕ α + 2. ( ϕ ϕ ), ( ln K + c 1 S (c ) ln K + ) ln K S, This shows ha, up o a muliplicaive facor in he regre, FlipFlop is always as good as he bes of Follow-he-Leader and AdaHedge s bound from Theorem 8. Of course, if 1300
21 Follow he Leader If You Can, Hedge If You Mus AdaHedge significanly ouperforms is bound, i is no guaraneed ha FlipFlop will ouperform he bound in he same way. In he experimens in Secion 5 we demonsrae ha he muliplicaive facor is no jus an arifac of he analysis, bu can acually be observed on simulaed daa. Proof From Lemma 10, we know ha ( ) = R fl. Subsiuion in (17) of Lemma 12 yields he firs inequaliy. For he second inequaliy, noe ha L > H ff means he regre is negaive, in which case he resul is clearly valid. We may herefore assume w.l.o.g. ha L H ff and apply Lemma 14. Combinaion wih Lemma 13 yields 2 V ln K + ( 2 3 ln K + 1)S S (L+ L )(L L ) L + L ln K + S 2 ln K + c 2 S, where c 2 = (c ) ln K + 1. We now solve his quadraic inequaliy as in (13) and relax i using a + b a + b for nonnegaive numbers a, b o obain S (L+ L )(L L ) ln K + S L + L 2 ln K + c 2 S S (L+ L )(L L ) ( ) ln K + S ln K + c2. L + L In combinaion wih Lemma 12, his yields he second bound of he heorem. Finally, we propose o selec he parameer values ha minimize he consan facor in fron of he leading erms of hese regre bounds. Corollary 16 The parameer values ϕ = 2.37 and α = approximaely minimize he wors of he wo leading facors in he bounds of Theorem 15. The regre for FlipFlop wih hese parameers is simulaneously bounded by R ff 5.64R fl S, R ff 5.64 S (L+ L )(L L ) L + L Proof The leading facors f(ϕ, α) = ϕα ϕ 1 ( ln K + S ln K ) ln K α + 1 and g(ϕ, α) = ϕ ϕ 1 + ϕ α + 2 are respecively increasing and decreasing in α. They are equalized for α(ϕ) = ( 2ϕ ϕ 3 16ϕ 2 + 4ϕ + 1 ) /(6ϕ 4). The analyic soluion for he minimum of f(ϕ, α(ϕ)) in ϕ is oo long o reproduce here, bu i is approximaely equal o ϕ = 2.37, a which poin α(ϕ ) Invariance o Rescaling and Translaion A common simplifying assumpion made in he lieraure is ha he losses l,k are ranslaed and normalised o ake values in he inerval [0, 1]. However, doing so requires a priori 1301
22 De Rooij, Van Erven, Grünwald and Koolen knowledge of he range of he losses. One would herefore prefer algorihms ha do no require he losses o be normalised. As discussed by Cesa-Bianchi e al. (2007), he regre bounds for such algorihms should no change when losses are ranslaed (because his does no change he regre) and should scale by σ when he losses are scaled by a facor σ > 0 (because he regre scales by σ). They call such regre bounds fundamenal and show ha mos of he mehods hey inroduce saisfy such fundamenal bounds. Here we go even furher: i is no jus our bounds ha are fundamenal, bu also our algorihms, which do no change heir oupu weighs if he losses are scaled or ranslaed. Theorem 17 Boh AdaHedge and FlipFlop are invarian o ranslaion and rescaling of he losses. Saring wih losses l 1,..., l T, obain rescaled, ranslaed losses l 1,..., l T by picking any σ > 0 and arbirary reals τ 1,..., τ T, and seing l,k = σl,k +τ for = 1,..., T and k = 1,..., K. Boh AdaHedge and FlipFlop issue he exac same sequence of weighs w = w on l as hey do on l. Proof We annoae any quaniy wih a prime o denoe ha i is defined wih respec o he losses l. We omi he algorihm name from he superscrip. Firs consider AdaHedge. We will prove he following relaions by inducion on : 1 = σ 1 ; η = η σ ; w = w. (25) For = 1, hese are valid since 0 = σ 0 = 0, η 1 = η 1/σ =, and w 1 = w 1 are uniform. Now assume owards inducion ha (25) is valid for some {1,..., T }. We can hen compue he following values from heir definiion: h = w l = σh + τ ; m = (1/η ) ln(w e η l ) = σm + τ ; δ = h m = σ(h m ) = σδ. Thus, he mixabiliy gaps are also relaed by he scale facor σ. From here we can re-esablish he inducion hypohesis for he nex round: we have = 1 + δ = σ 1 + σδ = σ, and η +1 = ln(k)/ = η +1 /σ. For he weighs we ge w +1 e η +1 L = e (η +1 /σ)(σl ) w +1, which means he wo mus be equal since boh sum o one. Thus he relaions of (25) are also valid for ime + 1, proving he resul for AdaHedge. For FlipFlop, if we assume regime changes occur a he same imes for l and l, hen similar reasoning reveals = σ ; = σ, η flip = η flip /σ =, η flop = η flop /σ, and w = w. Remains o check ha he regime changes do indeed occur a he same imes. Noe ha in Definiion 11, he flop regime is sared when > (ϕ/α), which is equivalen o esing > (ϕ/α) since boh sides of he inequaliy are scaled by σ. Similarly, he flip regime sars when > α, which is equivalen o he es > α. 5. Experimens We performed four experimens on arificial daa, designed o clarify how he learning rae deermines performance in a variey of Hedge algorihms. These experimens are designed o illusrae as clearly as possible he inricacies involved in he cenral quesion of his paper: wheher o use a high learning rae (by following he leader) or o play i safe by using a smaller learning rae insead. Raher han mimic real-world daa, on which high learning raes ofen seem o work well (Devaine e al., 2013), we vary he main facor ha 1302
23 Follow he Leader If You Can, Hedge If You Mus appears o drive he bes choice of learning rae: he difference in cumulaive loss beween he expers. We have kep he experimens as simple as possible: he daa are deerminisic, and involve wo expers. In each case, he daa consis of one iniial hand-crafed loss vecor l 1, followed by a sequence of loss vecors l 2,..., l T, which are eiher (0, 1) or (1, 0). For each experimen ξ {1, 2, 3, 4}, we wan he cumulaive loss difference L,1 L,2 beween he expers o follow a arge f ξ (), which will be a coninuous, nondecreasing funcion of. As he losses are binary, we canno make L,1 L,2 exacly equal o he arge f ξ (), bu afer he iniial loss l 1, we choose every subsequen loss vecor such ha i brings L,1 L,2 as close as possible o f ξ (). All funcions f ξ change slowly enough ha L,1 L,2 f ξ () 1 for all. For each experimen, we le he number of rials be T = 1000, and we firs plo he regre R (η) of he Hedge algorihm as a funcion of he fixed learning rae η. We subsequenly plo he regre R alg as a funcion of = 1,..., T, for each of he following algorihms alg : 1. Follow-he-Leader (Hedge wih learning rae ) 2. Hedge wih fixed learning rae η = 1 3. Hedge wih he learning rae ha opimizes he wors-case bound (7), which equals η = 8 ln(k)/(s 2 T ) ; we will call his algorihm safe Hedge for breviy. 4. AdaHedge 5. FlipFlop, wih parameers ϕ = 2.37 and α = as in Corollary Variaion MW by Hazan and Kale (2008), using he fixed learning rae ha opimises he bound provided in heir Theorem 4 7. NormalHedge, described by Chaudhuri e al. (2009) Noe ha he safe Hedge sraegy (he hird iem above) can only be used in pracice if he horizon T is known in advance. Variaion MW (he sixh iem) addiionally requires precogniion of he empirical variance of he sequence of losses of he bes exper up unil T (ha is, VAR max T as defined in Secion 1.2), which is no available in pracice, bu which we are supplying anyway. We include algorihms 6 and 7 because, as explained in Secion 1.2, hey are he sae of he ar in Hedge-syle algorihms. Like AdaHedge, Variaion MW is a refinemen of he CBMS sraegy described by Cesa-Bianchi e al. (2007). They modify he definiion of he weighs in he Hedge algorihm o include second-order erms; he resuling bound is never more han a consan facor worse han he bounds (1) for CBMS and (15) for AdaHedge, bu for some easy daa i can be subsanially beer. For his reason i is a naural performance arge for AdaHedge. The bounds for CBMS and AdaHedge are incomparible wih he bound for NormalHedge, being beer for some, worse for oher daa. The reason we include i in he experimens is because, compared o he oher mehods, is performance in pracice urns ou o be excellen. We do no know wheher here are daa sequences on which FlipFlop significanly ouperforms NormalHedge, nor wheher here is a heoreical reason for his good performance, as he NormalHedge bound (Chaudhuri e al., 2009) is no igh for our experimens. 1303
24 De Rooij, Van Erven, Grünwald and Koolen To reduce cluer, we omi resuls for CBMS; is behaviour is very similar o ha of AdaHedge. Below we provide an exac descripion of each experimen, and discuss he resuls. 5.1 Experimen 1. Wors Case for FTL The experimen is defined by l 1 = ( 1 2 0), and f 1() = 0. This yields he following losses: ( ) 1/2, 0 ( ) 0, 1 ( ) 1, 0 ( ) 0, 1 ( ) 1,... 0 These daa are he wors case for FTL: each round, he leader incurs loss one, while each of he wo individual expers only receives a loss once every wo rounds. Thus, he FTL regre increases by one every wo rounds and ends up around 500. For any learning rae η, he weighs used by he Hedge algorihm are repeaed every wo rounds, so he regre H L increases by he same amoun every wo rounds: he regre increases linearly in for every fixed η ha does no vary wih. However, he consan of proporionaliy can be reduced grealy by reducing he value of η, as he op graph in Figure 3 shows: for T = 1000, he regre becomes negligible for any η less han abou Thus, in his experimen, a learning algorihm mus reduce he learning rae o shield iself from incurring an excessive overhead. The boom graph in Figure 3 shows he expeced breakdown of he FTL algorihm; Hedge wih fixed learning rae η = 1 also performs quie badly. When η is reduced o he value ha opimises he wors-case bound, he regre becomes compeiive wih ha of he oher algorihms. Noe ha Variaion MW has he bes performance; his is because is learning rae is uned in relaion o he bound proved in he paper, which has a relaively large consan in fron of he leading erm. As a consequence he algorihm always uses a relaively small learning rae, which urns ou o be helpful in his case bu harmful in laer experimens. FlipFlop behaves as heory suggess i should: is regre increases alernaely like he regre of AdaHedge and he regre of FTL. The laer performs horribly, so during hose inervals he regre increases quickly, on he oher hand he FTL inervals are relaively shor-lived so in he end hey do no harm he regre by more han a consan facor. The NormalHedge algorihm sill has accepable performance, alhough is regre is relaively large in his experimen; we have no explanaion for his bu in fairness we do observe good performance of NormalHedge in he oher hree experimens as well as in numerous furher unrepored simulaions. 5.2 Experimen 2. Bes Case for FTL The second experimen is defined by l 1 = (1, 0) and f 2 () = 3/2. This leads o he sequence of losses ( ) ( ) ( ) ( ) ( ) ,,,,, in which he loss vecors are alernaing for 2. These daa look very similar o he firs experimen, bu as he op graph in Figure 4 illusraes, because of he small changes a 1304
25 Follow he Leader If You Can, Hedge If You Mus he sar of he sequence, i is now viable o reduce he regre by using a very high learning rae. In paricular, since here are no leader changes afer he firs round, FTL incurs a regre of only 1/2. As in he firs experimen, he regre increases linearly in for every fixed η (provided i is less han ); bu now he consan of lineariy is large only for learning raes close o 1. Once FlipFlop eners he FTL regime for he second ime, i says here indefiniely, which resuls in bounded regre. Afer his small change in he seup compared o he previous experimen, NormalHedge also suddenly adaps very well o he daa. The behaviour of he oher algorihms is very similar o he firs experimen: heir regre grows wihou bound. 5.3 Experimen 3. Weighs do no Concenrae in AdaHedge The hird experimen uses l 1 = (1, 0), and f 3 () = 0.4. The firs few loss vecors are he same as in he previous experimen, bu every now and hen here are wo loss vecors (1, 0) in a row, so ha he firs exper gradually falls behind he second in erms of performance. By = T = 1000, he firs exper has accumulaed 508 loss, while he second exper has only 492. For any fixed learning rae η, he weighs used by Hedge now concenrae on he second exper. We know from Lemma 4 ha he mixabiliy gap in any round is bounded by a consan imes he variance of he loss under he weighs played by he algorihm; as hese weighs concenrae on he second exper, his variance mus go o zero. One can show ha his happens quickly enough for he cumulaive mixabiliy gap o be bounded for any fixed η ha does no vary wih or depend on T. From (5) we have R (η) = M L + (η) ln K η + bounded = bounded. So in his scenario, as long as he learning rae is kep fixed, we will evenually learn he ideniy of he bes exper. However, if he learning rae is very small, his will happen so slowly ha he weighs sill have no converged by = Even worse, he op graph in Figure 5 shows ha for inermediae values of he learning rae, no only do he weighs fail o converge on he second exper sufficienly quickly, bu hey are sensiive enough o he alernaion of he loss vecors o increase he overhead incurred each round. For his experimen, i really pays o use a large learning rae raher han a safe small one. Thus FTL, Hedge wih η = 1, FlipFlop and NormalHedge perform excellenly, while safe Hedge, AdaHedge and Variaion MW incur a subsanial overhead. Exrapolaing he rend in he graph, i appears ha he overhead of hese algorihms is no bounded. This is possible because he hree algorihms wih poor performance use a learning rae ha decreases as a funcion of. As a concequence he used learning rae may remain oo small for he weighs o concenrae. For he case of AdaHedge, his is an example of he nasy feedback loop described in Secion Experimen 4. Weighs do Concenrae in AdaHedge The fourh and las experimen uses l 1 = (1, 0), and f 4 () = 0.6. The losses are comparable o hose of he hird experimen, bu he performance gap beween he wo expers is somewha larger. By = T = 1000, he wo expers have loss 532 and 468, respecively. I 1305
26 De Rooij, Van Erven, Grünwald and Koolen is now so easy o deermine which of he expers is beer ha he op graph in Figure 6 is nonincreasing: he larger he learning rae, he beer. The algorihms ha managed o keep heir regre bounded in he previous experimen obviously sill perform very well, bu i is clearly visible ha AdaHedge now achieves he same. As discussed below Theorem 6, his happens because he weigh concenraes on he second exper quickly enough ha AdaHedge s regre is bounded in his seing. The crucial difference wih he previous experimen is ha now we have f ξ () = β wih β > 1/2. Thus, while he previous experimen shows ha AdaHedge can be ricked ino reducing he learning rae while i would be beer no o do so, he presen experimen shows ha on he oher hand, someimes AdaHedge does adap really nicely o easy daa, in conras o algorihms ha are uned in erms of a wors-case bound. 6. Discussion and Conclusion The main conribuions of his work are wofold. Firs, we develop a new hedging algorihm called AdaHedge. The analysis simplifies exising resuls and we obain improved bounds (Theorems 6 and 8). Moreover, AdaHedge is fundamenal in he sense ha is weighs are invarian under ranslaion and scaling of he losses (Secion 4) and is bounds are imeless in he sense ha hey do no degenerae when rounds are insered in which all expers incur he same loss. Second, we explain in deail why i is difficul o une he learning rae such ha good performance is obained boh for easy and for hard daa, and we address he issue by developing he FlipFlop algorihm. FlipFlop never performs much worse han he Follow-he-Leader sraegy, which works very well on easy daa (Lemma 10), bu i also reains a wors-case bound similar o he bound for AdaHedge (Theorem 15). As such, his work may be seen as solving a special case of a more general quesion: can we compee wih Hedge for any fixed learning rae? We will now briefly discuss his quesion and hen place our work in a broader conex, which provides an ambiious agenda for fuure work. 6.1 General Quesion: Compeing wih Hedge for any Fixed Learning Rae Up o muliplicaive consans, FlipFlop is a leas as good as FTL and as (he bound for) AdaHedge. These wo algorihms represen wo exremes of choosing he learning rae η in Hedge: FTL akes η = o exploi easy daa, whereas AdaHedge decreases η wih o proec agains he wors case. I is now naural o ask wheher we can design a Universal Hedge algorihm ha can compee wih Hedge wih any fixed learning rae η (0, ]. Tha is, for all T, he regre up o ime T of Universal Hedge should be wihin a consan facor C of he regre incurred by Hedge run wih he fixed ˆη ha minimizes he Hedge loss H (ˆη). This appears o be a difficul quesion, and maybe such an algorihm does no even exis. Ye, even parial resuls (such as an algorihm ha compees wih η [ ln(k)/(s 2 T ), ] or wih a facor C ha increases slowly, say, logarihmically, in T ) would already be of significan ineres. In his regard, i is ineresing o noe ha, in pracice, he learning raes chosen by sophisicaed versions of Hedge do no always perform very well; higher learning raes ofen do beer. This is noed by Devaine e al. (2013), who resolve he issue by adaping he learning rae sequenially in an ad-hoc fashion, which works well in heir applicaion, bu 1306
27 Follow he Leader If You Can, Hedge If You Mus regre learning rae FTL Hedge ea=1 35 regre NormalHedge FlipFlop AdaHedge Safe Hedge Variaion MW ime Figure 3: Hedge regre for Experimen 1 (FTL wors-case) 1307
28 De Rooij, Van Erven, Grünwald and Koolen regre learning rae 30 Hedge ea= regre 15 AdaHedge 10 Safe Hedge 5 Variaion MW NormalHedge, FlipFlop, FTL ime Figure 4: Hedge regre for Experimen 2 (FTL bes-case) 1308
29 Follow he Leader If You Can, Hedge If You Mus regre learning rae 15 AdaHedge Safe Hedge 10 regre Variaion MW 5 Hedge ea=1 NormalHedge FlipFlop, FTL ime Figure 5: Hedge regre for Experimen 3 (weighs do no concenrae in AdaHedge) 1309
30 De Rooij, Van Erven, Grünwald and Koolen regre learning rae 15 Variaion MW Safe Hedge 10 regre 5 NormalHedge, AdaHedge, Hedge ea=1 FlipFlop, FTL ime Figure 6: Hedge regre for Experimen 4 (weighs do concenrae in AdaHedge) 1310
31 Follow he Leader If You Can, Hedge If You Mus for which hey can provide no guaranees. A Universal Hedge algorihm would adap o he learning rae ha is opimal wih hindsigh. FlipFlop is a firs sep in his direcion. Indeed, i already has some of he properies of such an ideal algorihm: under some condiions we can show ha if Hedge achieves bounded regre using any learning rae, hen FTL, and herefore FlipFlop, also achieves bounded regre: Theorem 18 Fix any η > 0. For K = 2 expers wih losses in {0, 1} we have R (η) is bounded R fl is bounded R ff is bounded. The proof is in Appendix B. While he second implicaion remains valid for more expers and oher losses, we currenly do no know if he firs implicaion coninues o hold as well. 6.2 The Big Picure Broadly speaking, a learning rae is any single scalar parameer conrolling he relaive weigh of he daa and a prior regularizaion erm in a learning ask. Such learning raes pop up in bach seings as diverse as L 1 /L 2 -regularized regression such as Lasso and Ridge, sandard Bayesian nonparameric and PAC-Bayesian inference (Zhang, 2006; Audiber, 2004; Caoni, 2007), and as in his paper in sequenial predicion. All he applicaions jus menioned can formally be seen as varians of Bayesian inference: Bayesian MAP in Lasso and Ridge, randomized drawing from he poserior ( Gibbs sampling ) in he PAC-Bayesian seing and in he Hedge seing. Moreover, in each of hese applicaions, selecing he appropriae learning rae is nonrivial: simply adding he learning rae as anoher parameer and puing a Bayesian prior on i can lead o very bad resuls (Grünwald and Langford, 2007). An ideal mehod for adaping he learning rae would work in all such applicaions. In addiion o he FlipFlop algorihm described here, we currenly have mehods ha are guaraneed o work for several PAC-Bayesian syle sochasic seings (Grünwald, 2011, 2012). I is encouraging ha all hese mehods are based on he same, apparenly fundamenal, quaniy, he mixabiliy gap as defined before Lemma 1: hey all employ differen echniques o ensure a learning rae under which he poserior is concenraed and hence he mixabiliy gap is small. This gives some hope ha he approach can be aken even furher. To give bu one example, he Safe Bayesian mehod of Grünwald (2012) uses essenially he same echnique as Devaine e al. (2013), wih an addiional online-o-bach conversion sep. Grünwald (2012) proves ha his approach adaps o he opimal learning rae in an i.i.d. sochasic seing wih arbirary (counably or uncounably infinie) ses of expers (predicors); in conras, AdaHedge and FlipFlop in he form presened in his paper are suiable for a wors-case seing wih a finie se of expers. This raises, of course, he quesion of wheher eiher he Safe Bayesian mehod can be exended o he wors-case seing (which would imply formal guaranees for he mehod of Devaine e al. 2013), or he FlipFlop algorihm can be exended o he seing wih infiniely many expers. Thus, we have wo major, inerrelaed quesions for fuure work: firs, as explained in Secion 6.1, we would like o be able o compee wih all η in some se ha conains a whole range raher han jus wo values. Second, we would like o compee wih he bes η in a seing wih a counably infinie or even uncounable number of expers equipped wih an arbirary prior disribuion. 1311
32 De Rooij, Van Erven, Grünwald and Koolen A hird quesion for fuure work is wheher our mehods can be exended beyond he sandard wors-case Hedge seing and he sochasic i.i.d. seing. A paricularly inriguing (and, as iniial research suggess, nonrivial) quesion is wheher AdaHedge and FlipFlop can be adaped o seings wih limied feedback such as he adversarial bandi seing (Cesa-Bianchi and Lugosi, 2006). We would also like o exend our approach o he Hedgebased sraegies for combinaorial decision domains like Componen Hedge by Koolen e al. (2010), and for marix-valued predicions like hose by Tsuda e al. (2005). Acknowledgmens We would like o hank Wojciech Kołowski, Gilles Solz and wo anonymous referees for criical feedback. This work was suppored in par by he IST Programme of he European Communiy, under he PASCAL Nework of Excellence, IST and by NWO Rubicon grans and Appendix A. Proof of Lemma 1 The resul for η = follows from η < as a limiing case, so we may assume wihou loss of generaliy ha η <. Then m h is obained by using Jensen s inequaliy o move he logarihm inside he expecaion, and m l and h l + follow by bounding all losses by heir minimal and maximal values, respecively. The nex wo iems are analogues of similar basic resuls in Bayesian probabiliy. Iem 2 generalizes he chain rule of probabiliy Pr(x 1,..., x T ) = T =1 Pr(x x 1,..., x 1 ): M = 1 η ln T =1 For he hird iem, use iem 2 o wrie w 1 e ηl w 1 e ηl 1 = 1 η ln(w 1 e ηl ). M = 1 η ln ( k w 1,k e ηl T,k The lower bound is obained by bounding all L T,k from below by L ; for he upper bound we drop all erms in he sum excep for he erm corresponding o he bes exper and use w 1,k = 1/K. For he las iem, le 0 < η < γ be any wo learning raes. Then Jensen s inequaliy gives 1 η ln w 1 e ηl = 1 η ln w 1 (e γl) η/γ 1 ( η ln w 1 e γl) η/γ 1 = γ ln w 1 e γl. This complees he proof. Appendix B. Proof of Theorem 18 The second implicaion follows from Theorem 15, so we only need o prove he firs implicaion. To his end, consider any infinie sequence of losses on which FTL has unbounded regre. We will argue ha Hedge wih fixed η mus have unbounded regre as well. ). 1312
33 Follow he Leader If You Can, Hedge If You Mus Our argumen is based on finding an infinie subsequence of he losses on which (a) he regre for Hedge wih fixed η is a mos as large as on he original sequence of losses; and (b) he regre for Hedge is infinie. To consruc his subsequence, firs remove all rials such ha l,1 = l,2 (ha is, boh expers suffer he same loss), as hese rials do no change he regre of eiher FTL or Hedge, nor heir behaviour on any of he oher rounds. Nex, we will selecively remove cerain local exrema. We call a pair of wo consecuive rials (, + 1) a local exremum if he losses in hese rials are opposie: eiher l = (0, 1) and l +1 = (1, 0) or vice versa. Removing any local exremum will only decrease he regre for Hedge, as may be seen as follows. We observe ha removing a local exremum will no change he cumulaive losses of he expers or he behaviour of Hedge on oher rounds, so i suffices o consider only he regre incurred on rounds and + 1 hemselves. By symmery i is furher sufficien o consider he case ha l = (0, 1) and l +1 = (1, 0). Then, over rials and + 1, he individual expers boh suffer loss 1, and for Hedge he loss is h + h +1 = w l + w +1 l +1 = w,2 + w +1,1. Now, since he loss received by exper 1 in round was less han ha of exper 2, some weigh shifs o he firs exper: we mus have w +1,1 > w,1. Subsiuion gives h + h +1 > w,1 + w,2 = 1. Thus, Hedge suffers more loss in hese wo rounds han whichever exper urns ou o be bes in hindsigh, and i follows ha removing rials and + 1 will only decrease is regre (by an amoun ha depends only on η). We proceed o selec he local exrema o remove. To his end, le d = L,2 L,1 denoe he difference in cumulaive loss beween he expers afer rials, and observe ha removal of a local exremum a (, + 1) will simply remove he elemens d and d +1 from he sequence d 1, d 2,... while leaving he oher elemens of he sequence unchanged. We will remove local exrema in a way ha leads o an infinie subsequence of losses such ha d 1, d 2, d 3, d 4, d 5,... = ±1, 0, ±1, 0, ±1,... (26) In his subsequence, every wo consecuive rials sill consiue a local exremum, on which Hedge incurs a cerain fixed posiive regre. Consequenly, he Hedge regre R grows linearly in and is herefore unbounded. If he losses already saisfy (26), we are done. If no, hen observe ha here can only be a leader change a ime + 1 in he sense of Lemma 10 when d = 0. Since he FTL regre is bounded by he number of leader changes (Lemma 10), and since FTL was assumed o have infinie regre, here mus herefore be an infinie number of rials such ha d = 0. We will remove local exrema in a way ha preserves his propery. In addiion, we mus have d +1 d = 1 for all, because d +1 = d would imply ha l +1,1 = l +1,2 and we have already removed such rials. This second propery is auomaically preserved regardless of which rials we remove. If he losses do no ye saisfy (26), here mus be a firs rial u wih d u 2. Since here are infiniely many wih d = 0, here mus hen also be a firs rial w > u wih d w = 0. Now choose any v [u, w) so ha d v = max [u,w] d maximizes he discrepancy beween he cumulaive losses of he expers. Since v aains he maximum and d +1 d = 1 for all as menioned above, we have d v+1 = d v 1, so ha (v, v + 1) mus be a local exremum, and his is he local exremum we remove. Since d v d u 2, we also have d v+1 1, so ha his does no remove any of he rials in which d = 0. Repeiion of his 1313
34 De Rooij, Van Erven, Grünwald and Koolen process will evenually lead o v = u, so ha rial u is removed. Given any T, he process may herefore be repeaed unil d 1 for all T. As d +1 d = 1 for all, we hen mach (26) for he firs T rials. Hence by leing T go o infiniy we obain he desired resul. References Jean-Yves Audiber. PAC-Bayesian saisical learning heory. PhD hesis, Universié Paris VI, Peer Auer, Nicolò Cesa-Bianchi, and Claudio Genile. Adapive and self-confiden on-line learning algorihms. Journal of Compuer and Sysem Sciences, 64:48 75, Olivier Caoni. PAC-Bayesian Supervised Classificaion. Lecure Noes-Monograph Series. IMS, Nicolò Cesa-Bianchi and Gábor Lugosi. Predicion, learning, and games. Cambridge Universiy Press, Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Rober E. Schapire, and Manfred K. Warmuh. How o use exper advice. Journal of he ACM, 44(3): , Nicolò Cesa-Bianchi, Yishay Mansour, and Gilles Solz. Improved second-order bounds for predicion wih exper advice. Machine Learning, 66(2/3): , Kamalika Chaudhuri, Yoav Freund, and Daniel Hsu. A parameer-free hedging algorihm. In Advances in Neural Informaion Processing Sysems 22 (NIPS 2009), pages , Alexey V. Chernov and Vladimir Vovk. Predicion wih advice of unknown number of expers. In Peer Grünwald and Peer Spires, ediors, UAI, pages AUAI Press, Marie Devaine, Pierre Gaillard, Yannig Goude, and Gilles Solz. Forecasing elecriciy consumpion by aggregaing specialized expers; a review of he sequenial aggregaion of specialized expers, wih an applicaion o Slovakian and French counry-wide one-dayahead (half-)hourly predicions. Machine Learning, 90(2): , February Yoav Freund and Rober E. Schapire. A decision-heoreic generalizaion of on-line learning and an applicaion o boosing. Journal of Compuer and Sysem Sciences, 55: , Yoav Freund and Rober E. Schapire. Adapive game playing using muliplicaive weighs. Games and Economic Behavior, 29:79 103, Sébasien Gerchinoviz. Prédicion de suies individuelles e cadre saisique classique: éude de quelques liens auour de la régression parcimonieuse e des echniques d agrégaion. PhD hesis, Universié Paris-Sud,
35 Follow he Leader If You Can, Hedge If You Mus Peer Grünwald. Safe learning: bridging he gap beween Bayes, MDL and saisical learning heory via empirical convexiy. In Proceedings of he 24h Inernaional Conference on Learning Theory (COLT 2011), pages , Peer Grünwald. The safe Bayesian: learning he learning rae via he mixabiliy gap. In Proceedings of he 23rd Inernaional Conference on Algorihmic Learning Theory (ALT 2012), Peer Grünwald and John Langford. Subopimal behavior of Bayes and MDL in classificaion under misspecificaion. Machine Learning, 66(2-3): , DOI /s László Györfi and György Oucsák. Sequenial predicion of unbounded saionary ime series. IEEE Transacions on Informaion Theory, 53(5): , Elad Hazan and Sayen Kale. Exracing cerainy from uncerainy: Regre bounded by variaion in coss. In Proceedings of he 21s Annual Conference on Learning Theory (COLT), pages 57 67, Marcus Huer and Jan Poland. Adapive online predicion by following he perurbed leader. Journal of Machine Learning Research, 6: , Adam Kalai and Sanosh Vempala. Efficien algorihms for online decision. In Proceedings of he 16s Annual Conference on Learning Theory (COLT), pages , Yuri Kalnishkan and Michael V. Vyugin. The weak aggregaing algorihm and weak mixabiliy. In Proceedings of he 18h Annual Conference on Learning Theory (COLT), pages , Wouer M. Koolen, Manfred K. Warmuh, and Jyrki Kivinen. Hedging srucured conceps. In A.T. Kalai and M. Mohri, ediors, Proceedings of he 23rd Annual Conference on Learning Theory (COLT 2010), pages , Nick Lilesone and Manfred K. Warmuh. The weighed majoriy algorihm. Informaion and Compuaion, 108(2): , Koji Tsuda, Gunnar Räsch, and Manfred K. Warmuh. Marix exponeniaed gradien updaes for on-line learning and Bregman projecion. Journal of Machine Learning Research, 6: , Tim van Erven, Peer Grünwald, Wouer M. Koolen, and Seven de Rooij. Adapive hedge. In Advances in Neural Informaion Processing Sysems 24 (NIPS 2011), pages , Vladimir Vovk. A game of predicion wih exper advice. Journal of Compuer and Sysem Sciences, 56(2): , Vladimir Vovk. Compeiive on-line saisics. Inernaional Saisical Review, 69(2): ,
36 De Rooij, Van Erven, Grünwald and Koolen Vladimir Vovk, Akimichi Takemura, and Glenn Shafer. Defensive forecasing. In Proceedings of AISTATS 2005, Archive version available a hp:// Tong Zhang. Informaion heoreical upper and lower bounds for saisical esimaion. IEEE Transacions on Informaion Theory, 52(4): ,
On the degrees of irreducible factors of higher order Bernoulli polynomials
ACTA ARITHMETICA LXII.4 (1992 On he degrees of irreducible facors of higher order Bernoulli polynomials by Arnold Adelberg (Grinnell, Ia. 1. Inroducion. In his paper, we generalize he curren resuls on
PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE
Profi Tes Modelling in Life Assurance Using Spreadshees PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE Erik Alm Peer Millingon 2004 Profi Tes Modelling in Life Assurance Using Spreadshees
Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)
Mahemaics in Pharmacokineics Wha and Why (A second aemp o make i clearer) We have used equaions for concenraion () as a funcion of ime (). We will coninue o use hese equaions since he plasma concenraions
Chapter 7. Response of First-Order RL and RC Circuits
Chaper 7. esponse of Firs-Order L and C Circuis 7.1. The Naural esponse of an L Circui 7.2. The Naural esponse of an C Circui 7.3. The ep esponse of L and C Circuis 7.4. A General oluion for ep and Naural
MTH6121 Introduction to Mathematical Finance Lesson 5
26 MTH6121 Inroducion o Mahemaical Finance Lesson 5 Conens 2.3 Brownian moion wih drif........................... 27 2.4 Geomeric Brownian moion........................... 28 2.5 Convergence of random
The Transport Equation
The Transpor Equaion Consider a fluid, flowing wih velociy, V, in a hin sraigh ube whose cross secion will be denoed by A. Suppose he fluid conains a conaminan whose concenraion a posiion a ime will be
ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS
ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS R. Caballero, E. Cerdá, M. M. Muñoz and L. Rey () Deparmen of Applied Economics (Mahemaics), Universiy of Málaga,
Chapter 8: Regression with Lagged Explanatory Variables
Chaper 8: Regression wih Lagged Explanaory Variables Time series daa: Y for =1,..,T End goal: Regression model relaing a dependen variable o explanaory variables. Wih ime series new issues arise: 1. One
Option Put-Call Parity Relations When the Underlying Security Pays Dividends
Inernaional Journal of Business and conomics, 26, Vol. 5, No. 3, 225-23 Opion Pu-all Pariy Relaions When he Underlying Securiy Pays Dividends Weiyu Guo Deparmen of Finance, Universiy of Nebraska Omaha,
CHARGE AND DISCHARGE OF A CAPACITOR
REFERENCES RC Circuis: Elecrical Insrumens: Mos Inroducory Physics exs (e.g. A. Halliday and Resnick, Physics ; M. Sernheim and J. Kane, General Physics.) This Laboraory Manual: Commonly Used Insrumens:
Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.
Graduae School of Business Adminisraion Universiy of Virginia UVA-F-38 Duraion and Convexiy he price of a bond is a funcion of he promised paymens and he marke required rae of reurn. Since he promised
Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1
Absrac number: 05-0407 Single-machine Scheduling wih Periodic Mainenance and boh Preempive and Non-preempive jobs in Remanufacuring Sysem Liu Biyu hen Weida (School of Economics and Managemen Souheas Universiy
11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements
Inroducion Chaper 14: Dynamic D-S dynamic model of aggregae and aggregae supply gives us more insigh ino how he economy works in he shor run. I is a simplified version of a DSGE model, used in cuing-edge
Task is a schedulable entity, i.e., a thread
Real-Time Scheduling Sysem Model Task is a schedulable eniy, i.e., a hread Time consrains of periodic ask T: - s: saring poin - e: processing ime of T - d: deadline of T - p: period of T Periodic ask T
Economics Honors Exam 2008 Solutions Question 5
Economics Honors Exam 2008 Soluions Quesion 5 (a) (2 poins) Oupu can be decomposed as Y = C + I + G. And we can solve for i by subsiuing in equaions given in he quesion, Y = C + I + G = c 0 + c Y D + I
PATHWISE PROPERTIES AND PERFORMANCE BOUNDS FOR A PERISHABLE INVENTORY SYSTEM
PATHWISE PROPERTIES AND PERFORMANCE BOUNDS FOR A PERISHABLE INVENTORY SYSTEM WILLIAM L. COOPER Deparmen of Mechanical Engineering, Universiy of Minnesoa, 111 Church Sree S.E., Minneapolis, MN 55455 [email protected]
Individual Health Insurance April 30, 2008 Pages 167-170
Individual Healh Insurance April 30, 2008 Pages 167-170 We have received feedback ha his secion of he e is confusing because some of he defined noaion is inconsisen wih comparable life insurance reserve
Why Did the Demand for Cash Decrease Recently in Korea?
Why Did he Demand for Cash Decrease Recenly in Korea? Byoung Hark Yoo Bank of Korea 26. 5 Absrac We explores why cash demand have decreased recenly in Korea. The raio of cash o consumpion fell o 4.7% in
Journal Of Business & Economics Research September 2005 Volume 3, Number 9
Opion Pricing And Mone Carlo Simulaions George M. Jabbour, (Email: [email protected]), George Washingon Universiy Yi-Kang Liu, ([email protected]), George Washingon Universiy ABSTRACT The advanage of Mone Carlo
An Online Learning-based Framework for Tracking
An Online Learning-based Framework for Tracking Kamalika Chaudhuri Compuer Science and Engineering Universiy of California, San Diego La Jolla, CA 9293 Yoav Freund Compuer Science and Engineering Universiy
Multiprocessor Systems-on-Chips
Par of: Muliprocessor Sysems-on-Chips Edied by: Ahmed Amine Jerraya and Wayne Wolf Morgan Kaufmann Publishers, 2005 2 Modeling Shared Resources Conex swiching implies overhead. On a processing elemen,
Optimal Investment and Consumption Decision of Family with Life Insurance
Opimal Invesmen and Consumpion Decision of Family wih Life Insurance Minsuk Kwak 1 2 Yong Hyun Shin 3 U Jin Choi 4 6h World Congress of he Bachelier Finance Sociey Torono, Canada June 25, 2010 1 Speaker
Chapter 1.6 Financial Management
Chaper 1.6 Financial Managemen Par I: Objecive ype quesions and answers 1. Simple pay back period is equal o: a) Raio of Firs cos/ne yearly savings b) Raio of Annual gross cash flow/capial cos n c) = (1
TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999
TSG-RAN Working Group 1 (Radio Layer 1) meeing #3 Nynashamn, Sweden 22 nd 26 h March 1999 RAN TSGW1#3(99)196 Agenda Iem: 9.1 Source: Tile: Documen for: Moorola Macro-diversiy for he PRACH Discussion/Decision
Forecasting and Information Sharing in Supply Chains Under Quasi-ARMA Demand
Forecasing and Informaion Sharing in Supply Chains Under Quasi-ARMA Demand Avi Giloni, Clifford Hurvich, Sridhar Seshadri July 9, 2009 Absrac In his paper, we revisi he problem of demand propagaion in
The option pricing framework
Chaper 2 The opion pricing framework The opion markes based on swap raes or he LIBOR have become he larges fixed income markes, and caps (floors) and swapions are he mos imporan derivaives wihin hese markes.
Morningstar Investor Return
Morningsar Invesor Reurn Morningsar Mehodology Paper Augus 31, 2010 2010 Morningsar, Inc. All righs reserved. The informaion in his documen is he propery of Morningsar, Inc. Reproducion or ranscripion
BALANCE OF PAYMENTS. First quarter 2008. Balance of payments
BALANCE OF PAYMENTS DATE: 2008-05-30 PUBLISHER: Balance of Paymens and Financial Markes (BFM) Lena Finn + 46 8 506 944 09, [email protected] Camilla Bergeling +46 8 506 942 06, [email protected]
Niche Market or Mass Market?
Niche Marke or Mass Marke? Maxim Ivanov y McMaser Universiy July 2009 Absrac The de niion of a niche or a mass marke is based on he ranking of wo variables: he monopoly price and he produc mean value.
Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.
Principal componens of sock marke dynamics Mehodology and applicaions in brief o be updaed Andrei Bouzaev, [email protected] Why principal componens are needed Objecives undersand he evidence of more han one
Real-time Particle Filters
Real-ime Paricle Filers Cody Kwok Dieer Fox Marina Meilă Dep. of Compuer Science & Engineering, Dep. of Saisics Universiy of Washingon Seale, WA 9895 ckwok,fox @cs.washingon.edu, [email protected] Absrac
Inductance and Transient Circuits
Chaper H Inducance and Transien Circuis Blinn College - Physics 2426 - Terry Honan As a consequence of Faraday's law a changing curren hrough one coil induces an EMF in anoher coil; his is known as muual
INTEREST RATE FUTURES AND THEIR OPTIONS: SOME PRICING APPROACHES
INTEREST RATE FUTURES AND THEIR OPTIONS: SOME PRICING APPROACHES OPENGAMMA QUANTITATIVE RESEARCH Absrac. Exchange-raded ineres rae fuures and heir opions are described. The fuure opions include hose paying
Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary
Random Walk in -D Random walks appear in many cones: diffusion is a random walk process undersanding buffering, waiing imes, queuing more generally he heory of sochasic processes gambling choosing he bes
Measuring macroeconomic volatility Applications to export revenue data, 1970-2005
FONDATION POUR LES ETUDES ET RERS LE DEVELOPPEMENT INTERNATIONAL Measuring macroeconomic volailiy Applicaions o expor revenue daa, 1970-005 by Joël Cariolle Policy brief no. 47 March 01 The FERDI is a
ARCH 2013.1 Proceedings
Aricle from: ARCH 213.1 Proceedings Augus 1-4, 212 Ghislain Leveille, Emmanuel Hamel A renewal model for medical malpracice Ghislain Léveillé École d acuaria Universié Laval, Québec, Canada 47h ARC Conference
USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES
USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES Mehme Nuri GÖMLEKSİZ Absrac Using educaion echnology in classes helps eachers realize a beer and more effecive learning. In his sudy 150 English eachers were
TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS
TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS RICHARD J. POVINELLI AND XIN FENG Deparmen of Elecrical and Compuer Engineering Marquee Universiy, P.O.
DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR
Invesmen Managemen and Financial Innovaions, Volume 4, Issue 3, 7 33 DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR Ahanasios
INTRODUCTION TO FORECASTING
INTRODUCTION TO FORECASTING INTRODUCTION: Wha is a forecas? Why do managers need o forecas? A forecas is an esimae of uncerain fuure evens (lierally, o "cas forward" by exrapolaing from pas and curren
The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas
The Greek financial crisis: growing imbalances and sovereign spreads Heaher D. Gibson, Sephan G. Hall and George S. Tavlas The enry The enry of Greece ino he Eurozone in 2001 produced a dividend in he
17 Laplace transform. Solving linear ODE with piecewise continuous right hand sides
7 Laplace ransform. Solving linear ODE wih piecewise coninuous righ hand sides In his lecure I will show how o apply he Laplace ransform o he ODE Ly = f wih piecewise coninuous f. Definiion. A funcion
Longevity 11 Lyon 7-9 September 2015
Longeviy 11 Lyon 7-9 Sepember 2015 RISK SHARING IN LIFE INSURANCE AND PENSIONS wihin and across generaions Ragnar Norberg ISFA Universié Lyon 1/London School of Economics Email: [email protected]
Working Paper On the timing option in a futures contract. SSE/EFI Working Paper Series in Economics and Finance, No. 619
econsor www.econsor.eu Der Open-Access-Publikaionsserver der ZBW Leibniz-Informaionszenrum Wirschaf The Open Access Publicaion Server of he ZBW Leibniz Informaion Cenre for Economics Biagini, Francesca;
4. International Parity Conditions
4. Inernaional ariy ondiions 4.1 urchasing ower ariy he urchasing ower ariy ( heory is one of he early heories of exchange rae deerminaion. his heory is based on he concep ha he demand for a counry's currency
Relationships between Stock Prices and Accounting Information: A Review of the Residual Income and Ohlson Models. Scott Pirie* and Malcolm Smith**
Relaionships beween Sock Prices and Accouning Informaion: A Review of he Residual Income and Ohlson Models Sco Pirie* and Malcolm Smih** * Inernaional Graduae School of Managemen, Universiy of Souh Ausralia
DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS
DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS Hong Mao, Shanghai Second Polyechnic Universiy Krzyszof M. Osaszewski, Illinois Sae Universiy Youyu Zhang, Fudan Universiy ABSTRACT Liigaion, exper
Term Structure of Prices of Asian Options
Term Srucure of Prices of Asian Opions Jirô Akahori, Tsuomu Mikami, Kenji Yasuomi and Teruo Yokoa Dep. of Mahemaical Sciences, Risumeikan Universiy 1-1-1 Nojihigashi, Kusasu, Shiga 525-8577, Japan E-mail:
How To Calculate Price Elasiciy Per Capia Per Capi
Price elasiciy of demand for crude oil: esimaes for 23 counries John C.B. Cooper Absrac This paper uses a muliple regression model derived from an adapaion of Nerlove s parial adjusmen model o esimae boh
Can Individual Investors Use Technical Trading Rules to Beat the Asian Markets?
Can Individual Invesors Use Technical Trading Rules o Bea he Asian Markes? INTRODUCTION In radiional ess of he weak-form of he Efficien Markes Hypohesis, price reurn differences are found o be insufficien
A Two-Account Life Insurance Model for Scenario-Based Valuation Including Event Risk Jensen, Ninna Reitzel; Schomacker, Kristian Juul
universiy of copenhagen Universiy of Copenhagen A Two-Accoun Life Insurance Model for Scenario-Based Valuaion Including Even Risk Jensen, Ninna Reizel; Schomacker, Krisian Juul Published in: Risks DOI:
MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR
MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR The firs experimenal publicaion, which summarised pas and expeced fuure developmen of basic economic indicaors, was published by he Minisry
Life insurance cash flows with policyholder behaviour
Life insurance cash flows wih policyholder behaviour Krisian Buchard,,1 & Thomas Møller, Deparmen of Mahemaical Sciences, Universiy of Copenhagen Universiesparken 5, DK-2100 Copenhagen Ø, Denmark PFA Pension,
A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation
A Noe on Using he Svensson procedure o esimae he risk free rae in corporae valuaion By Sven Arnold, Alexander Lahmann and Bernhard Schwezler Ocober 2011 1. The risk free ineres rae in corporae valuaion
Supplementary Appendix for Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?
Supplemenary Appendix for Depression Babies: Do Macroeconomic Experiences Affec Risk-Taking? Ulrike Malmendier UC Berkeley and NBER Sefan Nagel Sanford Universiy and NBER Sepember 2009 A. Deails on SCF
Hedging with Forwards and Futures
Hedging wih orwards and uures Hedging in mos cases is sraighforward. You plan o buy 10,000 barrels of oil in six monhs and you wish o eliminae he price risk. If you ake he buy-side of a forward/fuures
Chapter 6: Business Valuation (Income Approach)
Chaper 6: Business Valuaion (Income Approach) Cash flow deerminaion is one of he mos criical elemens o a business valuaion. Everyhing may be secondary. If cash flow is high, hen he value is high; if he
1 HALF-LIFE EQUATIONS
R.L. Hanna Page HALF-LIFE EQUATIONS The basic equaion ; he saring poin ; : wrien for ime: x / where fracion of original maerial and / number of half-lives, and / log / o calculae he age (# ears): age (half-life)
DETERMINISTIC INVENTORY MODEL FOR ITEMS WITH TIME VARYING DEMAND, WEIBULL DISTRIBUTION DETERIORATION AND SHORTAGES KUN-SHAN WU
Yugoslav Journal of Operaions Research 2 (22), Number, 6-7 DEERMINISIC INVENORY MODEL FOR IEMS WIH IME VARYING DEMAND, WEIBULL DISRIBUION DEERIORAION AND SHORAGES KUN-SHAN WU Deparmen of Bussines Adminisraion
The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1
Business Condiions & Forecasing Exponenial Smoohing LECTURE 2 MOVING AVERAGES AND EXPONENTIAL SMOOTHING OVERVIEW This lecure inroduces ime-series smoohing forecasing mehods. Various models are discussed,
Appendix D Flexibility Factor/Margin of Choice Desktop Research
Appendix D Flexibiliy Facor/Margin of Choice Deskop Research Cheshire Eas Council Cheshire Eas Employmen Land Review Conens D1 Flexibiliy Facor/Margin of Choice Deskop Research 2 Final Ocober 2012 \\GLOBAL.ARUP.COM\EUROPE\MANCHESTER\JOBS\200000\223489-00\4
Distributing Human Resources among Software Development Projects 1
Disribuing Human Resources among Sofware Developmen Proecs Macario Polo, María Dolores Maeos, Mario Piaini and rancisco Ruiz Summary This paper presens a mehod for esimaing he disribuion of human resources
Cooperation with Network Monitoring
Cooperaion wih Nework Monioring Alexander Wolizky Microsof Research and Sanford Universiy November 2011 Absrac This paper sudies he maximum level of cooperaion ha can be susained in perfec Bayesian equilibrium
cooking trajectory boiling water B (t) microwave 0 2 4 6 8 101214161820 time t (mins)
Alligaor egg wih calculus We have a large alligaor egg jus ou of he fridge (1 ) which we need o hea o 9. Now here are wo accepable mehods for heaing alligaor eggs, one is o immerse hem in boiling waer
Making a Faster Cryptanalytic Time-Memory Trade-Off
Making a Faser Crypanalyic Time-Memory Trade-Off Philippe Oechslin Laboraoire de Securié e de Crypographie (LASEC) Ecole Polyechnique Fédérale de Lausanne Faculé I&C, 1015 Lausanne, Swizerland [email protected]
E0 370 Statistical Learning Theory Lecture 20 (Nov 17, 2011)
E0 370 Saisical Learning Theory Lecure 0 (ov 7, 0 Online Learning from Expers: Weighed Majoriy and Hedge Lecurer: Shivani Agarwal Scribe: Saradha R Inroducion In his lecure, we will look a he problem of
Dependent Interest and Transition Rates in Life Insurance
Dependen Ineres and ransiion Raes in Life Insurance Krisian Buchard Universiy of Copenhagen and PFA Pension January 28, 2013 Absrac In order o find marke consisen bes esimaes of life insurance liabiliies
Present Value Methodology
Presen Value Mehodology Econ 422 Invesmen, Capial & Finance Universiy of Washingon Eric Zivo Las updaed: April 11, 2010 Presen Value Concep Wealh in Fisher Model: W = Y 0 + Y 1 /(1+r) The consumer/producer
Appendix A: Area. 1 Find the radius of a circle that has circumference 12 inches.
Appendi A: Area worked-ou s o Odd-Numbered Eercises Do no read hese worked-ou s before aemping o do he eercises ourself. Oherwise ou ma mimic he echniques shown here wihou undersanding he ideas. Bes wa
Impact of scripless trading on business practices of Sub-brokers.
Impac of scripless rading on business pracices of Sub-brokers. For furher deails, please conac: Mr. T. Koshy Vice Presiden Naional Securiies Deposiory Ld. Tradeworld, 5 h Floor, Kamala Mills Compound,
1. y 5y + 6y = 2e t Solution: Characteristic equation is r 2 5r +6 = 0, therefore r 1 = 2, r 2 = 3, and y 1 (t) = e 2t,
Homework6 Soluions.7 In Problem hrough 4 use he mehod of variaion of parameers o find a paricular soluion of he given differenial equaion. Then check your answer by using he mehod of undeermined coeffiens..
Does Option Trading Have a Pervasive Impact on Underlying Stock Prices? *
Does Opion Trading Have a Pervasive Impac on Underlying Sock Prices? * Neil D. Pearson Universiy of Illinois a Urbana-Champaign Allen M. Poeshman Universiy of Illinois a Urbana-Champaign Joshua Whie Universiy
Table of contents Chapter 1 Interest rates and factors Chapter 2 Level annuities Chapter 3 Varying annuities
Table of conens Chaper 1 Ineres raes and facors 1 1.1 Ineres 2 1.2 Simple ineres 4 1.3 Compound ineres 6 1.4 Accumulaed value 10 1.5 Presen value 11 1.6 Rae of discoun 13 1.7 Consan force of ineres 17
Markov Chain Modeling of Policy Holder Behavior in Life Insurance and Pension
Markov Chain Modeling of Policy Holder Behavior in Life Insurance and Pension Lars Frederik Brand Henriksen 1, Jeppe Woemann Nielsen 2, Mogens Seffensen 1, and Chrisian Svensson 2 1 Deparmen of Mahemaical
Cointegration: The Engle and Granger approach
Coinegraion: The Engle and Granger approach Inroducion Generally one would find mos of he economic variables o be non-saionary I(1) variables. Hence, any equilibrium heories ha involve hese variables require
PREMIUM INDEXING IN LIFELONG HEALTH INSURANCE
Far Eas Journal of Mahemaical Sciences (FJMS 203 Pushpa Publishing House, Allahabad, India Published Online: Sepember 203 Available online a hp://pphm.com/ournals/fms.hm Special Volume 203, Par IV, Pages
Chapter 4: Exponential and Logarithmic Functions
Chaper 4: Eponenial and Logarihmic Funcions Secion 4.1 Eponenial Funcions... 15 Secion 4. Graphs of Eponenial Funcions... 3 Secion 4.3 Logarihmic Funcions... 4 Secion 4.4 Logarihmic Properies... 53 Secion
Module 3 Design for Strength. Version 2 ME, IIT Kharagpur
Module 3 Design for Srengh Lesson 2 Sress Concenraion Insrucional Objecives A he end of his lesson, he sudens should be able o undersand Sress concenraion and he facors responsible. Deerminaion of sress
A Re-examination of the Joint Mortality Functions
Norh merican cuarial Journal Volume 6, Number 1, p.166-170 (2002) Re-eaminaion of he Join Morali Funcions bsrac. Heekung Youn, rkad Shemakin, Edwin Herman Universi of S. Thomas, Sain Paul, MN, US Morali
A Generalized Bivariate Ornstein-Uhlenbeck Model for Financial Assets
A Generalized Bivariae Ornsein-Uhlenbeck Model for Financial Asses Romy Krämer, Mahias Richer Technische Universiä Chemniz, Fakulä für Mahemaik, 917 Chemniz, Germany Absrac In his paper, we sudy mahemaical
Analysis of Pricing and Efficiency Control Strategy between Internet Retailer and Conventional Retailer
Recen Advances in Business Managemen and Markeing Analysis of Pricing and Efficiency Conrol Sraegy beween Inerne Reailer and Convenional Reailer HYUG RAE CHO 1, SUG MOO BAE and JOG HU PARK 3 Deparmen of
Pricing Fixed-Income Derivaives wih he Forward-Risk Adjused Measure Jesper Lund Deparmen of Finance he Aarhus School of Business DK-8 Aarhus V, Denmark E-mail: [email protected] Homepage: www.hha.dk/~jel/ Firs
II.1. Debt reduction and fiscal multipliers. dbt da dpbal da dg. bal
Quarerly Repor on he Euro Area 3/202 II.. Deb reducion and fiscal mulipliers The deerioraion of public finances in he firs years of he crisis has led mos Member Saes o adop sizeable consolidaion packages.
A UNIFIED APPROACH TO MATHEMATICAL OPTIMIZATION AND LAGRANGE MULTIPLIER THEORY FOR SCIENTISTS AND ENGINEERS
A UNIFIED APPROACH TO MATHEMATICAL OPTIMIZATION AND LAGRANGE MULTIPLIER THEORY FOR SCIENTISTS AND ENGINEERS RICHARD A. TAPIA Appendix E: Differeniaion in Absrac Spaces I should be no surprise ha he differeniaion
Chapter 2 Problems. 3600s = 25m / s d = s t = 25m / s 0.5s = 12.5m. Δx = x(4) x(0) =12m 0m =12m
Chaper 2 Problems 2.1 During a hard sneeze, your eyes migh shu for 0.5s. If you are driving a car a 90km/h during such a sneeze, how far does he car move during ha ime s = 90km 1000m h 1km 1h 3600s = 25m
On Stochastic and Worst-case Models for Investing
On Sochasic and Wors-case Models for Invesing Elad Hazan IBM Almaden Research Cener 650 Harry Rd, San Jose, CA 9520 [email protected] Sayen Kale Yahoo! Research 430 Grea America Parkway, Sana Clara,
Performance Center Overview. Performance Center Overview 1
Performance Cener Overview Performance Cener Overview 1 ODJFS Performance Cener ce Cener New Performance Cener Model Performance Cener Projec Meeings Performance Cener Execuive Meeings Performance Cener
AP Calculus AB 2013 Scoring Guidelines
AP Calculus AB 1 Scoring Guidelines The College Board The College Board is a mission-driven no-for-profi organizaion ha connecs sudens o college success and opporuniy. Founded in 19, he College Board was
SURVEYING THE RELATIONSHIP BETWEEN STOCK MARKET MAKER AND LIQUIDITY IN TEHRAN STOCK EXCHANGE COMPANIES
Inernaional Journal of Accouning Research Vol., No. 7, 4 SURVEYING THE RELATIONSHIP BETWEEN STOCK MARKET MAKER AND LIQUIDITY IN TEHRAN STOCK EXCHANGE COMPANIES Mohammad Ebrahimi Erdi, Dr. Azim Aslani,
UNDERSTANDING THE DEATH BENEFIT SWITCH OPTION IN UNIVERSAL LIFE POLICIES. Nadine Gatzert
UNDERSTANDING THE DEATH BENEFIT SWITCH OPTION IN UNIVERSAL LIFE POLICIES Nadine Gazer Conac (has changed since iniial submission): Chair for Insurance Managemen Universiy of Erlangen-Nuremberg Lange Gasse
Multiobjective Prediction with Expert Advice
Muliobjecive Predicion wih Exper Advice Alexey Chernov Compuer Learning Research Cenre and Deparmen of Compuer Science Royal Holloway Universiy of London GTP Workshop, June 2010 Alexey Chernov (RHUL) Muliobjecive
Acceleration Lab Teacher s Guide
Acceleraion Lab Teacher s Guide Objecives:. Use graphs of disance vs. ime and velociy vs. ime o find acceleraion of a oy car.. Observe he relaionship beween he angle of an inclined plane and he acceleraion
Stochastic Optimal Control Problem for Life Insurance
Sochasic Opimal Conrol Problem for Life Insurance s. Basukh 1, D. Nyamsuren 2 1 Deparmen of Economics and Economerics, Insiue of Finance and Economics, Ulaanbaaar, Mongolia 2 School of Mahemaics, Mongolian
Market Liquidity and the Impacts of the Computerized Trading System: Evidence from the Stock Exchange of Thailand
36 Invesmen Managemen and Financial Innovaions, 4/4 Marke Liquidiy and he Impacs of he Compuerized Trading Sysem: Evidence from he Sock Exchange of Thailand Sorasar Sukcharoensin 1, Pariyada Srisopisawa,
