Muliobjecive Predicion wih Exper Advice Alexey Chernov Compuer Learning Research Cenre and Deparmen of Compuer Science Royal Holloway Universiy of London GTP Workshop, June 2010 Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 1 / 18
Example: Predicion of Spor Mach Oucome V. Vovk, F. Zhdanov. Predicions wih Exper Advice for Brier Game. ICML 08 Bookmakers daa: 4 bookmakers, odds for 10000 ennis maches (2 oucomes) 8 bookmakers, odds for 9000 fooball maches (3 oucomes) Odds a i can be ransformed o probabiliies Prob[i]: Prob[i] = 1/a i j 1/a j The loss is measured by he square (Brier) loss funcion. Learner s sraegy is he Aggregaing Algorihm. Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 2 / 18
Tennis Predicion, Square Loss 16 14 12 Theoreical bounds Learner Bookmakers 10 8 6 4 2 0 2 4 0 2000 4000 6000 8000 10000 12000 Graph of he negaive regre Loss Ek (T ) Loss(T ), 4 Expers Learner is he AA for he square loss Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 3 / 18
Tennis Predicion, Log Loss 25 20 Theoreical bounds Learner Bookmakers 15 10 5 0 5 0 2000 4000 6000 8000 10000 12000 Graph of he negaive regre Loss Ek (T ) Loss(T ), 4 Expers Learner is he AA for he log loss (Bayes mixure) Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 4 / 18
Tennis Predicions: Wrong Losses Graphs of he negaive regre Loss Ek (T ) Loss(T ) 25 20 Theoreical bounds Learner Bookmakers 15 Theoreical bounds Learner Bookmakers 15 10 10 5 5 0 0 5 10 0 2000 4000 6000 8000 10000 12000 log loss he AA for he square loss Learner opimizes for a wrong loss funcion 5 0 2000 4000 6000 8000 10000 12000 square loss he AA for he log loss Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 5 / 18
Aggregaing Algorihm wih Wrong Losses Fac For he game wih 2 oucomes, one can consruc a sequence of predicions of 2 Expers and a sequence of oucomes wih he following propery. If Learner s predicions are generaed by he Aggregaing Algorihm for he log loss hen for almos all T Loss(T ) Loss E1 (T ) + T /10, where Loss(T ) and Loss E1 (T ) are he square losses of Learner and Exper 1. A similar saemen holds for he Aggregaing Algorihm for he square loss evaluaed by he log loss. Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 6 / 18
New Seings Many loss funcions Expers: γ (1),..., γ (k) Learner: γ Realiy: ω Loss (m) E k (T ) = T =1 λ (m) (γ (k), ω ) T Loss (m) (T ) = λ (m) (γ, ω ) =1 Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 7 / 18
New Seings Expers: γ (1),..., γ (k) Learner: γ Realiy: ω Many loss funcions Loss (m) E k (T ) = T =1 λ (m) (γ (k), ω ) Exper Evaluaor s advice Loss (k) E k (T ) = T =1 λ (k) (γ (k), ω ) T Loss (m) (T ) = λ (m) (γ, ω ) =1 T Loss (k) (T ) = λ (k) (γ, ω ) =1 Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 7 / 18
Bound for New Seings Theorem If λ (k) are η (k) -mixable proper loss funcions, k = 1,..., K, Learner has a sraegy (e. g. he Defensive Forecasing algorihm) ha guaranees, for all T and for all k, ha Corollary Loss (k) (T ) Loss (k) E k (T ) + 1 ln K. η (k) If λ (m) are η (m) -mixable proper loss funcions, m = 1,..., M, Learner has a sraegy ha guaranees, for all T, for all k and for all m, ha Loss (m) (T ) Loss (m) E k (T ) + 1 (ln K + ln M). η (m) Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 8 / 18
Tennis Predicions, Two Losses Graphs of he negaive regre Loss (m) E k (T ) Loss (m) (T ) 16 14 12 Theoreical bounds Learner Bookmakers 25 20 Theoreical bounds Learner Bookmakers 10 15 8 6 10 4 2 5 0 0 2 4 0 2000 4000 6000 8000 10000 12000 square loss 5 0 2000 4000 6000 8000 10000 12000 log loss Learner opimizes for boh loss funcions, using he DF algorihm. Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 9 / 18
Defensive Forecasing Algorihm π ω K k=1 where p (k) 1 = p(k) 0 eη(loss( 1) Loss E k ( 1)) p (k) 1 eη(λ(π,ω) λ(π(k),ω)) 1, To ge his (from Levin s Lemma) we need ha λ(π, ω) is coninuous and for all π, π E π e η(λ(π, ) λ(π, )) = ω Ω π(ω)e η(λ(π,ω) λ(π,ω)) 1 Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 10 / 18
Defensive Forecasing Algorihm π ω K k=1 p (k) 1 eη(k) (λ (k) (π,ω) λ (k) (π (k),ω)) 1, where p (k) 1 = eη(k) p(k) (Loss (k) ( 1) Loss (k) E ( 1)) k 0 To ge his (from Levin s Lemma) we need ha λ (k) (π, ω) is coninuous and for all π, π E π e η(k) (λ (k) (π, ) λ (k) (π, )) = ω Ω π(ω)e η(k) (λ (k) (π,ω) λ (k) (π,ω)) 1 Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 10 / 18
The DFA and he AA λ is coninuous and π, π E π e η(λ(π, ) λ(π, )) 1 λ is η-mixable λ is η-mixable and? π, π E π e η(λ(π, ) λ(π, )) 1 Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 11 / 18
The DFA and he AA λ is coninuous and π, π E π e η(λ(π, ) λ(π, )) 1 λ is η-mixable π, π E π e η(λ(π, ) λ(π, )) 1 λ is proper λ is η-mixable and proper π, π E π e η(λ(π, ) λ(π, )) 1 Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 11 / 18
Proper Loss Funcions λ is proper if for any π, π P(Ω) E π λ(π, ) E π λ(π, ) If ω π hen E π λ(π, ω) is he expeced loss for predicion π. The expeced loss is minimal for he rue disribuion he forecaser is encouraged o give he rue probabiliies The square loss and he log loss are proper. Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 12 / 18
Example: Hellinger Loss λ Hellinger (γ, ω) = 1 2 r ( ) 2 γ(j) I {ω=j} j=1 The Hellinger loss is 2-mixable The Hellinger loss is no proper Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 13 / 18
Proper Mixable Loss Funcions Each mixable loss funcion λ(γ, ω) has a proper analogue λ proper (π, ω) such ha 1 π γ ω λ proper (π, ω) = λ(γ, ω) 2 π γ E π λ proper (π, ) E π λ(γ, ) Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 14 / 18
Proper Mixable Loss Funcions Each mixable loss funcion λ(γ, ω) has a proper analogue λ proper (π, ω) such ha 1 π γ ω λ proper (π, ω) = λ(γ, ω) 2 π γ E π λ proper (π, ) E π λ(γ, ) For he Hellinger loss, he proper analogue is he spherical loss λ spherical (π, ω) = 1 π(ω) r j=1 (π(j))2 λ spherical (π, ω) = λ Hellinger (γ, ω) for γ(ω) = P (π(ω))2 r j=1 (π(j))2 Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 14 / 18
Example: Mixable and Non-Mixable Losses Expers 1,..., K predic π (k) P({0, 1}). Expers 1,..., N predic γ (n) {0, 1}. Learner predics (π, π) P({0, 1}) P({0, 1}) such ha if π(0) > 1/2 hen π(0) = 1 and if π(1) > 1/2 hen π(1) = 1. There exiss a sraegy for Learner ha guaranees for any k T λ square (π, ω ) =1 and for any m T λ abs ( π, ω ) =1 T =1 T =1 λ square (π (k), ω ) + ln(k + N) λ simple (γ (n), ω ) + O( T ln(k + N) + T ln ln T ) Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 15 / 18
Tennis Predicions, Square and Absolue Losses Graphs of he negaive regre Loss (m) E k (T ) Loss (m) (T ) 2 1.5 1 heoreical bounds DF expers 0.5 0 DF expers 0.5 0 0.5 0.5 1 1 1.5 1.5 2 2.5 0 50 100 150 200 250 300 350 400 square loss 2 0 50 100 150 200 250 300 350 400 absolue loss Learner opimizes for boh loss funcions, using he DF algorihm wih mixabiliy and Hoeffding supermaringales. Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 16 / 18
The Mixed Supermaringale 1 K + N K k=1 e 2P T 1 + 1 K + N =1 ((p ω ) 2 (p (k) N n=1 1/e 0 η dη ( ln 1 η ω ) 2) e 2((p ω)2 (p (k) where p = π (1), p (k) = π (k) (1), p = π (1). [x y] = 1 if x y and [x y] = 0 if x = y. T ω)2 ) ) 2 e ηp T 1 =1 ( p ω [γ (n) ω ]) η 2 /2 e η( p ω [γ(n) T ω]) η2 /2 Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 17 / 18
References V. Vovk, F. Zhdanov. Predicions wih Exper Advice for Brier Game. ICML 2008. hp://arxiv.org/abs/0710.0485 A. Chernov, Y. Kalnishkan, F. Zhdanov, V. Vovk. Supermaringales in Predicion wih Exper Advice. ALT 2008 and TCS. hp://arxiv.org/abs/1003.2218 A. Chernov, V. Vovk. Predicion wih exper evaluaors advice. ALT 2009. hp://arxiv.org/abs/0902.4127 A. Chernov, V. Vovk. Predicion wih Advice of Unknown Number of Expers. UAI 2010. hp://arxiv.org/abs/1006.0475 hp://onlinepredicion.ne/ Alexey Chernov (RHUL) Muliobjecive Predicion wih Exper Advice GTP Workshop, June 2010 18 / 18