Un point de vue bayésien pour des algorithmes de bandit plus performants

Size: px
Start display at page:

Download "Un point de vue bayésien pour des algorithmes de bandit plus performants"

Transcription

1 Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 1 / 36

2 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 2 / 36

3 Two bandit problems 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 3 / 36

4 Two bandit problems Bandit model A multi-armed bandit model is a set of K arms where Arm a is an unknown probability distribution ν a with mean µ a Drawing arm a is observing a realization of ν a Arms are assumed to be independent In a bandit game, at round t, a forecaster chooses arm A t to draw based on past observations, according to its sampling strategy (or bandit algorithm) observes reward X t ν At The forecaster is to learn which arm(s) is (are) the best a = argmax a µ a Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 4 / 36

5 Two bandit problems Bernoulli bandit model A multi-armed bandit model is a set of K arms where Arm a is a Bernoulli distribution B(p a ) with unknown mean µ a = p a Drawing arm a is observing a realization of B(p a ) (0 or 1) Arms are assumed to be independent In a bandit game, at round t, a forecaster chooses arm A t to draw based on past observations, according to its sampling strategy (or bandit algorithm) observes reward X t B(p At ) The forecaster is to learn which arm(s) is (are) the best a = argmax a p a Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 5 / 36

6 Two bandit problems Regret minimization The classical bandit problem: regret minimization The forecaster wants to maximize the reward accumulated during learning or equivalentely minimize its regret: [ n ] R n = nµ a E X t He has to find a sampling strategy (or bandit algorithm) that realizes a tradeoff between exploration and exploitation t=1 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 6 / 36

7 Two bandit problems An alternative: pure-exploration Regret minimization The forecaster has to find the best arm(s), and does not suffer a loss when drawing bad arms. He has to find a sampling strategy that optimaly explores the environnement, together with a stopping criterion and to recommand a set S of m arms such that P(S is the set of m best arms) 1 δ. Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 7 / 36

8 Two bandit problems Regret minimization Zoom on an application: Online advertisement Yahoo!(c) has to choose between K different advertisement the one to display on its webpage for each user (indexed by t N). Ad number a unknown probability of click p a Unknown best advertisement a = argmax a p a If ad a is displayed for user t, he clicks on it with probability p a Yahoo!(c): chooses ad A t to display for user number t observes whether the user has clicked or not: X t B(p At ) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 8 / 36

9 Two bandit problems Regret minimization Zoom on an application: Online advertisement Yahoo!(c) has to choose between K different advertisement the one to display on its webpage for each user (indexed by t N). Ad number a unknown probability of click p a Unknown best advertisement a = argmax a p a If ad a is displayed for user t, he clicks on it with probability p a Yahoo!(c): chooses ad A t to display for user number t observes whether the user has clicked or not: X t B(p At ) Yahoo!(c) can ajust its strategy (A t ) so as to Regret minimization Maximize the number of clicks during n interactions Pure-exploration Identify the best advertisement with probability at least 1 δ Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 8 / 36

10 Two bandit problems Regret minimization Zoom on an application: Online advertisement Yahoo!(c) has to choose between K different advertisement the one to display on its webpage for each user (indexed by t N). Ad number a unknown probability of click p a Unknown best advertisement a = argmax a p a If ad a is displayed for user t, he clicks on it with probability p a Yahoo!(c): chooses ad A t to display for user number t observes whether the user has clicked or not: X t B(p At ) Yahoo!(c) can ajust its strategy (A t ) so as to Regret minimization Maximize the number of clicks during n interactions Pure-exploration Identify the best advertisement with probability at least 1 δ Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 9 / 36

11 Regret minimization: Bayesian bandits, frequentist bandits 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 10 / 36

12 Regret minimization: Bayesian bandits, frequentist bandits Two probabilistic modellings Two probabilistic models K independent arms. µ = µ a Frequentist : θ = (θ 1,..., θ K ) unknown parameter (Y a,t ) t is i.i.d. with distribution ν θa with mean µ a = µ(θ a ) highest expectation among the arms. Bayesian : i.i.d. θ a π a (Y a,t ) t is i.i.d. conditionally to θ a with distribution ν θa At time t, arm A t is chosen and reward X t = Y At,t is observed Two measures of performance Minimize regret R n (θ) = E θ [ n t=1 µ µ At ] Minimize Bayes risk Risk n = R n (θ)dπ(θ) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 11 / 36

13 Regret minimization: Bayesian bandits, frequentist bandits Frequentist tools, Bayesian tools Two probabilistic models Bandit algorithms based on frequentist tools use: MLE for the parameter of each arm confidence intervals for the mean of each arm Bandit algorithms based on Bayesian tools use: Π t = (π t 1,..., πt K ) the current posterior over (θ 1,..., θ K ) π t a = p(θ a past observations from arm a) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 12 / 36

14 Regret minimization: Bayesian bandits, frequentist bandits Frequentist tools, Bayesian tools Two probabilistic models Bandit algorithms based on frequentist tools use: MLE for the parameter of each arm confidence intervals for the mean of each arm Bandit algorithms based on Bayesian tools use: Π t = (π t 1,..., πt K ) the current posterior over (θ 1,..., θ K ) π t a = p(θ a past observations from arm a) One can separate tools and objectives: Objective Frequentist Bayesian algorithms algorithms Regret?? Bayes risk?? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 12 / 36

15 Regret minimization: Bayesian bandits, frequentist bandits Frequentist tools, Bayesian tools Two probabilistic models Bandit algorithms based on frequentist tools use: MLE for the parameter of each arm confidence intervals for the mean of each arm Bandit algorithms based on Bayesian tools use: Π t = (π t 1,..., πt K ) the current posterior over (θ 1,..., θ K ) π t a = p(θ a past observations from arm a) One can separate tools and objectives: Objective Frequentist Bayesian algorithms algorithms Regret?? Bayes risk?? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 13 / 36

16 Regret minimization: Bayesian bandits, frequentist bandits Our goal Two probabilistic models We want to design Bayesian algorithm that are optimal with respect to the frequentist regret Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 14 / 36

17 Regret minimization: Bayesian bandits, frequentist bandits Frequentist optimality Asymptotically optimal algorithms towards the regret N a (t) the number of draws of arm a up to time t K R n (θ) = (µ µ a )E θ [N a (n)] a=1 Lai and Robbins,1985 : every consistent algorithm satisfies µ a < µ lim inf n E θ [N a (n)] ln n 1 KL(ν θa, ν θ ) A bandit algorithm is asymptotically optimal if µ a < µ lim sup n E θ [N a (n)] ln n 1 KL(ν θa, ν θ ) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 15 / 36

18 Regret minimization: Bayesian bandits, frequentist bandits The optimism principle A family of frequentist algorithms The following heuristic defines a family of optimistic index policies: For each arm a, compute a confidence interval on the unknown mean: µ a UCB a (t) w.h.p Use the optimism-in-face-of-uncertainty principle: act as if the best possible model was the true model The algorithm chooses at time t A t = arg max a UCB a (t) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 16 / 36

19 Regret minimization: Bayesian bandits, frequentist bandits UCB Towards optimal algorithms for Bernoulli bandits UCB [Auer et al. 02] uses Hoeffding bounds: α log(t) UCB a (t) = ˆp a (t) + 2N a (t) where ˆp a (t) = Sa(t) N a(t) Finite-time bound: E[N a (n)] is the empirical mean of arm a. K 1 2(p p a ) 2 ln n + K 2, with K 1 > 1. Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 17 / 36

20 Regret minimization: Bayesian bandits, frequentist bandits KL-UCB Towards optimal algorithms for Bernoulli bandits KL-UCB[Cappé et al. 2013] uses the index: u a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) log t + c log log t} β(t,δ)/n a (t) with K(p, q) := KL (B(p), B(q)) = p log Finite-time bound: E[N a (n)] S a (t)/n a (t) ( ) ( ) p 1 p + (1 p) log q 1 q 1 K(p a, p ) ln n + C Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 18 / 36

21 Two Bayesian bandit algorithms 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 19 / 36

22 Two Bayesian bandit algorithms UCBs versus Bayesian algorithms Bayesian algorithms Figure : Confidence intervals for the arms means after t rounds of a bandit game Figure : Posterior over the arms means after t rounds of a bandit game Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 20 / 36

23 Two Bayesian bandit algorithms UCBs versus Bayesian algorithms Bayesian algorithms Figure : Confidence intervals for the arms means after t rounds of a bandit game Figure : Posterior over the arms means after t rounds of a bandit game How do we exploit the posterior in a Bayesian bandit algorithm? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 20 / 36

24 Two Bayesian bandit algorithms The Bayes-UCB algorithm 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 21 / 36

25 Two Bayesian bandit algorithms The Bayes-UCB algorithm The algorithm Let : Π 0 = (π 0 1,..., π0 K ) be a prior distribution over (θ 1,..., θ K ) Λ t = (λ t 1,..., λt K ) be the posterior over the means (µ 1,..., µ K ) a the end of round t The Bayes-UCB algorithm chooses at time t ( Q 1 A t = argmax a 1 t(log t) c, λt 1 a where Q(α, π) is the quantile of order α of the distribution π. For Benoulli bandits with uniform prior on the means: θ a = µ a = p a Λ t = Π t i.i.d θ a U([0, 1]) = Beta(1, 1) λ t a = π t a = Beta(S a (t) + 1, N a (t) S a (t) + 1) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 22 / 36 )

26 Two Bayesian bandit algorithms Theoretical results Theoretical results for Bernoulli bandits Bayes-UCB is asymptotically optimal Theorem [K.,Cappé,Garivier 2012] Let ɛ > 0. The Bayes-UCB algorithm using a uniform prior over the arms and with parameter c 5 satisfies E θ [N a (n)] 1 + ɛ K(p a, p ) log(n) + o ɛ,c (log(n)). Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 23 / 36

27 Two Bayesian bandit algorithms Link to a frequentist algorithm Theoretical results Bayes-UCB index is close to KL-UCB index: ũ a (t) q a (t) u a (t) with: { u a (t) = max q S ( ) } a(t) N a (t) : N Sa (t) a(t)k N a (t), q log t + c log log t { ũ a (t) = max q S ( ) a(t) N a (t) + 1 : (N Sa (t) a(t) + 1)K N a (t) + 1, q ( ) } t log + c log log t N a (t) + 2 Bayes-UCB appears to build automatically confidence intervals based on Kullback-Leibler divergence, that are adapted to the geometry of the problem in this specific case. Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 24 / 36

28 Two Bayesian bandit algorithms Thompson Sampling 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 25 / 36

29 Thompson Sampling Two Bayesian bandit algorithms The algorithm A randomized Bayesian algorithm: a {1..K}, θ a (t) πa t A t = argmax a µ(θ a (t)) (Recent) interest for this algorithm: a very old algorithm [Thompson 1933] partial analysis proposed [Granmo 2010][May, Korda, Lee, Leslie 2012] extensive numerical study beyond the Bernoulli case [Chapelle, Li 2011] first logarithmic upper bound on the regret [Agrawal,Goyal 2012] Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 26 / 36

30 Two Bayesian bandit algorithms The algorithm Thompson Sampling (Bernoulli bandits) A randomized Bayesian algorithm: a {1..K}, θ a (t) Beta(S a (t) + 1, N a (t) S a (t) + 1) A t = argmax a θ a (t) (Recent) interest for this algorithm: a very old algorithm [Thompson 1933] partial analysis proposed [Granmo 2010][May, Korda, Lee, Leslie 2012] extensive numerical study beyond the Bernoulli case [Chapelle, Li 2011] first logarithmic upper bound on the regret [Agrawal,Goyal 2012] Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 27 / 36

31 Two Bayesian bandit algorithms Theoretical results An optimal regret bound for Bernoulli bandits Assume the first arm is the unique optimal arm. Known result : [Agrawal,Goyal 2012] ( K ) 1 E[R n ] C p ln(n) + o µ (ln(n)) p a a=2 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 28 / 36

32 Two Bayesian bandit algorithms Theoretical results An optimal regret bound for Bernoulli bandits Assume the first arm is the unique optimal arm. Known result : [Agrawal,Goyal 2012] ( K ) 1 E[R n ] C p ln(n) + o µ (ln(n)) p a a=2 Our improvement : [K.,Korda,Munos 2012] Theorem ɛ > 0, E[R n ] (1 + ɛ) ( K a=2 p p a K(p a, p ) ) ln(n) + o µ,ɛ (ln(n)) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 28 / 36

33 In practise Two Bayesian bandit algorithms Practical conclusion θ = [ ] 400 UCB 400 KLUCB regret time time 400 BayesUCB 400 Thompson time time Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 29 / 36

34 Two Bayesian bandit algorithms Practical conclusion In practise In the Bernoulli case, for each arm, KL-UCB requires to solve an optimization problem: u a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) log t + c log log t} Bayes-UCB requires to compute one quantile of a Beta distribution Thompson requires to compute one sample of a Beta distribution Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 30 / 36

35 Two Bayesian bandit algorithms Practical conclusion In practise In the Bernoulli case, for each arm, KL-UCB requires to solve an optimization problem: u a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) log t + c log log t} Bayes-UCB requires to compute one quantile of a Beta distribution Thompson requires to compute one sample of a Beta distribution Other advantages of Bayesian algorithms: they easily generalize to more complex models......even when the posterior is not directly computable (using MCMC) the prior can incorporate correlation between arms Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 30 / 36

36 Conclusion and perspectives 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 31 / 36

37 Conclusion and perspectives Summary for regret minimization Future work: Objective Frequentist Bayesian algorithms algorithms Regret KL-UCB Bayes-UCB Thompson Sampling Bayes risk KL-UCB Gittins algorithm for finite horizon Is Gittins algorithm optimal with respect to the regret? Are our Bayesian algorithms efficient with respect to the Bayes risk? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 32 / 36

38 Conclusion and perspectives Bayesian algorithm for pure-exploration? At round t, the KL-LUCB algorithm ([K., Kalyanakrishnan, 13]) draws two well-chosen arms: u t and l t stops when CI for arms in J(t) and J(t) c are separated recommends the set of m empirical best arms m=3. Set J(t), arm l t in bold Set J(t) c, arm u t in bold Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 33 / 36

39 Conclusion and perspectives Bayesian algorithm for pure-exploration? KL-LUCB uses KL-confidence intervals: L a (t) = min {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}, U a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}. ( ) We use β(t, δ) = log k1 Kt α to make sure P(S = Sm) 1 δ. δ Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 34 / 36

40 Conclusion and perspectives Bayesian algorithm for pure-exploration? KL-LUCB uses KL-confidence intervals: L a (t) = min {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}, U a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}. ( ) We use β(t, δ) = log k1 Kt α to make sure P(S = Sm) 1 δ. δ How to propose a Bayesian algorithm that adapts to δ? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 34 / 36

41 Conclusion and perspectives Conclusion Regret minimization: Go Bayesian! Bayes-UCB show striking similarities with KL-UCB Thompson Sampling is an easy-to-implement alternative to the optimistic approach both algorithms are asymptotically optimal towards frequentist regret (and more efficient in practise) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 35 / 36

42 Conclusion and perspectives Conclusion Regret minimization: Go Bayesian! Bayes-UCB show striking similarities with KL-UCB Thompson Sampling is an easy-to-implement alternative to the optimistic approach both algorithms are asymptotically optimal towards frequentist regret (and more efficient in practise) TODO list: Go deeper into the link between Bayes risk and (frequentist) regret (Gittins frequentist optimality?) Obtain theoretical guarantees for Bayes-UCB and Thompson Sampling beyond Bernoulli bandit models (e.g. when rewards belong to the exponential family) Develop Bayesian algorithm for the pure-exploration objective? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 35 / 36

43 Conclusion and perspectives References E. Kaufmann, O. Cappé, and A. Garivier. On Bayesian upper confidence bounds for bandit problems. AISTATS 2012 E.Kaufmann, N.Korda, and R.Munos. Thompson Sampling: an asymptotically optimal finite-time analysis. ALT 2012 E.Kaufmann, S.Kalyanakrishnan. Information Complexity in Bandit Subset Selection, COLT 2013 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 36 / 36

Introduction Main Results Experiments Conclusions. The UCB Algorithm

Introduction Main Results Experiments Conclusions. The UCB Algorithm Paper Finite-time Analysis of the Multiarmed Bandit Problem by Auer, Cesa-Bianchi, Fischer, Machine Learning 27, 2002 Presented by Markus Enzenberger. Go Seminar, University of Alberta. March 14, 2007

More information

UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM

UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM PETER AUER AND RONALD ORTNER ABSTRACT In the stochastic multi-armed bandit problem we consider a modification of the

More information

Multi-Armed Bandit Problems

Multi-Armed Bandit Problems Machine Learning Coms-4771 Multi-Armed Bandit Problems Lecture 20 Multi-armed Bandit Problems The Setting: K arms (or actions) Each time t, each arm i pays off a bounded real-valued reward x i (t), say

More information

On Bayesian Upper Confidence Bounds for Bandit Problems

On Bayesian Upper Confidence Bounds for Bandit Problems Emilie Kaufmann Olivier Cappé Aurélien Garivier LTCI, CNRS & Telecom ParisTech Abstract Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter

More information

Lecture 3: UCB Algorithm, Worst-Case Regret Bound

Lecture 3: UCB Algorithm, Worst-Case Regret Bound IEOR 8100-001: Learning and Optimization for Sequential Decision Making 01/7/16 Lecture 3: UCB Algorithm, Worst-Case Regret Bound Instructor: Shipra Agrawal Scribed by: Karl Stratos = Jang Sun Lee 1 UCB

More information

Finite-time Analysis of the Multiarmed Bandit Problem*

Finite-time Analysis of the Multiarmed Bandit Problem* Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finite-time Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A-8010

More information

Online Network Revenue Management using Thompson Sampling

Online Network Revenue Management using Thompson Sampling Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira

More information

Yishay Mansour Google & Tel Aviv Univ. Many thanks for my co-authors: A. Blum, N. Cesa-Bianchi, and G. Stoltz

Yishay Mansour Google & Tel Aviv Univ. Many thanks for my co-authors: A. Blum, N. Cesa-Bianchi, and G. Stoltz Regret Minimization: Algorithms and Applications Yishay Mansour Google & Tel Aviv Univ. Many thanks for my co-authors: A. Blum, N. Cesa-Bianchi, and G. Stoltz 2 Weather Forecast Sunny: Rainy: No meteorological

More information

Online Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI

Online Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI Online Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI Licentiate Thesis Stockholm, Sweden 2016 TRITA-EE 2016:001 ISSN 1653-5146 ISBN 978-91-7595-836-1 KTH Royal Institute

More information

An Introduction to Game-theoretic Online Learning Tim van Erven

An Introduction to Game-theoretic Online Learning Tim van Erven General Mathematics Colloquium Leiden, December 4, 2014 An Introduction to Game-theoretic Online Learning Tim van Erven Statistics Bayesian Statistics Frequentist Statistics Statistics Bayesian Statistics

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Handling Advertisements of Unknown Quality in Search Advertising

Handling Advertisements of Unknown Quality in Search Advertising Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Carnegie Mellon University spandey@cs.cmu.edu Christopher Olston Yahoo! Research olston@yahoo-inc.com Abstract We consider

More information

Revenue Optimization against Strategic Buyers

Revenue Optimization against Strategic Buyers Revenue Optimization against Strategic Buyers Mehryar Mohri Courant Institute of Mathematical Sciences 251 Mercer Street New York, NY, 10012 Andrés Muñoz Medina Google Research 111 8th Avenue New York,

More information

Chapter 7. BANDIT PROBLEMS.

Chapter 7. BANDIT PROBLEMS. Chapter 7. BANDIT PROBLEMS. Bandit problems are problems in the area of sequential selection of experiments, and they are related to stopping rule problems through the theorem of Gittins and Jones (974).

More information

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

More information

Beat the Mean Bandit

Beat the Mean Bandit Yisong Yue H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, USA Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY, USA yisongyue@cmu.edu tj@cs.cornell.edu

More information

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising Min Xu Machine Learning Department Carnegie Mellon University minx@cs.cmu.edu ao Qin Microsoft Research Asia taoqin@microsoft.com

More information

Fast and Robust Normal Estimation for Point Clouds with Sharp Features

Fast and Robust Normal Estimation for Point Clouds with Sharp Features 1/37 Fast and Robust Normal Estimation for Point Clouds with Sharp Features Alexandre Boulch & Renaud Marlet University Paris-Est, LIGM (UMR CNRS), Ecole des Ponts ParisTech Symposium on Geometry Processing

More information

Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising

Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Submitted to Operations Research manuscript OPRE-2009-06-288 Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Microsoft Research, Cambridge, MA, hamidnz@microsoft.com

More information

Algorithms for the multi-armed bandit problem

Algorithms for the multi-armed bandit problem Journal of Machine Learning Research 1 (2000) 1-48 Submitted 4/00; Published 10/00 Algorithms for the multi-armed bandit problem Volodymyr Kuleshov Doina Precup School of Computer Science McGill University

More information

Contextual-Bandit Approach to Recommendation Konstantin Knauf

Contextual-Bandit Approach to Recommendation Konstantin Knauf Contextual-Bandit Approach to Recommendation Konstantin Knauf 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 1 Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation

More information

A crash course in probability and Naïve Bayes classification

A crash course in probability and Naïve Bayes classification Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems OPERATIONS RESEARCH Vol. 6, No. 1, January February 212, pp. 18 195 ISSN 3-364X (print) ISSN 1526-5463 (online) http://d.doi.org/1.1287/opre.111.999 212 INFORMS The Knowledge Gradient Algorithm for a General

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Online Learning in Bandit Problems

Online Learning in Bandit Problems Online Learning in Bandit Problems by Cem Tekin A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering: Systems) in The University

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

In Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208.

In Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208. In Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208. ÔØ Ö ½ ÇÈÌÁÅÁ ÁÆ ÍÆÁÅÇ Ä Ê ËÈÇÆË ÍÆ ÌÁÇÆ ÇÊ ÁÆ Ê Î ÊÁ Ä Ë Â Ò À Ö Û ÈÙÖ Ù ÍÒ Ú Ö ØÝ Ï Ø Ä Ý ØØ ÁÆ ¼

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

More information

Online Learning with Switching Costs and Other Adaptive Adversaries

Online Learning with Switching Costs and Other Adaptive Adversaries Online Learning with Switching Costs and Other Adaptive Adversaries Nicolò Cesa-Bianchi Università degli Studi di Milano Italy Ofer Dekel Microsoft Research USA Ohad Shamir Microsoft Research and the Weizmann

More information

Signal detection and goodness-of-fit: the Berk-Jones statistics revisited

Signal detection and goodness-of-fit: the Berk-Jones statistics revisited Signal detection and goodness-of-fit: the Berk-Jones statistics revisited Jon A. Wellner (Seattle) INET Big Data Conference INET Big Data Conference, Cambridge September 29-30, 2015 Based on joint work

More information

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012 Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

More information

The Intersection of Statistics and Topology:

The Intersection of Statistics and Topology: The Intersection of Statistics and Topology: Confidence Sets Brittany Terese Fasy joint work with S. Balakrishnan, F. Chazal, F. Lecci, A. Rinaldo, A. Singh, L. Wasserman 18 January 2014 B. Fasy (Tulane)

More information

Regret Minimization for Reserve Prices in Second-Price Auctions

Regret Minimization for Reserve Prices in Second-Price Auctions Regret Minimization for Reserve Prices in Second-Price Auctions Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Claudio Gentile (Varese) and Yishay Mansour (Tel-Aviv) N. Cesa-Bianchi

More information

Introduction to Detection Theory

Introduction to Detection Theory Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection

More information

arxiv:1306.0155v1 [cs.lg] 1 Jun 2013

arxiv:1306.0155v1 [cs.lg] 1 Jun 2013 Dynamic ad allocation: bandits with budgets Aleksandrs Slivkins May 2013 arxiv:1306.0155v1 [cs.lg] 1 Jun 2013 Abstract We consider an application of multi-armed bandits to internet advertising (specifically,

More information

Machine Learning tools for Online Advertisement

Machine Learning tools for Online Advertisement Year 2008-2009. Research Internship Report Machine Learning tools for Online Advertisement Victor Gabillon victorgabillon@inria.fr Supervisors INRIA : Philippe Preux Jérémie Mary Supervisors Orange Labs

More information

A Survey of Preference-based Online Learning with Bandit Algorithms

A Survey of Preference-based Online Learning with Bandit Algorithms A Survey of Preference-based Online Learning with Bandit Algorithms Róbert Busa-Fekete and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {busarobi,eyke}@upb.de Abstract.

More information

The Price of Truthfulness for Pay-Per-Click Auctions

The Price of Truthfulness for Pay-Per-Click Auctions Technical Report TTI-TR-2008-3 The Price of Truthfulness for Pay-Per-Click Auctions Nikhil Devanur Microsoft Research Sham M. Kakade Toyota Technological Institute at Chicago ABSTRACT We analyze the problem

More information

Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng

Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution

More information

Monotone multi-armed bandit allocations

Monotone multi-armed bandit allocations JMLR: Workshop and Conference Proceedings 19 (2011) 829 833 24th Annual Conference on Learning Theory Monotone multi-armed bandit allocations Aleksandrs Slivkins Microsoft Research Silicon Valley, Mountain

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

On the Complexity of A/B Testing

On the Complexity of A/B Testing JMLR: Workshop and Conference Proceedings vol 35:1 3, 014 On the Complexity of A/B Testing Emilie Kaufmann LTCI, Télécom ParisTech & CNRS KAUFMANN@TELECOM-PARISTECH.FR Olivier Cappé CAPPE@TELECOM-PARISTECH.FR

More information

Summary of Probability

Summary of Probability Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

More information

Adaptive Bidding for Display Advertising

Adaptive Bidding for Display Advertising Adaptive Bidding for Display Advertising ABSTRACT Arpita Ghosh Yahoo! Research 2821 Mission College Blvd. Santa Clara, CA 95054 arpita@yahoo-inc.com Sergei Vassilvitskii Yahoo! Research 111 West 40th St.,

More information

Online Algorithms: Learning & Optimization with No Regret.

Online Algorithms: Learning & Optimization with No Regret. Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the

More information

Big Data, Statistics, and the Internet

Big Data, Statistics, and the Internet Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes

More information

Lecture 4: Random Variables

Lecture 4: Random Variables Lecture 4: Random Variables 1. Definition of Random variables 1.1 Measurable functions and random variables 1.2 Reduction of the measurability condition 1.3 Transformation of random variables 1.4 σ-algebra

More information

Performance measures for Neyman-Pearson classification

Performance measures for Neyman-Pearson classification Performance measures for Neyman-Pearson classification Clayton Scott Abstract In the Neyman-Pearson (NP) classification paradigm, the goal is to learn a classifier from labeled training data such that

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

Two-Sided Bandits and the Dating Market

Two-Sided Bandits and the Dating Market Two-Sided Bandits and the Dating Market Sanmay Das Center for Biological and Computational Learning and Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge,

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

More information

Set theory, and set operations

Set theory, and set operations 1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we

More information

ECONOMIC MECHANISMS FOR ONLINE PAY PER CLICK ADVERTISING: COMPLEXITY, ALGORITHMS AND LEARNING

ECONOMIC MECHANISMS FOR ONLINE PAY PER CLICK ADVERTISING: COMPLEXITY, ALGORITHMS AND LEARNING POLITECNICO DI MILANO DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINFORMATICA DOCTORAL PROGRAMME IN INFORMATION TECHNOLOGY ECONOMIC MECHANISMS FOR ONLINE PAY PER CLICK ADVERTISING: COMPLEXITY, ALGORITHMS

More information

our algorithms will work and our results will hold even when the context space is discrete given that it is bounded. arm (or an alternative).

our algorithms will work and our results will hold even when the context space is discrete given that it is bounded. arm (or an alternative). 1 Context-Adaptive Big Data Stream Mining - Online Appendix Cem Tekin*, Member, IEEE, Luca Canzian, Member, IEEE, Mihaela van der Schaar, Fellow, IEEE Electrical Engineering Department, University of California,

More information

Abstract. Keywords. restless bandits, reinforcement learning, sequential decision-making, change detection, non-stationary environments

Abstract. Keywords. restless bandits, reinforcement learning, sequential decision-making, change detection, non-stationary environments Modeling Human Performance in Restless Bandits with Particle Filters Sheng Kung M. Yi 1, Mark Steyvers 1, and Michael Lee 1 Abstract Bandit problems provide an interesting and widely-used setting for the

More information

Monte Carlo Methods in Finance

Monte Carlo Methods in Finance Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction

More information

Model for dynamic website optimization

Model for dynamic website optimization Chapter 10 Model for dynamic website optimization In this chapter we develop a mathematical model for dynamic website optimization. The model is designed to optimize websites through autonomous management

More information

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:

More information

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( ) { } ( ) = ( ) = {,,, } ( ) β ( ), < 1 ( ) + ( ) = ( ) + ( ) max, ( ) [ ( )] + ( ) [ ( )], [ ( )] [ ( )] = =, ( ) = ( ) = 0 ( ) = ( ) ( ) ( ) =, ( ), ( ) =, ( ), ( ). ln ( ) = ln ( ). + 1 ( ) = ( ) Ω[ (

More information

Strongly Adaptive Online Learning

Strongly Adaptive Online Learning Amit Daniely Alon Gonen Shai Shalev-Shwartz The Hebrew University AMIT.DANIELY@MAIL.HUJI.AC.IL ALONGNN@CS.HUJI.AC.IL SHAIS@CS.HUJI.AC.IL Abstract Strongly adaptive algorithms are algorithms whose performance

More information

4 Learning, Regret minimization, and Equilibria

4 Learning, Regret minimization, and Equilibria 4 Learning, Regret minimization, and Equilibria A. Blum and Y. Mansour Abstract Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive

More information

SECOND PART, LECTURE 4: CONFIDENCE INTERVALS

SECOND PART, LECTURE 4: CONFIDENCE INTERVALS Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 4: CONFIDENCE INTERVALS Lecture 4: Confidence Intervals

More information

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.

More information

An Introduction to Extreme Value Theory

An Introduction to Extreme Value Theory An Introduction to Extreme Value Theory Petra Friederichs Meteorological Institute University of Bonn COPS Summer School, July/August, 2007 Applications of EVT Finance distribution of income has so called

More information

ABC methods for model choice in Gibbs random fields

ABC methods for model choice in Gibbs random fields ABC methods for model choice in Gibbs random fields Jean-Michel Marin Institut de Mathématiques et Modélisation Université Montpellier 2 Joint work with Aude Grelaud, Christian Robert, François Rodolphe,

More information

Consistent Binary Classification with Generalized Performance Metrics

Consistent Binary Classification with Generalized Performance Metrics Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014 Problem and Motivation

More information

Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising

Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Amin Saberi Rakesh Vohra July 13, 2007 Abstract We examine the problem of allocating a resource repeatedly over

More information

FULL LIST OF REFEREED JOURNAL PUBLICATIONS Qihe Tang

FULL LIST OF REFEREED JOURNAL PUBLICATIONS Qihe Tang FULL LIST OF REFEREED JOURNAL PUBLICATIONS Qihe Tang 87. Li, J.; Tang, Q. Interplay of insurance and financial risks in a discrete-time model with strongly regular variation. Bernoulli 21 (2015), no. 3,

More information

Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising

Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Amin Saberi Rakesh Vohra September 24, 2012 Abstract We examine the problem of allocating an item repeatedly over

More information

Average Y Average Predicted. Calibration. Dean P. Foster Amazon

Average Y Average Predicted. Calibration. Dean P. Foster Amazon 0.5 0.4 Average Y 0.3 0.2 0.1 0.1 0.15 0.2 0.25 Average Predicted Calibration Dean P. Foster Amazon Outline Calibration for humans Calibration for big data Theory of calibrated Game theory: Convergence

More information

L1 vs. L2 Regularization and feature selection.

L1 vs. L2 Regularization and feature selection. L1 vs. L2 Regularization and feature selection. Paper by Andrew Ng (2004) Presentation by Afshin Rostami Main Topics Covering Numbers Definition Convergence Bounds L1 regularized logistic regression L1

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Big Data: The Computation/Statistics Interface

Big Data: The Computation/Statistics Interface Big Data: The Computation/Statistics Interface Michael I. Jordan University of California, Berkeley September 2, 2013 What Is the Big Data Phenomenon? Big Science is generating massive datasets to be used

More information

Information Compexity in Bandit Subset Selection Emilie Kaufmann and Shivaram Kalyanakrishnan

Information Compexity in Bandit Subset Selection Emilie Kaufmann and Shivaram Kalyanakrishnan Information Compexity in Bandit Subset Selection Emilie Kaufmann and Shivaram Kalyanakrishnan In JMLR Workshop and Conference Proceedings Conference on Learning Theory, 203), 30:228 25, 203. JMLR: Workshop

More information

The Kelly Betting System for Favorable Games.

The Kelly Betting System for Favorable Games. The Kelly Betting System for Favorable Games. Thomas Ferguson, Statistics Department, UCLA A Simple Example. Suppose that each day you are offered a gamble with probability 2/3 of winning and probability

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Computational Game Theory and Clustering

Computational Game Theory and Clustering Computational Game Theory and Clustering Martin Hoefer mhoefer@mpi-inf.mpg.de 1 Computational Game Theory? 2 Complexity and Computation of Equilibrium 3 Bounding Inefficiencies 4 Conclusion Computational

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

MIN-MAX CONFIDENCE INTERVALS

MIN-MAX CONFIDENCE INTERVALS MIN-MAX CONFIDENCE INTERVALS Johann Christoph Strelen Rheinische Friedrich Wilhelms Universität Bonn Römerstr. 164, 53117 Bonn, Germany E-mail: strelen@cs.uni-bonn.de July 2004 STOCHASTIC SIMULATION Random

More information

Senior Secondary Australian Curriculum

Senior Secondary Australian Curriculum Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

More information

A direct approach to false discovery rates

A direct approach to false discovery rates J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiple-hypothesis

More information

Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach

Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach Rémi Bardenet Arnaud Doucet Chris Holmes Department of Statistics, University of Oxford, Oxford OX1 3TG, UK REMI.BARDENET@GMAIL.COM

More information

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR) 2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

More information

Stochastic Online Greedy Learning with Semi-bandit Feedbacks

Stochastic Online Greedy Learning with Semi-bandit Feedbacks Stochastic Online Greedy Learning with Semi-bandit Feedbacks Tian Lin Tsinghua University Beijing, China lintian06@gmail.com Jian Li Tsinghua University Beijing, China lapordge@gmail.com Wei Chen Microsoft

More information

Examples of decision problems:

Examples of decision problems: Examples of decision problems: The Investment Problem: a. Problem Statement: i. Give n possible shares to invest in, provide a distribution of wealth on stocks. ii. The game is repetitive each time we

More information

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL POMDP Tutorial Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL Observation Belief ACTOR Transition Dynamics WORLD b Policy π Action Markov Decision Process -X: set of states [x s,x r

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

LARGE CLASSES OF EXPERTS

LARGE CLASSES OF EXPERTS LARGE CLASSES OF EXPERTS Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 31, 2006 OUTLINE 1 TRACKING THE BEST EXPERT 2 FIXED SHARE FORECASTER 3 VARIABLE-SHARE

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

More information

S = k=1 k. ( 1) k+1 N S N S N

S = k=1 k. ( 1) k+1 N S N S N Alternating Series Last time we talked about convergence of series. Our two most important tests, the integral test and the (limit) comparison test, both required that the terms of the series were (eventually)

More information

Improvements to the Linear Programming Based Scheduling of Web Advertisements

Improvements to the Linear Programming Based Scheduling of Web Advertisements Electronic Commerce Research, 5: 75 98 (2005) 2005 Springer Science + Business Media, Inc. Manufactured in the Netherlands. Improvements to the Linear Programming Based Scheduling of Web Advertisements

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

A Contextual-Bandit Approach to Personalized News Article Recommendation

A Contextual-Bandit Approach to Personalized News Article Recommendation A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong Li,WeiChu, Yahoo! Labs lihong,chuwei@yahooinc.com John Langford Yahoo! Labs jl@yahoo-inc.com Robert E. Schapire + + Dept

More information