Un point de vue bayésien pour des algorithmes de bandit plus performants

Size: px
Start display at page:

Download "Un point de vue bayésien pour des algorithmes de bandit plus performants"

Transcription

1 Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 1 / 36

2 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 2 / 36

3 Two bandit problems 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 3 / 36

4 Two bandit problems Bandit model A multi-armed bandit model is a set of K arms where Arm a is an unknown probability distribution ν a with mean µ a Drawing arm a is observing a realization of ν a Arms are assumed to be independent In a bandit game, at round t, a forecaster chooses arm A t to draw based on past observations, according to its sampling strategy (or bandit algorithm) observes reward X t ν At The forecaster is to learn which arm(s) is (are) the best a = argmax a µ a Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 4 / 36

5 Two bandit problems Bernoulli bandit model A multi-armed bandit model is a set of K arms where Arm a is a Bernoulli distribution B(p a ) with unknown mean µ a = p a Drawing arm a is observing a realization of B(p a ) (0 or 1) Arms are assumed to be independent In a bandit game, at round t, a forecaster chooses arm A t to draw based on past observations, according to its sampling strategy (or bandit algorithm) observes reward X t B(p At ) The forecaster is to learn which arm(s) is (are) the best a = argmax a p a Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 5 / 36

6 Two bandit problems Regret minimization The classical bandit problem: regret minimization The forecaster wants to maximize the reward accumulated during learning or equivalentely minimize its regret: [ n ] R n = nµ a E X t He has to find a sampling strategy (or bandit algorithm) that realizes a tradeoff between exploration and exploitation t=1 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 6 / 36

7 Two bandit problems An alternative: pure-exploration Regret minimization The forecaster has to find the best arm(s), and does not suffer a loss when drawing bad arms. He has to find a sampling strategy that optimaly explores the environnement, together with a stopping criterion and to recommand a set S of m arms such that P(S is the set of m best arms) 1 δ. Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 7 / 36

8 Two bandit problems Regret minimization Zoom on an application: Online advertisement Yahoo!(c) has to choose between K different advertisement the one to display on its webpage for each user (indexed by t N). Ad number a unknown probability of click p a Unknown best advertisement a = argmax a p a If ad a is displayed for user t, he clicks on it with probability p a Yahoo!(c): chooses ad A t to display for user number t observes whether the user has clicked or not: X t B(p At ) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 8 / 36

9 Two bandit problems Regret minimization Zoom on an application: Online advertisement Yahoo!(c) has to choose between K different advertisement the one to display on its webpage for each user (indexed by t N). Ad number a unknown probability of click p a Unknown best advertisement a = argmax a p a If ad a is displayed for user t, he clicks on it with probability p a Yahoo!(c): chooses ad A t to display for user number t observes whether the user has clicked or not: X t B(p At ) Yahoo!(c) can ajust its strategy (A t ) so as to Regret minimization Maximize the number of clicks during n interactions Pure-exploration Identify the best advertisement with probability at least 1 δ Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 8 / 36

10 Two bandit problems Regret minimization Zoom on an application: Online advertisement Yahoo!(c) has to choose between K different advertisement the one to display on its webpage for each user (indexed by t N). Ad number a unknown probability of click p a Unknown best advertisement a = argmax a p a If ad a is displayed for user t, he clicks on it with probability p a Yahoo!(c): chooses ad A t to display for user number t observes whether the user has clicked or not: X t B(p At ) Yahoo!(c) can ajust its strategy (A t ) so as to Regret minimization Maximize the number of clicks during n interactions Pure-exploration Identify the best advertisement with probability at least 1 δ Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 9 / 36

11 Regret minimization: Bayesian bandits, frequentist bandits 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 10 / 36

12 Regret minimization: Bayesian bandits, frequentist bandits Two probabilistic modellings Two probabilistic models K independent arms. µ = µ a Frequentist : θ = (θ 1,..., θ K ) unknown parameter (Y a,t ) t is i.i.d. with distribution ν θa with mean µ a = µ(θ a ) highest expectation among the arms. Bayesian : i.i.d. θ a π a (Y a,t ) t is i.i.d. conditionally to θ a with distribution ν θa At time t, arm A t is chosen and reward X t = Y At,t is observed Two measures of performance Minimize regret R n (θ) = E θ [ n t=1 µ µ At ] Minimize Bayes risk Risk n = R n (θ)dπ(θ) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 11 / 36

13 Regret minimization: Bayesian bandits, frequentist bandits Frequentist tools, Bayesian tools Two probabilistic models Bandit algorithms based on frequentist tools use: MLE for the parameter of each arm confidence intervals for the mean of each arm Bandit algorithms based on Bayesian tools use: Π t = (π t 1,..., πt K ) the current posterior over (θ 1,..., θ K ) π t a = p(θ a past observations from arm a) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 12 / 36

14 Regret minimization: Bayesian bandits, frequentist bandits Frequentist tools, Bayesian tools Two probabilistic models Bandit algorithms based on frequentist tools use: MLE for the parameter of each arm confidence intervals for the mean of each arm Bandit algorithms based on Bayesian tools use: Π t = (π t 1,..., πt K ) the current posterior over (θ 1,..., θ K ) π t a = p(θ a past observations from arm a) One can separate tools and objectives: Objective Frequentist Bayesian algorithms algorithms Regret?? Bayes risk?? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 12 / 36

15 Regret minimization: Bayesian bandits, frequentist bandits Frequentist tools, Bayesian tools Two probabilistic models Bandit algorithms based on frequentist tools use: MLE for the parameter of each arm confidence intervals for the mean of each arm Bandit algorithms based on Bayesian tools use: Π t = (π t 1,..., πt K ) the current posterior over (θ 1,..., θ K ) π t a = p(θ a past observations from arm a) One can separate tools and objectives: Objective Frequentist Bayesian algorithms algorithms Regret?? Bayes risk?? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 13 / 36

16 Regret minimization: Bayesian bandits, frequentist bandits Our goal Two probabilistic models We want to design Bayesian algorithm that are optimal with respect to the frequentist regret Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 14 / 36

17 Regret minimization: Bayesian bandits, frequentist bandits Frequentist optimality Asymptotically optimal algorithms towards the regret N a (t) the number of draws of arm a up to time t K R n (θ) = (µ µ a )E θ [N a (n)] a=1 Lai and Robbins,1985 : every consistent algorithm satisfies µ a < µ lim inf n E θ [N a (n)] ln n 1 KL(ν θa, ν θ ) A bandit algorithm is asymptotically optimal if µ a < µ lim sup n E θ [N a (n)] ln n 1 KL(ν θa, ν θ ) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 15 / 36

18 Regret minimization: Bayesian bandits, frequentist bandits The optimism principle A family of frequentist algorithms The following heuristic defines a family of optimistic index policies: For each arm a, compute a confidence interval on the unknown mean: µ a UCB a (t) w.h.p Use the optimism-in-face-of-uncertainty principle: act as if the best possible model was the true model The algorithm chooses at time t A t = arg max a UCB a (t) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 16 / 36

19 Regret minimization: Bayesian bandits, frequentist bandits UCB Towards optimal algorithms for Bernoulli bandits UCB [Auer et al. 02] uses Hoeffding bounds: α log(t) UCB a (t) = ˆp a (t) + 2N a (t) where ˆp a (t) = Sa(t) N a(t) Finite-time bound: E[N a (n)] is the empirical mean of arm a. K 1 2(p p a ) 2 ln n + K 2, with K 1 > 1. Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 17 / 36

20 Regret minimization: Bayesian bandits, frequentist bandits KL-UCB Towards optimal algorithms for Bernoulli bandits KL-UCB[Cappé et al. 2013] uses the index: u a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) log t + c log log t} β(t,δ)/n a (t) with K(p, q) := KL (B(p), B(q)) = p log Finite-time bound: E[N a (n)] S a (t)/n a (t) ( ) ( ) p 1 p + (1 p) log q 1 q 1 K(p a, p ) ln n + C Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 18 / 36

21 Two Bayesian bandit algorithms 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 19 / 36

22 Two Bayesian bandit algorithms UCBs versus Bayesian algorithms Bayesian algorithms Figure : Confidence intervals for the arms means after t rounds of a bandit game Figure : Posterior over the arms means after t rounds of a bandit game Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 20 / 36

23 Two Bayesian bandit algorithms UCBs versus Bayesian algorithms Bayesian algorithms Figure : Confidence intervals for the arms means after t rounds of a bandit game Figure : Posterior over the arms means after t rounds of a bandit game How do we exploit the posterior in a Bayesian bandit algorithm? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 20 / 36

24 Two Bayesian bandit algorithms The Bayes-UCB algorithm 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 21 / 36

25 Two Bayesian bandit algorithms The Bayes-UCB algorithm The algorithm Let : Π 0 = (π 0 1,..., π0 K ) be a prior distribution over (θ 1,..., θ K ) Λ t = (λ t 1,..., λt K ) be the posterior over the means (µ 1,..., µ K ) a the end of round t The Bayes-UCB algorithm chooses at time t ( Q 1 A t = argmax a 1 t(log t) c, λt 1 a where Q(α, π) is the quantile of order α of the distribution π. For Benoulli bandits with uniform prior on the means: θ a = µ a = p a Λ t = Π t i.i.d θ a U([0, 1]) = Beta(1, 1) λ t a = π t a = Beta(S a (t) + 1, N a (t) S a (t) + 1) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 22 / 36 )

26 Two Bayesian bandit algorithms Theoretical results Theoretical results for Bernoulli bandits Bayes-UCB is asymptotically optimal Theorem [K.,Cappé,Garivier 2012] Let ɛ > 0. The Bayes-UCB algorithm using a uniform prior over the arms and with parameter c 5 satisfies E θ [N a (n)] 1 + ɛ K(p a, p ) log(n) + o ɛ,c (log(n)). Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 23 / 36

27 Two Bayesian bandit algorithms Link to a frequentist algorithm Theoretical results Bayes-UCB index is close to KL-UCB index: ũ a (t) q a (t) u a (t) with: { u a (t) = max q S ( ) } a(t) N a (t) : N Sa (t) a(t)k N a (t), q log t + c log log t { ũ a (t) = max q S ( ) a(t) N a (t) + 1 : (N Sa (t) a(t) + 1)K N a (t) + 1, q ( ) } t log + c log log t N a (t) + 2 Bayes-UCB appears to build automatically confidence intervals based on Kullback-Leibler divergence, that are adapted to the geometry of the problem in this specific case. Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 24 / 36

28 Two Bayesian bandit algorithms Thompson Sampling 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 25 / 36

29 Thompson Sampling Two Bayesian bandit algorithms The algorithm A randomized Bayesian algorithm: a {1..K}, θ a (t) πa t A t = argmax a µ(θ a (t)) (Recent) interest for this algorithm: a very old algorithm [Thompson 1933] partial analysis proposed [Granmo 2010][May, Korda, Lee, Leslie 2012] extensive numerical study beyond the Bernoulli case [Chapelle, Li 2011] first logarithmic upper bound on the regret [Agrawal,Goyal 2012] Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 26 / 36

30 Two Bayesian bandit algorithms The algorithm Thompson Sampling (Bernoulli bandits) A randomized Bayesian algorithm: a {1..K}, θ a (t) Beta(S a (t) + 1, N a (t) S a (t) + 1) A t = argmax a θ a (t) (Recent) interest for this algorithm: a very old algorithm [Thompson 1933] partial analysis proposed [Granmo 2010][May, Korda, Lee, Leslie 2012] extensive numerical study beyond the Bernoulli case [Chapelle, Li 2011] first logarithmic upper bound on the regret [Agrawal,Goyal 2012] Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 27 / 36

31 Two Bayesian bandit algorithms Theoretical results An optimal regret bound for Bernoulli bandits Assume the first arm is the unique optimal arm. Known result : [Agrawal,Goyal 2012] ( K ) 1 E[R n ] C p ln(n) + o µ (ln(n)) p a a=2 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 28 / 36

32 Two Bayesian bandit algorithms Theoretical results An optimal regret bound for Bernoulli bandits Assume the first arm is the unique optimal arm. Known result : [Agrawal,Goyal 2012] ( K ) 1 E[R n ] C p ln(n) + o µ (ln(n)) p a a=2 Our improvement : [K.,Korda,Munos 2012] Theorem ɛ > 0, E[R n ] (1 + ɛ) ( K a=2 p p a K(p a, p ) ) ln(n) + o µ,ɛ (ln(n)) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 28 / 36

33 In practise Two Bayesian bandit algorithms Practical conclusion θ = [ ] 400 UCB 400 KLUCB regret time time 400 BayesUCB 400 Thompson time time Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 29 / 36

34 Two Bayesian bandit algorithms Practical conclusion In practise In the Bernoulli case, for each arm, KL-UCB requires to solve an optimization problem: u a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) log t + c log log t} Bayes-UCB requires to compute one quantile of a Beta distribution Thompson requires to compute one sample of a Beta distribution Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 30 / 36

35 Two Bayesian bandit algorithms Practical conclusion In practise In the Bernoulli case, for each arm, KL-UCB requires to solve an optimization problem: u a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) log t + c log log t} Bayes-UCB requires to compute one quantile of a Beta distribution Thompson requires to compute one sample of a Beta distribution Other advantages of Bayesian algorithms: they easily generalize to more complex models......even when the posterior is not directly computable (using MCMC) the prior can incorporate correlation between arms Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 30 / 36

36 Conclusion and perspectives 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The Bayes-UCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 31 / 36

37 Conclusion and perspectives Summary for regret minimization Future work: Objective Frequentist Bayesian algorithms algorithms Regret KL-UCB Bayes-UCB Thompson Sampling Bayes risk KL-UCB Gittins algorithm for finite horizon Is Gittins algorithm optimal with respect to the regret? Are our Bayesian algorithms efficient with respect to the Bayes risk? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 32 / 36

38 Conclusion and perspectives Bayesian algorithm for pure-exploration? At round t, the KL-LUCB algorithm ([K., Kalyanakrishnan, 13]) draws two well-chosen arms: u t and l t stops when CI for arms in J(t) and J(t) c are separated recommends the set of m empirical best arms m=3. Set J(t), arm l t in bold Set J(t) c, arm u t in bold Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 33 / 36

39 Conclusion and perspectives Bayesian algorithm for pure-exploration? KL-LUCB uses KL-confidence intervals: L a (t) = min {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}, U a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}. ( ) We use β(t, δ) = log k1 Kt α to make sure P(S = Sm) 1 δ. δ Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 34 / 36

40 Conclusion and perspectives Bayesian algorithm for pure-exploration? KL-LUCB uses KL-confidence intervals: L a (t) = min {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}, U a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}. ( ) We use β(t, δ) = log k1 Kt α to make sure P(S = Sm) 1 δ. δ How to propose a Bayesian algorithm that adapts to δ? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 34 / 36

41 Conclusion and perspectives Conclusion Regret minimization: Go Bayesian! Bayes-UCB show striking similarities with KL-UCB Thompson Sampling is an easy-to-implement alternative to the optimistic approach both algorithms are asymptotically optimal towards frequentist regret (and more efficient in practise) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 35 / 36

42 Conclusion and perspectives Conclusion Regret minimization: Go Bayesian! Bayes-UCB show striking similarities with KL-UCB Thompson Sampling is an easy-to-implement alternative to the optimistic approach both algorithms are asymptotically optimal towards frequentist regret (and more efficient in practise) TODO list: Go deeper into the link between Bayes risk and (frequentist) regret (Gittins frequentist optimality?) Obtain theoretical guarantees for Bayes-UCB and Thompson Sampling beyond Bernoulli bandit models (e.g. when rewards belong to the exponential family) Develop Bayesian algorithm for the pure-exploration objective? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 35 / 36

43 Conclusion and perspectives References E. Kaufmann, O. Cappé, and A. Garivier. On Bayesian upper confidence bounds for bandit problems. AISTATS 2012 E.Kaufmann, N.Korda, and R.Munos. Thompson Sampling: an asymptotically optimal finite-time analysis. ALT 2012 E.Kaufmann, S.Kalyanakrishnan. Information Complexity in Bandit Subset Selection, COLT 2013 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 36 / 36

Online Network Revenue Management using Thompson Sampling

Online Network Revenue Management using Thompson Sampling Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira

More information

Finite-time Analysis of the Multiarmed Bandit Problem*

Finite-time Analysis of the Multiarmed Bandit Problem* Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finite-time Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A-8010

More information

Online Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI

Online Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI Online Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI Licentiate Thesis Stockholm, Sweden 2016 TRITA-EE 2016:001 ISSN 1653-5146 ISBN 978-91-7595-836-1 KTH Royal Institute

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Handling Advertisements of Unknown Quality in Search Advertising

Handling Advertisements of Unknown Quality in Search Advertising Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Carnegie Mellon University spandey@cs.cmu.edu Christopher Olston Yahoo! Research olston@yahoo-inc.com Abstract We consider

More information

Revenue Optimization against Strategic Buyers

Revenue Optimization against Strategic Buyers Revenue Optimization against Strategic Buyers Mehryar Mohri Courant Institute of Mathematical Sciences 251 Mercer Street New York, NY, 10012 Andrés Muñoz Medina Google Research 111 8th Avenue New York,

More information

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising Min Xu Machine Learning Department Carnegie Mellon University minx@cs.cmu.edu ao Qin Microsoft Research Asia taoqin@microsoft.com

More information

Chapter 7. BANDIT PROBLEMS.

Chapter 7. BANDIT PROBLEMS. Chapter 7. BANDIT PROBLEMS. Bandit problems are problems in the area of sequential selection of experiments, and they are related to stopping rule problems through the theorem of Gittins and Jones (974).

More information

Beat the Mean Bandit

Beat the Mean Bandit Yisong Yue H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, USA Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY, USA yisongyue@cmu.edu tj@cs.cornell.edu

More information

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

More information

Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising

Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Submitted to Operations Research manuscript OPRE-2009-06-288 Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Microsoft Research, Cambridge, MA, hamidnz@microsoft.com

More information

Algorithms for the multi-armed bandit problem

Algorithms for the multi-armed bandit problem Journal of Machine Learning Research 1 (2000) 1-48 Submitted 4/00; Published 10/00 Algorithms for the multi-armed bandit problem Volodymyr Kuleshov Doina Precup School of Computer Science McGill University

More information

Fast and Robust Normal Estimation for Point Clouds with Sharp Features

Fast and Robust Normal Estimation for Point Clouds with Sharp Features 1/37 Fast and Robust Normal Estimation for Point Clouds with Sharp Features Alexandre Boulch & Renaud Marlet University Paris-Est, LIGM (UMR CNRS), Ecole des Ponts ParisTech Symposium on Geometry Processing

More information

Contextual-Bandit Approach to Recommendation Konstantin Knauf

Contextual-Bandit Approach to Recommendation Konstantin Knauf Contextual-Bandit Approach to Recommendation Konstantin Knauf 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 1 Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems OPERATIONS RESEARCH Vol. 6, No. 1, January February 212, pp. 18 195 ISSN 3-364X (print) ISSN 1526-5463 (online) http://d.doi.org/1.1287/opre.111.999 212 INFORMS The Knowledge Gradient Algorithm for a General

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Online Learning in Bandit Problems

Online Learning in Bandit Problems Online Learning in Bandit Problems by Cem Tekin A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering: Systems) in The University

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

In Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208.

In Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208. In Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208. ÔØ Ö ½ ÇÈÌÁÅÁ ÁÆ ÍÆÁÅÇ Ä Ê ËÈÇÆË ÍÆ ÌÁÇÆ ÇÊ ÁÆ Ê Î ÊÁ Ä Ë Â Ò À Ö Û ÈÙÖ Ù ÍÒ Ú Ö ØÝ Ï Ø Ä Ý ØØ ÁÆ ¼

More information

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information

Online Learning with Switching Costs and Other Adaptive Adversaries

Online Learning with Switching Costs and Other Adaptive Adversaries Online Learning with Switching Costs and Other Adaptive Adversaries Nicolò Cesa-Bianchi Università degli Studi di Milano Italy Ofer Dekel Microsoft Research USA Ohad Shamir Microsoft Research and the Weizmann

More information

Signal detection and goodness-of-fit: the Berk-Jones statistics revisited

Signal detection and goodness-of-fit: the Berk-Jones statistics revisited Signal detection and goodness-of-fit: the Berk-Jones statistics revisited Jon A. Wellner (Seattle) INET Big Data Conference INET Big Data Conference, Cambridge September 29-30, 2015 Based on joint work

More information

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012 Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

More information

Regret Minimization for Reserve Prices in Second-Price Auctions

Regret Minimization for Reserve Prices in Second-Price Auctions Regret Minimization for Reserve Prices in Second-Price Auctions Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Claudio Gentile (Varese) and Yishay Mansour (Tel-Aviv) N. Cesa-Bianchi

More information

arxiv:1306.0155v1 [cs.lg] 1 Jun 2013

arxiv:1306.0155v1 [cs.lg] 1 Jun 2013 Dynamic ad allocation: bandits with budgets Aleksandrs Slivkins May 2013 arxiv:1306.0155v1 [cs.lg] 1 Jun 2013 Abstract We consider an application of multi-armed bandits to internet advertising (specifically,

More information

Introduction to Detection Theory

Introduction to Detection Theory Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection

More information

Machine Learning tools for Online Advertisement

Machine Learning tools for Online Advertisement Year 2008-2009. Research Internship Report Machine Learning tools for Online Advertisement Victor Gabillon victorgabillon@inria.fr Supervisors INRIA : Philippe Preux Jérémie Mary Supervisors Orange Labs

More information

A Survey of Preference-based Online Learning with Bandit Algorithms

A Survey of Preference-based Online Learning with Bandit Algorithms A Survey of Preference-based Online Learning with Bandit Algorithms Róbert Busa-Fekete and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {busarobi,eyke}@upb.de Abstract.

More information

Big Data, Statistics, and the Internet

Big Data, Statistics, and the Internet Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes

More information

Monotone multi-armed bandit allocations

Monotone multi-armed bandit allocations JMLR: Workshop and Conference Proceedings 19 (2011) 829 833 24th Annual Conference on Learning Theory Monotone multi-armed bandit allocations Aleksandrs Slivkins Microsoft Research Silicon Valley, Mountain

More information

The Price of Truthfulness for Pay-Per-Click Auctions

The Price of Truthfulness for Pay-Per-Click Auctions Technical Report TTI-TR-2008-3 The Price of Truthfulness for Pay-Per-Click Auctions Nikhil Devanur Microsoft Research Sham M. Kakade Toyota Technological Institute at Chicago ABSTRACT We analyze the problem

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

On the Complexity of A/B Testing

On the Complexity of A/B Testing JMLR: Workshop and Conference Proceedings vol 35:1 3, 014 On the Complexity of A/B Testing Emilie Kaufmann LTCI, Télécom ParisTech & CNRS KAUFMANN@TELECOM-PARISTECH.FR Olivier Cappé CAPPE@TELECOM-PARISTECH.FR

More information

Adaptive Bidding for Display Advertising

Adaptive Bidding for Display Advertising Adaptive Bidding for Display Advertising ABSTRACT Arpita Ghosh Yahoo! Research 2821 Mission College Blvd. Santa Clara, CA 95054 arpita@yahoo-inc.com Sergei Vassilvitskii Yahoo! Research 111 West 40th St.,

More information

Online Algorithms: Learning & Optimization with No Regret.

Online Algorithms: Learning & Optimization with No Regret. Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising

Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Amin Saberi Rakesh Vohra September 24, 2012 Abstract We examine the problem of allocating an item repeatedly over

More information

Two-Sided Bandits and the Dating Market

Two-Sided Bandits and the Dating Market Two-Sided Bandits and the Dating Market Sanmay Das Center for Biological and Computational Learning and Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge,

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

ECONOMIC MECHANISMS FOR ONLINE PAY PER CLICK ADVERTISING: COMPLEXITY, ALGORITHMS AND LEARNING

ECONOMIC MECHANISMS FOR ONLINE PAY PER CLICK ADVERTISING: COMPLEXITY, ALGORITHMS AND LEARNING POLITECNICO DI MILANO DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINFORMATICA DOCTORAL PROGRAMME IN INFORMATION TECHNOLOGY ECONOMIC MECHANISMS FOR ONLINE PAY PER CLICK ADVERTISING: COMPLEXITY, ALGORITHMS

More information

Monte Carlo Methods in Finance

Monte Carlo Methods in Finance Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction

More information

Abstract. Keywords. restless bandits, reinforcement learning, sequential decision-making, change detection, non-stationary environments

Abstract. Keywords. restless bandits, reinforcement learning, sequential decision-making, change detection, non-stationary environments Modeling Human Performance in Restless Bandits with Particle Filters Sheng Kung M. Yi 1, Mark Steyvers 1, and Michael Lee 1 Abstract Bandit problems provide an interesting and widely-used setting for the

More information

Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

More information

our algorithms will work and our results will hold even when the context space is discrete given that it is bounded. arm (or an alternative).

our algorithms will work and our results will hold even when the context space is discrete given that it is bounded. arm (or an alternative). 1 Context-Adaptive Big Data Stream Mining - Online Appendix Cem Tekin*, Member, IEEE, Luca Canzian, Member, IEEE, Mihaela van der Schaar, Fellow, IEEE Electrical Engineering Department, University of California,

More information

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( ) { } ( ) = ( ) = {,,, } ( ) β ( ), < 1 ( ) + ( ) = ( ) + ( ) max, ( ) [ ( )] + ( ) [ ( )], [ ( )] [ ( )] = =, ( ) = ( ) = 0 ( ) = ( ) ( ) ( ) =, ( ), ( ) =, ( ), ( ). ln ( ) = ln ( ). + 1 ( ) = ( ) Ω[ (

More information

Strongly Adaptive Online Learning

Strongly Adaptive Online Learning Amit Daniely Alon Gonen Shai Shalev-Shwartz The Hebrew University AMIT.DANIELY@MAIL.HUJI.AC.IL ALONGNN@CS.HUJI.AC.IL SHAIS@CS.HUJI.AC.IL Abstract Strongly adaptive algorithms are algorithms whose performance

More information

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:

More information

Performance measures for Neyman-Pearson classification

Performance measures for Neyman-Pearson classification Performance measures for Neyman-Pearson classification Clayton Scott Abstract In the Neyman-Pearson (NP) classification paradigm, the goal is to learn a classifier from labeled training data such that

More information

Model for dynamic website optimization

Model for dynamic website optimization Chapter 10 Model for dynamic website optimization In this chapter we develop a mathematical model for dynamic website optimization. The model is designed to optimize websites through autonomous management

More information

4 Learning, Regret minimization, and Equilibria

4 Learning, Regret minimization, and Equilibria 4 Learning, Regret minimization, and Equilibria A. Blum and Y. Mansour Abstract Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive

More information

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.

More information

Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising

Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Amin Saberi Rakesh Vohra July 13, 2007 Abstract We examine the problem of allocating a resource repeatedly over

More information

FULL LIST OF REFEREED JOURNAL PUBLICATIONS Qihe Tang

FULL LIST OF REFEREED JOURNAL PUBLICATIONS Qihe Tang FULL LIST OF REFEREED JOURNAL PUBLICATIONS Qihe Tang 87. Li, J.; Tang, Q. Interplay of insurance and financial risks in a discrete-time model with strongly regular variation. Bernoulli 21 (2015), no. 3,

More information

An Introduction to Extreme Value Theory

An Introduction to Extreme Value Theory An Introduction to Extreme Value Theory Petra Friederichs Meteorological Institute University of Bonn COPS Summer School, July/August, 2007 Applications of EVT Finance distribution of income has so called

More information

Consistent Binary Classification with Generalized Performance Metrics

Consistent Binary Classification with Generalized Performance Metrics Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014 Problem and Motivation

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Big Data: The Computation/Statistics Interface

Big Data: The Computation/Statistics Interface Big Data: The Computation/Statistics Interface Michael I. Jordan University of California, Berkeley September 2, 2013 What Is the Big Data Phenomenon? Big Science is generating massive datasets to be used

More information

A Potential-based Framework for Online Multi-class Learning with Partial Feedback

A Potential-based Framework for Online Multi-class Learning with Partial Feedback A Potential-based Framework for Online Multi-class Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science

More information

Computational Game Theory and Clustering

Computational Game Theory and Clustering Computational Game Theory and Clustering Martin Hoefer mhoefer@mpi-inf.mpg.de 1 Computational Game Theory? 2 Complexity and Computation of Equilibrium 3 Bounding Inefficiencies 4 Conclusion Computational

More information

The Kelly Betting System for Favorable Games.

The Kelly Betting System for Favorable Games. The Kelly Betting System for Favorable Games. Thomas Ferguson, Statistics Department, UCLA A Simple Example. Suppose that each day you are offered a gamble with probability 2/3 of winning and probability

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR) 2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

More information

A direct approach to false discovery rates

A direct approach to false discovery rates J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiple-hypothesis

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Stochastic Online Greedy Learning with Semi-bandit Feedbacks

Stochastic Online Greedy Learning with Semi-bandit Feedbacks Stochastic Online Greedy Learning with Semi-bandit Feedbacks Tian Lin Tsinghua University Beijing, China lintian06@gmail.com Jian Li Tsinghua University Beijing, China lapordge@gmail.com Wei Chen Microsoft

More information

LARGE CLASSES OF EXPERTS

LARGE CLASSES OF EXPERTS LARGE CLASSES OF EXPERTS Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 31, 2006 OUTLINE 1 TRACKING THE BEST EXPERT 2 FIXED SHARE FORECASTER 3 VARIABLE-SHARE

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

Volatility Density Forecasting & Model Averaging Imperial College Business School, London

Volatility Density Forecasting & Model Averaging Imperial College Business School, London Imperial College Business School, London June 25, 2010 Introduction Contents Types of Volatility Predictive Density Risk Neutral Density Forecasting Empirical Volatility Density Forecasting Density model

More information

Prediction of future observations using belief functions

Prediction of future observations using belief functions Reminder on belief functions Prediction of future observations using belief functions Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 7253) http://www.hds.utc.fr/ tdenoeux Séminaire Math. Discrètes,

More information

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL POMDP Tutorial Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL Observation Belief ACTOR Transition Dynamics WORLD b Policy π Action Markov Decision Process -X: set of states [x s,x r

More information

A Contextual-Bandit Approach to Personalized News Article Recommendation

A Contextual-Bandit Approach to Personalized News Article Recommendation A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong Li,WeiChu, Yahoo! Labs lihong,chuwei@yahooinc.com John Langford Yahoo! Labs jl@yahoo-inc.com Robert E. Schapire + + Dept

More information

Improvements to the Linear Programming Based Scheduling of Web Advertisements

Improvements to the Linear Programming Based Scheduling of Web Advertisements Electronic Commerce Research, 5: 75 98 (2005) 2005 Springer Science + Business Media, Inc. Manufactured in the Netherlands. Improvements to the Linear Programming Based Scheduling of Web Advertisements

More information

The Value of Knowing a Demand Curve: Bounds on Regret for On-line Posted-Price Auctions

The Value of Knowing a Demand Curve: Bounds on Regret for On-line Posted-Price Auctions The Value of Knowing a Demand Curve: Bounds on Regret for On-line Posted-Price Auctions Robert Kleinberg F. Thomson Leighton Abstract We consider the revenue-maximization problem for a seller with an unlimited

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Context-Aware Online Traffic Prediction

Context-Aware Online Traffic Prediction Context-Aware Online Traffic Prediction Jie Xu, Dingxiong Deng, Ugur Demiryurek, Cyrus Shahabi, Mihaela van der Schaar University of California, Los Angeles University of Southern California J. Xu, D.

More information

Fuzzy Probability Distributions in Bayesian Analysis

Fuzzy Probability Distributions in Bayesian Analysis Fuzzy Probability Distributions in Bayesian Analysis Reinhard Viertl and Owat Sunanta Department of Statistics and Probability Theory Vienna University of Technology, Vienna, Austria Corresponding author:

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models Bernoilli/binomial models Return to iid Y 1,...,Y n Bin(1, θ). The sampling model/likelihood is p(y 1,...,y n θ) =θ P y i (1 θ) n P y i When combined with a prior p(θ), Bayes

More information

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails 12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint

More information

Information Theory and Stock Market

Information Theory and Stock Market Information Theory and Stock Market Pongsit Twichpongtorn University of Illinois at Chicago E-mail: ptwich2@uic.edu 1 Abstract This is a short survey paper that talks about the development of important

More information

Mixture Models for Genomic Data

Mixture Models for Genomic Data Mixture Models for Genomic Data S. Robin AgroParisTech / INRA École de Printemps en Apprentissage automatique, Baie de somme, May 2010 S. Robin (AgroParisTech / INRA) Mixture Models May 10 1 / 48 Outline

More information

HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE

HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE Subodha Kumar University of Washington subodha@u.washington.edu Varghese S. Jacob University of Texas at Dallas vjacob@utdallas.edu

More information

Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Section 5. Stan for Big Data. Bob Carpenter. Columbia University Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model

More information

Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Bayesian Analysis for the Social Sciences

Bayesian Analysis for the Social Sciences Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November

More information

Advertising Campaigns Management: Should We Be Greedy?

Advertising Campaigns Management: Should We Be Greedy? Advertising Campaigns Management: Should We Be Greedy? Sertan Girgin, Jérémie Mary, Philippe Preux, Olivier Nicol To cite this version: Sertan Girgin, Jérémie Mary, Philippe Preux, Olivier Nicol. Advertising

More information

MAS2317/3317. Introduction to Bayesian Statistics. More revision material

MAS2317/3317. Introduction to Bayesian Statistics. More revision material MAS2317/3317 Introduction to Bayesian Statistics More revision material Dr. Lee Fawcett, 2014 2015 1 Section A style questions 1. Describe briefly the frequency, classical and Bayesian interpretations

More information

If You re So Smart, Why Aren t You Rich? Belief Selection in Complete and Incomplete Markets

If You re So Smart, Why Aren t You Rich? Belief Selection in Complete and Incomplete Markets If You re So Smart, Why Aren t You Rich? Belief Selection in Complete and Incomplete Markets Lawrence Blume and David Easley Department of Economics Cornell University July 2002 Today: June 24, 2004 The

More information

More details on the inputs, functionality, and output can be found below.

More details on the inputs, functionality, and output can be found below. Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

More information

Betting rules and information theory

Betting rules and information theory Betting rules and information theory Giulio Bottazzi LEM and CAFED Scuola Superiore Sant Anna September, 2013 Outline Simple betting in favorable games The Central Limit Theorem Optimal rules The Game

More information

Feature Selection with Monte-Carlo Tree Search

Feature Selection with Monte-Carlo Tree Search Feature Selection with Monte-Carlo Tree Search Robert Pinsler 20.01.2015 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 1 Agenda 1 Feature Selection 2 Feature Selection

More information

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms Lihong Li Wei Chu John Langford Xuanhui Wang Yahoo! Labs 701 First Ave, Sunnyvale, CA, USA 94089 {lihong,chuwei,jl,xhwang}@yahoo-inc.com

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Kolmogorov Complexity and the Incompressibility Method

Kolmogorov Complexity and the Incompressibility Method Kolmogorov Complexity and the Incompressibility Method Holger Arnold 1. Introduction. What makes one object more complex than another? Kolmogorov complexity, or program-size complexity, provides one of

More information