Un point de vue bayésien pour des algorithmes de bandit plus performants


 Clifton Daniels
 2 years ago
 Views:
Transcription
1 Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 1 / 36
2 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The BayesUCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 2 / 36
3 Two bandit problems 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The BayesUCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 3 / 36
4 Two bandit problems Bandit model A multiarmed bandit model is a set of K arms where Arm a is an unknown probability distribution ν a with mean µ a Drawing arm a is observing a realization of ν a Arms are assumed to be independent In a bandit game, at round t, a forecaster chooses arm A t to draw based on past observations, according to its sampling strategy (or bandit algorithm) observes reward X t ν At The forecaster is to learn which arm(s) is (are) the best a = argmax a µ a Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 4 / 36
5 Two bandit problems Bernoulli bandit model A multiarmed bandit model is a set of K arms where Arm a is a Bernoulli distribution B(p a ) with unknown mean µ a = p a Drawing arm a is observing a realization of B(p a ) (0 or 1) Arms are assumed to be independent In a bandit game, at round t, a forecaster chooses arm A t to draw based on past observations, according to its sampling strategy (or bandit algorithm) observes reward X t B(p At ) The forecaster is to learn which arm(s) is (are) the best a = argmax a p a Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 5 / 36
6 Two bandit problems Regret minimization The classical bandit problem: regret minimization The forecaster wants to maximize the reward accumulated during learning or equivalentely minimize its regret: [ n ] R n = nµ a E X t He has to find a sampling strategy (or bandit algorithm) that realizes a tradeoff between exploration and exploitation t=1 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 6 / 36
7 Two bandit problems An alternative: pureexploration Regret minimization The forecaster has to find the best arm(s), and does not suffer a loss when drawing bad arms. He has to find a sampling strategy that optimaly explores the environnement, together with a stopping criterion and to recommand a set S of m arms such that P(S is the set of m best arms) 1 δ. Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 7 / 36
8 Two bandit problems Regret minimization Zoom on an application: Online advertisement Yahoo!(c) has to choose between K different advertisement the one to display on its webpage for each user (indexed by t N). Ad number a unknown probability of click p a Unknown best advertisement a = argmax a p a If ad a is displayed for user t, he clicks on it with probability p a Yahoo!(c): chooses ad A t to display for user number t observes whether the user has clicked or not: X t B(p At ) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 8 / 36
9 Two bandit problems Regret minimization Zoom on an application: Online advertisement Yahoo!(c) has to choose between K different advertisement the one to display on its webpage for each user (indexed by t N). Ad number a unknown probability of click p a Unknown best advertisement a = argmax a p a If ad a is displayed for user t, he clicks on it with probability p a Yahoo!(c): chooses ad A t to display for user number t observes whether the user has clicked or not: X t B(p At ) Yahoo!(c) can ajust its strategy (A t ) so as to Regret minimization Maximize the number of clicks during n interactions Pureexploration Identify the best advertisement with probability at least 1 δ Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 8 / 36
10 Two bandit problems Regret minimization Zoom on an application: Online advertisement Yahoo!(c) has to choose between K different advertisement the one to display on its webpage for each user (indexed by t N). Ad number a unknown probability of click p a Unknown best advertisement a = argmax a p a If ad a is displayed for user t, he clicks on it with probability p a Yahoo!(c): chooses ad A t to display for user number t observes whether the user has clicked or not: X t B(p At ) Yahoo!(c) can ajust its strategy (A t ) so as to Regret minimization Maximize the number of clicks during n interactions Pureexploration Identify the best advertisement with probability at least 1 δ Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 9 / 36
11 Regret minimization: Bayesian bandits, frequentist bandits 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The BayesUCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 10 / 36
12 Regret minimization: Bayesian bandits, frequentist bandits Two probabilistic modellings Two probabilistic models K independent arms. µ = µ a Frequentist : θ = (θ 1,..., θ K ) unknown parameter (Y a,t ) t is i.i.d. with distribution ν θa with mean µ a = µ(θ a ) highest expectation among the arms. Bayesian : i.i.d. θ a π a (Y a,t ) t is i.i.d. conditionally to θ a with distribution ν θa At time t, arm A t is chosen and reward X t = Y At,t is observed Two measures of performance Minimize regret R n (θ) = E θ [ n t=1 µ µ At ] Minimize Bayes risk Risk n = R n (θ)dπ(θ) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 11 / 36
13 Regret minimization: Bayesian bandits, frequentist bandits Frequentist tools, Bayesian tools Two probabilistic models Bandit algorithms based on frequentist tools use: MLE for the parameter of each arm confidence intervals for the mean of each arm Bandit algorithms based on Bayesian tools use: Π t = (π t 1,..., πt K ) the current posterior over (θ 1,..., θ K ) π t a = p(θ a past observations from arm a) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 12 / 36
14 Regret minimization: Bayesian bandits, frequentist bandits Frequentist tools, Bayesian tools Two probabilistic models Bandit algorithms based on frequentist tools use: MLE for the parameter of each arm confidence intervals for the mean of each arm Bandit algorithms based on Bayesian tools use: Π t = (π t 1,..., πt K ) the current posterior over (θ 1,..., θ K ) π t a = p(θ a past observations from arm a) One can separate tools and objectives: Objective Frequentist Bayesian algorithms algorithms Regret?? Bayes risk?? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 12 / 36
15 Regret minimization: Bayesian bandits, frequentist bandits Frequentist tools, Bayesian tools Two probabilistic models Bandit algorithms based on frequentist tools use: MLE for the parameter of each arm confidence intervals for the mean of each arm Bandit algorithms based on Bayesian tools use: Π t = (π t 1,..., πt K ) the current posterior over (θ 1,..., θ K ) π t a = p(θ a past observations from arm a) One can separate tools and objectives: Objective Frequentist Bayesian algorithms algorithms Regret?? Bayes risk?? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 13 / 36
16 Regret minimization: Bayesian bandits, frequentist bandits Our goal Two probabilistic models We want to design Bayesian algorithm that are optimal with respect to the frequentist regret Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 14 / 36
17 Regret minimization: Bayesian bandits, frequentist bandits Frequentist optimality Asymptotically optimal algorithms towards the regret N a (t) the number of draws of arm a up to time t K R n (θ) = (µ µ a )E θ [N a (n)] a=1 Lai and Robbins,1985 : every consistent algorithm satisfies µ a < µ lim inf n E θ [N a (n)] ln n 1 KL(ν θa, ν θ ) A bandit algorithm is asymptotically optimal if µ a < µ lim sup n E θ [N a (n)] ln n 1 KL(ν θa, ν θ ) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 15 / 36
18 Regret minimization: Bayesian bandits, frequentist bandits The optimism principle A family of frequentist algorithms The following heuristic defines a family of optimistic index policies: For each arm a, compute a confidence interval on the unknown mean: µ a UCB a (t) w.h.p Use the optimisminfaceofuncertainty principle: act as if the best possible model was the true model The algorithm chooses at time t A t = arg max a UCB a (t) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 16 / 36
19 Regret minimization: Bayesian bandits, frequentist bandits UCB Towards optimal algorithms for Bernoulli bandits UCB [Auer et al. 02] uses Hoeffding bounds: α log(t) UCB a (t) = ˆp a (t) + 2N a (t) where ˆp a (t) = Sa(t) N a(t) Finitetime bound: E[N a (n)] is the empirical mean of arm a. K 1 2(p p a ) 2 ln n + K 2, with K 1 > 1. Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 17 / 36
20 Regret minimization: Bayesian bandits, frequentist bandits KLUCB Towards optimal algorithms for Bernoulli bandits KLUCB[Cappé et al. 2013] uses the index: u a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) log t + c log log t} β(t,δ)/n a (t) with K(p, q) := KL (B(p), B(q)) = p log Finitetime bound: E[N a (n)] S a (t)/n a (t) ( ) ( ) p 1 p + (1 p) log q 1 q 1 K(p a, p ) ln n + C Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 18 / 36
21 Two Bayesian bandit algorithms 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The BayesUCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 19 / 36
22 Two Bayesian bandit algorithms UCBs versus Bayesian algorithms Bayesian algorithms Figure : Confidence intervals for the arms means after t rounds of a bandit game Figure : Posterior over the arms means after t rounds of a bandit game Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 20 / 36
23 Two Bayesian bandit algorithms UCBs versus Bayesian algorithms Bayesian algorithms Figure : Confidence intervals for the arms means after t rounds of a bandit game Figure : Posterior over the arms means after t rounds of a bandit game How do we exploit the posterior in a Bayesian bandit algorithm? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 20 / 36
24 Two Bayesian bandit algorithms The BayesUCB algorithm 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The BayesUCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 21 / 36
25 Two Bayesian bandit algorithms The BayesUCB algorithm The algorithm Let : Π 0 = (π 0 1,..., π0 K ) be a prior distribution over (θ 1,..., θ K ) Λ t = (λ t 1,..., λt K ) be the posterior over the means (µ 1,..., µ K ) a the end of round t The BayesUCB algorithm chooses at time t ( Q 1 A t = argmax a 1 t(log t) c, λt 1 a where Q(α, π) is the quantile of order α of the distribution π. For Benoulli bandits with uniform prior on the means: θ a = µ a = p a Λ t = Π t i.i.d θ a U([0, 1]) = Beta(1, 1) λ t a = π t a = Beta(S a (t) + 1, N a (t) S a (t) + 1) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 22 / 36 )
26 Two Bayesian bandit algorithms Theoretical results Theoretical results for Bernoulli bandits BayesUCB is asymptotically optimal Theorem [K.,Cappé,Garivier 2012] Let ɛ > 0. The BayesUCB algorithm using a uniform prior over the arms and with parameter c 5 satisfies E θ [N a (n)] 1 + ɛ K(p a, p ) log(n) + o ɛ,c (log(n)). Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 23 / 36
27 Two Bayesian bandit algorithms Link to a frequentist algorithm Theoretical results BayesUCB index is close to KLUCB index: ũ a (t) q a (t) u a (t) with: { u a (t) = max q S ( ) } a(t) N a (t) : N Sa (t) a(t)k N a (t), q log t + c log log t { ũ a (t) = max q S ( ) a(t) N a (t) + 1 : (N Sa (t) a(t) + 1)K N a (t) + 1, q ( ) } t log + c log log t N a (t) + 2 BayesUCB appears to build automatically confidence intervals based on KullbackLeibler divergence, that are adapted to the geometry of the problem in this specific case. Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 24 / 36
28 Two Bayesian bandit algorithms Thompson Sampling 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The BayesUCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 25 / 36
29 Thompson Sampling Two Bayesian bandit algorithms The algorithm A randomized Bayesian algorithm: a {1..K}, θ a (t) πa t A t = argmax a µ(θ a (t)) (Recent) interest for this algorithm: a very old algorithm [Thompson 1933] partial analysis proposed [Granmo 2010][May, Korda, Lee, Leslie 2012] extensive numerical study beyond the Bernoulli case [Chapelle, Li 2011] first logarithmic upper bound on the regret [Agrawal,Goyal 2012] Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 26 / 36
30 Two Bayesian bandit algorithms The algorithm Thompson Sampling (Bernoulli bandits) A randomized Bayesian algorithm: a {1..K}, θ a (t) Beta(S a (t) + 1, N a (t) S a (t) + 1) A t = argmax a θ a (t) (Recent) interest for this algorithm: a very old algorithm [Thompson 1933] partial analysis proposed [Granmo 2010][May, Korda, Lee, Leslie 2012] extensive numerical study beyond the Bernoulli case [Chapelle, Li 2011] first logarithmic upper bound on the regret [Agrawal,Goyal 2012] Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 27 / 36
31 Two Bayesian bandit algorithms Theoretical results An optimal regret bound for Bernoulli bandits Assume the first arm is the unique optimal arm. Known result : [Agrawal,Goyal 2012] ( K ) 1 E[R n ] C p ln(n) + o µ (ln(n)) p a a=2 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 28 / 36
32 Two Bayesian bandit algorithms Theoretical results An optimal regret bound for Bernoulli bandits Assume the first arm is the unique optimal arm. Known result : [Agrawal,Goyal 2012] ( K ) 1 E[R n ] C p ln(n) + o µ (ln(n)) p a a=2 Our improvement : [K.,Korda,Munos 2012] Theorem ɛ > 0, E[R n ] (1 + ɛ) ( K a=2 p p a K(p a, p ) ) ln(n) + o µ,ɛ (ln(n)) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 28 / 36
33 In practise Two Bayesian bandit algorithms Practical conclusion θ = [ ] 400 UCB 400 KLUCB regret time time 400 BayesUCB 400 Thompson time time Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 29 / 36
34 Two Bayesian bandit algorithms Practical conclusion In practise In the Bernoulli case, for each arm, KLUCB requires to solve an optimization problem: u a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) log t + c log log t} BayesUCB requires to compute one quantile of a Beta distribution Thompson requires to compute one sample of a Beta distribution Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 30 / 36
35 Two Bayesian bandit algorithms Practical conclusion In practise In the Bernoulli case, for each arm, KLUCB requires to solve an optimization problem: u a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) log t + c log log t} BayesUCB requires to compute one quantile of a Beta distribution Thompson requires to compute one sample of a Beta distribution Other advantages of Bayesian algorithms: they easily generalize to more complex models......even when the posterior is not directly computable (using MCMC) the prior can incorporate correlation between arms Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 30 / 36
36 Conclusion and perspectives 1 Two bandit problems 2 Regret minimization: Bayesian bandits, frequentist bandits 3 Two Bayesian bandit algorithms The BayesUCB algorithm Thompson Sampling 4 Conclusion and perspectives Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 31 / 36
37 Conclusion and perspectives Summary for regret minimization Future work: Objective Frequentist Bayesian algorithms algorithms Regret KLUCB BayesUCB Thompson Sampling Bayes risk KLUCB Gittins algorithm for finite horizon Is Gittins algorithm optimal with respect to the regret? Are our Bayesian algorithms efficient with respect to the Bayes risk? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 32 / 36
38 Conclusion and perspectives Bayesian algorithm for pureexploration? At round t, the KLLUCB algorithm ([K., Kalyanakrishnan, 13]) draws two wellchosen arms: u t and l t stops when CI for arms in J(t) and J(t) c are separated recommends the set of m empirical best arms m=3. Set J(t), arm l t in bold Set J(t) c, arm u t in bold Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 33 / 36
39 Conclusion and perspectives Bayesian algorithm for pureexploration? KLLUCB uses KLconfidence intervals: L a (t) = min {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}, U a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}. ( ) We use β(t, δ) = log k1 Kt α to make sure P(S = Sm) 1 δ. δ Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 34 / 36
40 Conclusion and perspectives Bayesian algorithm for pureexploration? KLLUCB uses KLconfidence intervals: L a (t) = min {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}, U a (t) = max {q ˆp a (t) : N a (t)k(ˆp a (t), q) β(t, δ)}. ( ) We use β(t, δ) = log k1 Kt α to make sure P(S = Sm) 1 δ. δ How to propose a Bayesian algorithm that adapts to δ? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 34 / 36
41 Conclusion and perspectives Conclusion Regret minimization: Go Bayesian! BayesUCB show striking similarities with KLUCB Thompson Sampling is an easytoimplement alternative to the optimistic approach both algorithms are asymptotically optimal towards frequentist regret (and more efficient in practise) Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 35 / 36
42 Conclusion and perspectives Conclusion Regret minimization: Go Bayesian! BayesUCB show striking similarities with KLUCB Thompson Sampling is an easytoimplement alternative to the optimistic approach both algorithms are asymptotically optimal towards frequentist regret (and more efficient in practise) TODO list: Go deeper into the link between Bayes risk and (frequentist) regret (Gittins frequentist optimality?) Obtain theoretical guarantees for BayesUCB and Thompson Sampling beyond Bernoulli bandit models (e.g. when rewards belong to the exponential family) Develop Bayesian algorithm for the pureexploration objective? Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 35 / 36
43 Conclusion and perspectives References E. Kaufmann, O. Cappé, and A. Garivier. On Bayesian upper confidence bounds for bandit problems. AISTATS 2012 E.Kaufmann, N.Korda, and R.Munos. Thompson Sampling: an asymptotically optimal finitetime analysis. ALT 2012 E.Kaufmann, S.Kalyanakrishnan. Information Complexity in Bandit Subset Selection, COLT 2013 Emilie Kaufmann (Telecom ParisTech) Bandits bayésiens Aussois, 28/08/13 36 / 36
Introduction Main Results Experiments Conclusions. The UCB Algorithm
Paper Finitetime Analysis of the Multiarmed Bandit Problem by Auer, CesaBianchi, Fischer, Machine Learning 27, 2002 Presented by Markus Enzenberger. Go Seminar, University of Alberta. March 14, 2007
More informationUCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTIARMED BANDIT PROBLEM
UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTIARMED BANDIT PROBLEM PETER AUER AND RONALD ORTNER ABSTRACT In the stochastic multiarmed bandit problem we consider a modification of the
More informationMultiArmed Bandit Problems
Machine Learning Coms4771 MultiArmed Bandit Problems Lecture 20 Multiarmed Bandit Problems The Setting: K arms (or actions) Each time t, each arm i pays off a bounded realvalued reward x i (t), say
More informationOn Bayesian Upper Confidence Bounds for Bandit Problems
Emilie Kaufmann Olivier Cappé Aurélien Garivier LTCI, CNRS & Telecom ParisTech Abstract Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter
More informationLecture 3: UCB Algorithm, WorstCase Regret Bound
IEOR 8100001: Learning and Optimization for Sequential Decision Making 01/7/16 Lecture 3: UCB Algorithm, WorstCase Regret Bound Instructor: Shipra Agrawal Scribed by: Karl Stratos = Jang Sun Lee 1 UCB
More informationFinitetime Analysis of the Multiarmed Bandit Problem*
Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finitetime Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A8010
More informationOnline Network Revenue Management using Thompson Sampling
Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David SimchiLevi He Wang Working Paper 16031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira
More informationYishay Mansour Google & Tel Aviv Univ. Many thanks for my coauthors: A. Blum, N. CesaBianchi, and G. Stoltz
Regret Minimization: Algorithms and Applications Yishay Mansour Google & Tel Aviv Univ. Many thanks for my coauthors: A. Blum, N. CesaBianchi, and G. Stoltz 2 Weather Forecast Sunny: Rainy: No meteorological
More informationOnline Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI
Online Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI Licentiate Thesis Stockholm, Sweden 2016 TRITAEE 2016:001 ISSN 16535146 ISBN 9789175958361 KTH Royal Institute
More informationAn Introduction to Gametheoretic Online Learning Tim van Erven
General Mathematics Colloquium Leiden, December 4, 2014 An Introduction to Gametheoretic Online Learning Tim van Erven Statistics Bayesian Statistics Frequentist Statistics Statistics Bayesian Statistics
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationHandling Advertisements of Unknown Quality in Search Advertising
Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Carnegie Mellon University spandey@cs.cmu.edu Christopher Olston Yahoo! Research olston@yahooinc.com Abstract We consider
More informationRevenue Optimization against Strategic Buyers
Revenue Optimization against Strategic Buyers Mehryar Mohri Courant Institute of Mathematical Sciences 251 Mercer Street New York, NY, 10012 Andrés Muñoz Medina Google Research 111 8th Avenue New York,
More informationChapter 7. BANDIT PROBLEMS.
Chapter 7. BANDIT PROBLEMS. Bandit problems are problems in the area of sequential selection of experiments, and they are related to stopping rule problems through the theorem of Gittins and Jones (974).
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationBeat the Mean Bandit
Yisong Yue H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, USA Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY, USA yisongyue@cmu.edu tj@cs.cornell.edu
More informationEstimation Bias in MultiArmed Bandit Algorithms for Search Advertising
Estimation Bias in MultiArmed Bandit Algorithms for Search Advertising Min Xu Machine Learning Department Carnegie Mellon University minx@cs.cmu.edu ao Qin Microsoft Research Asia taoqin@microsoft.com
More informationFast and Robust Normal Estimation for Point Clouds with Sharp Features
1/37 Fast and Robust Normal Estimation for Point Clouds with Sharp Features Alexandre Boulch & Renaud Marlet University ParisEst, LIGM (UMR CNRS), Ecole des Ponts ParisTech Symposium on Geometry Processing
More informationDynamic PayPerAction Mechanisms and Applications to Online Advertising
Submitted to Operations Research manuscript OPRE200906288 Dynamic PayPerAction Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Microsoft Research, Cambridge, MA, hamidnz@microsoft.com
More informationAlgorithms for the multiarmed bandit problem
Journal of Machine Learning Research 1 (2000) 148 Submitted 4/00; Published 10/00 Algorithms for the multiarmed bandit problem Volodymyr Kuleshov Doina Precup School of Computer Science McGill University
More informationContextualBandit Approach to Recommendation Konstantin Knauf
ContextualBandit Approach to Recommendation Konstantin Knauf 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 1 Agenda Problem Scenario Scenario Multiarmed Bandit Model for Online Recommendation
More informationA crash course in probability and Naïve Bayes classification
Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationThe Knowledge Gradient Algorithm for a General Class of Online Learning Problems
OPERATIONS RESEARCH Vol. 6, No. 1, January February 212, pp. 18 195 ISSN 3364X (print) ISSN 15265463 (online) http://d.doi.org/1.1287/opre.111.999 212 INFORMS The Knowledge Gradient Algorithm for a General
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationOnline Learning in Bandit Problems
Online Learning in Bandit Problems by Cem Tekin A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering: Systems) in The University
More informationMessagepassing sequential detection of multiple change points in networks
Messagepassing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
More informationWeek 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / 2184189 CesaBianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationIn Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208.
In Optimum Design 2000, A. Atkinson, B. Bogacka, and A. Zhigljavsky, eds., Kluwer, 2001, pp. 195 208. ÔØ Ö ½ ÇÈÌÁÅÁ ÁÆ ÍÆÁÅÇ Ä Ê ËÈÇÆË ÍÆ ÌÁÇÆ ÇÊ ÁÆ Ê Î ÊÁ Ä Ë Â Ò À Ö Û ÈÙÖ Ù ÍÒ Ú Ö ØÝ Ï Ø Ä Ý ØØ ÁÆ ¼
More informationTrading regret rate for computational efficiency in online learning with limited feedback
Trading regret rate for computational efficiency in online learning with limited feedback Shai ShalevShwartz TTIC Hebrew University Online Learning with Limited Feedback Workshop, 2009 June 2009 Shai
More informationThe Variability of PValues. Summary
The Variability of PValues Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 276958203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report
More informationOnline Learning with Switching Costs and Other Adaptive Adversaries
Online Learning with Switching Costs and Other Adaptive Adversaries Nicolò CesaBianchi Università degli Studi di Milano Italy Ofer Dekel Microsoft Research USA Ohad Shamir Microsoft Research and the Weizmann
More informationSignal detection and goodnessoffit: the BerkJones statistics revisited
Signal detection and goodnessoffit: the BerkJones statistics revisited Jon A. Wellner (Seattle) INET Big Data Conference INET Big Data Conference, Cambridge September 2930, 2015 Based on joint work
More informationBayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012
Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic
More informationThe Intersection of Statistics and Topology:
The Intersection of Statistics and Topology: Confidence Sets Brittany Terese Fasy joint work with S. Balakrishnan, F. Chazal, F. Lecci, A. Rinaldo, A. Singh, L. Wasserman 18 January 2014 B. Fasy (Tulane)
More informationRegret Minimization for Reserve Prices in SecondPrice Auctions
Regret Minimization for Reserve Prices in SecondPrice Auctions Nicolò CesaBianchi Università degli Studi di Milano Joint work with: Claudio Gentile (Varese) and Yishay Mansour (TelAviv) N. CesaBianchi
More informationIntroduction to Detection Theory
Introduction to Detection Theory Reading: Ch. 3 in KayII. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection
More informationarxiv:1306.0155v1 [cs.lg] 1 Jun 2013
Dynamic ad allocation: bandits with budgets Aleksandrs Slivkins May 2013 arxiv:1306.0155v1 [cs.lg] 1 Jun 2013 Abstract We consider an application of multiarmed bandits to internet advertising (specifically,
More informationMachine Learning tools for Online Advertisement
Year 20082009. Research Internship Report Machine Learning tools for Online Advertisement Victor Gabillon victorgabillon@inria.fr Supervisors INRIA : Philippe Preux Jérémie Mary Supervisors Orange Labs
More informationA Survey of Preferencebased Online Learning with Bandit Algorithms
A Survey of Preferencebased Online Learning with Bandit Algorithms Róbert BusaFekete and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {busarobi,eyke}@upb.de Abstract.
More informationThe Price of Truthfulness for PayPerClick Auctions
Technical Report TTITR20083 The Price of Truthfulness for PayPerClick Auctions Nikhil Devanur Microsoft Research Sham M. Kakade Toyota Technological Institute at Chicago ABSTRACT We analyze the problem
More informationSufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution
More informationMonotone multiarmed bandit allocations
JMLR: Workshop and Conference Proceedings 19 (2011) 829 833 24th Annual Conference on Learning Theory Monotone multiarmed bandit allocations Aleksandrs Slivkins Microsoft Research Silicon Valley, Mountain
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationOn the Complexity of A/B Testing
JMLR: Workshop and Conference Proceedings vol 35:1 3, 014 On the Complexity of A/B Testing Emilie Kaufmann LTCI, Télécom ParisTech & CNRS KAUFMANN@TELECOMPARISTECH.FR Olivier Cappé CAPPE@TELECOMPARISTECH.FR
More informationSummary of Probability
Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible
More informationAdaptive Bidding for Display Advertising
Adaptive Bidding for Display Advertising ABSTRACT Arpita Ghosh Yahoo! Research 2821 Mission College Blvd. Santa Clara, CA 95054 arpita@yahooinc.com Sergei Vassilvitskii Yahoo! Research 111 West 40th St.,
More informationOnline Algorithms: Learning & Optimization with No Regret.
Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the
More informationBig Data, Statistics, and the Internet
Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes
More informationLecture 4: Random Variables
Lecture 4: Random Variables 1. Definition of Random variables 1.1 Measurable functions and random variables 1.2 Reduction of the measurability condition 1.3 Transformation of random variables 1.4 σalgebra
More informationPerformance measures for NeymanPearson classification
Performance measures for NeymanPearson classification Clayton Scott Abstract In the NeymanPearson (NP) classification paradigm, the goal is to learn a classifier from labeled training data such that
More informationNotes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 2226 Jan 2007 1 Introduction In this course we will be looking
More informationTwoSided Bandits and the Dating Market
TwoSided Bandits and the Dating Market Sanmay Das Center for Biological and Computational Learning and Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge,
More informationRandom graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationSet theory, and set operations
1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we
More informationECONOMIC MECHANISMS FOR ONLINE PAY PER CLICK ADVERTISING: COMPLEXITY, ALGORITHMS AND LEARNING
POLITECNICO DI MILANO DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINFORMATICA DOCTORAL PROGRAMME IN INFORMATION TECHNOLOGY ECONOMIC MECHANISMS FOR ONLINE PAY PER CLICK ADVERTISING: COMPLEXITY, ALGORITHMS
More informationour algorithms will work and our results will hold even when the context space is discrete given that it is bounded. arm (or an alternative).
1 ContextAdaptive Big Data Stream Mining  Online Appendix Cem Tekin*, Member, IEEE, Luca Canzian, Member, IEEE, Mihaela van der Schaar, Fellow, IEEE Electrical Engineering Department, University of California,
More informationAbstract. Keywords. restless bandits, reinforcement learning, sequential decisionmaking, change detection, nonstationary environments
Modeling Human Performance in Restless Bandits with Particle Filters Sheng Kung M. Yi 1, Mark Steyvers 1, and Michael Lee 1 Abstract Bandit problems provide an interesting and widelyused setting for the
More informationMonte Carlo Methods in Finance
Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction
More informationModel for dynamic website optimization
Chapter 10 Model for dynamic website optimization In this chapter we develop a mathematical model for dynamic website optimization. The model is designed to optimize websites through autonomous management
More informationA HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT
New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:
More information( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )
{ } ( ) = ( ) = {,,, } ( ) β ( ), < 1 ( ) + ( ) = ( ) + ( ) max, ( ) [ ( )] + ( ) [ ( )], [ ( )] [ ( )] = =, ( ) = ( ) = 0 ( ) = ( ) ( ) ( ) =, ( ), ( ) =, ( ), ( ). ln ( ) = ln ( ). + 1 ( ) = ( ) Ω[ (
More informationStrongly Adaptive Online Learning
Amit Daniely Alon Gonen Shai ShalevShwartz The Hebrew University AMIT.DANIELY@MAIL.HUJI.AC.IL ALONGNN@CS.HUJI.AC.IL SHAIS@CS.HUJI.AC.IL Abstract Strongly adaptive algorithms are algorithms whose performance
More information4 Learning, Regret minimization, and Equilibria
4 Learning, Regret minimization, and Equilibria A. Blum and Y. Mansour Abstract Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive
More informationSECOND PART, LECTURE 4: CONFIDENCE INTERVALS
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 4: CONFIDENCE INTERVALS Lecture 4: Confidence Intervals
More informationUniversal Algorithm for Trading in Stock Market Based on the Method of Calibration
Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.
More informationAn Introduction to Extreme Value Theory
An Introduction to Extreme Value Theory Petra Friederichs Meteorological Institute University of Bonn COPS Summer School, July/August, 2007 Applications of EVT Finance distribution of income has so called
More informationABC methods for model choice in Gibbs random fields
ABC methods for model choice in Gibbs random fields JeanMichel Marin Institut de Mathématiques et Modélisation Université Montpellier 2 Joint work with Aude Grelaud, Christian Robert, François Rodolphe,
More informationConsistent Binary Classification with Generalized Performance Metrics
Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014 Problem and Motivation
More informationDynamic CostPerAction Mechanisms and Applications to Online Advertising
Dynamic CostPerAction Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Amin Saberi Rakesh Vohra July 13, 2007 Abstract We examine the problem of allocating a resource repeatedly over
More informationFULL LIST OF REFEREED JOURNAL PUBLICATIONS Qihe Tang
FULL LIST OF REFEREED JOURNAL PUBLICATIONS Qihe Tang 87. Li, J.; Tang, Q. Interplay of insurance and financial risks in a discretetime model with strongly regular variation. Bernoulli 21 (2015), no. 3,
More informationDynamic PayPerAction Mechanisms and Applications to Online Advertising
Dynamic PayPerAction Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Amin Saberi Rakesh Vohra September 24, 2012 Abstract We examine the problem of allocating an item repeatedly over
More informationAverage Y Average Predicted. Calibration. Dean P. Foster Amazon
0.5 0.4 Average Y 0.3 0.2 0.1 0.1 0.15 0.2 0.25 Average Predicted Calibration Dean P. Foster Amazon Outline Calibration for humans Calibration for big data Theory of calibrated Game theory: Convergence
More informationL1 vs. L2 Regularization and feature selection.
L1 vs. L2 Regularization and feature selection. Paper by Andrew Ng (2004) Presentation by Afshin Rostami Main Topics Covering Numbers Definition Convergence Bounds L1 regularized logistic regression L1
More informationFoundations of Machine Learning OnLine Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Foundations of Machine Learning OnLine Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationBig Data: The Computation/Statistics Interface
Big Data: The Computation/Statistics Interface Michael I. Jordan University of California, Berkeley September 2, 2013 What Is the Big Data Phenomenon? Big Science is generating massive datasets to be used
More informationInformation Compexity in Bandit Subset Selection Emilie Kaufmann and Shivaram Kalyanakrishnan
Information Compexity in Bandit Subset Selection Emilie Kaufmann and Shivaram Kalyanakrishnan In JMLR Workshop and Conference Proceedings Conference on Learning Theory, 203), 30:228 25, 203. JMLR: Workshop
More informationThe Kelly Betting System for Favorable Games.
The Kelly Betting System for Favorable Games. Thomas Ferguson, Statistics Department, UCLA A Simple Example. Suppose that each day you are offered a gamble with probability 2/3 of winning and probability
More informationLikelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth GarrettMayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
More informationComputational Game Theory and Clustering
Computational Game Theory and Clustering Martin Hoefer mhoefer@mpiinf.mpg.de 1 Computational Game Theory? 2 Complexity and Computation of Equilibrium 3 Bounding Inefficiencies 4 Conclusion Computational
More information17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
More informationMINMAX CONFIDENCE INTERVALS
MINMAX CONFIDENCE INTERVALS Johann Christoph Strelen Rheinische Friedrich Wilhelms Universität Bonn Römerstr. 164, 53117 Bonn, Germany Email: strelen@cs.unibonn.de July 2004 STOCHASTIC SIMULATION Random
More informationSenior Secondary Australian Curriculum
Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero
More informationA direct approach to false discovery rates
J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiplehypothesis
More informationTowards scaling up Markov chain Monte Carlo: an adaptive subsampling approach
Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach Rémi Bardenet Arnaud Doucet Chris Holmes Department of Statistics, University of Oxford, Oxford OX1 3TG, UK REMI.BARDENET@GMAIL.COM
More information2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
More informationStochastic Online Greedy Learning with Semibandit Feedbacks
Stochastic Online Greedy Learning with Semibandit Feedbacks Tian Lin Tsinghua University Beijing, China lintian06@gmail.com Jian Li Tsinghua University Beijing, China lapordge@gmail.com Wei Chen Microsoft
More informationExamples of decision problems:
Examples of decision problems: The Investment Problem: a. Problem Statement: i. Give n possible shares to invest in, provide a distribution of wealth on stocks. ii. The game is repetitive each time we
More informationPreliminaries: Problem Definition Agent model, POMDP, Bayesian RL
POMDP Tutorial Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL Observation Belief ACTOR Transition Dynamics WORLD b Policy π Action Markov Decision Process X: set of states [x s,x r
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationLARGE CLASSES OF EXPERTS
LARGE CLASSES OF EXPERTS Csaba Szepesvári University of Alberta CMPUT 654 Email: szepesva@ualberta.ca UofA, October 31, 2006 OUTLINE 1 TRACKING THE BEST EXPERT 2 FIXED SHARE FORECASTER 3 VARIABLESHARE
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,
More informationS = k=1 k. ( 1) k+1 N S N S N
Alternating Series Last time we talked about convergence of series. Our two most important tests, the integral test and the (limit) comparison test, both required that the terms of the series were (eventually)
More informationImprovements to the Linear Programming Based Scheduling of Web Advertisements
Electronic Commerce Research, 5: 75 98 (2005) 2005 Springer Science + Business Media, Inc. Manufactured in the Netherlands. Improvements to the Linear Programming Based Scheduling of Web Advertisements
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationA ContextualBandit Approach to Personalized News Article Recommendation
A ContextualBandit Approach to Personalized News Article Recommendation Lihong Li,WeiChu, Yahoo! Labs lihong,chuwei@yahooinc.com John Langford Yahoo! Labs jl@yahooinc.com Robert E. Schapire + + Dept
More information