Multi-Armed Bandit Problems
|
|
- Curtis McBride
- 7 years ago
- Views:
Transcription
1 Machine Learning Coms-4771 Multi-Armed Bandit Problems Lecture 20
2 Multi-armed Bandit Problems The Setting: K arms (or actions) Each time t, each arm i pays off a bounded real-valued reward x i (t), say in [0, 1]. Each time t, the learner chooses a single arm i t {1,..., K} and receives reward x it (t). The goal is to maximize the return. 1 K {}}{... The simplest instance of the exploration-exploitation problem
3 Bandits for targeting content Choose the best content to display to the next visitor of your website Content options = slot machines Reward = user s response (e.g., click on a ad) A simplifying assumption: no context (no visitor profiles). In practice, we want to solve contextual bandit problems.
4 Stochastic bandits: Each arm i is associated with some unknown probability distribution with expectation µ i. Rewards are drawn iid. The largest expected reward: µ = max i {1,...,K} E[x i ] Regret after T plays: µ T E[x it (t)] expectation is over the draws of rewards and the randomness in player s strategy Adversarial (nonstochastic) bandits: No assumption is made about the reward sequence (other than it s bounded). Regret after T plays: max i x i (t) E[x it (t)] expectation is only over the randomness in the player s strategy
5 Stochastic Bandits: Upper Confidence Bounds Strategy UCB Play each arm once At any time t > K (deterministically) play machine i t maximizing over j {1,..., K} where x j (t) + 2 ln t T j,t, x j is the average reward obtained from machine j Tj,t is the number of times j has been played so far
6 UCB Intuition: The second term 2 ln t/t j,t is the the size of the one-sided (1 1/t)-condifence interval for the average reward (using Chernoff-Hoeffding bounds). Theorem (Auer, Cesa-Bianchi, Fisher) At time T, the regret of the UCB policy is at most 8K ln T + 5K, where = µ max i:µi <µ µ i (the gap between the best expected reward and the expected reward of the runner up).
7 Stochastic Bandits: ɛ-greedy Randomized policy: ɛ t -greedy Parameter: schedule ɛ 1, ɛ 2,..., where 0 ɛ t 1. At each time t (exploit) with probability 1 ɛ t, play the arm i t with the highest current average return (explore) with probability ɛ, play a random arm Is there a schedule of ɛ t which guarantees logarithmic regret? Constant ɛ causes linear regret. Fix: let ɛ go to 0 as our estimates of the expected rewards become more accurate.
8 Theorem (Auer, Cesa-Bianchi, Fisher) If ɛ t = 12/(d 2 t) where 0 < d, then the instantaneous regret (i.e., probability of choosing a suboptimal arm) at any time t of ɛ-greedy is at most O( K dt ). The regret of ɛ-greedy at time T (summing over the steps) is thus at most O( d K log T ) (using T 1 t lnt + γ where γ is the Euler constant).
9 Practical performance (from Auer, Cesa-Bianchi and Fisher): Tuning the UCB: replace 2 ln t/t i,t with ln t T i,t min{1/4, V i,t }, where V i,t is an upper confidence bound for the variance of arm i. (The factor 1/4 is an upper bound on the variance of any [0, 1] bounded variable.) Performs significantly better in practice. ɛ-greedy is quite sensitive to bad parameter tuning and large differences in response rates. Otherwise an optimally tuned ɛ-greedy performs very well. UCB tuned performs comparably to a well-tuned ɛ-greedy and is not very sensitive to large differences in response rates.
10 Nonstochastic Bandits: Recap No assumptions are made about the generation of rewards. Modeled by an arbitrary sequence of reward vectors x 1 (t),..., x K (t), where x i (t) [0, 1] is the reward obtained if action i is chosen at time t. At step t, the player chooses arm i t and receives x it. Regret after T plays (with respect to the best single action): max x j (t) j } {{ } G max=reward of the best action in hindsight E[x it (t)] }{{} expected reward of the player
11 Exp3 Algorithm (Auer, Cesa-Bianchi, Freund, and Schapire) Initialization: w i (1) = 1 for i {1,..., K} Set γ = min{1, K ln K (e 1)g }, where g G max. For each t = 1, 2,... Set w i p i (t) = (1 γ) K j=1 w j(t) + γ K Draw i t randomly according to p 1 (t),..., p K (t). Receive reward xit (t) [0, 1] For j = 1,..., K set the estimated rewards and update the weights: { x j (t)/p j (t) if j = i t ˆx j (t) = 0 otherwise w j (t + 1) = w j (t) exp(γˆx j (t)/k)
12 Exp3 Algorithm (Auer, Cesa-Bianchi, Freund, and Schapire) Theorem: For any T > 0 and for any sequence of rewards, regret of the player is bounded by 2 e 1 gk ln K 2.63 TK ln K Observation: Setting ˆx it to x it (t)/p it (t) guarantees that the expectations are equal to the actual rewards for each action: E[ˆx j i 1,..., i t 1 ] = p j (t)x j (t)/p j (t) = x j (t), where the expectation is with respect to the random choice of i t at time t (given the choices in the previous rounds). So dividing by p it compensates for the reward of actions with small probability of being drawn.
13 Proof: Let W t = j w j(t). We have W t+1 W t = K i=1 K i=1 w i (t)exp(γˆx i (t)/k) {}}{ w i (t + 1) W t = K i=1 p i (t) (γ/k) exp(γˆx i (t)/k) 1 γ p i (t) (γ/k) (1 + γ 1 γ k ˆx i(t) + (e 2) γ2 k 2 ˆx i 2 (t)) (using e x 1 + x + (e 2)x 2 for x [0, 1]) =1 {}}{ K p i (t) (γ/k) + 1 γ i=1 K i=1 p i (t)γˆx i (t) (1 γ)k (e 2)γ2 + (1 γ)k 2 γ 1 + (1 γ)k x i t (t) +(e 2) (γ/k)2 }{{} 1 γ the only non-zero term use approximation 1 + z e z i K ˆx i (t) i=1 p i (t)ˆx 2 i (t)
14 Take logs: ln W T +1 W t Summing over t, ln W T +1 W 1 γ/k (1 γ) Now, for any fixed arm j γ (1 γ)k x i t (t) + (e 2) (γ/k)2 1 γ reward of Exp3 {}}{ (e 2)(γ/K)2 G exp γ ln W T +1 ln w j(t + 1) = ln w j(1) + (γ/k) W 1 W 1 W 1 Combine with the upper bound, γ ˆx j (t) ln K γ/k K 1 γ G (e 2)(γ/K)2 exp γ Solve for G exp 3 : K ˆx i (t) i=1 i=1 K ˆx i (t) ˆx j (t). i=1 K ˆx i (t) G exp 3 (1 γ) x j (t) K T γ ln K (1 γ) (e 2)(γ/K) K ˆx i (t) i=1
15 Take expectaion of both sides wrt distribution of i 1,..., i T : E[G exp 3 ] (1 γ) x j (t) K γ ln K (e 2) γ K KG max. Since j was chosen arbitrarily, it holds for j = max: Thus E[G exp 3 ] (1 γ)g max K γ ln K (e 2)γG max G max E[G exp 3 ] K ln K γ + (e 1)γG max The value of γ in the algorithm is chosen to minimize the regret.
16 Comments: Don t need to know T in advance (guess and double) Possible to get high probability bounds (with a modified version of Exp3 that uses upper confidence bounds) Stronger notions of regret. Compete with the best in a class of strategies. The difference between T bounds and log T bounds is a bit misleading. The difference is not due to the adversarial nature of rewards but in the asymptotic quantification! log T bounds hold for any fixed set of reward distributions (so is fixed before T, not after).
Finite-time Analysis of the Multiarmed Bandit Problem*
Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finite-time Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A-8010
More informationMonotone multi-armed bandit allocations
JMLR: Workshop and Conference Proceedings 19 (2011) 829 833 24th Annual Conference on Learning Theory Monotone multi-armed bandit allocations Aleksandrs Slivkins Microsoft Research Silicon Valley, Mountain
More informationOnline Learning with Switching Costs and Other Adaptive Adversaries
Online Learning with Switching Costs and Other Adaptive Adversaries Nicolò Cesa-Bianchi Università degli Studi di Milano Italy Ofer Dekel Microsoft Research USA Ohad Shamir Microsoft Research and the Weizmann
More informationContextual-Bandit Approach to Recommendation Konstantin Knauf
Contextual-Bandit Approach to Recommendation Konstantin Knauf 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 1 Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation
More informationTrading regret rate for computational efficiency in online learning with limited feedback
Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai
More information1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09
1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch
More informationOnline Algorithms: Learning & Optimization with No Regret.
Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the
More informationUn point de vue bayésien pour des algorithmes de bandit plus performants
Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)
More informationHandling Advertisements of Unknown Quality in Search Advertising
Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Carnegie Mellon University spandey@cs.cmu.edu Christopher Olston Yahoo! Research olston@yahoo-inc.com Abstract We consider
More informationRevenue Optimization against Strategic Buyers
Revenue Optimization against Strategic Buyers Mehryar Mohri Courant Institute of Mathematical Sciences 251 Mercer Street New York, NY, 10012 Andrés Muñoz Medina Google Research 111 8th Avenue New York,
More informationWeek 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
More informationThe Price of Truthfulness for Pay-Per-Click Auctions
Technical Report TTI-TR-2008-3 The Price of Truthfulness for Pay-Per-Click Auctions Nikhil Devanur Microsoft Research Sham M. Kakade Toyota Technological Institute at Chicago ABSTRACT We analyze the problem
More informationOnline Network Revenue Management using Thompson Sampling
Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira
More informationAlgorithms for the multi-armed bandit problem
Journal of Machine Learning Research 1 (2000) 1-48 Submitted 4/00; Published 10/00 Algorithms for the multi-armed bandit problem Volodymyr Kuleshov Doina Precup School of Computer Science McGill University
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More informationRobbing the bandit: Less regret in online geometric optimization against an adaptive adversary.
Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. Varsha Dani Thomas P. Hayes November 12, 2005 Abstract We consider online bandit geometric optimization,
More informationOnline Optimization and Personalization of Teaching Sequences
Online Optimization and Personalization of Teaching Sequences Benjamin Clément 1, Didier Roy 1, Manuel Lopes 1, Pierre-Yves Oudeyer 1 1 Flowers Lab Inria Bordeaux Sud-Ouest, Bordeaux 33400, France, didier.roy@inria.fr
More informationarxiv:1306.0155v1 [cs.lg] 1 Jun 2013
Dynamic ad allocation: bandits with budgets Aleksandrs Slivkins May 2013 arxiv:1306.0155v1 [cs.lg] 1 Jun 2013 Abstract We consider an application of multi-armed bandits to internet advertising (specifically,
More informationStrongly Adaptive Online Learning
Amit Daniely Alon Gonen Shai Shalev-Shwartz The Hebrew University AMIT.DANIELY@MAIL.HUJI.AC.IL ALONGNN@CS.HUJI.AC.IL SHAIS@CS.HUJI.AC.IL Abstract Strongly adaptive algorithms are algorithms whose performance
More informationHow To Solve Two Sided Bandit Problems
Two-Sided Bandits and the Dating Market Sanmay Das Center for Biological and Computational Learning and Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge,
More informationOnline Learning with Composite Loss Functions
JMLR: Workshop and Conference Proceedings vol 35:1 18, 2014 Online Learning with Composite Loss Functions Ofer Dekel Microsoft Research Jian Ding University of Chicago Tomer Koren Technion Yuval Peres
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationThe Advantages and Disadvantages of Online Linear Optimization
LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA
More informationA Contextual-Bandit Approach to Personalized News Article Recommendation
A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong Li,WeiChu, Yahoo! Labs lihong,chuwei@yahooinc.com John Langford Yahoo! Labs jl@yahoo-inc.com Robert E. Schapire + + Dept
More information4 Learning, Regret minimization, and Equilibria
4 Learning, Regret minimization, and Equilibria A. Blum and Y. Mansour Abstract Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive
More informationRegret Minimization for Reserve Prices in Second-Price Auctions
Regret Minimization for Reserve Prices in Second-Price Auctions Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Claudio Gentile (Varese) and Yishay Mansour (Tel-Aviv) N. Cesa-Bianchi
More information14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationAn Introduction to Competitive Analysis for Online Optimization. Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002)
An Introduction to Competitive Analysis for Online Optimization Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002) IMA, December 11, 2002 Overview 1 Online Optimization sequences
More informationInterior-Point Methods for Full-Information and Bandit Online Learning
4164 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 7, JULY 2012 Interior-Point Methods for Full-Information and Bandit Online Learning Jacob D. Abernethy, Elad Hazan, and Alexander Rakhlin Abstract
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationour algorithms will work and our results will hold even when the context space is discrete given that it is bounded. arm (or an alternative).
1 Context-Adaptive Big Data Stream Mining - Online Appendix Cem Tekin*, Member, IEEE, Luca Canzian, Member, IEEE, Mihaela van der Schaar, Fellow, IEEE Electrical Engineering Department, University of California,
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationStochastic Online Greedy Learning with Semi-bandit Feedbacks
Stochastic Online Greedy Learning with Semi-bandit Feedbacks Tian Lin Tsinghua University Beijing, China lintian06@gmail.com Jian Li Tsinghua University Beijing, China lapordge@gmail.com Wei Chen Microsoft
More informationOnline Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret Raman Arora Toyota Technological Institute at Chicago, Chicago, IL 60637, USA Ofer Dekel Microsoft Research, 1 Microsoft
More informationNotes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking
More informationThe Multiplicative Weights Update method
Chapter 2 The Multiplicative Weights Update method The Multiplicative Weights method is a simple idea which has been repeatedly discovered in fields as diverse as Machine Learning, Optimization, and Game
More informationGambling and Data Compression
Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More information17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
More information1 Portfolio Selection
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # Scribe: Nadia Heninger April 8, 008 Portfolio Selection Last time we discussed our model of the stock market N stocks start on day with
More informationA Potential-based Framework for Online Multi-class Learning with Partial Feedback
A Potential-based Framework for Online Multi-class Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationOnline Convex Optimization
E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting
More informationDynamic Pay-Per-Action Mechanisms and Applications to Online Advertising
Submitted to Operations Research manuscript OPRE-2009-06-288 Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Microsoft Research, Cambridge, MA, hamidnz@microsoft.com
More informationThe Value of Knowing a Demand Curve: Bounds on Regret for On-line Posted-Price Auctions
The Value of Knowing a Demand Curve: Bounds on Regret for On-line Posted-Price Auctions Robert Kleinberg F. Thomson Leighton Abstract We consider the revenue-maximization problem for a seller with an unlimited
More informationAdaptive play in Texas Hold em Poker
Adaptive play in Texas Hold em Poker Raphaël Maîtrepierre and Jérémie Mary and Rémi Munos 1 Abstract. We present a Texas Hold em poker player for limit headsup games. Our bot is designed to adapt automatically
More informationUniversal Algorithm for Trading in Stock Market Based on the Method of Calibration
Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationA Drifting-Games Analysis for Online Learning and Applications to Boosting
A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire
More informationEstimation Bias in Multi-Armed Bandit Algorithms for Search Advertising
Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising Min Xu Machine Learning Department Carnegie Mellon University minx@cs.cmu.edu ao Qin Microsoft Research Asia taoqin@microsoft.com
More informationOnline Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI
Online Combinatorial Optimization under Bandit Feedback MOHAMMAD SADEGH TALEBI MAZRAEH SHAHI Licentiate Thesis Stockholm, Sweden 2016 TRITA-EE 2016:001 ISSN 1653-5146 ISBN 978-91-7595-836-1 KTH Royal Institute
More informationOptimizing engagement in online news
Optimizing engagement in online news University of Amsterdam, Faculty of Science Optimizing engagement in online news For the degree of Master of Science in Artificial Intelligence at University of Amsterdam
More informationOnline Adwords Allocation
Online Adwords Allocation Shoshana Neuburger May 6, 2009 1 Overview Many search engines auction the advertising space alongside search results. When Google interviewed Amin Saberi in 2004, their advertisement
More informationAchieving All with No Parameters: AdaNormalHedge
JMLR: Workshop and Conference Proceedings vol 40:1 19, 2015 Achieving All with No Parameters: AdaNormalHedge Haipeng Luo Department of Computer Science, Princeton University Robert E. Schapire Microsoft
More informationBeat the Mean Bandit
Yisong Yue H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, USA Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY, USA yisongyue@cmu.edu tj@cs.cornell.edu
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationThe Price of Truthfulness for Pay-Per-Click Auctions
The Price of Truthfulness for Pay-Per-Click Auctions ABSTRACT Nikhil R. Devanur Microsoft Research One Microsoft Way Redmond, WA nikdev@microsoft.com We analyze the problem of designing a truthful pay-per-click
More informationSubmodular Function Maximization
Submodular Function Maximization Andreas Krause (ETH Zurich) Daniel Golovin (Google) Submodularity 1 is a property of set functions with deep theoretical consequences and far reaching applications. At
More informationLoad balancing of temporary tasks in the l p norm
Load balancing of temporary tasks in the l p norm Yossi Azar a,1, Amir Epstein a,2, Leah Epstein b,3 a School of Computer Science, Tel Aviv University, Tel Aviv, Israel. b School of Computer Science, The
More informationApplied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More information4.1 Introduction - Online Learning Model
Computational Learning Foundations Fall semester, 2010 Lecture 4: November 7, 2010 Lecturer: Yishay Mansour Scribes: Elad Liebman, Yuval Rochman & Allon Wagner 1 4.1 Introduction - Online Learning Model
More informationDynamic Pay-Per-Action Mechanisms and Applications to Online Advertising
Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising Hamid Nazerzadeh Amin Saberi Rakesh Vohra September 24, 2012 Abstract We examine the problem of allocating an item repeatedly over
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
More informationOnline Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes
Advanced Course in Machine Learning Spring 2011 Online Learning Lecturer: Shai Shalev-Shwartz Scribe: Shai Shalev-Shwartz In this lecture we describe a different model of learning which is called online
More informationA Survey of Preference-based Online Learning with Bandit Algorithms
A Survey of Preference-based Online Learning with Bandit Algorithms Róbert Busa-Fekete and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {busarobi,eyke}@upb.de Abstract.
More informationExact Nonparametric Tests for Comparing Means - A Personal Summary
Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola
More informationKeyword Optimization in Search-Based Advertising Markets
Tel Aviv University Raymond and Beverly Sackler Faculty of Exact Sciences The Blavatnik School of Computer Science Keyword Optimization in Search-Based Advertising Markets Thesis submitted in partial fulfillment
More informationMA4001 Engineering Mathematics 1 Lecture 10 Limits and Continuity
MA4001 Engineering Mathematics 1 Lecture 10 Limits and Dr. Sarah Mitchell Autumn 2014 Infinite limits If f(x) grows arbitrarily large as x a we say that f(x) has an infinite limit. Example: f(x) = 1 x
More informationGoal Problems in Gambling and Game Theory. Bill Sudderth. School of Statistics University of Minnesota
Goal Problems in Gambling and Game Theory Bill Sudderth School of Statistics University of Minnesota 1 Three problems Maximizing the probability of reaching a goal. Maximizing the probability of reaching
More informationLecture 2: Universality
CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal
More informationUnbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms Lihong Li Wei Chu John Langford Xuanhui Wang Yahoo! Labs 701 First Ave, Sunnyvale, CA, USA 94089 {lihong,chuwei,jl,xhwang}@yahoo-inc.com
More information( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )
{ } ( ) = ( ) = {,,, } ( ) β ( ), < 1 ( ) + ( ) = ( ) + ( ) max, ( ) [ ( )] + ( ) [ ( )], [ ( )] [ ( )] = =, ( ) = ( ) = 0 ( ) = ( ) ( ) ( ) =, ( ), ( ) =, ( ), ( ). ln ( ) = ln ( ). + 1 ( ) = ( ) Ω[ (
More informationLecture 8. Confidence intervals and the central limit theorem
Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of
More informationCONSIDER a merchant selling items through e-bay
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 6, NO., JANUARY 5 549 Regret Minimization for Reserve Prices in Second-Price Auctions Nicolò Cesa-Bianchi, Claudio Gentile, and Yishay Mansour Abstract We
More informationMathematical finance and linear programming (optimization)
Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may
More informationConfidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
More informationLARGE CLASSES OF EXPERTS
LARGE CLASSES OF EXPERTS Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 31, 2006 OUTLINE 1 TRACKING THE BEST EXPERT 2 FIXED SHARE FORECASTER 3 VARIABLE-SHARE
More informationECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015
ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1
More informationAlgorithm and the Advantages of a Competitive Approach to Productivity
Online Collaborative Filtering on Graphs Siddhartha Banerjee Department of Management Science and Engineering, Stanford University, Stanford, CA 94025 sidb@stanford.edu Sujay Sanghavi, Sanjay Shakkottai
More informationOnline Semi-Supervised Learning
Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised
More information9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1
9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1 Seffi Naor Computer Science Dept. Technion Haifa, Israel Introduction
More informationConfidence Intervals for Spearman s Rank Correlation
Chapter 808 Confidence Intervals for Spearman s Rank Correlation Introduction This routine calculates the sample size needed to obtain a specified width of Spearman s rank correlation coefficient confidence
More informationAdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz
AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why
More information1 Approximating Set Cover
CS 05: Algorithms (Grad) Feb 2-24, 2005 Approximating Set Cover. Definition An Instance (X, F ) of the set-covering problem consists of a finite set X and a family F of subset of X, such that every elemennt
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationOnline Convex Programming and Generalized Infinitesimal Gradient Ascent
Online Convex Programming and Generalized Infinitesimal Gradient Ascent Martin Zinkevich February 003 CMU-CS-03-110 School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 Abstract Convex
More informationExploitation and Exploration in a Performance based Contextual Advertising System
Exploitation and Exploration in a Performance based Contextual Advertising System ABSTRACT Wei Li, Xuerui Wang, Ruofei Zhang, Ying Cui, Jianchang Mao Yahoo! Labs, Sunnyvale, CA, 94089 {lwei,xuerui,rzhang,ycui,jmao}@yahooinc.com
More informationECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE
ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.
More informationDecentralized control of stochastic multi-agent service system. J. George Shanthikumar Purdue University
Decentralized control of stochastic multi-agent service system J. George Shanthikumar Purdue University Joint work with Huaning Cai, Peng Li and Andrew Lim 1 Many problems can be thought of as stochastic
More informationOnline Learning in Bandit Problems
Online Learning in Bandit Problems by Cem Tekin A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering: Systems) in The University
More information6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation
6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation Daron Acemoglu and Asu Ozdaglar MIT November 2, 2009 1 Introduction Outline The problem of cooperation Finitely-repeated prisoner s dilemma
More informationIntroduction to Online Optimization
Princeton University - Department of Operations Research and Financial Engineering Introduction to Online Optimization Sébastien Bubeck December 14, 2011 1 Contents Chapter 1. Introduction 5 1.1. Statistical
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationBoosting. riedmiller@informatik.uni-freiburg.de
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationMATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationMathematical Finance
Mathematical Finance Option Pricing under the Risk-Neutral Measure Cory Barnes Department of Mathematics University of Washington June 11, 2013 Outline 1 Probability Background 2 Black Scholes for European
More information