An Introduction to Game-theoretic Online Learning Tim van Erven

Size: px

Start display at page:

Download "An Introduction to Game-theoretic Online Learning Tim van Erven"

Opal Curtis
7 years ago
Views:

1 General Mathematics Colloquium Leiden, December 4, 2014 An Introduction to Game-theoretic Online Learning Tim van Erven

2 Statistics Bayesian Statistics Frequentist Statistics

3 Statistics Bayesian Statistics Frequentist Statistics A. was here

4 Statistics Bayesian Statistics Frequentist Statistics Rogue Factions

5 Statistics Bayesian Statistics Frequentist Statistics Rogue Factions Adversarial Bandits

6 Statistics Bayesian Statistics Frequentist Statistics Rogue Factions Adversarial Bandits Game-theoretic Online Learning You are here

7 Outline Introduction to Online Learning Game-theoretic Model Regression example: Electricity Classification example: Spam Three algorithms Tuning the Learning Rate

8 Game-Theoretic Online Learning Predict data that arrive one by one Model: repeated game against an adversary Applications: spam detection data compression online convex optimization predicting electricity consumption predicting air pollution levels...

9 Repeated Game (Informally) Sequentially predict outcomes Measure quality of prediction by loss Before predicting, get predictions (=advice) from experts Goal: to predict as well as the best expert over rounds. Data and Advice can be adversarial

10 Repeated Game Every round : 1. Get expert predictions 2. Predict 3. Outcome is revealed 4. Measure losses

11 Repeated Game Every round : 1. Get expert predictions 2. Predict 3. Outcome is revealed 4. Measure losses Best expert: Goal: minimize regret

12 Outline Introduction to Online Learning Game-theoretic Model Regression example: Electricity Classification example: Spam Three algorithms Tuning the Learning Rate

13 Regression Example: Electricity Électricité de France: predict electricity demand one day ahead, every day Experts: complicated regression models Loss: [Devaine, Gaillard, Goude, Stoltz, 2013]

14 Regression Example: Electricity Électricité de France: predict electricity demand one day ahead, every day Experts: complicated regression models Loss: [Devaine, Gaillard, Goude, Stoltz, 2013] Best model after one year: How much worse are we?

15 Outline Introduction to Online Learning Game-theoretic Model Regression example: Electricity Classification example: Spam Three algorithms Tuning the Learning Rate

16 Example: Spam Detection spam ham spam ham ham spam spam ham spam spam

17 Classification Example: Spam Experts: spam detection algorithms Messages: Predictions: Loss: Regret: extra mistakes we make over best algorithm on messages

18 Outline Introduction to Online Learning Three algorithms: 1. Halving 2. Follow the Leader (FTL) 3. Follow the Regularized Leader (FTRL) Tuning the Learning Rate

19 A First Algorithm: Halving Suppose one of the spam detectors is perfect Keep track of experts without mistakes so far: Halving algorithm: Theorem: regret

20 A First Algorithm: Halving Theorem: regret Does not grow with Proof: Suppose halving makes mistakes, regret = Every mistake eliminates at least half of is at most mistakes

21 Outline Introduction to Online Learning Three algorithms: 1. Halving 2. Follow the Leader (FTL) 3. Follow the Regularized Leader (FTRL) Tuning the Learning Rate

22 Follow the Leader Want to remove unrealistic assumption that one expert is perfect FTL: copy the leader's prediction The leader at time : (break ties randomly) where expert k is cumulative loss for

23 FTL Works with Perfect Expert Theorem: Suppose one of the spam detectors is perfect. Then Expected regret Proof: Expected regret = E[nr. mistakes] - 0 Worst case: experts get one loss in turn E[nr. mistakes]

24 FTL: More Good News No assumption of perfect expert Theorem: regret

25 FTL: More Good News No assumption of perfect expert Theorem: regret Proof sketch: No leader change: our loss = loss of leader, so the regret stays the same Leader change: our regret increases at most by 1 (range of losses)

26 FTL: More Good News No assumption of perfect expert Theorem: regret Proof sketch: No leader change: our loss = loss of leader, so the regret stays the same Leader change, our regret increases at most by 1 (range of losses) Works well for i.i.d. losses, because the leader changes only finitely many times w.h.p.

27 FTL on IID Losses 4 experts with Bernoulli 0.1, 0.2, 0.3, 0.4 losses; regret = O(log K)

28 FTL Worst-case Losses

29 FTL Worst-case Losses Two experts with tie/leader change every round: Expert Expert FTL 1/2 1 1/2 1 1/2 1 Both experts have cumulative loss: Regret is linear in Problem: FTL too sure of itself when no ties!

30 Outline Introduction to Online Learning Three algorithms: 1. Halving 2. Follow the Leader (FTL) 3. Follow the Regularized Leader (FTRL) Tuning the Learning Rate

31 Solution: Be Less Sure! Pull FTL choices towards uniform distribution Follow the Leader: leader: as distribution: Follow the Regularized Leader: add penalty for being away from uniform; Kullback-Leibler divergence in this talk

32 The Learning Rate Follow the Regularized Leader: Very sensitive to choice of learning rate Follow the Leader Don't learn at all

33 Outline Introduction to Online Learning Three algorithms Tuning the Learning Rate: Safe tuning The Price of Robustness Learning the Learning Rate

34 The Worst-case Safe Learning Rate Theorem: For FTRL regret regret No (probabilistic) assumptions about data! Optimal is standard in online learning

35 Outline Introduction to Online Learning Three algorithms Tuning the Learning Rate: Safe tuning The Price of Robustness Learning the Learning Rate

36 The Price of Robustness Safe tuning does much worse than FTL on i.i.d. losses

37 The Price of Robustness Method Special Case of FTRL Perfect Expert Data IID Data Worst-case Data Halving no undefined undefined Follow the Leader FTRL with Worstcase Safe Tuning (very large) (small) Can we adapt to optimal eta automatically?

38 Outline Introduction to Online Learning Three algorithms Tuning the Learning Rate: Safe tuning The Price of Robustness Learning the Learning Rate

39 A Failed Approach: Being Meta We want Idea: meta-problem Expert 1: FTL Expert 2: FTRL with Safe Tuning

40 A Failed Approach: Being Meta We want Idea: meta-problem Expert 1: FTL Expert 2: FTRL with Safe Tuning Regret Best of both worlds if meta-regret small!

41 A Failed Approach: Being Meta We want Idea: meta-problem Expert 1: FTL Expert 2: FTRL with Safe Tuning Regret Best of both worlds if meta-regret small! If and, then is too big!

42 Partial Progress Safe tuning: Regret Improvement for small losses: Regret

43 Partial Progress Safe tuning: Regret Improvement for small losses: Regret variance of Variance Bounds: Cesa-Bianchi, Mansour, Stoltz, 2007 ve, Grünwald, Koolen, De Rooij, 2011 De Rooij, ve, Grünwald, Koolen, 2014

44 Partial Progress Safe tuning: Regret Improvement for small losses: Regret variance of Variance Bounds:

45 Regret is a Difficult Function 1/2 All the previous solutions could potentially end up in wrong local minimum

46 Regret is a Difficult Function 2/2 Regret as a function of T for fixed is nonmonotonic. This means some may look very bad for a while, but end up being very good after all How do we see which 's are good? Use (best-possible) monotonic lower-bound per

47 Learning the Learning Rate Koolen, ve, Grünwald, 2014

48 Learning the Learning Rate Track performance for grid of learning rates Switch between them Pay for switching, but not too much Running time as fast as for single fixed does not depend on size of grid

49 Learning the Learning Rate Theorems: } As good as all previous methods For all interesting : polylog(k,t)

50 Learning the Learning Rate Koolen, ve, Grünwald, 2014

51 The Price of Robustness Method Special Case of FTRL Perfect Expert Data IID Data Worst-case Data Compared to Optimal Follow the Leader (very large) no useful guarantees FTRL with Worst-case Safe Tuning (small) no useful guarantees LLR polylog(k,t) factor Can we adapt to optimal eta automatically? Yes!

52 Statistics Summary Bayesian Statistics Frequentist Statistics Rogue Factions Adversarial Bandits Game-theoretic Online Learning Game-theoretic Online Learning You are here e.g. electricity forecasting, spam detection Three algorithms: Halving, Follow the (Regularized) Leader Tuning the Learning Rate: Safe tuning pays the Price of Robustness Learning the learning rate adapts to the optimal learning rate automatically

53 Future Work So far: compete with the best expert Online Convex Optimization: compete with the best convex combination of experts Future work: extend LLR to online convex optimization

54 References Cesa-Bianchi and Lugosi. Prediction, learning, and games Cesa-Bianchi, Mansour, Stoltz. Improved second-order bounds for prediction with expert advice. Machine Learning, 66(2/3): , Devaine, Gaillard, Goude, Stoltz. Forecasting electricity consumption by aggregating specialized experts. Machine Learning, 90(2): , Van Erven, Grünwald, Koolen and De Rooij. Adaptive Hedge. NIPS, De Rooij, Van Erven, Grünwald, Koolen. Follow the Leader If You Can, Hedge If You Must. Journal of Machine Learning Research, Koolen, Van Erven, Grünwald. Learning the Learning Rate for Prediction with Expert Advice. NIPS, 2014.

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider