Contextual-Bandit Approach to Recommendation Konstantin Knauf 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 1
Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 2
Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 3
Scenario: News Article Recommendation Which articles to feature? Challenges: A lot of new users & articles Changing relevance of articles Incorporation of content information Goal: Quickly identify relevant news stories on a personal level 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 4
Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 5
The Original Multi-Armed Bandit Problem Which machine should I play to maximize my overall reward? Exploration Exploitation 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 6
Multi-armed Bandit Model for Online-Recommendation Contextual Bandit Example: News Recommendation 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 7
Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Exploration vs. Exploitation UCB LinUCB Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 8
Algorithms to Balance Exploration & Exploitation Exploration Trade off Exploitation n-bandit-algorithm Context-free 1. -greedy 2. UCB Contextual 1. epoch-greedy 2. LinUCB 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 9
Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Exploration vs. Exploitation UCB LinUCB Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 10
UCB (Upper Confidence Bound) Algorithm Result 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 11
UCB Example Action A 1.2 2 B 2.4 2 C 3.1 1 D 3.9 5 10 Reward 6 5 4 3 2 1 0 Action A Action B Action C Action D 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 12
Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Exploration vs. Exploitation UCB LinUCB Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 13
LinUCB (Disjoint Linear Models) I Algorithm For each Action Reward 9 8 7 6 5 4 3 2 1 0 Context(!) 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 14
LinUCB (Disjoint Linear Models) II Now for the fixed context Reward 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Action A Action B Action C Action D 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 15
LinUCB (Hybrid Linear Models) Algorithm 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 16
Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 17
Evaluation Interactive Algorithm Algorithm User Solution? Testing on Live Data Too Expensive Testing offline Different logging policy Simulator based approach Biased 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 18
Unbiased Evaluation based on Logged Data I Assumption Algorithm 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 19
Assumptions revisited Assumption Are those assumption fulfilled for online news recommendation? Independence Identical Distribution Infinite Stream 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 20
Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 21
Empirical Results Scenario 4.7m events (featured article, infos, click) in tuning set 36m events in test set Articles and Users clustered into 5 categories Two five-dim. feature vectors Results 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Eps-Greedy UCB LinUCB (disjoint) LinUCB (hybrid) 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Rel. CTR 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 22
Questions? 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 23
Backup: LinUCB Example I Assumtpions: 2 users, 3 articles, 2 genres Trial History Actio n User Genr e A 1 1 0 B 1 1 1 A 1 1 0 C 2 2 1 C 2 2 1 A 2 1 0 C 1 2 1 Click 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 24
Backup: LinUCB Example II New Trial: User 1 visits page. Which article do we show? Action A (1,1) 0 1.18 B (1,1) 0.14 0.89 C (1,2) 0.83 1.55 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 25