Contextual-Bandit Approach to Recommendation Konstantin Knauf



Similar documents
A Contextual-Bandit Approach to Personalized News Article Recommendation

Collaborative Filtering. Radek Pelánek

Handling Advertisements of Unknown Quality in Search Advertising

Un point de vue bayésien pour des algorithmes de bandit plus performants

Introduction to Online Learning Theory

Prerequisites. Course Outline

CoolaData Predictive Analytics

Monotone multi-armed bandit allocations

Programming Using Python

Online Network Revenue Management using Thompson Sampling

Putting IBM Watson to Work In Healthcare

Machine Learning tools for Online Advertisement

Exploitation and Exploration in a Performance based Contextual Advertising System



Online Optimization and Personalization of Teaching Sequences

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

An Introduction to Advanced Analytics and Data Mining

Data Mining Practical Machine Learning Tools and Techniques

2. A typical business process

Information Management course

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

CITY UNIVERSITY OF HONG KONG. Revenue Optimization in Internet Advertising Auctions

STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II)

Feature Selection with Monte-Carlo Tree Search

Predictive Analytics in Pork Production

Big Data for Big Intel

Cost effective Outbreak Detection in Networks

TURKISH ORACLE USER GROUP

Monte-Carlo Methods. Timo Nolle

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Preliminaries: Problem Definition Agent model, POMDP, Bayesian RL

Distributed and Scalable QoS Optimization for Dynamic Web Service Composition

Tracking in flussi video 3D. Ing. Samuele Salti

Challenges of Cloud Scale Natural Language Processing

our algorithms will work and our results will hold even when the context space is discrete given that it is bounded. arm (or an alternative).

SQL Server Virtualization 101. David Klee, Group Principal and Practice Lead. SQL PASS Virtualization VC,

Downloaded from UvA-DARE, the institutional repository of the University of Amsterdam (UvA)

Office: LSK 5045 Begin subject: [ISOM3360]...

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

Web Help Desk. Auto Route & Assign Tickets

Machine Learning for Cyber Security Intelligence

Markov Decision Processes for Ad Network Optimization

Algorithms for the multi-armed bandit problem

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

The Adomaton Prototype: Automated Online Advertising Campaign Monitoring and Optimization

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Microsoft Azure Machine learning Algorithms

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Beyond Traditional Management Reporting IBM Corporation

Data Mining for Knowledge Management. Classification

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

SaPHAL Sales Prediction powered by HANA and Predictive Analytics

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)

Projektgruppe. Categorization of text documents via classification

Linear programming approach for online advertising

Machine Learning using MapReduce

SPSS Modeler Integration with IBM DB2 Analytics Accelerator

Acronis Backup & Recovery 10 Online Initial Seeding Step-by-Step Guide

Session 61 L, Applications of Data Analytics in Health Insurance. Moderator/Presenter: Henning Chiv, FSA, MAAA

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Massive Cloud Auditing using Data Mining on Hadoop

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

Statistics Graduate Courses

Data mining techniques: decision trees

ANALYTICS CENTER LEARNING PROGRAM

Invited Applications Paper

Concept and Project Objectives

Maximizing Return and Minimizing Cost with the Decision Management Systems

TD(0) Leads to Better Policies than Approximate Value Iteration

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Statistical Machine Learning

SAP FINUG Teknologiaseminaari

The Big Data Paradigm Shift. Insight Through Automation

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Java Modules for Time Series Analysis

Advanced Big Data Analytics with R and Hadoop

CONTENTS. List of Figures List of Tables. List of Abbreviations

Transcription:

Contextual-Bandit Approach to Recommendation Konstantin Knauf 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 1

Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 2

Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 3

Scenario: News Article Recommendation Which articles to feature? Challenges: A lot of new users & articles Changing relevance of articles Incorporation of content information Goal: Quickly identify relevant news stories on a personal level 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 4

Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 5

The Original Multi-Armed Bandit Problem Which machine should I play to maximize my overall reward? Exploration Exploitation 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 6

Multi-armed Bandit Model for Online-Recommendation Contextual Bandit Example: News Recommendation 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 7

Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Exploration vs. Exploitation UCB LinUCB Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 8

Algorithms to Balance Exploration & Exploitation Exploration Trade off Exploitation n-bandit-algorithm Context-free 1. -greedy 2. UCB Contextual 1. epoch-greedy 2. LinUCB 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 9

Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Exploration vs. Exploitation UCB LinUCB Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 10

UCB (Upper Confidence Bound) Algorithm Result 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 11

UCB Example Action A 1.2 2 B 2.4 2 C 3.1 1 D 3.9 5 10 Reward 6 5 4 3 2 1 0 Action A Action B Action C Action D 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 12

Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Exploration vs. Exploitation UCB LinUCB Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 13

LinUCB (Disjoint Linear Models) I Algorithm For each Action Reward 9 8 7 6 5 4 3 2 1 0 Context(!) 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 14

LinUCB (Disjoint Linear Models) II Now for the fixed context Reward 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Action A Action B Action C Action D 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 15

LinUCB (Hybrid Linear Models) Algorithm 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 16

Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 17

Evaluation Interactive Algorithm Algorithm User Solution? Testing on Live Data Too Expensive Testing offline Different logging policy Simulator based approach Biased 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 18

Unbiased Evaluation based on Logged Data I Assumption Algorithm 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 19

Assumptions revisited Assumption Are those assumption fulfilled for online news recommendation? Independence Identical Distribution Infinite Stream 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 20

Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation Algorithms to Balance Exploration & Exploitation Evaluating Multi-armed Bandit Algorithms Empirical Results 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 21

Empirical Results Scenario 4.7m events (featured article, infos, click) in tuning set 36m events in test set Articles and Users clustered into 5 categories Two five-dim. feature vectors Results 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Eps-Greedy UCB LinUCB (disjoint) LinUCB (hybrid) 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Rel. CTR 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 22

Questions? 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 23

Backup: LinUCB Example I Assumtpions: 2 users, 3 articles, 2 genres Trial History Actio n User Genr e A 1 1 0 B 1 1 1 A 1 1 0 C 2 2 1 C 2 2 1 A 2 1 0 C 1 2 1 Click 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 24

Backup: LinUCB Example II New Trial: User 1 visits page. Which article do we show? Action A (1,1) 0 1.18 B (1,1) 0.14 0.89 C (1,2) 0.83 1.55 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 25