The Need for Training in Big Data: Experiences and Case Studies

Similar documents
Collaborative Filtering. Radek Pelánek

Big Data and Analytics: Challenges and Opportunities

Factorization Machines

Big Data & Netflix. Paul Ellwood February 9th, 2015

COMP9321 Web Application Engineering

Content-Based Recommendation

AdTheorent s. The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising. The Intelligent Impression TM

RECOMMENDATION SYSTEM

High Productivity Data Processing Analytics Methods with Applications

The Nanotechnology in ecommerce Tao BDTC2014

Statistics for BIG data

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Challenges and Opportunities in Data Mining: Personalization

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

IPTV Recommender Systems. Paolo Cremonesi

Machine Learning using MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce

Big Data Analytics Process & Building Blocks

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Liktap 2012 Search Engine

Big Data Analytics. Lucas Rego Drumond

Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models

Intelligent Web Techniques Web Personalization

Hadoop Parallel Data Processing

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Mobile Real-Time Bidding and Predictive

Similarity Search in a Very Large Scale Using Hadoop and HBase

Analysis of Cloud Solutions for Asset Management

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Making Sense of the Mayhem: Machine Learning and March Madness

Response prediction using collaborative filtering with hierarchies and side-information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Mammoth Scale Machine Learning!

Data Mining in the Swamp

Azure Machine Learning, SQL Data Mining and R

A NOVEL RESEARCH PAPER RECOMMENDATION SYSTEM

itesla Project Innovative Tools for Electrical System Security within Large Areas

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

The Artificial Prediction Market

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Performance Characterization of Game Recommendation Algorithms on Online Social Network Sites

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009

A Study on Security and Privacy in Big Data Processing

A Professional Big Data Master s Program to train Computational Specialists

INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10

Experiments in Web Page Classification for Semantic Web

Big Data Technology Recommendation Challenges in Web Media Sites

Applying Data Science to Sales Pipelines for Fun and Profit

Hybrid model rating prediction with Linked Open Data for Recommender Systems

Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

On the Effectiveness of Obfuscation Techniques in Online Social Networks

arxiv: v1 [cs.ir] 12 Jun 2015

Concept and Project Objectives

BIG DATA: BIG BOOST TO BIG TECH

Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students.

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Advanced Big Data Analytics with R and Hadoop

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

How To Create A Web Alert For A User To Find Interesting Results From A Past Search History On A Web Page

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify

Journée Thématique Big Data 13/03/2015

Profit from Big Data flow. Hospital Revenue Leakage: Minimizing missing charges in hospital systems

Introduction to Hadoop

Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher

Hadoop and Map-Reduce. Swati Gore

MACHINE LEARNING & INTRUSION DETECTION: HYPE OR REALITY?

Machine Learning Capacity and Performance Analysis and R

How To Improve Your Profit With Optimized Prediction

Big Graph Processing: Some Background

Classification of Bad Accounts in Credit Card Industry

Big Data Explained. An introduction to Big Data Science.

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Big Data Analytics CSCI 4030

Big & Personal: data and models behind Netflix recommendations

Big Data Big Deal? Salford Systems

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Hospital Billing Optimizer: Advanced Analytics Solution to Minimize Hospital Systems Revenue Leakage

A Brief Outline on Bigdata Hadoop

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Scaling Up HBase, Hive, Pegasus

Invited Applications Paper

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP

Transcription:

The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon

Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor at Purdue professor at Georgia Tech part time scientist at Yahoo visiting scientist at Google manager of scientists at Amazon.

Extracting Meaning from Big Data - Skills Extracting meaning from big data requires skills in: Computing and software engineering Machine Learning, statistics, and optimization Product sense and careful experimentation All three areas are important. It is rare to find people with skills in all three areas. Companies are fighting tooth and nail over the few people with skills in all three categories

Case Study: Recommendation Systems Recommendation systems technology is essential to many companies: Netflix (recommending movies) Amazon (recommending products) Pandora (recommending music) Facebook (recommending friends, recommending posts in stream) Google ( recommending ads) Huge progress has been made, but we are still in the early stages. To overcome the many remaining challenges we need people with skills in computing, machine learning, product sense.

Recommendation Systems: Historical Perspective Ancient history: 1990s Given some pairs of (user,item) ratings, predict ratings of other pairs. Similarity based recommendation systems: R ui = c u + u α u,u R u i where α u,u expresses similarity between users u, u, or R ui = c i + i α i,i R ui where α i,i expresses similarity between users i, i.

Historical Perspective Middle Ages: early 2000s Matrix Completion Perspective Given M a1,b 1,..., M am,b m predict unobserved entries of M R n1 n2 M a,b is the rating given by user a to item b. Many possible completions of M consistent with training data Standard practice: favor low-rank ( simple ) completions of M M = UV R n1 n2, (U, V ) = arg min U,V U R n1 r V R n2 r r min(n 1, n 2 ) ([UV T ] a,b M a,b ) 2 (a,b) A

Historical Perspective Middle Ages: early 2000s Matrix Completion Perspective Given M a1,b 1,..., M am,b m predict unobserved entries of M R n1 n2 M a,b is the rating given by user a to item b. Many possible completions of M consistent with training data Standard practice: favor low-rank ( simple ) completions of M M = UV R n1 n2, (U, V ) = arg min U,V U R n1 r V R n2 r r min(n 1, n 2 ) ([UV T ] a,b M a,b ) 2 (a,b) A

Historical Perspective Middle Ages: early 2000s Matrix Completion Perspective Given M a1,b 1,..., M am,b m predict unobserved entries of M R n1 n2 M a,b is the rating given by user a to item b. Many possible completions of M consistent with training data Standard practice: favor low-rank ( simple ) completions of M M = UV R n1 n2, (U, V ) = arg min U,V U R n1 r V R n2 r r min(n 1, n 2 ) ([UV T ] a,b M a,b ) 2 (a,b) A

Historical Perspective Middle Ages: early 2000s Matrix Completion Perspective Given M a1,b 1,..., M am,b m predict unobserved entries of M R n1 n2 M a,b is the rating given by user a to item b. Many possible completions of M consistent with training data Standard practice: favor low-rank ( simple ) completions of M M = UV R n1 n2, (U, V ) = arg min U,V U R n1 r V R n2 r r min(n 1, n 2 ) ([UV T ] a,b M a,b ) 2 (a,b) A

X =

Part 1: Skills needed to construct modern recommendation system in industry Part 2: Skills needed to innovate in recommendation system in industry setting Based on my experience working on recommendation systems at Google and Amazon

Constructing MF Models (ML) ML challenge: non-linear optimization over a high dimensional vector space with big-data Constructing modern matrix factorization models (in industry) requires: understanding non-linear optimization algorithms and how to implement them, including online methods such as stochastic gradient descent understanding practical tips and tricks such as momentum, step size selection, etc. understanding major issues in machine learning such as overfitting

Constructing MF Models (ML) ML challenge: non-linear optimization over a high dimensional vector space with big-data Constructing modern matrix factorization models (in industry) requires: understanding non-linear optimization algorithms and how to implement them, including online methods such as stochastic gradient descent understanding practical tips and tricks such as momentum, step size selection, etc. understanding major issues in machine learning such as overfitting

Constructing MF Models (ML) ML challenge: non-linear optimization over a high dimensional vector space with big-data Constructing modern matrix factorization models (in industry) requires: understanding non-linear optimization algorithms and how to implement them, including online methods such as stochastic gradient descent understanding practical tips and tricks such as momentum, step size selection, etc. understanding major issues in machine learning such as overfitting

Constructing MF Models (Computing) Computing challenge: getting data, processing big data in train time, maintaining SLA in test time, service oriented architecture Constructing modern matrix factorization models (in industry) requires: writing production code in C++ or Java getting data: querying SQL/NoSQL databases processing data: parallel and distributed computing for big data (e.g., Map-Reduce) software engineering practices: version control, code documentation, build tools, unit tests, integration tests very efficient scoring in serve time (SLA) constructing multiple software services that communicate with each other

Constructing MF Models (Computing) Computing challenge: getting data, processing big data in train time, maintaining SLA in test time, service oriented architecture Constructing modern matrix factorization models (in industry) requires: writing production code in C++ or Java getting data: querying SQL/NoSQL databases processing data: parallel and distributed computing for big data (e.g., Map-Reduce) software engineering practices: version control, code documentation, build tools, unit tests, integration tests very efficient scoring in serve time (SLA) constructing multiple software services that communicate with each other

Constructing MF Models (Computing) Computing challenge: getting data, processing big data in train time, maintaining SLA in test time, service oriented architecture Constructing modern matrix factorization models (in industry) requires: writing production code in C++ or Java getting data: querying SQL/NoSQL databases processing data: parallel and distributed computing for big data (e.g., Map-Reduce) software engineering practices: version control, code documentation, build tools, unit tests, integration tests very efficient scoring in serve time (SLA) constructing multiple software services that communicate with each other

Constructing MF Models (Computing) Computing challenge: getting data, processing big data in train time, maintaining SLA in test time, service oriented architecture Constructing modern matrix factorization models (in industry) requires: writing production code in C++ or Java getting data: querying SQL/NoSQL databases processing data: parallel and distributed computing for big data (e.g., Map-Reduce) software engineering practices: version control, code documentation, build tools, unit tests, integration tests very efficient scoring in serve time (SLA) constructing multiple software services that communicate with each other

Constructing MF Models (Computing) Computing challenge: getting data, processing big data in train time, maintaining SLA in test time, service oriented architecture Constructing modern matrix factorization models (in industry) requires: writing production code in C++ or Java getting data: querying SQL/NoSQL databases processing data: parallel and distributed computing for big data (e.g., Map-Reduce) software engineering practices: version control, code documentation, build tools, unit tests, integration tests very efficient scoring in serve time (SLA) constructing multiple software services that communicate with each other

Constructing MF Models (Computing) Computing challenge: getting data, processing big data in train time, maintaining SLA in test time, service oriented architecture Constructing modern matrix factorization models (in industry) requires: writing production code in C++ or Java getting data: querying SQL/NoSQL databases processing data: parallel and distributed computing for big data (e.g., Map-Reduce) software engineering practices: version control, code documentation, build tools, unit tests, integration tests very efficient scoring in serve time (SLA) constructing multiple software services that communicate with each other

Constructing MF Models (Product) Product challenge: addressing (very important) details, putting together a meaningful and practical evaluation process (A/B test) Design an online evaluation process that measures business goals (customer engagement, sales), and that can reach statistical significant within reasonable time, and yet avoid breaking customer trust by testing poor models train based on recent behavior or all available history? emphasize popular items or long tail? how many items should be shown? should some items be omitted (adult or otherwise controversial content)? how can the product be modified so that more training data is collected or solicited from users?

Constructing MF Models (Product) Product challenge: addressing (very important) details, putting together a meaningful and practical evaluation process (A/B test) Design an online evaluation process that measures business goals (customer engagement, sales), and that can reach statistical significant within reasonable time, and yet avoid breaking customer trust by testing poor models train based on recent behavior or all available history? emphasize popular items or long tail? how many items should be shown? should some items be omitted (adult or otherwise controversial content)? how can the product be modified so that more training data is collected or solicited from users?

Constructing MF Models (Product) Product challenge: addressing (very important) details, putting together a meaningful and practical evaluation process (A/B test) Design an online evaluation process that measures business goals (customer engagement, sales), and that can reach statistical significant within reasonable time, and yet avoid breaking customer trust by testing poor models train based on recent behavior or all available history? emphasize popular items or long tail? how many items should be shown? should some items be omitted (adult or otherwise controversial content)? how can the product be modified so that more training data is collected or solicited from users?

Constructing MF Models (Product) Product challenge: addressing (very important) details, putting together a meaningful and practical evaluation process (A/B test) Design an online evaluation process that measures business goals (customer engagement, sales), and that can reach statistical significant within reasonable time, and yet avoid breaking customer trust by testing poor models train based on recent behavior or all available history? emphasize popular items or long tail? how many items should be shown? should some items be omitted (adult or otherwise controversial content)? how can the product be modified so that more training data is collected or solicited from users?

Constructing MF Models (Product) Product challenge: addressing (very important) details, putting together a meaningful and practical evaluation process (A/B test) Design an online evaluation process that measures business goals (customer engagement, sales), and that can reach statistical significant within reasonable time, and yet avoid breaking customer trust by testing poor models train based on recent behavior or all available history? emphasize popular items or long tail? how many items should be shown? should some items be omitted (adult or otherwise controversial content)? how can the product be modified so that more training data is collected or solicited from users?

Constructing MF Models (Product) Product challenge: addressing (very important) details, putting together a meaningful and practical evaluation process (A/B test) Design an online evaluation process that measures business goals (customer engagement, sales), and that can reach statistical significant within reasonable time, and yet avoid breaking customer trust by testing poor models train based on recent behavior or all available history? emphasize popular items or long tail? how many items should be shown? should some items be omitted (adult or otherwise controversial content)? how can the product be modified so that more training data is collected or solicited from users?

Innovation in Recommendation Systems Difficulties in conducting academic research on practical recommendation systems: accuracy in offline score prediction is poorly correlated with industry metrics (user engagement, sales) lack of datasets that reflect the practical scenario in industry (explicit vs. implicit data, context, content-based filtering vs. collaborative filtering) The biggest potential for impactful innovation lies in: (a) understanding the data, (b) evaluation.

Historical Perspective Industrial Revolution - early 2000s: Netflix Competition Netflix released a dataset of anonymized (user, item) ratings and a one million dollar prize to top performing team. Data was significantly larger than what was previously available. This introduced some scalability arguments into the modeling process. Introduced a common evaluation methodology and a clear way of defining a winner. Huge boost to the field in terms of number of research papers and interest from the academic community. Ended in an embarrassment as researchers from UT Austin de-anonymized the data by successfully joining it with IMDB records. Netflix took the data down and is facing a lawsuit.

Historical Perspective Industrial Revolution - early 2000s: Netflix Competition Netflix released a dataset of anonymized (user, item) ratings and a one million dollar prize to top performing team. Data was significantly larger than what was previously available. This introduced some scalability arguments into the modeling process. Introduced a common evaluation methodology and a clear way of defining a winner. Huge boost to the field in terms of number of research papers and interest from the academic community. Ended in an embarrassment as researchers from UT Austin de-anonymized the data by successfully joining it with IMDB records. Netflix took the data down and is facing a lawsuit.

Historical Perspective Industrial Revolution - early 2000s: Netflix Competition Netflix released a dataset of anonymized (user, item) ratings and a one million dollar prize to top performing team. Data was significantly larger than what was previously available. This introduced some scalability arguments into the modeling process. Introduced a common evaluation methodology and a clear way of defining a winner. Huge boost to the field in terms of number of research papers and interest from the academic community. Ended in an embarrassment as researchers from UT Austin de-anonymized the data by successfully joining it with IMDB records. Netflix took the data down and is facing a lawsuit.

Historical Perspective Industrial Revolution - early 2000s: Netflix Competition Netflix released a dataset of anonymized (user, item) ratings and a one million dollar prize to top performing team. Data was significantly larger than what was previously available. This introduced some scalability arguments into the modeling process. Introduced a common evaluation methodology and a clear way of defining a winner. Huge boost to the field in terms of number of research papers and interest from the academic community. Ended in an embarrassment as researchers from UT Austin de-anonymized the data by successfully joining it with IMDB records. Netflix took the data down and is facing a lawsuit.

Historical Perspective Industrial Revolution - early 2000s: Netflix Competition Netflix released a dataset of anonymized (user, item) ratings and a one million dollar prize to top performing team. Data was significantly larger than what was previously available. This introduced some scalability arguments into the modeling process. Introduced a common evaluation methodology and a clear way of defining a winner. Huge boost to the field in terms of number of research papers and interest from the academic community. Ended in an embarrassment as researchers from UT Austin de-anonymized the data by successfully joining it with IMDB records. Netflix took the data down and is facing a lawsuit.

Evaluation Standard evaluation methods based on rating prediction have very low correlation with what matters in industry: revenue and user engagement. Unfortunately, this is also true for evaluation methods based on NDCG or Precision@k. Traditional measures partition data into train and test, train models on train set, and measure precision or a related metric on the held out test set. In reality, what really matters are the action taken by users when presented with a specific recommendation.

Evaluation Standard evaluation methods based on rating prediction have very low correlation with what matters in industry: revenue and user engagement. Unfortunately, this is also true for evaluation methods based on NDCG or Precision@k. Traditional measures partition data into train and test, train models on train set, and measure precision or a related metric on the held out test set. In reality, what really matters are the action taken by users when presented with a specific recommendation.

Evaluation Standard evaluation methods based on rating prediction have very low correlation with what matters in industry: revenue and user engagement. Unfortunately, this is also true for evaluation methods based on NDCG or Precision@k. Traditional measures partition data into train and test, train models on train set, and measure precision or a related metric on the held out test set. In reality, what really matters are the action taken by users when presented with a specific recommendation.

Evaluation Standard evaluation methods based on rating prediction have very low correlation with what matters in industry: revenue and user engagement. Unfortunately, this is also true for evaluation methods based on NDCG or Precision@k. Traditional measures partition data into train and test, train models on train set, and measure precision or a related metric on the held out test set. In reality, what really matters are the action taken by users when presented with a specific recommendation.

Evaluation Traditional measures assume: action of users in train and test sets are independent of their context choices of what to view or buy are independent of options displayed to the user. patently false when number of recommended items is large; users make choices based on small number of alternative they consider at the moment. As a result: traditional measures based on train/test set partition do not correlate well with revenue or user engagement in an A/B test. Method that perform well in A/B test and in launch may not be the winning methods in studies that use traditional evaluation.

Evaluation Traditional measures assume: action of users in train and test sets are independent of their context choices of what to view or buy are independent of options displayed to the user. patently false when number of recommended items is large; users make choices based on small number of alternative they consider at the moment. As a result: traditional measures based on train/test set partition do not correlate well with revenue or user engagement in an A/B test. Method that perform well in A/B test and in launch may not be the winning methods in studies that use traditional evaluation.

Evaluation Traditional measures assume: action of users in train and test sets are independent of their context choices of what to view or buy are independent of options displayed to the user. patently false when number of recommended items is large; users make choices based on small number of alternative they consider at the moment. As a result: traditional measures based on train/test set partition do not correlate well with revenue or user engagement in an A/B test. Method that perform well in A/B test and in launch may not be the winning methods in studies that use traditional evaluation.

Evaluation Traditional measures assume: action of users in train and test sets are independent of their context choices of what to view or buy are independent of options displayed to the user. patently false when number of recommended items is large; users make choices based on small number of alternative they consider at the moment. As a result: traditional measures based on train/test set partition do not correlate well with revenue or user engagement in an A/B test. Method that perform well in A/B test and in launch may not be the winning methods in studies that use traditional evaluation.

Evaluation Traditional measures assume: action of users in train and test sets are independent of their context choices of what to view or buy are independent of options displayed to the user. patently false when number of recommended items is large; users make choices based on small number of alternative they consider at the moment. As a result: traditional measures based on train/test set partition do not correlate well with revenue or user engagement in an A/B test. Method that perform well in A/B test and in launch may not be the winning methods in studies that use traditional evaluation.

New Evaluation Methodology Study correlations between existing evaluation methods and increased user engagement in A/B test New offline evaluation methods that take context into account (what was displayed to the user when they took their action) may have higher correlation with real world success. Potential breakthrough: efficient search in the space of model possibilities that maximizes A/B test performance human gradient descent? challenges: what happens to statistical significance, p-values?

New Evaluation Methodology Study correlations between existing evaluation methods and increased user engagement in A/B test New offline evaluation methods that take context into account (what was displayed to the user when they took their action) may have higher correlation with real world success. Potential breakthrough: efficient search in the space of model possibilities that maximizes A/B test performance human gradient descent? challenges: what happens to statistical significance, p-values?

New Evaluation Methodology Study correlations between existing evaluation methods and increased user engagement in A/B test New offline evaluation methods that take context into account (what was displayed to the user when they took their action) may have higher correlation with real world success. Potential breakthrough: efficient search in the space of model possibilities that maximizes A/B test performance human gradient descent? challenges: what happens to statistical significance, p-values?

New Evaluation Methodology Study correlations between existing evaluation methods and increased user engagement in A/B test New offline evaluation methods that take context into account (what was displayed to the user when they took their action) may have higher correlation with real world success. Potential breakthrough: efficient search in the space of model possibilities that maximizes A/B test performance human gradient descent? challenges: what happens to statistical significance, p-values?

New Evaluation Methodology Study correlations between existing evaluation methods and increased user engagement in A/B test New offline evaluation methods that take context into account (what was displayed to the user when they took their action) may have higher correlation with real world success. Potential breakthrough: efficient search in the space of model possibilities that maximizes A/B test performance human gradient descent? challenges: what happens to statistical significance, p-values?

Datasets Few public recommendation systems datasets are available, and vast majority of academic research focuses on modeling these datasets. By far the most popular task is predict five star ratings based on Movielens/EachMovie/Netflix dataset. This has led to an obsession with (often imaginary) incremental gains in modeling these datasets.

Datasets Few public recommendation systems datasets are available, and vast majority of academic research focuses on modeling these datasets. By far the most popular task is predict five star ratings based on Movielens/EachMovie/Netflix dataset. This has led to an obsession with (often imaginary) incremental gains in modeling these datasets.

Datasets Few public recommendation systems datasets are available, and vast majority of academic research focuses on modeling these datasets. By far the most popular task is predict five star ratings based on Movielens/EachMovie/Netflix dataset. This has led to an obsession with (often imaginary) incremental gains in modeling these datasets.

Datasets Predicting five star ratings for Netflix data has little relevance to real world applications In real world we don t care about predicting ratings; we care about increasing user engagement and revenue. In real world we know more information on users and items (not just user ID and item ID). Recommendation systems can use additional information such as user profile, address, etc. In real world we often deal with implicit binary ratings (impressions, purchases) rather than explicit five star ratings. In explicit ratings, training data consists of both positive and negative feedback while in implicit ratings all training data is positive. Public datasets are typically much smaller than industry data, making it impossible to seriously explore scalability issues.

Datasets Predicting five star ratings for Netflix data has little relevance to real world applications In real world we don t care about predicting ratings; we care about increasing user engagement and revenue. In real world we know more information on users and items (not just user ID and item ID). Recommendation systems can use additional information such as user profile, address, etc. In real world we often deal with implicit binary ratings (impressions, purchases) rather than explicit five star ratings. In explicit ratings, training data consists of both positive and negative feedback while in implicit ratings all training data is positive. Public datasets are typically much smaller than industry data, making it impossible to seriously explore scalability issues.

Datasets Predicting five star ratings for Netflix data has little relevance to real world applications In real world we don t care about predicting ratings; we care about increasing user engagement and revenue. In real world we know more information on users and items (not just user ID and item ID). Recommendation systems can use additional information such as user profile, address, etc. In real world we often deal with implicit binary ratings (impressions, purchases) rather than explicit five star ratings. In explicit ratings, training data consists of both positive and negative feedback while in implicit ratings all training data is positive. Public datasets are typically much smaller than industry data, making it impossible to seriously explore scalability issues.

Datasets Predicting five star ratings for Netflix data has little relevance to real world applications In real world we don t care about predicting ratings; we care about increasing user engagement and revenue. In real world we know more information on users and items (not just user ID and item ID). Recommendation systems can use additional information such as user profile, address, etc. In real world we often deal with implicit binary ratings (impressions, purchases) rather than explicit five star ratings. In explicit ratings, training data consists of both positive and negative feedback while in implicit ratings all training data is positive. Public datasets are typically much smaller than industry data, making it impossible to seriously explore scalability issues.

Implicit Ratings Treat positive ratings (impressions, purchases) as a rating of 1 and all other signal as a rating of 0. Use ranked loss minimization over the above signal (favor ordering all impressions/purchases over lack of impressions/purchases) When number of items is large, negative signal formed by lack of impressions/purchases is overwhelming in size. Solution: Sample items marking lack of impressions/purchases and apply loss function only to that sample. This assumes that whatever was not viewed or bought was not done so on purpose.

Implicit Ratings Treat positive ratings (impressions, purchases) as a rating of 1 and all other signal as a rating of 0. Use ranked loss minimization over the above signal (favor ordering all impressions/purchases over lack of impressions/purchases) When number of items is large, negative signal formed by lack of impressions/purchases is overwhelming in size. Solution: Sample items marking lack of impressions/purchases and apply loss function only to that sample. This assumes that whatever was not viewed or bought was not done so on purpose.

Implicit Ratings Treat positive ratings (impressions, purchases) as a rating of 1 and all other signal as a rating of 0. Use ranked loss minimization over the above signal (favor ordering all impressions/purchases over lack of impressions/purchases) When number of items is large, negative signal formed by lack of impressions/purchases is overwhelming in size. Solution: Sample items marking lack of impressions/purchases and apply loss function only to that sample. This assumes that whatever was not viewed or bought was not done so on purpose.

Implicit Ratings Treat positive ratings (impressions, purchases) as a rating of 1 and all other signal as a rating of 0. Use ranked loss minimization over the above signal (favor ordering all impressions/purchases over lack of impressions/purchases) When number of items is large, negative signal formed by lack of impressions/purchases is overwhelming in size. Solution: Sample items marking lack of impressions/purchases and apply loss function only to that sample. This assumes that whatever was not viewed or bought was not done so on purpose.