A Recommendation Engine Exploiting Collective Intelligence on Big Data Luigi Giuri, Executive Chairman Alessandro Negro, CTO luigi.giuri@reco4.com alessandro.negro@reco4.com 1
Outline ü Introduction to recommendations ü Recommenders concepts ü Recommenders in action ü Reco4 Recommendation Engine 2
The questions ü Which digital camera should I buy? ü Which movie should I rent? ü Which web sites will I find interesting? ü What is the best holiday for me and my family? ü Which book should I buy for my next vacation? 3
How to decide? ü Conversations with friends ü Obtaining information from a trusted third party ü Consulting the Internet ü Making a gut decision ü Simply following the crowd 4
A software solution The software system that determines which items should be shown to a particular user is a recommender system. 5
Outline ü Introduction to recommendations ü Recommenders concepts ü Recommenders in action ü Reco4 Recommendation Engine 6
Type of recommenders ü Collaborative Filtering ü Content-based ü Knowledge-based ü Hybrid 7
Collaborative filtering If users shared the same interests in the past, they will also have similar tastes in the future 8
CF analysis ü How do we find users with similar tastes? ü How do we measure similarity? ü Are there other techniques besides looking for similarity? 9
CF approaches ü Memory-based: k-nearest neighborhood User-based Item-based ü Model-based ü Latent factor models Matrix factorization ü Association rule mining 10
User-Item Dataset Iron Man Notting Hill Star Wars Life is Beautiful Louis 5 1 2 Mary 2 3 5 4 Ben Hur Wall-E Cars Tron Matt 4 2 3 Andy 1 2 3 3 Alex 5 4 3 11
Ok, but ü We want to trasform the sparse matrix into a complete matrix ü We do this with predictions of the values for empty cells ü I.e., the algorithms perform rating predictions 12
And then... After predicting the user rating for all interesting items we recommend the top n rated items 13
CF pros and cons Pros No need of item knowledge Very simple data structure Scalable A lot of algorithms are available Cons Cold start problem Over-specialization Difficult explanation Require large user groups 14
Content-based Recommenders Leveraging structured and unstructured data sources to extract item descriptions and a user profile that assigns importance to these characteristics. 15
CB analysis ü How can systems automatically acquire and improve user profiles? ü What techniques can be used to extract the item descriptions? ü How do we determine which items match a user s interests? 16
Data source Title Genre Author Type Keywords Text The Night of the Gun The Lace Reader Into the Fire Memoir David Carr Paperback press and journalism, drug addicion, personal memoirs, New York FicIon, Mystery Romance, Suspense Brunonia Barry Suzanne Brockmann Hardcover Hardcover American contemporary ficion, detecive, historical American ficion, murder, neo- Nazism Nullam non rhoncus nisl, vitae tempor ligula. Nullam vitae faucibus ex. Suspendisse euismod, dui in auctor ornare. Morbi vesibulum ligula vitae augue egestas, in volutpat lorem efficitur. Proin porta erat non ex sagiys, vel sagiys nisl Duis sed laoreet purus. Donec iaculis aliquam justo, a commodo turpis efficitur sit amet. Morbi in nunc euismod, iaculis 17
CB approaches ü Similarity-based retrieval knn Relevance feedback ü Other text classification methods Probabilistic Linear classifiers and machine learning Explicit decision models 18
Item-Feature Dataset Genre Type Keywords The Night of the Gun The Lace Reader Memoir Romance Paperback Hardcover New York detective 1 1 1 Into the Fire 1 1 1 1 19
CB pros and cons Pros Does not require large user groups New items can be immediately recommended Low cost for knowledge acquisition and maintenance Cons Subjective, qualitative item features acquisition User preference elicitation New users Tendency to overfit the training data 20
Knowledge-based recommenders Use structured quality features to derive meansend information about both the current user and the available items. 21
Data source id price ($) mpix opt-zoom LCD-size movies sound waterproof P1 148 8.0 4x 2.5 No No Yes P2 182 8.0 5x 2.7 Yes Yes No P3 189 8.0 10x 2.5 Yes Yes No P4 196 10.0 12x 2.7 Yes No Yes P5 151 7.1 3x 3.0 Yes Yes No P6 199 9.0 3x 3.0 Yes Yes No P7 259 10.0 3x 3.0 Yes Yes No P8 278 9.1 10x 3.0 yes Yes No 22
KB pros and cons Pros Applicable to low purchase rate scenario (cars, computers, houses) No ramp-up problem Conversational recommender Cons Domain specific Knowledge acquisition High user interaction required 23
Hybrid recommenders Combine different techniques to generate better or more precise recommendations 24
Outline ü Introduction to recommendations ü Recommenders concepts ü Recommenders in action ü Reco4 Recommendation Engine 25
Recommender systems ü A system that can recommend or present items to the user based on the user s interests and interactions ü One of the best ways to provide a personalized customer experience ü Built by exploiting collective intelligence or content based approach to perform predictions ü Examples: Amazon, YouTube, Netflix, Yahoo, Tripadvisor, Last.fm, IMDb 26
Why recommender systems ü Standard uses: Increase the number of items sold Sell more diverse items Increase the user satisfaction Increase user fidelity Better understand what the user wants ü Advanced uses: Create ad hoc campaigns (per geographic area, per type of users) Optimize products distribution over a wide area for large retail chains Optimize marketing campaign filtering target 27
Outline ü Introduction to recommendations ü Recommenders concepts ü Recommenders in action ü Reco4 Recommendation Engine 28
Problem ü There are no available software products for state-of-the-art recommender systems ü There is no "best solution ü There is no "one solution fits all ü The Netflix price winner composed 104 different algorithms ü A high-end recommender engine can be built only through expensive custom projects ü Large scale, multi-source datasets require a big data approach 29
Solution: Reco4 Recommender Engine A graph-based recommender engine 30
Reco4 main goals ü Implement the state-of-the-art in the recommendation on top of a graph model ü Provide a complete framework ü Offer an easy to use dashboard/console ü Provide software/cloud services/consultancy 31
Reco4 features ü Core Based on multiple approaches (collaborative filtering, content-based, ) Autonomous and self-learning recommender configuration Persistent and updatable models (multi model supported) Real-time and batch mode of operations ü Algorithms Commercial and research-oriented algorithms Context-aware recommendations Social recommendations ü Operations Cluster and cloud-ready for Big Data Analysis Tested on Oracle Big Data Appliance and Amazon WS Integrated into Oracle Marketing Cloud (Eloqua) 32
Advantage of graph database ü NoSQL database to handle BigData ü Extensibility ü No aggregate-oriented database ü Minimal information needed ü Natural way for representing connections: User - to item Item - to Item User - to User Item to Features ü Graph Based/Social Algorithms ü Graph Partitioning (sharding) ü Performance 33
Reco4 architecture stack 34
Product evolution along 3 directions Algos Model Based Memory Based Content Based Context Awareness Social Network Association Rules Composition Tools GraphDBs Apache Storm Apache Hadoop Distributed Cache Grizzly OpImizaIon/ RealTime SVD PCA Clustering Map/ Reduce Sampling 35
Algorithms roadmap Collaborative filtering: ü Memory based (Neighborhood) User/Item based Several distance algorithms (Cosine, Euclidean, Tanimoto, etc.) Graph based Path Based Similarity (Shortest Path, Number of Paths) Random Walk Similarity (Item Rank, Average first-passage/commute time) ü Model based ü Latent factor Stochastic gradient descendant Alternating least square SVD++ (by Koren) ü Association Rule Mining 36
Algorithms roadmap (cont d) Content Based: ü Latent Semantic Indexing ü Ontology Based Analysis Social recommendation: ü Trust based approach ü Probabilistic approach 37
Algorithms roadmap (cont d) Cross-cutting features (all algos) ü Binary (One class) ü Context awareness (pre- and post-filtering) ü Composability ü Real time ü Parallelization 38
Recommendation model 39
Reco4 Hadoop ü Based on Hadoop 2.x ü Leveraging yarn resource manager ü Tested on local installation and Amazon WS Elastic MR ü Tested on Oracle Big Data Appliance (VM) ü Export graph to HDFS and import result from HDFS to graph ü Algorithms: ü K-nearest neighbour ü Association rule mining ü K-means clustering 40
Reco4 in the cloud ü Recommendation as a service (RaaS) ü Reco4 cloud infrastructure will offer: Pay as you need Pay as you grow Support for burst Periodical analysis at lower costs Test/evaluate several algorithms on a reduced dataset Compose algorithms dynamically Hadoop support 41
Reco4 Console: Dashboard 42
Reco4 Console 43
Reco4 Console: Jobs 44
Reco4 Console: Graph Visualization 45
Current use cases Intelligent Performance Marketing Marketing campaign optimization Intelligent lead scoring Oracle Marketing Cloud (Eloqua) seamless integration Anti-Money Laundering User-Generated video content Gaming/Gambling industry 46
Reco4 and the Oracle stack TransacIonal Data PredicIon Data AnalyIcal Data Feedback OLTP Environments Reco4 Processing Environment Marketing Campaign Management 47
Thank you Luigi Giuri, luigi.giuri@reco4.com Alessandro Negro, alessandro.negro@reco4.com http://www.reco4.com Twitter: @reco4j References: D. Jannach et al, Recommender Systems An Introduction, Cambridge UP F. Ricci et al, Recommender Systems Handbook, Springer 48
49