A Recommendation Engine Exploiting Collective Intelligence on Big Data



Similar documents
Poster Design Tips. Academic Technology Center

lloyd s coverholders brand GUIDELINES

Collaborative Filtering. Radek Pelánek

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Software Engineering Research Group MSc Thesis Style

The Need for Training in Big Data: Experiences and Case Studies

The package provides not only Roman fonts, but also sans serif fonts and

Print Less. Save More.

Content-Based Recommendation

IPTV Recommender Systems. Paolo Cremonesi

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Have a question? Talk to us...

A Crash Course in Internet Marketing.» A Crash Course in Internet Marketing

WNM Visual Design & Typography Academy of Art University Jessica Hall - halica84@gmail.com

Big Data and Analytics: Challenges and Opportunities

COMP9321 Web Application Engineering

! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II

(or remove the package call from the preamble of this document).

IDENTITY BRANDING DANIEL DURKEE

Hybrid model rating prediction with Linked Open Data for Recommender Systems

How To Make Sense Of Data With Altilia

Canada. MEETING AND TRADESHOW PUBLIC RELATIONS: A HOW-TO GUIDE Get the Most out of Your Meeting and Tradeshow Investment. June 8 12 HOW-TO GUIDE

Advanced In-Database Analytics

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

vitae. and Lorem store this ipsum data. dolor sit amet, consectetur adipiscing elit. Integer ornare mi id mi pellentesque

TRAINING PROGRAM ON BIGDATA/HADOOP

Understanding Microsoft s BI Tools

Machine Learning using MapReduce

Mammoth Scale Machine Learning!

Public Relations: A How-To Guide for SNMMI Chapters

Sample Brand Strategy. // LAST MODIFIED May 14, 2014 BY CHRIS FORD //

Using Data Mining and Machine Learning in Retail

NoSQL and Hadoop Technologies On Oracle Cloud

Setting Up Your Website Using C# and C9

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Recommending News Articles using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1

ibooks Identity Guidelines September 2013

IT services for analyses of various data samples

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI

Challenges for Data Driven Systems

Inventory Planning Methods: The Proper Approach to Inventory Planning

Graphic Identity Standards Guide

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Graphic Standards Marketing Department. Hands-on education for real-world achievement.

IEEE JAVA TITLES

NEW AND UNIFIED TEMPLATES FOR CANADIAN ACOUSTICS ARTICLES

Map-Reduce for Machine Learning on Multicore

The 4 Pillars of Technosoft s Big Data Practice

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Lecture #2. Algorithms for Big Data

Big Data and Data Science: Behind the Buzz Words

How To Use Spagobi Suite

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Data Warehouse design

PREVIEW Health Plans. Partner Resources Small Businesses Medi-Cal. Other Languages

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

RECOMMENDATION SYSTEM

Big Data and Scripting Systems build on top of Hadoop

BIG DATA TRENDS AND TECHNOLOGIES

Data Mining for Web Personalization

A NOVEL RESEARCH PAPER RECOMMENDATION SYSTEM

Branding Standards Draft 2 - May 2012

CONTENTS Colors Typography Logos Photography Letterhead

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

ACS Mexico Our Goal Is Service

Search and Real-Time Analytics on Big Data

How To Use Big Data For Telco (For A Telco)

Oracle Big Data SQL Technical Update

An Introduction to Data Mining

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Logo and Design Guidelines for Solution Partners

Getting Started Practical Input For Your Roadmap

Hadoop & Spark Using Amazon EMR

OVERVIEW. Team Valio. Brief from Valio. Testing Lohkeava Yoghurt. Current Packaging Analysis. Research. Concepts 1-6. Campaign

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

A Brief Outline on Bigdata Hadoop

CloudRank-D:A Benchmark Suite for Private Cloud Systems

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

The University of Jordan

Le book.

vehicle tracking & fleet management system

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Performance Characterization of Game Recommendation Algorithms on Online Social Network Sites

Manifest for Big Data Pig, Hive & Jaql

An Intelligent Matching System for the Products of Small Business/Manufactures with the Celebrities

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Social Business Intelligence For Retail Industry

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Oracle Big Data Handbook

lloyd s BROKERs brand guidelines

Milestone Marketing Method

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Integrating Big Data into the Computing Curricula

Transcription:

A Recommendation Engine Exploiting Collective Intelligence on Big Data Luigi Giuri, Executive Chairman Alessandro Negro, CTO luigi.giuri@reco4.com alessandro.negro@reco4.com 1

Outline ü Introduction to recommendations ü Recommenders concepts ü Recommenders in action ü Reco4 Recommendation Engine 2

The questions ü Which digital camera should I buy? ü Which movie should I rent? ü Which web sites will I find interesting? ü What is the best holiday for me and my family? ü Which book should I buy for my next vacation? 3

How to decide? ü Conversations with friends ü Obtaining information from a trusted third party ü Consulting the Internet ü Making a gut decision ü Simply following the crowd 4

A software solution The software system that determines which items should be shown to a particular user is a recommender system. 5

Outline ü Introduction to recommendations ü Recommenders concepts ü Recommenders in action ü Reco4 Recommendation Engine 6

Type of recommenders ü Collaborative Filtering ü Content-based ü Knowledge-based ü Hybrid 7

Collaborative filtering If users shared the same interests in the past, they will also have similar tastes in the future 8

CF analysis ü How do we find users with similar tastes? ü How do we measure similarity? ü Are there other techniques besides looking for similarity? 9

CF approaches ü Memory-based: k-nearest neighborhood User-based Item-based ü Model-based ü Latent factor models Matrix factorization ü Association rule mining 10

User-Item Dataset Iron Man Notting Hill Star Wars Life is Beautiful Louis 5 1 2 Mary 2 3 5 4 Ben Hur Wall-E Cars Tron Matt 4 2 3 Andy 1 2 3 3 Alex 5 4 3 11

Ok, but ü We want to trasform the sparse matrix into a complete matrix ü We do this with predictions of the values for empty cells ü I.e., the algorithms perform rating predictions 12

And then... After predicting the user rating for all interesting items we recommend the top n rated items 13

CF pros and cons Pros No need of item knowledge Very simple data structure Scalable A lot of algorithms are available Cons Cold start problem Over-specialization Difficult explanation Require large user groups 14

Content-based Recommenders Leveraging structured and unstructured data sources to extract item descriptions and a user profile that assigns importance to these characteristics. 15

CB analysis ü How can systems automatically acquire and improve user profiles? ü What techniques can be used to extract the item descriptions? ü How do we determine which items match a user s interests? 16

Data source Title Genre Author Type Keywords Text The Night of the Gun The Lace Reader Into the Fire Memoir David Carr Paperback press and journalism, drug addicion, personal memoirs, New York FicIon, Mystery Romance, Suspense Brunonia Barry Suzanne Brockmann Hardcover Hardcover American contemporary ficion, detecive, historical American ficion, murder, neo- Nazism Nullam non rhoncus nisl, vitae tempor ligula. Nullam vitae faucibus ex. Suspendisse euismod, dui in auctor ornare. Morbi vesibulum ligula vitae augue egestas, in volutpat lorem efficitur. Proin porta erat non ex sagiys, vel sagiys nisl Duis sed laoreet purus. Donec iaculis aliquam justo, a commodo turpis efficitur sit amet. Morbi in nunc euismod, iaculis 17

CB approaches ü Similarity-based retrieval knn Relevance feedback ü Other text classification methods Probabilistic Linear classifiers and machine learning Explicit decision models 18

Item-Feature Dataset Genre Type Keywords The Night of the Gun The Lace Reader Memoir Romance Paperback Hardcover New York detective 1 1 1 Into the Fire 1 1 1 1 19

CB pros and cons Pros Does not require large user groups New items can be immediately recommended Low cost for knowledge acquisition and maintenance Cons Subjective, qualitative item features acquisition User preference elicitation New users Tendency to overfit the training data 20

Knowledge-based recommenders Use structured quality features to derive meansend information about both the current user and the available items. 21

Data source id price ($) mpix opt-zoom LCD-size movies sound waterproof P1 148 8.0 4x 2.5 No No Yes P2 182 8.0 5x 2.7 Yes Yes No P3 189 8.0 10x 2.5 Yes Yes No P4 196 10.0 12x 2.7 Yes No Yes P5 151 7.1 3x 3.0 Yes Yes No P6 199 9.0 3x 3.0 Yes Yes No P7 259 10.0 3x 3.0 Yes Yes No P8 278 9.1 10x 3.0 yes Yes No 22

KB pros and cons Pros Applicable to low purchase rate scenario (cars, computers, houses) No ramp-up problem Conversational recommender Cons Domain specific Knowledge acquisition High user interaction required 23

Hybrid recommenders Combine different techniques to generate better or more precise recommendations 24

Outline ü Introduction to recommendations ü Recommenders concepts ü Recommenders in action ü Reco4 Recommendation Engine 25

Recommender systems ü A system that can recommend or present items to the user based on the user s interests and interactions ü One of the best ways to provide a personalized customer experience ü Built by exploiting collective intelligence or content based approach to perform predictions ü Examples: Amazon, YouTube, Netflix, Yahoo, Tripadvisor, Last.fm, IMDb 26

Why recommender systems ü Standard uses: Increase the number of items sold Sell more diverse items Increase the user satisfaction Increase user fidelity Better understand what the user wants ü Advanced uses: Create ad hoc campaigns (per geographic area, per type of users) Optimize products distribution over a wide area for large retail chains Optimize marketing campaign filtering target 27

Outline ü Introduction to recommendations ü Recommenders concepts ü Recommenders in action ü Reco4 Recommendation Engine 28

Problem ü There are no available software products for state-of-the-art recommender systems ü There is no "best solution ü There is no "one solution fits all ü The Netflix price winner composed 104 different algorithms ü A high-end recommender engine can be built only through expensive custom projects ü Large scale, multi-source datasets require a big data approach 29

Solution: Reco4 Recommender Engine A graph-based recommender engine 30

Reco4 main goals ü Implement the state-of-the-art in the recommendation on top of a graph model ü Provide a complete framework ü Offer an easy to use dashboard/console ü Provide software/cloud services/consultancy 31

Reco4 features ü Core Based on multiple approaches (collaborative filtering, content-based, ) Autonomous and self-learning recommender configuration Persistent and updatable models (multi model supported) Real-time and batch mode of operations ü Algorithms Commercial and research-oriented algorithms Context-aware recommendations Social recommendations ü Operations Cluster and cloud-ready for Big Data Analysis Tested on Oracle Big Data Appliance and Amazon WS Integrated into Oracle Marketing Cloud (Eloqua) 32

Advantage of graph database ü NoSQL database to handle BigData ü Extensibility ü No aggregate-oriented database ü Minimal information needed ü Natural way for representing connections: User - to item Item - to Item User - to User Item to Features ü Graph Based/Social Algorithms ü Graph Partitioning (sharding) ü Performance 33

Reco4 architecture stack 34

Product evolution along 3 directions Algos Model Based Memory Based Content Based Context Awareness Social Network Association Rules Composition Tools GraphDBs Apache Storm Apache Hadoop Distributed Cache Grizzly OpImizaIon/ RealTime SVD PCA Clustering Map/ Reduce Sampling 35

Algorithms roadmap Collaborative filtering: ü Memory based (Neighborhood) User/Item based Several distance algorithms (Cosine, Euclidean, Tanimoto, etc.) Graph based Path Based Similarity (Shortest Path, Number of Paths) Random Walk Similarity (Item Rank, Average first-passage/commute time) ü Model based ü Latent factor Stochastic gradient descendant Alternating least square SVD++ (by Koren) ü Association Rule Mining 36

Algorithms roadmap (cont d) Content Based: ü Latent Semantic Indexing ü Ontology Based Analysis Social recommendation: ü Trust based approach ü Probabilistic approach 37

Algorithms roadmap (cont d) Cross-cutting features (all algos) ü Binary (One class) ü Context awareness (pre- and post-filtering) ü Composability ü Real time ü Parallelization 38

Recommendation model 39

Reco4 Hadoop ü Based on Hadoop 2.x ü Leveraging yarn resource manager ü Tested on local installation and Amazon WS Elastic MR ü Tested on Oracle Big Data Appliance (VM) ü Export graph to HDFS and import result from HDFS to graph ü Algorithms: ü K-nearest neighbour ü Association rule mining ü K-means clustering 40

Reco4 in the cloud ü Recommendation as a service (RaaS) ü Reco4 cloud infrastructure will offer: Pay as you need Pay as you grow Support for burst Periodical analysis at lower costs Test/evaluate several algorithms on a reduced dataset Compose algorithms dynamically Hadoop support 41

Reco4 Console: Dashboard 42

Reco4 Console 43

Reco4 Console: Jobs 44

Reco4 Console: Graph Visualization 45

Current use cases Intelligent Performance Marketing Marketing campaign optimization Intelligent lead scoring Oracle Marketing Cloud (Eloqua) seamless integration Anti-Money Laundering User-Generated video content Gaming/Gambling industry 46

Reco4 and the Oracle stack TransacIonal Data PredicIon Data AnalyIcal Data Feedback OLTP Environments Reco4 Processing Environment Marketing Campaign Management 47

Thank you Luigi Giuri, luigi.giuri@reco4.com Alessandro Negro, alessandro.negro@reco4.com http://www.reco4.com Twitter: @reco4j References: D. Jannach et al, Recommender Systems An Introduction, Cambridge UP F. Ricci et al, Recommender Systems Handbook, Springer 48

49