Using Social Media to Drive Recommender Systems for Mobile Apps. - GRP Presenta=on - Jovian Lin (A0026542M)

Similar documents
Theo JD Bothma Department of Informa1on Science

Data Warehousing. Yeow Wei Choong Anne Laurent

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Extrac'ng People s Hobby and Interest Informa'on from Social Media Content

Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

How To Understand The Big Data Paradigm

Keeping Pace with Big Data

Collision Data Analysis, A Mul0 Dimensional Approach Presented by: Howard Sco> Needham, Sandarbh Singh

Collaborative Filtering. Radek Pelánek

How To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Engines. Stephen Shaw 18th of February, Netsoc

Network Maps for End Users: Collect, Analyze, Visualize and Communicate Network Insights with Zero Coding

Social Media Analy.cs (SMA)

Making Sense of Big Data. Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

How To Use A Webmail On A Pc Or Macodeo.Com

Big Data Use Cases. At Salesforce.com. Narayan Bharadwaj Director, Product Management

Social Media Monitoring by Using Data Mining. Fuat Basık

Pu?ng B2B Research to the Legal Test

Mobile Apps Jovian Lin, Ph.D.

IJCSES Vol.7 No.4 October 2013 pp Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

BENCHMARKING V ISUALIZATION TOOL

Fixed Scope Offering (FSO) for Oracle SRM

ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on

Nodes, Ties and Influence

Scalus A)ribute Workshop. Paris, April 14th 15th

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho

Introduction to Data Mining

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives

Machine Learning using MapReduce

Search and Information Retrieval

The Need for Training in Big Data: Experiences and Case Studies

Welcome! Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research. Andrea Heckert, PhD, MPH Program Officer, Science

Recommendation Tool Using Collaborative Filtering

DTCC Data Quality Survey Industry Report

A Brief Overview of the Mobile App Ecosystem. September 13, 2012

CMMI for High-Performance with TSP/PSP

Challenges and Opportunities in Data Mining: Personalization

Opportuni)es and Challenges of Textual Big Data for the Humani)es

B2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity

RESTful or RESTless Current State of Today's Top Web APIs

Realm of Big Data Ini0a0ves

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

CS 5150 So(ware Engineering Evalua4on and User Tes4ng

Blue Medora VMware vcenter Opera3ons Manager Management Pack for Oracle Enterprise Manager

Phone Systems Buyer s Guide

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

IT Change Management Process Training

The author(s) shown below used Federal funds provided by the U.S. Department of Justice and prepared the following final report:

Missing Data. Katyn & Elena

IT Governance in Organizations Experiencing Decentralization. Jelena Zdravkovic

Application of Supply Chain Concepts to the Analysis Process

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Scientific Report. BIDYUT KUMAR / PATRA INDIAN VTT Technical Research Centre of Finland, Finland. Raimo / Launonen. First name / Family name

User Data Analytics and Recommender System for Discovery Engine

Clustering Technique in Data Mining for Text Documents

A Web Page Prediction Model Based on Click-Stream Tree Representation of User Behavior

SDN- based Mobile Networking for Cellular Operators. Seil Jeon, Carlos Guimaraes, Rui L. Aguiar

Applying Machine Learning to Network Security Monitoring. Alex Pinto Chief Data Scien2st

Introduc)on to Hadoop

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More

Data Mining Yelp Data - Predicting rating stars from review text

CSER & emerge Consor.a EHR Working Group Collabora.on on Display and Storage of Gene.c Informa.on in Electronic Health Records

Project Management Introduc1on

The Data Mining Process

Beyond Strategy: Building Your Mobile Capabili6es

San Jacinto College Banner & Enterprise Applica5on Review Task Force Report. November 01, 2011 FINAL

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Dynamical Clustering of Personalized Web Search Results

Factorization Machines

Topic Extrac,on from Online Reviews for Classifica,on and Recommenda,on (2013) R. Dong, M. Schaal, M. P. O Mahony, B. Smyth

Ensemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/

Social Media Channels and Their Uses

Synchronous and asynchronous video conferencing tools in an online-course:! Supporting a community of inquiry!

Which universities lead and lag? Toward university rankings based on scholarly output

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Seman&c Web: Benefits For Clinical Decision Support At The Bedside. Emory Fry, MD SemTechBiz 2013

Doing Big Data Projects: What s the Best Team Process Methology?

Scalus Winter School Storage Systems

BPO. Accerela*ng Revenue Enhancements Through Sales Support Services

Can Cloud Hos+ng Providers Really Replace. Your Cri(cal IT Infrastructure?

Text Analytics. A business guide

Intinno: A Web Integrated Digital Library and Learning Content Management System

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction

Informa.on Systems in Organiza.ons

Honeycomb Crea/ve Works is financed by the European Union s European Regional Development Fund through the INTERREG IVA Cross- border Programme

Semantically Enhanced Web Personalization Approaches and Techniques

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Crunching Big Data with R And Hadoop!

Insider s Guide to Digital Media Measurement Sen5ment Analysis Symposium 2015

Transcription:

Using Social Media to Drive Recommender Systems for Mobile Apps - GRP Presenta=on - Jovian Lin (A0026542M)

Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

Invasion of the Mobile Apps Mobile apps are soaring in popularity. By 2015, mobile app development projects will outnumber na?ve PC projects by 4 to 1. Mobile devices will outnumber tradi=onal computers by 2 to 1 in a network. 85 BILLION mobile app downloads 185 BILLION by 2014

Informa=on Overload Abundance of informa?on (on the Web) Their dynamic & heterogeneous nature Increasing difficult to find what we want and in a manner that best meets our requirements. Consequence: Role of user modeling Personalized informa?on access } Crucial!!! i.e., users need a personalized support in siwing through large amounts of available informa?on, according to their interests and tastes.

App- splosion Finding a relevant app is like looking for a needle in a haystack. Three ways to discover apps: 1. Keyword search Smaller/untrusted text descrip?ons. Users intent is unclear (e.g., new ho`est games ). Users may not know how to express their query effec?vely.

1. Introduc?on 2. Related Work 3. Preliminary Work 4. Future Work App- splosion Finding a relevant app is like looking for a needle in a haystack. Three ways to discover apps: 1. Keyword search Smaller/untrusted text descrip?ons. Users intent is unclear (e.g., new ho`est games ). Users may not know how to express their query effec?vely. 2. View a list of apps (e.g., top 20 most popular apps) Scrolling through the various lists is like visi?ng an urban flea market. Not personalized.

App- splosion Finding a relevant app is like looking for a needle in a haystack. Three ways to discover apps: 1. Keyword search Smaller/untrusted text descrip?ons. Users intent is unclear (e.g., new ho`est games ). Users may not know how to express their query effec?vely. 2. View a list of apps (e.g., top 20 most popular apps) Scrolling through the various lists is like visi?ng an urban flea market. Not personalized. 3. Recommender Systems

Recommender Systems Defini=on: Recommender systems a`empt to alleviate users informa?on overload by filtering items that are not relevant to the users interests. Recommenda?on problem is defined as: Es?ma?ng the response of a user for new items based on historical informa?on stored in the system, and sugges?ng to this user novel and original items for which the predicted response is high.

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid

Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. Also called people- to- people correla?on, CF is considered to the the most popular and widely implemented technique in RS.

Content- based filtering (CBF) systems make recommenda?ons by analyzing the content of textual informa?on and finding regulari?es in the content. CBF can be seen as an extension of the work done on informa?on filtering.

The major difference between CF and CBF is: Collabora?ve filtering systems only uses user- item ra?ngs data to make predic?ons and recommenda?ons. Content- based systems rely on the features of users and items for predic?ons.

Hybrid Recommender Systems are based on the combina?on of CF and CBF. They try to avoid the limita?ons of either approach and thereby improve recommenda?on performance. E.g., a simple Hybrid RS may switch between using CF and CBF algorithms depending on the availability of user ra?ngs.

Collabora?ve Filtering (CF) Memory- based CF Model- based CF Content- based Filtering Hybrid Context- aware

Collabora?ve Filtering (CF) Memory- based CF Model- based CF Content- based Filtering Hybrid Context- aware Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

Problem: Data Sparsity In prac?ce, many commercial RS are used to evaluate very large product sets. This causes the user- item matrix to be extremely sparse, which affects the performance of recommenda?ons. To make things worse, new items or users do not have past ra?ngs. This is owen termed the cold- start problem. In the domain of apps, the number of new ra?ngs cannot keep up with the growing number of new apps. Our solution: U=lize data from the social web to drive recommender systems.

The Social Web A New Treasure Trove

Why is the Social Web Important? The Internet has reached cri?cal mass in the developed world. Most real- world rela?onships can be supported in the online world. Web 2.0 makes real-?me and online interac?ons possible. i.e., we have user- generated content (UGC). The prolifera?on of mobile devices that are connected to mobile networks is accelera?ng innova?on, and are further enabling real-?me services and networks.

Our Research Build recommender systems for App Stores. Predict unknown ra?ngs for apps (especially new apps). i.e., tackle the issue of cold- start. Use real-?me, social informa?on to drive recommenda?ons. Use contextual cues (e.g., loca?on,?me, public events, weather) to rank personalized recommenda?ons.

Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Collabora=ve Filtering Introduc?on Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. E.g., if Alice and Bob both like Item X, and Alice likes Item Y, then Bob is more likely to like Item Y.

Collabora=ve Filtering Introduc?on Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. Mark 5/5 4/5 4/5? 3/5? Sergey 4/5 4/5 5/5 3/5

Collabora=ve Filtering Introduc?on Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. CF algorithms are based on the quality of items as evaluated by peers, instead of relying on content (which may be a bad indicator of quality). E.g., fake apps in App Stores. Unlike content- based systems, CF systems can recommend items with very different content as long as other users have already shown interest for these different items.

Collabora=ve Filtering Advantages & Disadvantages Advantages Doesn t require content especially useful in domains where content analysis is difficult or costly. Doesn t require domain knowledge independent of content; only need ra?ngs (or any other informa?on about users preferences). Able to find novel items unlike content- based filtering, the recommended items may be dissimilar in content. Disadvantages Cold- start problem When new users or items enter the system, they have no ra?ngs. As a result, the system cannot generate any recommenda?ons. Data sparsity Even awer acquiring more ra?ngs from the users, sparsity of the user- item matrix can s?ll be a problem for CF.

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Memory- based Collabora=ve Filtering Memory- based CF Introduc?on The user- item ra?ngs stored in the system are directly used to predict ra?ngs for new items. Two approaches user- based and item- based. User- based approaches evaluate the interest of a user u for an item i using ra?ngs for this item by other users, called neighbors, that have similar ra?ng pa`erns. Item- based approaches predict the ra?ng of a user u for an item i based on the ra?ngs of u for items similar to i.

Memory- based Collabora=ve Filtering Introduc?on Memory- based CF User- based approach look at the rows. Item- based approach look at the columns.

Memory- based Collabora=ve Filtering Introduc?on The similarity between two items is dependent upon the ra?ngs given to the items by users who have rated both of them. Users Items Item- item similarity is computed by looking at co- rated items only. Based on the ra?ngs, we calculate the similarity between two items. In the case of items i and j, the similarity s ij is computed by looking into them. Popular similarity measures include the (i) Pearson correla?on- based similarity and the (ii) adjusted cosine similarity. E.g.,

Memory- based Collabora=ve Filtering Introduc?on Once we can calculate the similarity between items, we can predict the ra?ng by using the idea of weighted sum. With the predicted ra?ngs, Top- N recommenda?ons are easily generated.

Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms

Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms 1) Default vo=ng In many CFs, pairwise similarity is computed only from ra?ngs in the intersec?on of the items that both users have rated. Focusing on intersec?on set similarity neglects the global ra?ng behavior reflected in a user s en?re ra?ng history. Default vo?ng: 1. Use the average of the clique (or small group) as default vo?ng to extend each user s ra?ng history. 2. Use neutral or (somewhat) nega?ve preference for the unobserved ra?ngs.

Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms 2) Inverse User Frequency Idea: Universally liked items are not as useful in capturing similarity as less common items. The inverse frequency is defined as: f j = log ( n / n j ) Total no. of users No. of users who rated Item j If everyone has rated item j, then f j is zero.

Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms 3) Imputa=on Idea: Fill in missing ra?ng and make the user- item ra?ngs matrix dense. Such as using the average ra?ngs for user and item. However: 1. Imputa?on can be very expensive as it significantly increases the amount of data. 2. Inaccurate imputa?on might distort data.

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Model- based CF Model- based Collabora=ve Filtering Model- based approaches use the ra?ngs to learn a predic?ve model. General idea model the user- item interac?ons with factors represen?ng latent characteris?cs of the users and items in the system. This model is then trained using the available data, and later used to predict ra?ngs of users for new items. Model- based approaches for recommending are numerous. E.g., Bayesian Networks Clustering Latent Seman?c Analysis (LSA) Latent Dirichlet Alloca?on (LDA) Support Vector Machines (SVM) Introduc?on Singular Vector Decomposi?on (SVD) / Matrix factoriza?on

1) Clustering Model- based Collabora=ve Filtering Techniques used in Model- based CF A cluster is a collec?on of data objects that are similar to one another within the same cluster; and are dissimilar to the objects in other clusters. Clustering can be classified into 3 categories: 1. Par??oning methods 2. Hierarchical methods 3. Density- based In most situa?ons, clustering is an intermediate step and the resul?ng clusters are used for further analysis. Clustering can be applied in many ways. For example, Sarwar et al. used clustering to par??on data into clusters, and use a memory- based CF algorithm such as the Pearson- correla?on to make predic?ons for each cluster.

1) Clustering Model- based Collabora=ve Filtering Techniques used in Model- based CF Advantages Be`er scalability than typical CF methods as they make predic?ons within much smaller clusters rather than the en?re database. Clustering computa?on can be run offline. Disadvantages Recommenda?on quality is generally low.

Model- based Collabora=ve Filtering Techniques used in Model- based CF 2) Latent Seman=c CF Models A latent seman?c CF technique relies on a sta?s?cal modeling technique that introduces latent class variables in a mixture model serng. This allows it to discover user communi?es and prototypical interest profiles. Conceptually it decomposes user preferences using overlapping user communi?es. It has higher accuracy and scalability then standard memory- based CF. E.g., the aspect model by Hoffman & Puzicha a probabilis?c latent- space model which models individual ra?ngs as a convex combina?on of ra?ng factors.

Model- based Collabora=ve Filtering Techniques used in Model- based CF 3) Matrix Factoriza=on Map both users & items to a joint latent factor space of dimensionality f. User- item interac?ons are modeled as inner products in that space. Each item i is associated with a vector q i while each user u is associated with a vector p u. q i measures the extent to which the item possesses those factors. p u measures the extent of interest the user has for the items. The resul?ng dot product q it p u captures the interac?on between user u and item i i.e., the user s overall interest in the item s characteris?cs. The es?mate of user u s ra?ng for item i: r ui = q it p u

Model- based Collabora=ve Filtering Techniques used in Model- based CF 3) Matrix Factoriza=on Capture the latent rela?onships between users and items. Use SVD to factorize the ra?ngs matrix R, obtaining Q, S, and P. i.e., R = QSP T Reduce the matrix S (a diagonal matrix) to dimension k. This produces a low- dimensional representa?on of the original ra?ng matrix. Compute two resultant matrices: 1. Q k S k (q T ) 2. S k P k (p) The resultant matrices can be used to compute the recommenda?on score for any user and item. To predict a ra?ng, calculate the dot product of the i th row of q and u th column of p i.e., r ui = q it p u

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

Content- based Filtering Introduc?on Systems implemen?ng a content- based recommenda?on approach: 1. Analyze a set of documents and/or descrip?ons of previously- rated items (by a user). 2. Build a model or profile of user interests based on the features of the objects rated by that user. The profile is a structured representa?on of user interests, adopted to recommend new interes?ng items. The recommenda?on process consists in matching up the a`ributes of the user profile against the a`ributes of a content object. Result: a relevance judgment that represents the user s level of interest in that object.

Structured Item Representa?on Represented Items CONTENT ANALYZER Content- based Filtering User A training examples Item Descrip?ons Introduc?on PROFILE LEARNER Profiles User A Profile Implicit or explicit feedback Feedback Ac=ve User A User A feedback Item Descrip?ons User A Profile Informa=on Source FILTERING COMPONENT List of recommenda?ons

Advantages Content- based Filtering Advantages & Disadvantages User independence unlike CF, does not depend on other users. Transparency explana?ons can be provided by lis?ng content features. New Item can recommend items that have not received ra?ngs. Disadvantages Limited content analysis may not be sufficient to define dis?nguishing aspects of items that turn out to be necessary for the elicita?on of user interests. Over specula=on/serendipity problem recommenda?ons have limited degree of novelty. New user a new user with no given ra?ngs.

Content- based Filtering State of the art CBF systems Here, we describe alterna?ve item representa?on techniques, as well as recommenda?on algorithms suitable for the described representa?ons. In most CBF systems, item descrip?ons are textual features. Textual features create a number of complica?ons when learning a user profile. This is due to the natural language ambiguity. Polysemy the presence of mul?ple meanings for one word. Synonymy mul?ple words with same meaning. Seman?c analysis and its integra?on in personaliza?on models is an innova?ve approach to solve these problems. Key idea: obtain a seman?c interpreta?on of the user informa?on needs.

Content- based Filtering State of the art CBF systems Here, we describe alterna?ve item representa?on techniques, as well as recommenda?on algorithms suitable for the described representa?ons. In most CBF systems, item descrip?ons are textual features. Textual features create a number of complica?ons when learning a user profile. This is due to the natural language ambiguity. Polysemy the presence of mul?ple meanings for one word. Synonymy mul?ple words with same meaning. Seman?c analysis and its integra?on in personaliza?on models is an innova?ve approach to solve these problems. Key idea: obtain a seman?c interpreta?on of the user informa?on needs.

Content- based Filtering State of the art CBF systems Keyword- based Vector Space Model (VSM) Most CBF systems use simple retrieval models or VSM with basic TF- IDF weigh?ng. Each document is represented by a vector in a n- dimensional space. Each dimension corresponds to a term from the overall vocabulary. T = {t 1, t 2,, t n } represents the overall vocabulary (aka dic?onary). T is obtained by applying standard NLP opera?ons, e.g., tokeniza?on, stop- words removal, and stemming. d j = {w 1j, w 2j,, w nj }, where w kj is the weight for term t k in document d j. TF- IDF is the most common weigh?ng scheme: Rare items are not less relevant than frequent terms (IDF assump?on); Mul?ple occurrence of a term in a document are not less relevant than single occurrences (TF assump?on); Long documents are not preferred to short documents (normaliza?on assump?on).

Content- based Filtering State of the art CBF systems Keyword- based Vector Space Model (VSM) Most CBF systems use simple retrieval models or VSM with basic TF- IDF weigh?ng. Each document is represented by a vector in a n- dimensional space. To measure the closeness between 2 documents, we use the cosine similarity measure. In CBF, both user profiles and items are represented as weighted term vectors.

Some unique keyword- based systems: Content- based Filtering State of the art CBF systems Incorporate a mechanism for temporal decay, i.e., the system ages the interest as expressed by the user. Maintain a separate interest profile for a few different topics, e.g., Na?onal, World, Business, etc. In YourNews, The user interest profile for each topic is represented as a weighted prototype term vector extracted from the user s news view history. Having short- term and long- term models. In NewsDude, it learns a short- term user model based on TF- IDF, and a long- term model based on a naïve Bayesian classifier. For domains that are not inherently text- based (e.g., movies): INTIMATE and Movies2GO use movie synopses. FOAFing the Music u?lizes user profiles, music- related RSS feeds, content- based descrip?ons extracted from the audio itself.

Content- based Filtering State of the art CBF systems Unfortunately, when more advanced characteris?cs are required, keyword- based approaches show their limita?ons. E.g., French impressionism, keyword- based approaches may find documents containing French and impressionism. Documents about Claude Monet will not appear in the recommenda?on.

Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Possible ways: Ontologies provide RS with the cultural and linguis?c background.

Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Possible ways: Encyclopedic Knowledge Sources Explicit Seman?c Analysis (ESA), a technique able to provide a fine- grained seman?c representa?on of natural language texts in a high- dimensional space of natural (and comprehensible) concepts derived from Wikipedia. Inspired by the desire to augment text representa?on with massive amounts of world knowledge. In fact, Wikipedia is used to es?mate similarity between movies, in order to provide more accurate predic?ons of the Newlix Prize compe??on.

Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Possible ways: Topic Models topic modeling algorithms are used to discover a set of topics from a large collec?on of documents. A topic is a distribu?on over terms that is biased around those associated under a single theme. Topic models provide an interpretable low- dimensional representa?on of the documents. Documents are represented as a distribu?on of topics.

Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Example: MobileWalla MobileWalla (MW) is an independent, unbiased search engine for mobile apps with seman.c search capabili?es. It has an objec?ve app ra?ng and scoring mechanism based on user and developer involvement with an app. Such scoring mechanism enables MW to provide a number of other ways to discover apps such as dynamically maintained hot lists and fast rising lists.

U?lizing user generated content (UGC) in the recommenda?on process. Web 2.0 Folksonomy taxonomy generated by users who collabora?vely annotate and categorize resources of interests. Hashtags Tagging Content- based Filtering Trends and Future Research

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

Hybrid Recommender Systems Introduc?on In order to cope with data sparsity, it is essen?al to resort to external sources of informa?on. This is mandatory when dealing with the cold- start problem of new users and/or new items. Why? Because the absence of ra?ng hinders the possibility of using CF techniques that rely exclusively on ra?ng informa?on.

Combining CF and CBF: Hybrid Recommender Systems Techniques used in Hybrid Systems Fab recommender by Balabonovic & Shoham: Maintains user profiles of interest in Web pages using content- based techniques. Uses CF techniques to iden?fy profiles with similar tastes. Filterbots: Filterbots act as ar?ficial users using certain criteria. A jazzbot will give full marks to a CD because it is in the jazz category. A figh?ng- game- bot will give full marks for an ios app Street Fighte 4 Turbo. Ra?ngs generated by bots are injected into the user- item matrix. Standard CF algorithms are applied to generate recommenda?ons. Similar Imputed Neighborhood Based Collabora?ve Filtering

Hybrid Recommender Systems Techniques used in Hybrid Systems Using external sources Emo?ons Handling Data Sparsity in CF using Emo?on and Seman?c Based Features by Yashar Moshfeghi & Joemon Jose Use a combina?on item- related emo?ons, seman?c data, and LDA to recommend movies. Profiles from Social Web Liu et al. captured and mapped profiles of social web services to a Taste Fabric using ontologies of books, music, movies, etc. These profiles can be used as pseudo users (something like Filterbots). Tags (#hashtags), UGC (blogs, tweets), wri`en reviews (IMDB, blog comments, etc). A number of work has been done to u?lize the structure of follower/followee rela?onship on Twi`er, together with the textual content of their tweets, to find similar users. Context informa?on Ra8ng = R(User, Item, Context) For e.g., movie recommenders can use addi?onal data such as?me, place, and company (i.e., gf, bf, siblings). Techniques that help incorporate context include (i) Markov Chain Monte Carlo (MCMC) techniques, (ii) SVMs, and (iii) Factoriza?on Machines (FMs).

Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

Preliminary Work (CIKM 12) Predic=ng Ra=ngs for New Mobile Apps by Combining Collabora=ve Filtering & Topic Modeling We propose a method that mi?gates the cold- start problem by combining collabora?ve filtering and topic modeling. To predict the ra?ng of an item for a given user Our approach learns a model that can correlate similar apps (based on user ra?ngs alone) with mul?- faceted content (such as descrip?ons, categories, price, and company informa?on of apps). U?lize: (i) Clustering, and (ii) a supervised variant of LDA.

Step 1 i ii Calculate similari=es between exis?ng apps and generate sow clusters. iii iv Cluster 2 Cluster 1 Cluster 3 i) Similarities between apps (shown as nodes) are calculated based on user ratings (i.e., memory-based collaborative filtering). ii) Apps are clustered based on the calculated similarity scores. iii) Soft clustering allows an app to be assigned to more than one cluster. iv) Eventually, each cluster is labeled with a cluster ID.

Step 2 Use soh- clustered informa=on and app categories as labels in Labeled LDA to generate a probability distribu=on of labels for each app. [ Cluster ID ] Cluster 1 Cluster 2 Cluster K Business Item Descrip?ons Labeled LDA [ Apps ] Facebook Instagram Games Twi`er Weather We merge the set of cluster IDs (e.g., Cluster 1, Cluster 2) and the set of categorical labels of apps (e.g., Business, Games ) to form a new set of labels, S labels. The set of new labels S labels and the textual descriptions of items are used as inputs to Labeled LDA. Labeled LDA allows us to represent each item as a probability distribution of topics (or labels).

Step 3 Create scalable neighborhoods using incremental clustering. Predict ra?ngs for new apps. We calculate the similarity between apps based on each app s probability distribution (from Laballed LDA), and form clusters based on the computed similarity scores between the apps. When a new app (shown as the square) arrives, the neighborhood for the new app is selected by looking into the cluster that it is closest to. The predicted rating of the new app is then calculated based on the neighborhood of apps.

Preliminary Work (CIKM 12) Results We created a hybrid recommender system by using: Content- independent labels (generated through CF technique), Item metadata (content), and Topic modeling.

Preliminary Work (CIKM 12) Discussion Apps and Movies are different. A ra?ng of 1 for a movie probably means that it is bad; but a ra?ng of 1 for an app could be due to it s crash- prone nature, and NOT it s content. A ra?ng on the App Store (good or bad) indicates that the user took the effort to download the app. Perhaps: instead of using 1 5 ra?ngs, we should instead use unary ra?ngs. Unlike movies, apps are constantly evolving. Each versioning may: Add a new feature (e.g., Re?na display) Fix a bug (e.g., make it compa?ble with ios5) If we focus on this unique traits of apps, we could come out with something novel.

Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

Future Work Our research is mo?vated by: the availability of real-?me (social) web data, and the type of UGC to drive recommender systems. We observed that: When a new app is freshly developed and released, it tends to have no ra?ngs for a period of?me. However, we can almost never fail to find tweets about newly released mobile applica?ons on Twi`er. i.e, the number of Tweets about an app is generally much more than the available ra?ngs or reviews on the official App Store. Hence, an interes?ng way to generate item ra?ngs is to use the sen?ments of the wri`en tweets of verified Twi`er user accounts to predict would- be ra?ngs for a new mobile applica?on.

Future Work Consider the following scenario: 1. A user hears about a new app, say Furious Pigs for the iphone and ipad that costs $0.99. 2. The user does not know whether it is worth buying it, and signs into the App Store. 3. However, he realizes that there are no ra?ngs for the app, which is natural as the new app just entered the App Store not too long ago. 4. The user then checks into Twi`er, and searches for the term Furious Pigs. 5. Twi`er processes the user s query, and returns a list of Tweets of other users who have men?oned Furious Pigs. 6. The user reads the tweets. 7. He also no?ces that one of the tweets happens to provide a link to a blog that has reviewed the Furious Pigs app before. He clicks on the link, and reads the blog entry about the review for Furious Pigs. 8. He also no?ces that a local celebrity has tweeted about Furious Pigs. 9. AWer reading through the tweets and blog post, the user finds that the overall sen?ment of the app is rela?vely good. 10. He then downloads the app Furious Pigs onto his iphone and ipad.

Future Work The scenario illustrates the following points: 1. When there are insufficient ra?ngs or reviews about apps at the App Store, we can s?ll rely on tweets to receive app- related informa?on. (see next slide)

Future Work The scenario illustrates the following points: 1. When there are insufficient ra?ngs or reviews about apps at the App Store, we can s?ll rely on tweets to receive app- related informa?on. 2. Tweets (for apps) have a shorter delay or lag?me, as compare to ra?ngs for apps. 3. As Twi`er is focused on driving discovery outward to web pages (or even YouTube videos), there is a chance that we can find even more focused content about an app from a tweet s hyperlink. 4. Every Twi`er user has a certain credibility score or rank. When a popular person (say, Barack Obama) endorses the app Furious Pigs, there is a high chance that the Furious Pigs app will be have an increase in downloads.

Future Work We want to automate this process of using tweets to enhance personalized recommenda?ons to users.

Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 1. Disambigua?on of proper names on Twi`er. 2. Twi`er credibility measurement. 3. Apply Sen?ment Analysis on Twi`er 4. Mapping Twi`er profiles to user profiles in the App Store.

Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 1. Disambigua?on of proper names on Twi`er. Naming conflicts arise from seman?c overloading of en?ty names. For example, when trying to search for tweets discussing the Facebook iphone app, we discovered that Facebook is overloaded it could refer to both the app or the website (h`p://www.facebook.com/). Therefore, we need a strategy to reliably extract twi`er posts that are related to specific apps, overcoming issues of naming conflicts.

Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 2. Twi`er credibility measurement. Not all content posted on Twi`er is trustworthy or useful in providing informa?on about the query. It is important to predict the credibility of informa?on in a tweet. Gupta & Kumaraguru adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets according to their credibility score. Weng et al. made use of the follower and followee rela?onships in Twi`er, and applied an extension of the PageRank algorithm to measure the influence of users in Twi`er.

Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 3. Apply Sen?ment Analysis on Twi`er The problem with general sen?ment analysis algorithms is that most algorithms use simple terms to express sen?ment about a product or service. However, cultural factors (including Web culture), their related linguis?c nuances, and differing contexts make it extremely difficult to turn a string of wri`en text into a posi?ve or nega?ve sen?ment. Therefore, in order to determine the sen?ment of tweets within Twi`er and the app domain, we will have to learn a model that is unique, which will predict sen?ment scores for new tweets about new apps. To do so, we will need to build and evaluate machine learning algorithms that take in both (i) exis?ng apps and their corresponding numerical ra?ngs, and (ii) exis?ng tweets and the words used in the tweets, and learns a mapping between ra?ng scores and words. That way, when a new tweet about a new app is men?oned, a ra?ng for the new app can be predicted.

Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 4. Mapping Twi`er profiles to user profiles in the App Store. Unlike Twi`er profiles, user profiles in the App Store are not as ac?ve; in fact, based on our findings, an average Apple App Store user rates between 3 to 10 apps only. In order to produce personalized recommenda?ons (that are driven by the Social Web) to these exis?ng users in the app store, we will need to find a method for mapping Twi`er profiles to the user profiles in the App Store. When a Twi`er user posts something posi?ve about a new app, our recommender system would then be able to recommend that new app to the exis?ng users (in the App Store) who share a similar profile to the Twi`er user.

Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

Conclusion Build recommender systems for App Stores. Predict unknown ra?ngs for apps (especially new apps). i.e., tackle the issue of cold- start. Use real-?me, social informa?on to drive recommenda?ons. Use contextual cues (e.g., loca?on,?me, public events, weather) to rank personalized recommenda?ons.

Thank You

Q & A