Using Social Media to Drive Recommender Systems for Mobile Apps - GRP Presenta=on - Jovian Lin (A0026542M)
Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques
Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques
Invasion of the Mobile Apps Mobile apps are soaring in popularity. By 2015, mobile app development projects will outnumber na?ve PC projects by 4 to 1. Mobile devices will outnumber tradi=onal computers by 2 to 1 in a network. 85 BILLION mobile app downloads 185 BILLION by 2014
Informa=on Overload Abundance of informa?on (on the Web) Their dynamic & heterogeneous nature Increasing difficult to find what we want and in a manner that best meets our requirements. Consequence: Role of user modeling Personalized informa?on access } Crucial!!! i.e., users need a personalized support in siwing through large amounts of available informa?on, according to their interests and tastes.
App- splosion Finding a relevant app is like looking for a needle in a haystack. Three ways to discover apps: 1. Keyword search Smaller/untrusted text descrip?ons. Users intent is unclear (e.g., new ho`est games ). Users may not know how to express their query effec?vely.
1. Introduc?on 2. Related Work 3. Preliminary Work 4. Future Work App- splosion Finding a relevant app is like looking for a needle in a haystack. Three ways to discover apps: 1. Keyword search Smaller/untrusted text descrip?ons. Users intent is unclear (e.g., new ho`est games ). Users may not know how to express their query effec?vely. 2. View a list of apps (e.g., top 20 most popular apps) Scrolling through the various lists is like visi?ng an urban flea market. Not personalized.
App- splosion Finding a relevant app is like looking for a needle in a haystack. Three ways to discover apps: 1. Keyword search Smaller/untrusted text descrip?ons. Users intent is unclear (e.g., new ho`est games ). Users may not know how to express their query effec?vely. 2. View a list of apps (e.g., top 20 most popular apps) Scrolling through the various lists is like visi?ng an urban flea market. Not personalized. 3. Recommender Systems
Recommender Systems Defini=on: Recommender systems a`empt to alleviate users informa?on overload by filtering items that are not relevant to the users interests. Recommenda?on problem is defined as: Es?ma?ng the response of a user for new items based on historical informa?on stored in the system, and sugges?ng to this user novel and original items for which the predicted response is high.
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid
Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. Also called people- to- people correla?on, CF is considered to the the most popular and widely implemented technique in RS.
Content- based filtering (CBF) systems make recommenda?ons by analyzing the content of textual informa?on and finding regulari?es in the content. CBF can be seen as an extension of the work done on informa?on filtering.
The major difference between CF and CBF is: Collabora?ve filtering systems only uses user- item ra?ngs data to make predic?ons and recommenda?ons. Content- based systems rely on the features of users and items for predic?ons.
Hybrid Recommender Systems are based on the combina?on of CF and CBF. They try to avoid the limita?ons of either approach and thereby improve recommenda?on performance. E.g., a simple Hybrid RS may switch between using CF and CBF algorithms depending on the availability of user ra?ngs.
Collabora?ve Filtering (CF) Memory- based CF Model- based CF Content- based Filtering Hybrid Context- aware
Collabora?ve Filtering (CF) Memory- based CF Model- based CF Content- based Filtering Hybrid Context- aware Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based
Problem: Data Sparsity In prac?ce, many commercial RS are used to evaluate very large product sets. This causes the user- item matrix to be extremely sparse, which affects the performance of recommenda?ons. To make things worse, new items or users do not have past ra?ngs. This is owen termed the cold- start problem. In the domain of apps, the number of new ra?ngs cannot keep up with the growing number of new apps. Our solution: U=lize data from the social web to drive recommender systems.
The Social Web A New Treasure Trove
Why is the Social Web Important? The Internet has reached cri?cal mass in the developed world. Most real- world rela?onships can be supported in the online world. Web 2.0 makes real-?me and online interac?ons possible. i.e., we have user- generated content (UGC). The prolifera?on of mobile devices that are connected to mobile networks is accelera?ng innova?on, and are further enabling real-?me services and networks.
Our Research Build recommender systems for App Stores. Predict unknown ra?ngs for apps (especially new apps). i.e., tackle the issue of cold- start. Use real-?me, social informa?on to drive recommenda?ons. Use contextual cues (e.g., loca?on,?me, public events, weather) to rank personalized recommenda?ons.
Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques
Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Collabora=ve Filtering Introduc?on Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. E.g., if Alice and Bob both like Item X, and Alice likes Item Y, then Bob is more likely to like Item Y.
Collabora=ve Filtering Introduc?on Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. Mark 5/5 4/5 4/5? 3/5? Sergey 4/5 4/5 5/5 3/5
Collabora=ve Filtering Introduc?on Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. CF algorithms are based on the quality of items as evaluated by peers, instead of relying on content (which may be a bad indicator of quality). E.g., fake apps in App Stores. Unlike content- based systems, CF systems can recommend items with very different content as long as other users have already shown interest for these different items.
Collabora=ve Filtering Advantages & Disadvantages Advantages Doesn t require content especially useful in domains where content analysis is difficult or costly. Doesn t require domain knowledge independent of content; only need ra?ngs (or any other informa?on about users preferences). Able to find novel items unlike content- based filtering, the recommended items may be dissimilar in content. Disadvantages Cold- start problem When new users or items enter the system, they have no ra?ngs. As a result, the system cannot generate any recommenda?ons. Data sparsity Even awer acquiring more ra?ngs from the users, sparsity of the user- item matrix can s?ll be a problem for CF.
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Memory- based Collabora=ve Filtering Memory- based CF Introduc?on The user- item ra?ngs stored in the system are directly used to predict ra?ngs for new items. Two approaches user- based and item- based. User- based approaches evaluate the interest of a user u for an item i using ra?ngs for this item by other users, called neighbors, that have similar ra?ng pa`erns. Item- based approaches predict the ra?ng of a user u for an item i based on the ra?ngs of u for items similar to i.
Memory- based Collabora=ve Filtering Introduc?on Memory- based CF User- based approach look at the rows. Item- based approach look at the columns.
Memory- based Collabora=ve Filtering Introduc?on The similarity between two items is dependent upon the ra?ngs given to the items by users who have rated both of them. Users Items Item- item similarity is computed by looking at co- rated items only. Based on the ra?ngs, we calculate the similarity between two items. In the case of items i and j, the similarity s ij is computed by looking into them. Popular similarity measures include the (i) Pearson correla?on- based similarity and the (ii) adjusted cosine similarity. E.g.,
Memory- based Collabora=ve Filtering Introduc?on Once we can calculate the similarity between items, we can predict the ra?ng by using the idea of weighted sum. With the predicted ra?ngs, Top- N recommenda?ons are easily generated.
Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms
Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms 1) Default vo=ng In many CFs, pairwise similarity is computed only from ra?ngs in the intersec?on of the items that both users have rated. Focusing on intersec?on set similarity neglects the global ra?ng behavior reflected in a user s en?re ra?ng history. Default vo?ng: 1. Use the average of the clique (or small group) as default vo?ng to extend each user s ra?ng history. 2. Use neutral or (somewhat) nega?ve preference for the unobserved ra?ngs.
Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms 2) Inverse User Frequency Idea: Universally liked items are not as useful in capturing similarity as less common items. The inverse frequency is defined as: f j = log ( n / n j ) Total no. of users No. of users who rated Item j If everyone has rated item j, then f j is zero.
Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms 3) Imputa=on Idea: Fill in missing ra?ng and make the user- item ra?ngs matrix dense. Such as using the average ra?ngs for user and item. However: 1. Imputa?on can be very expensive as it significantly increases the amount of data. 2. Inaccurate imputa?on might distort data.
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Model- based CF Model- based Collabora=ve Filtering Model- based approaches use the ra?ngs to learn a predic?ve model. General idea model the user- item interac?ons with factors represen?ng latent characteris?cs of the users and items in the system. This model is then trained using the available data, and later used to predict ra?ngs of users for new items. Model- based approaches for recommending are numerous. E.g., Bayesian Networks Clustering Latent Seman?c Analysis (LSA) Latent Dirichlet Alloca?on (LDA) Support Vector Machines (SVM) Introduc?on Singular Vector Decomposi?on (SVD) / Matrix factoriza?on
1) Clustering Model- based Collabora=ve Filtering Techniques used in Model- based CF A cluster is a collec?on of data objects that are similar to one another within the same cluster; and are dissimilar to the objects in other clusters. Clustering can be classified into 3 categories: 1. Par??oning methods 2. Hierarchical methods 3. Density- based In most situa?ons, clustering is an intermediate step and the resul?ng clusters are used for further analysis. Clustering can be applied in many ways. For example, Sarwar et al. used clustering to par??on data into clusters, and use a memory- based CF algorithm such as the Pearson- correla?on to make predic?ons for each cluster.
1) Clustering Model- based Collabora=ve Filtering Techniques used in Model- based CF Advantages Be`er scalability than typical CF methods as they make predic?ons within much smaller clusters rather than the en?re database. Clustering computa?on can be run offline. Disadvantages Recommenda?on quality is generally low.
Model- based Collabora=ve Filtering Techniques used in Model- based CF 2) Latent Seman=c CF Models A latent seman?c CF technique relies on a sta?s?cal modeling technique that introduces latent class variables in a mixture model serng. This allows it to discover user communi?es and prototypical interest profiles. Conceptually it decomposes user preferences using overlapping user communi?es. It has higher accuracy and scalability then standard memory- based CF. E.g., the aspect model by Hoffman & Puzicha a probabilis?c latent- space model which models individual ra?ngs as a convex combina?on of ra?ng factors.
Model- based Collabora=ve Filtering Techniques used in Model- based CF 3) Matrix Factoriza=on Map both users & items to a joint latent factor space of dimensionality f. User- item interac?ons are modeled as inner products in that space. Each item i is associated with a vector q i while each user u is associated with a vector p u. q i measures the extent to which the item possesses those factors. p u measures the extent of interest the user has for the items. The resul?ng dot product q it p u captures the interac?on between user u and item i i.e., the user s overall interest in the item s characteris?cs. The es?mate of user u s ra?ng for item i: r ui = q it p u
Model- based Collabora=ve Filtering Techniques used in Model- based CF 3) Matrix Factoriza=on Capture the latent rela?onships between users and items. Use SVD to factorize the ra?ngs matrix R, obtaining Q, S, and P. i.e., R = QSP T Reduce the matrix S (a diagonal matrix) to dimension k. This produces a low- dimensional representa?on of the original ra?ng matrix. Compute two resultant matrices: 1. Q k S k (q T ) 2. S k P k (p) The resultant matrices can be used to compute the recommenda?on score for any user and item. To predict a ra?ng, calculate the dot product of the i th row of q and u th column of p i.e., r ui = q it p u
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based
Content- based Filtering Introduc?on Systems implemen?ng a content- based recommenda?on approach: 1. Analyze a set of documents and/or descrip?ons of previously- rated items (by a user). 2. Build a model or profile of user interests based on the features of the objects rated by that user. The profile is a structured representa?on of user interests, adopted to recommend new interes?ng items. The recommenda?on process consists in matching up the a`ributes of the user profile against the a`ributes of a content object. Result: a relevance judgment that represents the user s level of interest in that object.
Structured Item Representa?on Represented Items CONTENT ANALYZER Content- based Filtering User A training examples Item Descrip?ons Introduc?on PROFILE LEARNER Profiles User A Profile Implicit or explicit feedback Feedback Ac=ve User A User A feedback Item Descrip?ons User A Profile Informa=on Source FILTERING COMPONENT List of recommenda?ons
Advantages Content- based Filtering Advantages & Disadvantages User independence unlike CF, does not depend on other users. Transparency explana?ons can be provided by lis?ng content features. New Item can recommend items that have not received ra?ngs. Disadvantages Limited content analysis may not be sufficient to define dis?nguishing aspects of items that turn out to be necessary for the elicita?on of user interests. Over specula=on/serendipity problem recommenda?ons have limited degree of novelty. New user a new user with no given ra?ngs.
Content- based Filtering State of the art CBF systems Here, we describe alterna?ve item representa?on techniques, as well as recommenda?on algorithms suitable for the described representa?ons. In most CBF systems, item descrip?ons are textual features. Textual features create a number of complica?ons when learning a user profile. This is due to the natural language ambiguity. Polysemy the presence of mul?ple meanings for one word. Synonymy mul?ple words with same meaning. Seman?c analysis and its integra?on in personaliza?on models is an innova?ve approach to solve these problems. Key idea: obtain a seman?c interpreta?on of the user informa?on needs.
Content- based Filtering State of the art CBF systems Here, we describe alterna?ve item representa?on techniques, as well as recommenda?on algorithms suitable for the described representa?ons. In most CBF systems, item descrip?ons are textual features. Textual features create a number of complica?ons when learning a user profile. This is due to the natural language ambiguity. Polysemy the presence of mul?ple meanings for one word. Synonymy mul?ple words with same meaning. Seman?c analysis and its integra?on in personaliza?on models is an innova?ve approach to solve these problems. Key idea: obtain a seman?c interpreta?on of the user informa?on needs.
Content- based Filtering State of the art CBF systems Keyword- based Vector Space Model (VSM) Most CBF systems use simple retrieval models or VSM with basic TF- IDF weigh?ng. Each document is represented by a vector in a n- dimensional space. Each dimension corresponds to a term from the overall vocabulary. T = {t 1, t 2,, t n } represents the overall vocabulary (aka dic?onary). T is obtained by applying standard NLP opera?ons, e.g., tokeniza?on, stop- words removal, and stemming. d j = {w 1j, w 2j,, w nj }, where w kj is the weight for term t k in document d j. TF- IDF is the most common weigh?ng scheme: Rare items are not less relevant than frequent terms (IDF assump?on); Mul?ple occurrence of a term in a document are not less relevant than single occurrences (TF assump?on); Long documents are not preferred to short documents (normaliza?on assump?on).
Content- based Filtering State of the art CBF systems Keyword- based Vector Space Model (VSM) Most CBF systems use simple retrieval models or VSM with basic TF- IDF weigh?ng. Each document is represented by a vector in a n- dimensional space. To measure the closeness between 2 documents, we use the cosine similarity measure. In CBF, both user profiles and items are represented as weighted term vectors.
Some unique keyword- based systems: Content- based Filtering State of the art CBF systems Incorporate a mechanism for temporal decay, i.e., the system ages the interest as expressed by the user. Maintain a separate interest profile for a few different topics, e.g., Na?onal, World, Business, etc. In YourNews, The user interest profile for each topic is represented as a weighted prototype term vector extracted from the user s news view history. Having short- term and long- term models. In NewsDude, it learns a short- term user model based on TF- IDF, and a long- term model based on a naïve Bayesian classifier. For domains that are not inherently text- based (e.g., movies): INTIMATE and Movies2GO use movie synopses. FOAFing the Music u?lizes user profiles, music- related RSS feeds, content- based descrip?ons extracted from the audio itself.
Content- based Filtering State of the art CBF systems Unfortunately, when more advanced characteris?cs are required, keyword- based approaches show their limita?ons. E.g., French impressionism, keyword- based approaches may find documents containing French and impressionism. Documents about Claude Monet will not appear in the recommenda?on.
Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Possible ways: Ontologies provide RS with the cultural and linguis?c background.
Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Possible ways: Encyclopedic Knowledge Sources Explicit Seman?c Analysis (ESA), a technique able to provide a fine- grained seman?c representa?on of natural language texts in a high- dimensional space of natural (and comprehensible) concepts derived from Wikipedia. Inspired by the desire to augment text representa?on with massive amounts of world knowledge. In fact, Wikipedia is used to es?mate similarity between movies, in order to provide more accurate predic?ons of the Newlix Prize compe??on.
Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Possible ways: Topic Models topic modeling algorithms are used to discover a set of topics from a large collec?on of documents. A topic is a distribu?on over terms that is biased around those associated under a single theme. Topic models provide an interpretable low- dimensional representa?on of the documents. Documents are represented as a distribu?on of topics.
Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Example: MobileWalla MobileWalla (MW) is an independent, unbiased search engine for mobile apps with seman.c search capabili?es. It has an objec?ve app ra?ng and scoring mechanism based on user and developer involvement with an app. Such scoring mechanism enables MW to provide a number of other ways to discover apps such as dynamically maintained hot lists and fast rising lists.
U?lizing user generated content (UGC) in the recommenda?on process. Web 2.0 Folksonomy taxonomy generated by users who collabora?vely annotate and categorize resources of interests. Hashtags Tagging Content- based Filtering Trends and Future Research
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based
Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based
Hybrid Recommender Systems Introduc?on In order to cope with data sparsity, it is essen?al to resort to external sources of informa?on. This is mandatory when dealing with the cold- start problem of new users and/or new items. Why? Because the absence of ra?ng hinders the possibility of using CF techniques that rely exclusively on ra?ng informa?on.
Combining CF and CBF: Hybrid Recommender Systems Techniques used in Hybrid Systems Fab recommender by Balabonovic & Shoham: Maintains user profiles of interest in Web pages using content- based techniques. Uses CF techniques to iden?fy profiles with similar tastes. Filterbots: Filterbots act as ar?ficial users using certain criteria. A jazzbot will give full marks to a CD because it is in the jazz category. A figh?ng- game- bot will give full marks for an ios app Street Fighte 4 Turbo. Ra?ngs generated by bots are injected into the user- item matrix. Standard CF algorithms are applied to generate recommenda?ons. Similar Imputed Neighborhood Based Collabora?ve Filtering
Hybrid Recommender Systems Techniques used in Hybrid Systems Using external sources Emo?ons Handling Data Sparsity in CF using Emo?on and Seman?c Based Features by Yashar Moshfeghi & Joemon Jose Use a combina?on item- related emo?ons, seman?c data, and LDA to recommend movies. Profiles from Social Web Liu et al. captured and mapped profiles of social web services to a Taste Fabric using ontologies of books, music, movies, etc. These profiles can be used as pseudo users (something like Filterbots). Tags (#hashtags), UGC (blogs, tweets), wri`en reviews (IMDB, blog comments, etc). A number of work has been done to u?lize the structure of follower/followee rela?onship on Twi`er, together with the textual content of their tweets, to find similar users. Context informa?on Ra8ng = R(User, Item, Context) For e.g., movie recommenders can use addi?onal data such as?me, place, and company (i.e., gf, bf, siblings). Techniques that help incorporate context include (i) Markov Chain Monte Carlo (MCMC) techniques, (ii) SVMs, and (iii) Factoriza?on Machines (FMs).
Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques
Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques
Preliminary Work (CIKM 12) Predic=ng Ra=ngs for New Mobile Apps by Combining Collabora=ve Filtering & Topic Modeling We propose a method that mi?gates the cold- start problem by combining collabora?ve filtering and topic modeling. To predict the ra?ng of an item for a given user Our approach learns a model that can correlate similar apps (based on user ra?ngs alone) with mul?- faceted content (such as descrip?ons, categories, price, and company informa?on of apps). U?lize: (i) Clustering, and (ii) a supervised variant of LDA.
Step 1 i ii Calculate similari=es between exis?ng apps and generate sow clusters. iii iv Cluster 2 Cluster 1 Cluster 3 i) Similarities between apps (shown as nodes) are calculated based on user ratings (i.e., memory-based collaborative filtering). ii) Apps are clustered based on the calculated similarity scores. iii) Soft clustering allows an app to be assigned to more than one cluster. iv) Eventually, each cluster is labeled with a cluster ID.
Step 2 Use soh- clustered informa=on and app categories as labels in Labeled LDA to generate a probability distribu=on of labels for each app. [ Cluster ID ] Cluster 1 Cluster 2 Cluster K Business Item Descrip?ons Labeled LDA [ Apps ] Facebook Instagram Games Twi`er Weather We merge the set of cluster IDs (e.g., Cluster 1, Cluster 2) and the set of categorical labels of apps (e.g., Business, Games ) to form a new set of labels, S labels. The set of new labels S labels and the textual descriptions of items are used as inputs to Labeled LDA. Labeled LDA allows us to represent each item as a probability distribution of topics (or labels).
Step 3 Create scalable neighborhoods using incremental clustering. Predict ra?ngs for new apps. We calculate the similarity between apps based on each app s probability distribution (from Laballed LDA), and form clusters based on the computed similarity scores between the apps. When a new app (shown as the square) arrives, the neighborhood for the new app is selected by looking into the cluster that it is closest to. The predicted rating of the new app is then calculated based on the neighborhood of apps.
Preliminary Work (CIKM 12) Results We created a hybrid recommender system by using: Content- independent labels (generated through CF technique), Item metadata (content), and Topic modeling.
Preliminary Work (CIKM 12) Discussion Apps and Movies are different. A ra?ng of 1 for a movie probably means that it is bad; but a ra?ng of 1 for an app could be due to it s crash- prone nature, and NOT it s content. A ra?ng on the App Store (good or bad) indicates that the user took the effort to download the app. Perhaps: instead of using 1 5 ra?ngs, we should instead use unary ra?ngs. Unlike movies, apps are constantly evolving. Each versioning may: Add a new feature (e.g., Re?na display) Fix a bug (e.g., make it compa?ble with ios5) If we focus on this unique traits of apps, we could come out with something novel.
Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques
Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques
Future Work Our research is mo?vated by: the availability of real-?me (social) web data, and the type of UGC to drive recommender systems. We observed that: When a new app is freshly developed and released, it tends to have no ra?ngs for a period of?me. However, we can almost never fail to find tweets about newly released mobile applica?ons on Twi`er. i.e, the number of Tweets about an app is generally much more than the available ra?ngs or reviews on the official App Store. Hence, an interes?ng way to generate item ra?ngs is to use the sen?ments of the wri`en tweets of verified Twi`er user accounts to predict would- be ra?ngs for a new mobile applica?on.
Future Work Consider the following scenario: 1. A user hears about a new app, say Furious Pigs for the iphone and ipad that costs $0.99. 2. The user does not know whether it is worth buying it, and signs into the App Store. 3. However, he realizes that there are no ra?ngs for the app, which is natural as the new app just entered the App Store not too long ago. 4. The user then checks into Twi`er, and searches for the term Furious Pigs. 5. Twi`er processes the user s query, and returns a list of Tweets of other users who have men?oned Furious Pigs. 6. The user reads the tweets. 7. He also no?ces that one of the tweets happens to provide a link to a blog that has reviewed the Furious Pigs app before. He clicks on the link, and reads the blog entry about the review for Furious Pigs. 8. He also no?ces that a local celebrity has tweeted about Furious Pigs. 9. AWer reading through the tweets and blog post, the user finds that the overall sen?ment of the app is rela?vely good. 10. He then downloads the app Furious Pigs onto his iphone and ipad.
Future Work The scenario illustrates the following points: 1. When there are insufficient ra?ngs or reviews about apps at the App Store, we can s?ll rely on tweets to receive app- related informa?on. (see next slide)
Future Work The scenario illustrates the following points: 1. When there are insufficient ra?ngs or reviews about apps at the App Store, we can s?ll rely on tweets to receive app- related informa?on. 2. Tweets (for apps) have a shorter delay or lag?me, as compare to ra?ngs for apps. 3. As Twi`er is focused on driving discovery outward to web pages (or even YouTube videos), there is a chance that we can find even more focused content about an app from a tweet s hyperlink. 4. Every Twi`er user has a certain credibility score or rank. When a popular person (say, Barack Obama) endorses the app Furious Pigs, there is a high chance that the Furious Pigs app will be have an increase in downloads.
Future Work We want to automate this process of using tweets to enhance personalized recommenda?ons to users.
Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 1. Disambigua?on of proper names on Twi`er. 2. Twi`er credibility measurement. 3. Apply Sen?ment Analysis on Twi`er 4. Mapping Twi`er profiles to user profiles in the App Store.
Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 1. Disambigua?on of proper names on Twi`er. Naming conflicts arise from seman?c overloading of en?ty names. For example, when trying to search for tweets discussing the Facebook iphone app, we discovered that Facebook is overloaded it could refer to both the app or the website (h`p://www.facebook.com/). Therefore, we need a strategy to reliably extract twi`er posts that are related to specific apps, overcoming issues of naming conflicts.
Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 2. Twi`er credibility measurement. Not all content posted on Twi`er is trustworthy or useful in providing informa?on about the query. It is important to predict the credibility of informa?on in a tweet. Gupta & Kumaraguru adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets according to their credibility score. Weng et al. made use of the follower and followee rela?onships in Twi`er, and applied an extension of the PageRank algorithm to measure the influence of users in Twi`er.
Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 3. Apply Sen?ment Analysis on Twi`er The problem with general sen?ment analysis algorithms is that most algorithms use simple terms to express sen?ment about a product or service. However, cultural factors (including Web culture), their related linguis?c nuances, and differing contexts make it extremely difficult to turn a string of wri`en text into a posi?ve or nega?ve sen?ment. Therefore, in order to determine the sen?ment of tweets within Twi`er and the app domain, we will have to learn a model that is unique, which will predict sen?ment scores for new tweets about new apps. To do so, we will need to build and evaluate machine learning algorithms that take in both (i) exis?ng apps and their corresponding numerical ra?ngs, and (ii) exis?ng tweets and the words used in the tweets, and learns a mapping between ra?ng scores and words. That way, when a new tweet about a new app is men?oned, a ra?ng for the new app can be predicted.
Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 4. Mapping Twi`er profiles to user profiles in the App Store. Unlike Twi`er profiles, user profiles in the App Store are not as ac?ve; in fact, based on our findings, an average Apple App Store user rates between 3 to 10 apps only. In order to produce personalized recommenda?ons (that are driven by the Social Web) to these exis?ng users in the app store, we will need to find a method for mapping Twi`er profiles to the user profiles in the App Store. When a Twi`er user posts something posi?ve about a new app, our recommender system would then be able to recommend that new app to the exis?ng users (in the App Store) who share a similar profile to the Twi`er user.
Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques
Conclusion Build recommender systems for App Stores. Predict unknown ra?ngs for apps (especially new apps). i.e., tackle the issue of cold- start. Use real-?me, social informa?on to drive recommenda?ons. Use contextual cues (e.g., loca?on,?me, public events, weather) to rank personalized recommenda?ons.
Thank You
Q & A