Using Social Media to Drive Recommender Systems for Mobile Apps. - GRP Presenta=on - Jovian Lin (A M)

Size: px
Start display at page:

Download "Using Social Media to Drive Recommender Systems for Mobile Apps. - GRP Presenta=on - Jovian Lin (A0026542M)"

Transcription

1 Using Social Media to Drive Recommender Systems for Mobile Apps - GRP Presenta=on - Jovian Lin (A M)

2 Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

3 Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

4 Invasion of the Mobile Apps Mobile apps are soaring in popularity. By 2015, mobile app development projects will outnumber na?ve PC projects by 4 to 1. Mobile devices will outnumber tradi=onal computers by 2 to 1 in a network. 85 BILLION mobile app downloads 185 BILLION by 2014

5 Informa=on Overload Abundance of informa?on (on the Web) Their dynamic & heterogeneous nature Increasing difficult to find what we want and in a manner that best meets our requirements. Consequence: Role of user modeling Personalized informa?on access } Crucial!!! i.e., users need a personalized support in siwing through large amounts of available informa?on, according to their interests and tastes.

6 App- splosion Finding a relevant app is like looking for a needle in a haystack. Three ways to discover apps: 1. Keyword search Smaller/untrusted text descrip?ons. Users intent is unclear (e.g., new ho`est games ). Users may not know how to express their query effec?vely.

7 1. Introduc?on 2. Related Work 3. Preliminary Work 4. Future Work App- splosion Finding a relevant app is like looking for a needle in a haystack. Three ways to discover apps: 1. Keyword search Smaller/untrusted text descrip?ons. Users intent is unclear (e.g., new ho`est games ). Users may not know how to express their query effec?vely. 2. View a list of apps (e.g., top 20 most popular apps) Scrolling through the various lists is like visi?ng an urban flea market. Not personalized.

8 App- splosion Finding a relevant app is like looking for a needle in a haystack. Three ways to discover apps: 1. Keyword search Smaller/untrusted text descrip?ons. Users intent is unclear (e.g., new ho`est games ). Users may not know how to express their query effec?vely. 2. View a list of apps (e.g., top 20 most popular apps) Scrolling through the various lists is like visi?ng an urban flea market. Not personalized. 3. Recommender Systems

9 Recommender Systems Defini=on: Recommender systems a`empt to alleviate users informa?on overload by filtering items that are not relevant to the users interests. Recommenda?on problem is defined as: Es?ma?ng the response of a user for new items based on historical informa?on stored in the system, and sugges?ng to this user novel and original items for which the predicted response is high.

10 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid

11

12 Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. Also called people- to- people correla?on, CF is considered to the the most popular and widely implemented technique in RS.

13 Content- based filtering (CBF) systems make recommenda?ons by analyzing the content of textual informa?on and finding regulari?es in the content. CBF can be seen as an extension of the work done on informa?on filtering.

14 The major difference between CF and CBF is: Collabora?ve filtering systems only uses user- item ra?ngs data to make predic?ons and recommenda?ons. Content- based systems rely on the features of users and items for predic?ons.

15 Hybrid Recommender Systems are based on the combina?on of CF and CBF. They try to avoid the limita?ons of either approach and thereby improve recommenda?on performance. E.g., a simple Hybrid RS may switch between using CF and CBF algorithms depending on the availability of user ra?ngs.

16

17

18 Collabora?ve Filtering (CF) Memory- based CF Model- based CF Content- based Filtering Hybrid Context- aware

19 Collabora?ve Filtering (CF) Memory- based CF Model- based CF Content- based Filtering Hybrid Context- aware Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

20 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

21 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

22 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

23 Problem: Data Sparsity In prac?ce, many commercial RS are used to evaluate very large product sets. This causes the user- item matrix to be extremely sparse, which affects the performance of recommenda?ons. To make things worse, new items or users do not have past ra?ngs. This is owen termed the cold- start problem. In the domain of apps, the number of new ra?ngs cannot keep up with the growing number of new apps. Our solution: U=lize data from the social web to drive recommender systems.

24 The Social Web A New Treasure Trove

25 Why is the Social Web Important? The Internet has reached cri?cal mass in the developed world. Most real- world rela?onships can be supported in the online world. Web 2.0 makes real-?me and online interac?ons possible. i.e., we have user- generated content (UGC). The prolifera?on of mobile devices that are connected to mobile networks is accelera?ng innova?on, and are further enabling real-?me services and networks.

26 Our Research Build recommender systems for App Stores. Predict unknown ra?ngs for apps (especially new apps). i.e., tackle the issue of cold- start. Use real-?me, social informa?on to drive recommenda?ons. Use contextual cues (e.g., loca?on,?me, public events, weather) to rank personalized recommenda?ons.

27 Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

28 Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

29 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

30 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

31 Collabora=ve Filtering Introduc?on Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. E.g., if Alice and Bob both like Item X, and Alice likes Item Y, then Bob is more likely to like Item Y.

32 Collabora=ve Filtering Introduc?on Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. Mark 5/5 4/5 4/5? 3/5? Sergey 4/5 4/5 5/5 3/5

33 Collabora=ve Filtering Introduc?on Collabora?ve filtering (CF) recommends to the ac?ve user the items that other users (with similar tastes) liked in the past. The similarity in taste of two users is calculated based on their ra?ng history. CF algorithms are based on the quality of items as evaluated by peers, instead of relying on content (which may be a bad indicator of quality). E.g., fake apps in App Stores. Unlike content- based systems, CF systems can recommend items with very different content as long as other users have already shown interest for these different items.

34 Collabora=ve Filtering Advantages & Disadvantages Advantages Doesn t require content especially useful in domains where content analysis is difficult or costly. Doesn t require domain knowledge independent of content; only need ra?ngs (or any other informa?on about users preferences). Able to find novel items unlike content- based filtering, the recommended items may be dissimilar in content. Disadvantages Cold- start problem When new users or items enter the system, they have no ra?ngs. As a result, the system cannot generate any recommenda?ons. Data sparsity Even awer acquiring more ra?ngs from the users, sparsity of the user- item matrix can s?ll be a problem for CF.

35 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

36 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

37 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

38 Memory- based Collabora=ve Filtering Memory- based CF Introduc?on The user- item ra?ngs stored in the system are directly used to predict ra?ngs for new items. Two approaches user- based and item- based. User- based approaches evaluate the interest of a user u for an item i using ra?ngs for this item by other users, called neighbors, that have similar ra?ng pa`erns. Item- based approaches predict the ra?ng of a user u for an item i based on the ra?ngs of u for items similar to i.

39 Memory- based Collabora=ve Filtering Introduc?on Memory- based CF User- based approach look at the rows. Item- based approach look at the columns.

40 Memory- based Collabora=ve Filtering Introduc?on The similarity between two items is dependent upon the ra?ngs given to the items by users who have rated both of them. Users Items Item- item similarity is computed by looking at co- rated items only. Based on the ra?ngs, we calculate the similarity between two items. In the case of items i and j, the similarity s ij is computed by looking into them. Popular similarity measures include the (i) Pearson correla?on- based similarity and the (ii) adjusted cosine similarity. E.g.,

41 Memory- based Collabora=ve Filtering Introduc?on Once we can calculate the similarity between items, we can predict the ra?ng by using the idea of weighted sum. With the predicted ra?ngs, Top- N recommenda?ons are easily generated.

42 Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms

43 Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms 1) Default vo=ng In many CFs, pairwise similarity is computed only from ra?ngs in the intersec?on of the items that both users have rated. Focusing on intersec?on set similarity neglects the global ra?ng behavior reflected in a user s en?re ra?ng history. Default vo?ng: 1. Use the average of the clique (or small group) as default vo?ng to extend each user s ra?ng history. 2. Use neutral or (somewhat) nega?ve preference for the unobserved ra?ngs.

44 Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms 2) Inverse User Frequency Idea: Universally liked items are not as useful in capturing similarity as less common items. The inverse frequency is defined as: f j = log ( n / n j ) Total no. of users No. of users who rated Item j If everyone has rated item j, then f j is zero.

45 Memory- based Collabora=ve Filtering Extensions to Memory- based Algorithms 3) Imputa=on Idea: Fill in missing ra?ng and make the user- item ra?ngs matrix dense. Such as using the average ra?ngs for user and item. However: 1. Imputa?on can be very expensive as it significantly increases the amount of data. 2. Inaccurate imputa?on might distort data.

46 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

47 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

48 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

49 Model- based CF Model- based Collabora=ve Filtering Model- based approaches use the ra?ngs to learn a predic?ve model. General idea model the user- item interac?ons with factors represen?ng latent characteris?cs of the users and items in the system. This model is then trained using the available data, and later used to predict ra?ngs of users for new items. Model- based approaches for recommending are numerous. E.g., Bayesian Networks Clustering Latent Seman?c Analysis (LSA) Latent Dirichlet Alloca?on (LDA) Support Vector Machines (SVM) Introduc?on Singular Vector Decomposi?on (SVD) / Matrix factoriza?on

50 1) Clustering Model- based Collabora=ve Filtering Techniques used in Model- based CF A cluster is a collec?on of data objects that are similar to one another within the same cluster; and are dissimilar to the objects in other clusters. Clustering can be classified into 3 categories: 1. Par??oning methods 2. Hierarchical methods 3. Density- based In most situa?ons, clustering is an intermediate step and the resul?ng clusters are used for further analysis. Clustering can be applied in many ways. For example, Sarwar et al. used clustering to par??on data into clusters, and use a memory- based CF algorithm such as the Pearson- correla?on to make predic?ons for each cluster.

51 1) Clustering Model- based Collabora=ve Filtering Techniques used in Model- based CF Advantages Be`er scalability than typical CF methods as they make predic?ons within much smaller clusters rather than the en?re database. Clustering computa?on can be run offline. Disadvantages Recommenda?on quality is generally low.

52 Model- based Collabora=ve Filtering Techniques used in Model- based CF 2) Latent Seman=c CF Models A latent seman?c CF technique relies on a sta?s?cal modeling technique that introduces latent class variables in a mixture model serng. This allows it to discover user communi?es and prototypical interest profiles. Conceptually it decomposes user preferences using overlapping user communi?es. It has higher accuracy and scalability then standard memory- based CF. E.g., the aspect model by Hoffman & Puzicha a probabilis?c latent- space model which models individual ra?ngs as a convex combina?on of ra?ng factors.

53 Model- based Collabora=ve Filtering Techniques used in Model- based CF 3) Matrix Factoriza=on Map both users & items to a joint latent factor space of dimensionality f. User- item interac?ons are modeled as inner products in that space. Each item i is associated with a vector q i while each user u is associated with a vector p u. q i measures the extent to which the item possesses those factors. p u measures the extent of interest the user has for the items. The resul?ng dot product q it p u captures the interac?on between user u and item i i.e., the user s overall interest in the item s characteris?cs. The es?mate of user u s ra?ng for item i: r ui = q it p u

54 Model- based Collabora=ve Filtering Techniques used in Model- based CF 3) Matrix Factoriza=on Capture the latent rela?onships between users and items. Use SVD to factorize the ra?ngs matrix R, obtaining Q, S, and P. i.e., R = QSP T Reduce the matrix S (a diagonal matrix) to dimension k. This produces a low- dimensional representa?on of the original ra?ng matrix. Compute two resultant matrices: 1. Q k S k (q T ) 2. S k P k (p) The resultant matrices can be used to compute the recommenda?on score for any user and item. To predict a ra?ng, calculate the dot product of the i th row of q and u th column of p i.e., r ui = q it p u

55 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

56 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

57 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

58 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

59 Content- based Filtering Introduc?on Systems implemen?ng a content- based recommenda?on approach: 1. Analyze a set of documents and/or descrip?ons of previously- rated items (by a user). 2. Build a model or profile of user interests based on the features of the objects rated by that user. The profile is a structured representa?on of user interests, adopted to recommend new interes?ng items. The recommenda?on process consists in matching up the a`ributes of the user profile against the a`ributes of a content object. Result: a relevance judgment that represents the user s level of interest in that object.

60 Structured Item Representa?on Represented Items CONTENT ANALYZER Content- based Filtering User A training examples Item Descrip?ons Introduc?on PROFILE LEARNER Profiles User A Profile Implicit or explicit feedback Feedback Ac=ve User A User A feedback Item Descrip?ons User A Profile Informa=on Source FILTERING COMPONENT List of recommenda?ons

61 Advantages Content- based Filtering Advantages & Disadvantages User independence unlike CF, does not depend on other users. Transparency explana?ons can be provided by lis?ng content features. New Item can recommend items that have not received ra?ngs. Disadvantages Limited content analysis may not be sufficient to define dis?nguishing aspects of items that turn out to be necessary for the elicita?on of user interests. Over specula=on/serendipity problem recommenda?ons have limited degree of novelty. New user a new user with no given ra?ngs.

62 Content- based Filtering State of the art CBF systems Here, we describe alterna?ve item representa?on techniques, as well as recommenda?on algorithms suitable for the described representa?ons. In most CBF systems, item descrip?ons are textual features. Textual features create a number of complica?ons when learning a user profile. This is due to the natural language ambiguity. Polysemy the presence of mul?ple meanings for one word. Synonymy mul?ple words with same meaning. Seman?c analysis and its integra?on in personaliza?on models is an innova?ve approach to solve these problems. Key idea: obtain a seman?c interpreta?on of the user informa?on needs.

63

64 Content- based Filtering State of the art CBF systems Here, we describe alterna?ve item representa?on techniques, as well as recommenda?on algorithms suitable for the described representa?ons. In most CBF systems, item descrip?ons are textual features. Textual features create a number of complica?ons when learning a user profile. This is due to the natural language ambiguity. Polysemy the presence of mul?ple meanings for one word. Synonymy mul?ple words with same meaning. Seman?c analysis and its integra?on in personaliza?on models is an innova?ve approach to solve these problems. Key idea: obtain a seman?c interpreta?on of the user informa?on needs.

65 Content- based Filtering State of the art CBF systems Keyword- based Vector Space Model (VSM) Most CBF systems use simple retrieval models or VSM with basic TF- IDF weigh?ng. Each document is represented by a vector in a n- dimensional space. Each dimension corresponds to a term from the overall vocabulary. T = {t 1, t 2,, t n } represents the overall vocabulary (aka dic?onary). T is obtained by applying standard NLP opera?ons, e.g., tokeniza?on, stop- words removal, and stemming. d j = {w 1j, w 2j,, w nj }, where w kj is the weight for term t k in document d j. TF- IDF is the most common weigh?ng scheme: Rare items are not less relevant than frequent terms (IDF assump?on); Mul?ple occurrence of a term in a document are not less relevant than single occurrences (TF assump?on); Long documents are not preferred to short documents (normaliza?on assump?on).

66 Content- based Filtering State of the art CBF systems Keyword- based Vector Space Model (VSM) Most CBF systems use simple retrieval models or VSM with basic TF- IDF weigh?ng. Each document is represented by a vector in a n- dimensional space. To measure the closeness between 2 documents, we use the cosine similarity measure. In CBF, both user profiles and items are represented as weighted term vectors.

67 Some unique keyword- based systems: Content- based Filtering State of the art CBF systems Incorporate a mechanism for temporal decay, i.e., the system ages the interest as expressed by the user. Maintain a separate interest profile for a few different topics, e.g., Na?onal, World, Business, etc. In YourNews, The user interest profile for each topic is represented as a weighted prototype term vector extracted from the user s news view history. Having short- term and long- term models. In NewsDude, it learns a short- term user model based on TF- IDF, and a long- term model based on a naïve Bayesian classifier. For domains that are not inherently text- based (e.g., movies): INTIMATE and Movies2GO use movie synopses. FOAFing the Music u?lizes user profiles, music- related RSS feeds, content- based descrip?ons extracted from the audio itself.

68 Content- based Filtering State of the art CBF systems Unfortunately, when more advanced characteris?cs are required, keyword- based approaches show their limita?ons. E.g., French impressionism, keyword- based approaches may find documents containing French and impressionism. Documents about Claude Monet will not appear in the recommenda?on.

69 Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Possible ways: Ontologies provide RS with the cultural and linguis?c background.

70 Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Possible ways: Encyclopedic Knowledge Sources Explicit Seman?c Analysis (ESA), a technique able to provide a fine- grained seman?c representa?on of natural language texts in a high- dimensional space of natural (and comprehensible) concepts derived from Wikipedia. Inspired by the desire to augment text representa?on with massive amounts of world knowledge. In fact, Wikipedia is used to es?mate similarity between movies, in order to provide more accurate predic?ons of the Newlix Prize compe??on.

71 Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Possible ways: Topic Models topic modeling algorithms are used to discover a set of topics from a large collec?on of documents. A topic is a distribu?on over terms that is biased around those associated under a single theme. Topic models provide an interpretable low- dimensional representa?on of the documents. Documents are represented as a distribu?on of topics.

72

73 Content- based Filtering State of the art CBF systems Therefore, more advanced representa?on strategies are needed in order to equip CBF systems with seman?c intelligence. Example: MobileWalla MobileWalla (MW) is an independent, unbiased search engine for mobile apps with seman.c search capabili?es. It has an objec?ve app ra?ng and scoring mechanism based on user and developer involvement with an app. Such scoring mechanism enables MW to provide a number of other ways to discover apps such as dynamically maintained hot lists and fast rising lists.

74 U?lizing user generated content (UGC) in the recommenda?on process. Web 2.0 Folksonomy taxonomy generated by users who collabora?vely annotate and categorize resources of interests. Hashtags Tagging Content- based Filtering Trends and Future Research

75 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

76 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based/ neighborhood- based Model- based

77 Recommender Systems Collabora?ve Filtering (CF) Content- based Filtering Hybrid Memory- based Model- based

78 Hybrid Recommender Systems Introduc?on In order to cope with data sparsity, it is essen?al to resort to external sources of informa?on. This is mandatory when dealing with the cold- start problem of new users and/or new items. Why? Because the absence of ra?ng hinders the possibility of using CF techniques that rely exclusively on ra?ng informa?on.

79 Combining CF and CBF: Hybrid Recommender Systems Techniques used in Hybrid Systems Fab recommender by Balabonovic & Shoham: Maintains user profiles of interest in Web pages using content- based techniques. Uses CF techniques to iden?fy profiles with similar tastes. Filterbots: Filterbots act as ar?ficial users using certain criteria. A jazzbot will give full marks to a CD because it is in the jazz category. A figh?ng- game- bot will give full marks for an ios app Street Fighte 4 Turbo. Ra?ngs generated by bots are injected into the user- item matrix. Standard CF algorithms are applied to generate recommenda?ons. Similar Imputed Neighborhood Based Collabora?ve Filtering

80 Hybrid Recommender Systems Techniques used in Hybrid Systems Using external sources Emo?ons Handling Data Sparsity in CF using Emo?on and Seman?c Based Features by Yashar Moshfeghi & Joemon Jose Use a combina?on item- related emo?ons, seman?c data, and LDA to recommend movies. Profiles from Social Web Liu et al. captured and mapped profiles of social web services to a Taste Fabric using ontologies of books, music, movies, etc. These profiles can be used as pseudo users (something like Filterbots). Tags (#hashtags), UGC (blogs, tweets), wri`en reviews (IMDB, blog comments, etc). A number of work has been done to u?lize the structure of follower/followee rela?onship on Twi`er, together with the textual content of their tweets, to find similar users. Context informa?on Ra8ng = R(User, Item, Context) For e.g., movie recommenders can use addi?onal data such as?me, place, and company (i.e., gf, bf, siblings). Techniques that help incorporate context include (i) Markov Chain Monte Carlo (MCMC) techniques, (ii) SVMs, and (iii) Factoriza?on Machines (FMs).

81 Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

82 Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

83 Preliminary Work (CIKM 12) Predic=ng Ra=ngs for New Mobile Apps by Combining Collabora=ve Filtering & Topic Modeling We propose a method that mi?gates the cold- start problem by combining collabora?ve filtering and topic modeling. To predict the ra?ng of an item for a given user Our approach learns a model that can correlate similar apps (based on user ra?ngs alone) with mul?- faceted content (such as descrip?ons, categories, price, and company informa?on of apps). U?lize: (i) Clustering, and (ii) a supervised variant of LDA.

84 Step 1 i ii Calculate similari=es between exis?ng apps and generate sow clusters. iii iv Cluster 2 Cluster 1 Cluster 3 i) Similarities between apps (shown as nodes) are calculated based on user ratings (i.e., memory-based collaborative filtering). ii) Apps are clustered based on the calculated similarity scores. iii) Soft clustering allows an app to be assigned to more than one cluster. iv) Eventually, each cluster is labeled with a cluster ID.

85 Step 2 Use soh- clustered informa=on and app categories as labels in Labeled LDA to generate a probability distribu=on of labels for each app. [ Cluster ID ] Cluster 1 Cluster 2 Cluster K Business Item Descrip?ons Labeled LDA [ Apps ] Facebook Instagram Games Twi`er Weather We merge the set of cluster IDs (e.g., Cluster 1, Cluster 2) and the set of categorical labels of apps (e.g., Business, Games ) to form a new set of labels, S labels. The set of new labels S labels and the textual descriptions of items are used as inputs to Labeled LDA. Labeled LDA allows us to represent each item as a probability distribution of topics (or labels).

86 Step 3 Create scalable neighborhoods using incremental clustering. Predict ra?ngs for new apps. We calculate the similarity between apps based on each app s probability distribution (from Laballed LDA), and form clusters based on the computed similarity scores between the apps. When a new app (shown as the square) arrives, the neighborhood for the new app is selected by looking into the cluster that it is closest to. The predicted rating of the new app is then calculated based on the neighborhood of apps.

87 Preliminary Work (CIKM 12) Results We created a hybrid recommender system by using: Content- independent labels (generated through CF technique), Item metadata (content), and Topic modeling.

88 Preliminary Work (CIKM 12) Discussion Apps and Movies are different. A ra?ng of 1 for a movie probably means that it is bad; but a ra?ng of 1 for an app could be due to it s crash- prone nature, and NOT it s content. A ra?ng on the App Store (good or bad) indicates that the user took the effort to download the app. Perhaps: instead of using 1 5 ra?ngs, we should instead use unary ra?ngs. Unlike movies, apps are constantly evolving. Each versioning may: Add a new feature (e.g., Re?na display) Fix a bug (e.g., make it compa?ble with ios5) If we focus on this unique traits of apps, we could come out with something novel.

89 Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

90 Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

91 Future Work Our research is mo?vated by: the availability of real-?me (social) web data, and the type of UGC to drive recommender systems. We observed that: When a new app is freshly developed and released, it tends to have no ra?ngs for a period of?me. However, we can almost never fail to find tweets about newly released mobile applica?ons on Twi`er. i.e, the number of Tweets about an app is generally much more than the available ra?ngs or reviews on the official App Store. Hence, an interes?ng way to generate item ra?ngs is to use the sen?ments of the wri`en tweets of verified Twi`er user accounts to predict would- be ra?ngs for a new mobile applica?on.

92 Future Work Consider the following scenario: 1. A user hears about a new app, say Furious Pigs for the iphone and ipad that costs $ The user does not know whether it is worth buying it, and signs into the App Store. 3. However, he realizes that there are no ra?ngs for the app, which is natural as the new app just entered the App Store not too long ago. 4. The user then checks into Twi`er, and searches for the term Furious Pigs. 5. Twi`er processes the user s query, and returns a list of Tweets of other users who have men?oned Furious Pigs. 6. The user reads the tweets. 7. He also no?ces that one of the tweets happens to provide a link to a blog that has reviewed the Furious Pigs app before. He clicks on the link, and reads the blog entry about the review for Furious Pigs. 8. He also no?ces that a local celebrity has tweeted about Furious Pigs. 9. AWer reading through the tweets and blog post, the user finds that the overall sen?ment of the app is rela?vely good. 10. He then downloads the app Furious Pigs onto his iphone and ipad.

93 Future Work The scenario illustrates the following points: 1. When there are insufficient ra?ngs or reviews about apps at the App Store, we can s?ll rely on tweets to receive app- related informa?on. (see next slide)

94

95 Future Work The scenario illustrates the following points: 1. When there are insufficient ra?ngs or reviews about apps at the App Store, we can s?ll rely on tweets to receive app- related informa?on. 2. Tweets (for apps) have a shorter delay or lag?me, as compare to ra?ngs for apps. 3. As Twi`er is focused on driving discovery outward to web pages (or even YouTube videos), there is a chance that we can find even more focused content about an app from a tweet s hyperlink. 4. Every Twi`er user has a certain credibility score or rank. When a popular person (say, Barack Obama) endorses the app Furious Pigs, there is a high chance that the Furious Pigs app will be have an increase in downloads.

96 Future Work We want to automate this process of using tweets to enhance personalized recommenda?ons to users.

97 Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 1. Disambigua?on of proper names on Twi`er. 2. Twi`er credibility measurement. 3. Apply Sen?ment Analysis on Twi`er 4. Mapping Twi`er profiles to user profiles in the App Store.

98 Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 1. Disambigua?on of proper names on Twi`er. Naming conflicts arise from seman?c overloading of en?ty names. For example, when trying to search for tweets discussing the Facebook iphone app, we discovered that Facebook is overloaded it could refer to both the app or the website (h`p:// Therefore, we need a strategy to reliably extract twi`er posts that are related to specific apps, overcoming issues of naming conflicts.

99 Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 2. Twi`er credibility measurement. Not all content posted on Twi`er is trustworthy or useful in providing informa?on about the query. It is important to predict the credibility of informa?on in a tweet. Gupta & Kumaraguru adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets according to their credibility score. Weng et al. made use of the follower and followee rela?onships in Twi`er, and applied an extension of the PageRank algorithm to measure the influence of users in Twi`er.

100 Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 3. Apply Sen?ment Analysis on Twi`er The problem with general sen?ment analysis algorithms is that most algorithms use simple terms to express sen?ment about a product or service. However, cultural factors (including Web culture), their related linguis?c nuances, and differing contexts make it extremely difficult to turn a string of wri`en text into a posi?ve or nega?ve sen?ment. Therefore, in order to determine the sen?ment of tweets within Twi`er and the app domain, we will have to learn a model that is unique, which will predict sen?ment scores for new tweets about new apps. To do so, we will need to build and evaluate machine learning algorithms that take in both (i) exis?ng apps and their corresponding numerical ra?ngs, and (ii) exis?ng tweets and the words used in the tweets, and learns a mapping between ra?ng scores and words. That way, when a new tweet about a new app is men?oned, a ra?ng for the new app can be predicted.

101 Future Work In order to achieve this, we will have to solve at least the following issues in order to proceed: 4. Mapping Twi`er profiles to user profiles in the App Store. Unlike Twi`er profiles, user profiles in the App Store are not as ac?ve; in fact, based on our findings, an average Apple App Store user rates between 3 to 10 apps only. In order to produce personalized recommenda?ons (that are driven by the Social Web) to these exis?ng users in the app store, we will need to find a method for mapping Twi`er profiles to the user profiles in the App Store. When a Twi`er user posts something posi?ve about a new app, our recommender system would then be able to recommend that new app to the exis?ng users (in the App Store) who share a similar profile to the Twi`er user.

102 Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our Research Related Work Collabora=ve Filtering }- Memory- based (CF) - Model- based (CF) - Content- based Filtering - Hybrid Recommender Systems Preliminary Work Summary of recent CIKM 12 Submission Future Work Conclusion Introduc=on Advantages/Disadvantages Techniques

103 Conclusion Build recommender systems for App Stores. Predict unknown ra?ngs for apps (especially new apps). i.e., tackle the issue of cold- start. Use real-?me, social informa?on to drive recommenda?ons. Use contextual cues (e.g., loca?on,?me, public events, weather) to rank personalized recommenda?ons.

104 Thank You

105 Q & A

Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za

Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Reflec1ons on the role of corpora and big data in e- lexicography in rela1on to end user informa1on needs CILC 2015 7th Interna1onal

More information

Data Warehousing. Yeow Wei Choong Anne Laurent

Data Warehousing. Yeow Wei Choong Anne Laurent Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,

More information

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...

More information

Extrac'ng People s Hobby and Interest Informa'on from Social Media Content

Extrac'ng People s Hobby and Interest Informa'on from Social Media Content Extrac'ng People s Hobby and Interest Informa'on from Social Media Content Thomas Forss, Shuhua Liu and Kaj- Mikael Björk Dept of Business Administra?on and Analy?cs Arcada University of Applied Sciences

More information

Data Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.

Data Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot. Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised

More information

How To Understand The Big Data Paradigm

How To Understand The Big Data Paradigm Big Data and Its Empiricist Founda4ons Teresa Scantamburlo The evolu4on of Data Science The mechaniza4on of induc4on The business of data The Big Data paradigm (data + computa4on) Cri4cal analysis Tenta4ve

More information

Keeping Pace with Big Data

Keeping Pace with Big Data - A Data Mining Perspec>ve Huan Liu, Tempe, AZ hep://www.public.asu.edu/~huanliu NSF Workshop on Big Data Analy6cs for Infrastructure and Building Resilience and Sustainability, Beijing, China Sept 19-20,

More information

Collision Data Analysis, A Mul0 Dimensional Approach Presented by: Howard Sco> Needham, Sandarbh Singh

Collision Data Analysis, A Mul0 Dimensional Approach Presented by: Howard Sco> Needham, Sandarbh Singh Masters Defense Collision Data Analysis, A Mul0 Dimensional Approach Presented by: Howard Sco> Needham, Sandarbh Singh Introduc0on! We wanted to find a large open source database so we can mine and experiment

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

More information

How To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9

How To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9 Copyright 2014 Splunk Inc. Splunk for Mobile Intelligence Bill Emme< Director, Solu?ons Marke?ng Panos Papadopoulos Director, Product Management Disclaimer During the course of this presenta?on, we may

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

Network Maps for End Users: Collect, Analyze, Visualize and Communicate Network Insights with Zero Coding

Network Maps for End Users: Collect, Analyze, Visualize and Communicate Network Insights with Zero Coding Network Maps for End Users: Collect, Analyze, Visualize and Communicate Network Insights with Zero Coding A project from the Social Media Research Founda8on: h:p://www.smrfounda8on.org About Me Introduc8ons

More information

Social Media Analy.cs (SMA)

Social Media Analy.cs (SMA) Social Media Analy.cs (SMA) Emanuele Della Valle DEIB - Politecnico di Milano emanuele.dellavalle@polimi.it hap://emanueledellavalle.org What's social media? haps://www.youtube.com/watch?v=sgniiud_oqg

More information

Making Sense of Big Data. Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.

Making Sense of Big Data. Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl. Making Sense of Big Data Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.gov 865-574- 0834 ORNL s Big Data Legacy Science National Security Energy

More information

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION CSE 537 Ar@ficial Intelligence Professor Anita Wasilewska GROUP 2 TEAM MEMBERS: SAEED BOOR BOOR - 110564337 SHIH- YU TSAI - 110385129 HAN LI 110168054 SOURCES

More information

How To Use A Webmail On A Pc Or Macodeo.Com

How To Use A Webmail On A Pc Or Macodeo.Com Big data workloads and real-world data sets Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Five

More information

Big Data Use Cases. At Salesforce.com. Narayan Bharadwaj Director, Product Management Salesforce.com. @nadubharadwaj

Big Data Use Cases. At Salesforce.com. Narayan Bharadwaj Director, Product Management Salesforce.com. @nadubharadwaj Big Data Use Cases At Salesforce.com Narayan Bharadwaj Director, Product Management Salesforce.com @nadubharadwaj Safe harbor Safe harbor statement under the Private Securi9es Li9ga9on Reform Act of 1995:

More information

Social Media Monitoring by Using Data Mining. Fuat Basık

Social Media Monitoring by Using Data Mining. Fuat Basık Social Media Monitoring by Using Data Mining Fuat Basık Presentation Plan Introduc0on Mo0va0on Stream Processing Data Set Turkish Language Pre Processing and Stemming Term Frequency and Inverse Document

More information

Pu?ng B2B Research to the Legal Test

Pu?ng B2B Research to the Legal Test With the global leader in sampling and data services Pu?ng B2B Research to the Legal Test Ashlin Quirk, SSI General Counsel 2014 Survey Sampling Interna6onal 1 2014 Survey Sampling Interna6onal Se?ng the

More information

Mobile Apps Jovian Lin, Ph.D.

Mobile Apps Jovian Lin, Ph.D. 7 th January 2015 Seminar Room 2.4, Lv 2, SIS, SMU Recommendation Algorithms for Mobile Apps Jovian Lin, Ph.D. 1. Introduction 2 No. of apps is ever-increasing 1.3 million Android apps on Google Play (as

More information

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals

More information

BENCHMARKING V ISUALIZATION TOOL

BENCHMARKING V ISUALIZATION TOOL Copyright 2014 Splunk Inc. BENCHMARKING V ISUALIZATION TOOL J. Green Computer Scien

More information

Fixed Scope Offering (FSO) for Oracle SRM

Fixed Scope Offering (FSO) for Oracle SRM Fixed Scope Offering (FSO) for Oracle SRM Agenda iapps Introduc.on Execu.ve Summary Business Objec.ves Solu.on Proposal Scope - Business Process Scope Applica.on Implementa.on Methodology Time Frames Team,

More information

ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on

ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on Jaume Bacardit jaume.bacardit@ncl.ac.uk The Interdisciplinary Compu/ng and Complex BioSystems

More information

Nodes, Ties and Influence

Nodes, Ties and Influence Nodes, Ties and Influence Chapter 2 Chapter 2, Community Detec:on and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010. 1 IMPORTANCE OF NODES 2 Importance of Nodes Not

More information

Scalus A)ribute Workshop. Paris, April 14th 15th

Scalus A)ribute Workshop. Paris, April 14th 15th Scalus A)ribute Workshop Paris, April 14th 15th Content Mo=va=on, objec=ves, and constraints Scalus strategy Scenario and architectural views How the architecture works Mo=va=on for this MCITN Storage

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho Ins+tuto Superior Técnico Technical University of Lisbon Big Data Bruno Lopes Catarina Moreira João Pinho Mo#va#on 2 220 PetaBytes Of data that people create every day! 2 Mo#va#on 90 % of Data UNSTRUCTURED

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING = + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:

More information

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Search The Way You Think Copyright 2009 Coronado, Ltd. All rights reserved. All other product names and logos

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

More information

Welcome! Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research. Andrea Heckert, PhD, MPH Program Officer, Science

Welcome! Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research. Andrea Heckert, PhD, MPH Program Officer, Science Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research Emily Evans, PhD, MPH Program Officer, Science Andrea Heckert, PhD, MPH Program Officer, Science June 22, 2015 Welcome! Emily

More information

Recommendation Tool Using Collaborative Filtering

Recommendation Tool Using Collaborative Filtering Recommendation Tool Using Collaborative Filtering Aditya Mandhare 1, Soniya Nemade 2, M.Kiruthika 3 Student, Computer Engineering Department, FCRIT, Vashi, India 1 Student, Computer Engineering Department,

More information

DTCC Data Quality Survey Industry Report

DTCC Data Quality Survey Industry Report DTCC Data Quality Survey Industry Report November 2013 element 22 unlocking the power of your data Contents 1. Introduction 3 2. Approach and participants 4 3. Summary findings 5 4. Findings by topic 6

More information

A Brief Overview of the Mobile App Ecosystem. September 13, 2012

A Brief Overview of the Mobile App Ecosystem. September 13, 2012 A Brief Overview of the Mobile App Ecosystem September 13, 2012 Presenters Pam Dixon, Execu9ve Director, World Privacy Forum Jules Polonetsky, Director and Co- Chair, Future of Privacy Forum Nathan Good,

More information

CMMI for High-Performance with TSP/PSP

CMMI for High-Performance with TSP/PSP Dr. Kıvanç DİNÇER, PMP Hace6epe University Implemen@ng CMMI for High-Performance with TSP/PSP Informa@on Systems & SoFware The Informa@on Systems usage has experienced an exponen@al growth over the past

More information

Challenges and Opportunities in Data Mining: Personalization

Challenges and Opportunities in Data Mining: Personalization Challenges and Opportunities in Data Mining: Big Data, Predictive User Modeling, and Personalization Bamshad Mobasher School of Computing DePaul University, April 20, 2012 Google Trends: Data Mining vs.

More information

Opportuni)es and Challenges of Textual Big Data for the Humani)es

Opportuni)es and Challenges of Textual Big Data for the Humani)es Opportuni)es and Challenges of Textual Big Data for the Humani)es Dr. Adam Wyner, Department of Compu)ng Prof. Barbara Fennell, Department of Linguis)cs THiNK Network Knowledge Exchange in the Humani)es

More information

B2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity

B2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity B2B Offerings Helping businesses op2mize Infolob s amazing b2b offerings helps your company achieve maximum produc2vity What is B2B? B2B is shorthand for the sales prac4ce called business- to- business

More information

RESTful or RESTless Current State of Today's Top Web APIs

RESTful or RESTless Current State of Today's Top Web APIs RESTful or RESTless Current State of Today's Top Web APIs Frederik Buelthoff, Maria Maleshkova AIFB, Karlsruhe Ins-tute of Technology (KIT), Germany [1] Growing Number of Web APIs Challenges Scalability

More information

Realm of Big Data Ini0a0ves

Realm of Big Data Ini0a0ves Realm of Big Data Ini0a0ves Kamlesh Mhashilkar Head - Analy0cs, Big Data and Informa0on Management (ABIM) Prac0ce TCS Digital Enterprise Copyright 2013 Tata Consultancy Services Limited 1 Realm of Big

More information

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy Much higher Volumes. Processed with more Velocity. With much more Variety. Is Big Data so big? Big Data Smart Data Project HAVEn: Adaptive Intelligence

More information

CS 5150 So(ware Engineering Evalua4on and User Tes4ng

CS 5150 So(ware Engineering Evalua4on and User Tes4ng Cornell University Compu1ng and Informa1on Science CS 5150 So(ware Engineering Evalua4on and User Tes4ng William Y. Arms Usability: The Analyze/Design/Build/Evaluate Loop Analyze requirements Design User

More information

Blue Medora VMware vcenter Opera3ons Manager Management Pack for Oracle Enterprise Manager

Blue Medora VMware vcenter Opera3ons Manager Management Pack for Oracle Enterprise Manager Blue Medora VMware vcenter Opera3ons Manager Management Pack for Oracle Enterprise Manager Oracle WebLogic J2EE on VMware Monitoring 203 Blue Medora LLC All rights reserved WebLogic on VMware Management

More information

Phone Systems Buyer s Guide

Phone Systems Buyer s Guide Phone Systems Buyer s Guide Contents How Cri(cal is Communica(on to Your Business? 3 Fundamental Issues 4 Phone Systems Basic Features 6 Features for Users with Advanced Needs 10 Key Ques(ons for All Buyers

More information

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output

More information

IT Change Management Process Training

IT Change Management Process Training IT Change Management Process Training Before you begin: This course was prepared for all IT professionals with the goal of promo9ng awareness of the process. Those taking this course will have varied knowledge

More information

The author(s) shown below used Federal funds provided by the U.S. Department of Justice and prepared the following final report:

The author(s) shown below used Federal funds provided by the U.S. Department of Justice and prepared the following final report: The author(s) shown below used Federal funds provided by the U.S. Department of Justice and prepared the following final report: Document Title: Criminal Justice System State Administrative Agencies: Research

More information

Missing Data. Katyn & Elena

Missing Data. Katyn & Elena Missing Data Katyn & Elena What to do with Missing Data Standard is complete case analysis/listwise dele;on ie. Delete cases with missing data so only complete cases are le> Two other popular op;ons: Mul;ple

More information

IT Governance in Organizations Experiencing Decentralization. Jelena Zdravkovic

IT Governance in Organizations Experiencing Decentralization. Jelena Zdravkovic IT Governance in Organizations Experiencing Decentralization Jelena Zdravkovic Department of Computer & Systems Sciences (DSV), Stockholm University, Sweden Giannoulis About the Speaker Title: Associate

More information

Application of Supply Chain Concepts to the Analysis Process

Application of Supply Chain Concepts to the Analysis Process Application of Supply Chain Concepts to the Analysis Process Rob Handfield, PhD Bank of America University Distinguished Professor of Supply Chain Management Executive Director, Supply Chain Resource Cooperative

More information

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

More information

Scientific Report. BIDYUT KUMAR / PATRA INDIAN VTT Technical Research Centre of Finland, Finland. Raimo / Launonen. First name / Family name

Scientific Report. BIDYUT KUMAR / PATRA INDIAN VTT Technical Research Centre of Finland, Finland. Raimo / Launonen. First name / Family name Scientific Report First name / Family name Nationality Name of the Host Organisation First Name / family name of the Scientific Coordinator BIDYUT KUMAR / PATRA INDIAN VTT Technical Research Centre of

More information

User Data Analytics and Recommender System for Discovery Engine

User Data Analytics and Recommender System for Discovery Engine User Data Analytics and Recommender System for Discovery Engine Yu Wang Master of Science Thesis Stockholm, Sweden 2013 TRITA- ICT- EX- 2013: 88 User Data Analytics and Recommender System for Discovery

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

A Web Page Prediction Model Based on Click-Stream Tree Representation of User Behavior

A Web Page Prediction Model Based on Click-Stream Tree Representation of User Behavior A Web Page Predicon Model Based on Click-Stream Tree Representaon of User Behavior Şule Gündüz Computer Engineering Department Istanbul Technical University Istanbul, Turkey gunduz@cs.itu.edu.tr M. Tamer

More information

SDN- based Mobile Networking for Cellular Operators. Seil Jeon, Carlos Guimaraes, Rui L. Aguiar

SDN- based Mobile Networking for Cellular Operators. Seil Jeon, Carlos Guimaraes, Rui L. Aguiar SDN- based Mobile Networking for Cellular Operators Seil Jeon, Carlos Guimaraes, Rui L. Aguiar Background The data explosion currently we re facing with has a serious impact on current cellular networks

More information

Applying Machine Learning to Network Security Monitoring. Alex Pinto Chief Data Scien2st MLSec Project @alexcpsec @MLSecProject!

Applying Machine Learning to Network Security Monitoring. Alex Pinto Chief Data Scien2st MLSec Project @alexcpsec @MLSecProject! Applying Machine Learning to Network Security Monitoring Alex Pinto Chief Data Scien2st MLSec Project @alexcpsec @MLSecProject! whoami Almost 15 years in Informa2on Security, done a licle bit of everything.

More information

Introduc)on to Hadoop

Introduc)on to Hadoop Introduc)on to Hadoop Slides compiled from: Introduc)on to MapReduce and Hadoop Shivnath Babu Experiences with Hadoop and MapReduce Jian Wen Word Count over a Given Set of Web Pages see bob throw see spot

More information

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor

More information

Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More

Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More Copyright 2015 Splunk Inc. Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More Stela Udovicic Sr. Product Marke?ng Manager Clayton

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

CSER & emerge Consor.a EHR Working Group Collabora.on on Display and Storage of Gene.c Informa.on in Electronic Health Records

CSER & emerge Consor.a EHR Working Group Collabora.on on Display and Storage of Gene.c Informa.on in Electronic Health Records electronic Medical Records and Genomics CSER & emerge Consor.a EHR Working Group Collabora.on on Display and Storage of Gene.c Informa.on in Electronic Health Records Brian Shirts, MD, PhD University of

More information

Project Management Introduc1on

Project Management Introduc1on Project Management Introduc1on Session 1 Part I Introduc1on By Amal Le Collen, PMP Dr. Lauren1u Neamtu, PMP Session outline 1. PART I: Introduc1on 1. The Purpose of the PMBOK Guide 2. What is a project?

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Beyond Strategy: Building Your Mobile Capabili6es

Beyond Strategy: Building Your Mobile Capabili6es Beyond Strategy: Building Your Mobile Capabili6es TASSCC Technology Educa6on Conference April 10, 2015 Presented by: Raj Polikepa6 Director of App Development Texas.gov Agenda ê Objec6ves of Mobile Strategy

More information

San Jacinto College Banner & Enterprise Applica5on Review Task Force Report. November 01, 2011 FINAL

San Jacinto College Banner & Enterprise Applica5on Review Task Force Report. November 01, 2011 FINAL San Jacinto College Banner & Enterprise Applica5on Review Task Force Report November 01, 2011 FINAL 1 Content Review goal and approach 3 Barriers to effec5ve use of Banner: Consultant observa5ons 10 Consultant

More information

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas Big Data The Big Picture Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas What is Big Data? Big Data gets its name because that s what it is data that

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

More information

Factorization Machines

Factorization Machines Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan rendle@ar.sanken.osaka-u.ac.jp Abstract In this

More information

Topic Extrac,on from Online Reviews for Classifica,on and Recommenda,on (2013) R. Dong, M. Schaal, M. P. O Mahony, B. Smyth

Topic Extrac,on from Online Reviews for Classifica,on and Recommenda,on (2013) R. Dong, M. Schaal, M. P. O Mahony, B. Smyth Topic Extrac,on from Online Reviews for Classifica,on and Recommenda,on (2013) R. Dong, M. Schaal, M. P. O Mahony, B. Smyth Lecture Algorithms to Analyze Big Data Speaker Hüseyin Dagaydin Heidelberg, 27

More information

Social Media Channels and Their Uses

Social Media Channels and Their Uses How and When to Use Social Media Channels to Strategically Support Government Goals October 2012 Prepared by Craig Thomler Managing Director Delib Australia Pty Ltd Email: craig@delib.net.au Phone: 0411

More information

Synchronous and asynchronous video conferencing tools in an online-course:! Supporting a community of inquiry!

Synchronous and asynchronous video conferencing tools in an online-course:! Supporting a community of inquiry! Synchronous and asynchronous video conferencing tools in an online-course:! Supporting a community of inquiry! David Wicks, Seattle Pacific University! Andrew Lumpe, Seattle Pacific University! Janiess

More information

Which universities lead and lag? Toward university rankings based on scholarly output

Which universities lead and lag? Toward university rankings based on scholarly output Which universities lead and lag? Toward university rankings based on scholarly output Daniel Ramage and Christopher D. Manning Computer Science Department Stanford University Stanford, California 94305

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Seman&c Web: Benefits For Clinical Decision Support At The Bedside. Emory Fry, MD SemTechBiz 2013

Seman&c Web: Benefits For Clinical Decision Support At The Bedside. Emory Fry, MD SemTechBiz 2013 Seman&c Web: Benefits For Clinical Decision Support At The Bedside Emory Fry, MD SemTechBiz 2013 Clinical Decision Support (CDS) A system providing knowledge and person specific or popula8on informa8on

More information

Doing Big Data Projects: What s the Best Team Process Methology?

Doing Big Data Projects: What s the Best Team Process Methology? Doing Big Data Projects: What s the Best Team Process Methology? October 2015 1 Executive Summary What s the Best Team Process Methology? September 2015 2 Executive Summary What s the Best Team Process

More information

Scalus Winter School Storage Systems

Scalus Winter School Storage Systems Scalus Winter School Storage Systems Flash Memory André Brinkmann Flash Memory Floa:ng gate of a flash cell is electrically isolated Applying high voltages between source and drain accelerates electrons

More information

BPO. Accerela*ng Revenue Enhancements Through Sales Support Services

BPO. Accerela*ng Revenue Enhancements Through Sales Support Services BPO Accerela*ng Revenue Enhancements Through Sales Support Services What is BPO? Business Process Outsorcing (BPO) is the process of outsourcing specific business func6ons to a third- party service provider

More information

Can Cloud Hos+ng Providers Really Replace. Your Cri(cal IT Infrastructure?

Can Cloud Hos+ng Providers Really Replace. Your Cri(cal IT Infrastructure? Can Cloud Hos+ng Providers Really Replace Your Cri(cal IT Infrastructure? Housekeeping Welcome to Align s Webinar Can Cloud Hos+ng Providers Really Replace Your Cri(cal IT Infrastructure? Informa+on for

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

Intinno: A Web Integrated Digital Library and Learning Content Management System

Intinno: A Web Integrated Digital Library and Learning Content Management System Intinno: A Web Integrated Digital Library and Learning Content Management System Synopsis of the Thesis to be submitted in Partial Fulfillment of the Requirements for the Award of the Degree of Master

More information

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction Latent Semantic Indexing with Selective Query Expansion Andy Garron April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville PA 19426 Abstract This article describes

More information

Informa.on Systems in Organiza.ons

Informa.on Systems in Organiza.ons Informa.on Systems in Organiza.ons MIS 2101 Week 7 / Chapter 7 Enhancing Business Processes Using Enterprise Informa.on Systems Photo: Objet Mathema+que by Man Ray, 1934 Chapter 7 Learning Objec.ves Core

More information

Honeycomb Crea/ve Works is financed by the European Union s European Regional Development Fund through the INTERREG IVA Cross- border Programme

Honeycomb Crea/ve Works is financed by the European Union s European Regional Development Fund through the INTERREG IVA Cross- border Programme Honeycomb Crea/ve Works is financed by the European Union s European Regional Development Fund through the INTERREG IVA Cross- border Programme managed by the Special EU Programmes Body. Web Analy*cs In

More information

Semantically Enhanced Web Personalization Approaches and Techniques

Semantically Enhanced Web Personalization Approaches and Techniques Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Crunching Big Data with R And Hadoop!

Crunching Big Data with R And Hadoop! 1 Crunching Big Data with R And Hadoop! Strata/Hadoop World NYC 2012 Flash drives with tutorial materials are near the door, please start downloading the tutorial materials onto your laptop. There is a

More information

Insider s Guide to Digital Media Measurement Sen5ment Analysis Symposium 2015

Insider s Guide to Digital Media Measurement Sen5ment Analysis Symposium 2015 Insider s Guide to Digital Media Measurement Sen5ment Analysis Symposium 2015 Presented By Stephen D. Rappaport, Global Digital Advisor, Sunstar Inc. Senior Consultant SDR Consul5ng E. steve@sdrconsul5ngllc.com

More information