Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Size: px

Start display at page:

Download "Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1"

Dustin Francis
8 years ago
Views:

1 Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

2 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components Metaphor : A System for Related Search Recommendation Inputs, Implementation and Evaluation 28. Januar 2014 TU Darmstadt Thanh Tung Do 2

3 Google News personalization : Januar 2014 TU Darmstadt Thanh Tung Do 3

4 Google news personalization : Show me something interesting Problem : Google News with large database -> with signed-in user Algorithms : Use a mix of memory based and model based algorithms. 28. Januar 2014 TU Darmstadt Thanh Tung Do 4

5 28. Januar 2014 TU Darmstadt Thanh Tung Do 5

6 Minhash Similarity -> overlap of click history Jaccard coefficient : Distance function : D( u i, u j) = 1 - S( u i, u j) 28. Januar 2014 TU Darmstadt Thanh Tung Do 6

7 LSH Locality Sensitive Hash Idea : hash the data points using several hash functions -> Discover all pairs with similarity greater than s Computing minhash Score : Start with the matrix representation of the set { 0, 1 } : click or not Randomly permute the rows of the matrix Minhash is the first row with a One 28. Januar 2014 TU Darmstadt Thanh Tung Do 7

8 Example - u i u j Story A 1 1 Story B 1 0 Story C 0 1 Story D u i u j Story A 1 1 Story B 1 0 Story C 0 1 Story A 0 0»> irrelevant 28. Januar 2014 TU Darmstadt Thanh Tung Do 8

9 MinHash clustering using MapReduce 28. Januar 2014 TU Darmstadt Thanh Tung Do 9

10 28. Januar 2014 TU Darmstadt Thanh Tung Do 10

11 PLSI Probabilistic Latent Semantic Indexing Idea : make user and item independent Introduce a hidden variable Z Z = L 28. Januar 2014 TU Darmstadt Thanh Tung Do 11

12 PLSI with EM Algorithm Question : What are the items of interest to users? With EM can be the maximum likelihood Parameters of the model study (iterative) But EM on a machine not feasible -> Use MapReduce 28. Januar 2014 TU Darmstadt Thanh Tung Do 12

13 MapReducing EM Algorithm 28. Januar 2014 TU Darmstadt Thanh Tung Do 13

14 Using PLSI with Dynamic Datasets New user / item -> new model -> not real time Therefore : Google News approximate PLSI z value like Cluster Track activities of clusters to story User click -> number for story bzgl. cluster update -> compute P(s z) in real time 28. Januar 2014 TU Darmstadt Thanh Tung Do 14

15 28. Januar 2014 TU Darmstadt Thanh Tung Do 15

16 Covisitation Story : clicked by the same users within a short time Represent in graph with Adjazenlist Take neu click history by User : For each story find all covisited Stories Score is normalized number of Covisitations available via all items from history 28. Januar 2014 TU Darmstadt Thanh Tung Do 16

17 Candidate generation These are determined on the basis of: News Edition Language of the user Categories set by the user Timeliness of the Story 28. Januar 2014 TU Darmstadt Thanh Tung Do 17

18 System Component Data Table : User-ID vs Story-ID User table Story table - Cluster information - Cluster Statistics - Click History - Covisitation statictics Server : News Frontend ( NFE ) News Statictics Server ( NSS ) News Personalization Server ( NPS ) 28. Januar 2014 TU Darmstadt Thanh Tung Do 18

19 System Setup Recommerder request 28. Januar 2014 TU Darmstadt Thanh Tung Do 19

20 System Setup Update stastics request 28. Januar 2014 TU Darmstadt Thanh Tung Do 20

21 Metaphor : A System for Related Search Recommendation 28. Januar 2014 TU Darmstadt Thanh Tung Do 21

22 Metaphor Based on : Series of signal Correletating queries on : time, clicks and contents Signal : 1. Collaborative Filtering 2. Query-result-Query 3. Partial Matches 4. Length Bias 28. Januar 2014 TU Darmstadt Thanh Tung Do 22

23 Dataflow 28. Januar 2014 TU Darmstadt Thanh Tung Do 23

24 1. Input : Collaborative Filtering Searches done in the same session -> Done in the same session Problem : popular query -> Damping popular queries by a TF-IDF measure. 28. Januar 2014 TU Darmstadt Thanh Tung Do 24

25 2. Input : Query-result-Query Same query -> different result Problem : how to give intersecting result sets? 28. Januar 2014 TU Darmstadt Thanh Tung Do 25

26 Calculation QRQ Prevent bias from a few member Aggregate all similar pairs Attach a count to each unique pair C(q, r), Aggregate all pairs with the same clicked-result r -> G(r) and rank queries based on the pair count. 28. Januar 2014 TU Darmstadt Thanh Tung Do 26

27 3. Input : Partial Matches Basic principle is straightforward. Problem : identifying a set of queries which are meaningfully overlapping? Solve : Grouping unique queries together Gounting their occurrence. 28. Januar 2014 TU Darmstadt Thanh Tung Do 27

28 M : the total number of unique queries Q(t): the total number of queries that contain the token t. 28. Januar 2014 TU Darmstadt Thanh Tung Do 28

29 4. Input : Length Bias Click behavior -> a feature The click suggested queries that are slightly longer than the original query, but not too much longer -> Drift q p, q n : previous query and next query function l(q) returns the length of the query q. 28. Januar 2014 TU Darmstadt Thanh Tung Do 29

30 Implementation Kafka : a publish-subscribe system for event collection and dissemination 28. Januar 2014 TU Darmstadt Thanh Tung Do 30

31 Offline evaluation 28. Januar 2014 TU Darmstadt Thanh Tung Do 31

32 Online evaluation Coverage : the fraction of queries that have recommendations for a given signal. Impression : the number of times recommendations were displayed and are analogous to the trigger rate of the module on the search page Clicks : represents a vote that this particular recommendation is relevant. Click-through rate : the fraction of clicks over impressions. 28. Januar 2014 TU Darmstadt Thanh Tung Do 32

33 Evaluation : online 28. Januar 2014 TU Darmstadt Thanh Tung Do 33

34 Conclusion Google News Personalization with mixture algorithms : Model - based Collaborative Filtering MinHash PLSI Memory-based Collaborative Filtering Metaphor with 4 Signal : Collaborative Filtering Query-result-Query Partial matches Length Bias 28. Januar 2014 TU Darmstadt Thanh Tung Do 34

35 Thanks your attention 28. Januar 2014 TU Darmstadt Thanh Tung Do 35

Liktap 2012 Search Engine

Liktap 2012 Search Engine Related Searches at LinkedIn Mitul Tiwari Joint work with Azarias Reda, Yubin Park, Christian Posse, and Sam Shah LinkedIn 1 Who am I 2 Outline About LinkedIn Related Searches Design Implementation Evaluation