Optimizing Search Engines using Clickthrough Data Presented by - Kajal Miyan Seminar Series, 891 Michigan state University *Slides adopted from presentations of Thorsten Joachims (author) and Shui-Lung Chuang,
The Goal Reiterated
The Reference Papers Evaluating Retrieval Performance using Clickthrough Data Technical Report, Cornell U., 2002 Optimizing Search Engines using Clickthrough Data KDD-2002 Thorsten Joachims Department of Computer Science Cornell University
Outline Things about clickthrough data Evaluating search engines using clickthrough data Optimizing search engines using clickthrough data SVM-light Open issues Conclusion Discussion
Inspiration Search Engines. Vs Learning Search Engines Recent use of SVM-light with astonishing results.
Heterogeneity and Homogeneity in Web Search Different users but similar behavior patterns.
Problem Definition Optimization of web search results ranking Ranking algorithms mainly based on similarity Similarity between query keywords and page text keywords Similarity between pages (Page Rank) No consideration for user personal preferences
» Goa Tourist Destination in India... I want to visit Room for improvement by incorporating user behaviour data: user feedback Use of previous implicit search information to enhance result ranking
Types of user feedback Explicit feedback User explicitly judges relevance of results with the query Costly in time and resources Costly for user limited effectiveness Direct evaluation of results Implicit feedback Extracted from log files Large amount of information Indirect evaluation of results through click behaviour
Implicit feedback (Categories) Clicked results Absolute relevance: Clicked result -> Relevant Risky: poor user behaviour quality Percentage of result clicks for a query Frequently clicked groups of results Links followed from result page Relative relevance: Clicked result -> More relevant than non-clicked More reliable Time Between clicks E.g., fast clicks maybe bad results Spent on a result page E.g., a lot of time relevant page First click, scroll E.g., maybe confusing results
Implicit feedback (Categories) Query chains: sequences of reformulated queries to improve results of initial search Detection: Query similarity Result sets similarity Time Connection between results of different queries: Enhancement of a bad query with another from the query chain Comparison of result relevance between different query results Scrolling Time (quality of results) Scrolling behaviour (quality of results) Percentage of page (results viewed) Other features Save, print, copy etc of result page maybe relevant Exit type (e.g. close window poor results)
Joachims approach Clickthrough data Relative relevance Indicated by user behaviour study Method Training of svm functions Training input: inequations of query result rankings Trained function: weight vector for features examined Use of trained vector to give weights to examined features Experiments Comparison of method with existing search engines
Search Engine Logs Where is Web page of ICDM 2002? ICDM ICDM 02 ICDM 2002 Search Engine Logs 1 Query Terms: ICDM ICDM 02 ICDM 2002 3 Click through 2 URLs: http://kis.maebashi-it.ac.jp/icdm02/ http://www.wi-lab.com/icdm02 http://www.computer.org/.../pr01754.htm
Clickthrough Data Clickthrough data can be thought of as triplets (q,r,c) the query q the ranking r presented to the user E.g., q the set c of links the user clicked on support vector machine r c link1 link3 link7 Clickthough data provide users feedback for relevance judgment
A Mechanism to Record Clickthrough Data query-log: the query words, the presented ranking click-log: query-id, clicked URL (via a proxy server) This process should be made transparent to the user This process should not influence system performance
The Problem to start with Which search engine provides better results: Google or MSNSearch? A problem of statistical inference: hypothesis test Users are only rarely willing to give explicit feedback Clickthrough data seem to provide users implicit feedback; Is them suitable for relevance judgment? I.e., Click Relevance?
EXP1: Regular Clickthrough Data Clicks heavily depend on the ranking (presentation bias)
EXP2: Unbiased Clickthrough Data The criteria to get unbiased clickthrough data for comparing search engines Blind test: The interface should hide the random variables underlying the hypothesis test to avoid biasing the user s response Click preference: The interface should be designed so that a click demonstrates a particular judgment of the user Low usability impact: The interface should not substantially lower the productivity of the user
EXP2: Unbiased Clickthrough Data Top l links of the combined ranking containing the top ka and kb links from rankings A and B; ka-kb 1.
Computing the Combined Ranking
Experiment Google v.s. MSNSearch Experiment data gathered from 3 users, 9/25 10/18, 2001 180 queries and 211 clicks (1.17 clicks/query, 2.31 words/query) The top k links for each query are manually judged for the relevance Questions to examine: Does the clickthrough evaluation agree with the manual relevance judgments? Click Preference?
Theoretical Analysis: Assumption 1 Intuitively, this assumption formalizes that users click on a relevant link more frequently than on a non-relevant link by a difference of ε.
Theoretical Analysis: Assumption 2 Intuitively, the assumption states that the only reason for a user clicking on a particular link is due to the relevance of the link, but not to other influence factors connected with a particular retrieval function.
Statistical Hypothesis Test Two-tailed paired t-test binomial sign test Please refer to the paper if you have interest
Clickthrough vs. Relevance Google vs. MSNSearch (77% vs. 63%) Google vs. Default (85% vs. 18%) MSNSearch vs. Default (91% vs. 12%)
Is Assumption 1 Valid? Assumption 1: User clicks on more relevant links than non-relevant links on average
Is Assumption 2 Valid?
An Illustrative Scenario
Click Absolute Relevance Judgment Clickthrough data as a triplet (q,r,c) The presented ranking r depends on the query q as determined by the retrieval function of search engine The set c of clicked-on links depends on both query q and the presented ranking r E.g., Highly ranked links have advantages to be clicked A click on a particular link cannot be seen as an absolute relevance judgment
Clickthrough data (Studies) Why not absolute relevance? User behaviour influenced by initial ranking Study: Rank and viewership Percentage of queries where a user viewed the search result presented at a particular rank Conclusion: Most times users view only the few first results
Ranking and NOT Classification Classification into Relevant or non-relevant results Neither possible nor feasible Ranking should be used. SVM-light
Click = Relative Preference Judgment Assuming that the user scanned the ranking from top to bottom E.g., c = {link1,link3,link7} (r*: the ranking preferred by the user) link3 <r* link2 link7 <r* link2 link7 <r* link4 link7 <r* link5 link7 <r* link6
A Framework for Learning Retrieval Fun. r* is optimal ordering, r f(q) is the ordering of retrieval function f on query q; r* and r f(q) are binary relations over D D, where D={d1,,dm} is the document collection e.g., di <r dj, then (di,dj) r Kendall s τ (vs. average precision) P: # of concordant pairs Q: # of discordant pairs For a fixed but unknown distribution Pr(q,r*), the goal is to learn a retrieval function with the maximum expected Kendall s τ
An SVM Algo. for Learning Ranking Fun. Given an independently and identically distributed training sample S of size n containing queries q with their target ranking r* The learner will select a ranking function f from a family of ranking functions F that maximize the empirical τ
The Ranking SVM Algorithm Consider the class of linear ranking functions w is a weight vector that is adjusted by learning, Φ(q,d) is a mapping onto features describing the match between q and d The goal is to find a w so that max number of following inequalities is fulfilled
The Categorization SVM Learning a hypothesis h such that P ( error( h)) train _ error( h) + complexity( h) Example: h( d) = sign{ w d + b} = + 1 if w d + b > 1 else 0 + + w + + + + Learning Optimization problem 1 Minimize: w w 2 so that: i : y [ w d + b] i i 1 θ where y i = +1 ( 1) if d i is in class + ( )
The Ranking SVM Algorithm (cont.) Optimization Problem 1 is convex and has no local optima. By rearranging the constraints as A classification SVM on vectors
Experiment Setup: Meta Search Google MSNSearch Striver Excite Altavista Hotbot
Features
Experiment Results
Learned Weights of Features
Open issues Trade-off between amount of training data and homogeneity Clustering algorithms to find homogenous groups of users Adaptation to the properties of a particular document collection. Shipping off-the-shelf SE that learns after deployment Incremental online learning/feedback algorithm Protection from spamming??? Personalizing my search choices!!!
Possible extensions Utilization of time feedback How long was the user browsing a clicked page Other types of feedback Scrolling Exit type Combination with absolute relevance clickthrough feedback Percentage of result clicks for a query Links followed from result page Query chains Improvement of detection method Link association rules For frequently clicked groups of results Query/links clustering Constant training of ranking functions
Conclusions The first work (evaluating search engines) is crucial The feasibility of using clickthrough data in evaluating retrieval performance has been verified The clickthrough data (less effort) perform as well as manual relevance judgment (more effort) in this task. The second (SVM) shows an interesting work on clickthrough data Negative comments The approaches have not been justified in a larger scale, so whether the techniques are workable in real cases is still uncertain. That perhaps is the reason that though the paper has been out since 2002, Google is still in business...
Discussion Is MILN similarly capable???