A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology
Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
Search engine marketing
Generalized second-price auction
Search advertising demo
Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
Why do we need click prediction? Revenue is highly influenced by click probability prediction. Search engines rank ads with expected revenue E[revenue] = P ad (click) GSP(ad)
How to predict click behavior? Click-through logs help! Figure: Ranking presented for the query support vector machine
How to predict click behavior? To predict clicks by counting! P ad (click) = # of clicks # of impressions However, that is far from satisfaction clicks are biased due to the user browsing behavior long tail and cold start problems
How to predict click behavior? Long tail and cold start problems
How to predict click behavior? Long tail and cold start problems
Long tail query demo: Google vs. Bing
A unified framework for click modeling Problem definition Definition 1: (Click modeling) Let random variable u denotes a user, q denotes a query issued by the user, a denotes an ad, r is the position of the ad. The binary variable c is 1 if the ad is clicked and 0 otherwise. Let L denotes the impression list and S denote the click sequence. Click modeling aims to explain observed click events. The shorthand is: P(c, q, a, u, r, L, S) Goals of click modeling 1 To estimate the actual ad relevance from biased click-through logs 2 To predict P(c = 1 q, a, u, r, L, S) for future impressions
An overview of click models Hypotheses in click modeling To model click events, we have to incorporate proper browsing hypotheses (i.e., generative process). The main hypotheses include: Unbiased hypothesis: P(c q, a, u, r, L, S) = P(c q, a) Position bias hypothesis: P(c q, a, u, r, L, S) = P(c q, a, r) Depend on click pattern: P(c q, a, u, r, L, S) = P(c q, a, r, S) : P(c q, a, u, r, L, S) = P(c q, a, r, L) Depend on user intent: P(c q, a, u, r, L, S) = P(c q, a, u, r)
Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
Unbiased hypothesis Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
Unbiased hypothesis Unbiased hypothesis: Basic hypothesis Basic hypothesis In the basic hypothesis, there is no bias associated with the observed clicks. This leads to the simplest model: P(c q, a, u, r, L, S) = P(c q, a) Remark In the basic hypothesis, the click probability is dominated by the relevance between query q and ad a.
Position bias hypothesis Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
Position bias hypothesis Position bias hypothesis: Examination hypothesis Examination hypothesis (WWW 07, Richardson et al.) Examination hypothesis assumes that an ad be clicked must be both examined (i.e. e = 1) and relevant: P(c = 1 q, a, u, r, L, S) =P(c = 1 q, a, r) Independence assumption = P(c = 1 e, q, a, r)p(e q, a, r) e {0,1} =P(c = 1 e = 1, q, a)p(e = 1 r) Examination hypothesis Novelty: The first attempt to model position bias
Position bias hypothesis Position bias hypothesis: Examination hypothesis Examination hypothesis (WWW 07, Richardson et al.) The position bias P(e = 1 r) can be experimentally measured by presenting users with the same ad at various positions on the page, and observing the user clicks. Remark In the examination hypothesis, the position bias is modeled with the query-independent examination probability P(e r) and eliminated from the relevance estimation.
Depend on click pattern Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
Depend on click pattern Depend on click pattern: Cascade hypothesis Cascade hypothesis (WSDM 08, Carswell et al.) Cascade hypothesis assumes that an user scans each ad sequentially without any skips until she clicks on an ad and does not examine any additional ads after the click: P(e 1 = 1) = 1 P(e i = 1 e i 1 = 0) = 0 P(e i = 1 e i 1 = 1) = 1 c i 1 Novelty: The first attempt to model click pattern
Depend on click pattern Depend on click pattern: Cascade hypothesis Cascade hypothesis (WSDM 08, Carswell et al.) The probability of a click sequence with kth ad being clicked is: P(c = 1 r = k, q, a, u, L, S) =P(c = 1 r = k, q, a, L, S) Independence assumption k 1 =P(c = 1 r = k, q, a) P(c = 0 r = i, q, a) i=1 Cascade hypo. Remark This model is quite restrictive since it allows at most one click per query session.
Depend on click pattern Depend on click pattern: Multiple-click model Multiple-Click Model (WSDM 09, Guo et al.) Novelty: To enable multiple clicks in a session by incorporating a decision phase for continuing examining results. Figure: The user model of dependent click model
Depend on click pattern Depend on click pattern: Multiple-click model Multiple-click model (WSDM 09, Guo et al.) The probability of examination and click is given by: P(e = 1 r = 1) = 1 P(c = 1 r = i) = P(e = 1 r = i)p(c = 1 e = 1, r = i) P(e = 1 r = i + 1) = λ i P(c = 1 r = i) + P(c = 0 r = i) The probability of a click sequence with kth ad being clicked is: P(c = 1 r = k, q, a, u, L, S) =P(c = 1 r = k, q, a, L, S) Independence assumption k 1 =P(c = 1 r = k, q, a) λ i P(c = 1 r = i, q, a) + P(c = 0 r = i, q, a) i=1
Depend on click pattern Depend on click pattern: Dynamic Bayesian Network Dynamic Bayesian Network (WWW 09, Chapelle and Zhang) Novelty: The first attempt to model post click pattern Key idea: Model both post and perceived relevance Figure: The DBN used for clicks modeling.
Depend on click pattern Depend on click pattern: Dynamic Bayesian Network Dynamic Bayesian Network (WWW 09, Chapelle and Zhang) The following equations describe the model: A i = 1, E i = 1 C i = 1 P(A i = 1) = a u P(S i = 1 C i = 1) = s u C i = 0 S i = 0 S i = 1 E i+1 = 0 P(E i+1 = 1 E i = 1, S i = 0) = γ E i = 0 E i+1 = 0 where γ is the probability that an user examines the next result if she is not satisfied with the current result.
Depend on click pattern Depend on click pattern: Dynamic Bayesian Network Experiment: Data: 58,000,000 sessions and 682,000 unique queries from the click logs of the UK market X-axis: # of training sessions occurred at Position 1 Y-axis: MSE between the true CTRs and predicted CTRs
Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
: Temporal click model Temporal click model (SIGIR 09, Xu et al.) Key idea: (Externality) An ad may receive fewer clicks when co-displayed with high quality ads. Novelty: The first attempt to model ad externality Data study 1 Data: Ad impression sequences with exactly two ads. Two data sets are constructed by collecting one month (tens of millions) ads shown on north and south, respectively. Ground truth: Empirical CTR as the measure of ad quality. Experiment Setting: Group impressions with similar ad quality at Position 1 into one bin and plot the average CTR at Position 2, and vice versa.
: Temporal click model
: Temporal click model Data study 2 Experiment Setting: Group impressions with similar CTR at Position 1 in one bin and plot the percentage of events where the first click occurred at Position 2 and vice versa.
: Temporal click model Figure: The first click influenced by ad quality
: Temporal click model Temporal click model (SIGIR 09, Xu et al.) Positional rationality hypothesis: 1 Users examine both ads together to assess their qualities, 2 If the ad at Position 2 is much better than ad at Position 1, users would click the ad at Position 2 first
: Temporal click model The proposed method Input: click-through log of ad impression sequence A =< a 1, a 2 >. Output: the predicted CTR of ads. Generative process:
: Temporal click model Graphical model: E: examination variable, E {0, 1} R a : ad quality variable, R a [0, 1] U a : position bias variable, U a [0, 1] F: random variable for the first pick, F {a 1, a 2 } S: random variable for the re-pick, S {a 1, a 2 } C i : click random variable for ith click, C {0, 1}
: Temporal click model Experiments Data set: 0.3 million unique queries and 0.1 billion sessions shown at north. 1.1 million queries and 0.65 billion sessions shown at south. Evaluation: MSE between true CTRs and predicted CTRs. Baselines: 1) Naive CTR statistics (NS) estimates CTR by counting, and 2) Bayesian browsing model (BBM) (KDD 09 Liu et al.)
: Temporal click model
: Temporal click model Experiment results Both TCM and BBM are significantly better than NS for all query frequencies. TCM is noticeably better than BBM on less-frequent queries but shows similar performance on frequent ones.
: Relational click prediction Relational click prediction (WSDM 12, Xiong et al.) Key idea: Click events would be influenced by the similarity between co-displayed Ads. Novelty: The first attempt to model similarity influence Figure: Two ad lists for query itunes account.
: Relational click prediction Data study Data: 0.7 million unique queries and 0.6 million unique ads from one month click logs Experiment setting: 1 Group ads into a specific context, i.e. a triple T =< q, a, r >, where q, a and r represent query, ad and position respectively. 2 Select a triple T that appears in multiple pageviews, i.e. l =< q, ad list >. 3 Calculate similarity between ad a and other ads in each l. 4 Compute empirical CTR of T in each l, and compare them with the average CTR of T on all pageviews. Evaluation: CTR T,l = CTR T,l CTR T CTR T
: Relational click prediction X-axis denotes the similarity between a and other co-displayed ads. Y-axis denotes the average CTR l for different triples.
: Relational click prediction Data study CTR l is negatively correlated with the similarity between surrounding ads. The intuition is that: When the surrounding ads are similar to the given ad in their contents (or topics), it is likely that they will distract user s attention.
: Relational click prediction The proposed method Key idea: Modeling ads in an ad list together instead of treating them independently. (P(c q, a, u, L, r = 1),, P(c q, a, u, L, r = n)) T = F(X, R) where X = {x 1,, x n } includes all the feature vectors x i extracted from < q, a, u, L, r = i >, and R encodes the relation between ads.
: Relational click prediction Graphical model: Figure: A continuous CRF model for relational click prediction.
: Relational click prediction Let Y = {y 1,, y n } denotes the predicted CTRs of ads. The probability distribution of output Y conditioned on input X is defined as P(Y X) = 1 Z(X) exp h(y i, X; w) + βg(y i, y j, X) i j>i where h is the vertex feature function representing the dependence between CTR and input feature vectors, g is the edge feature function representing pairwise relationship between ads.
: Relational click prediction Individual modeling For simplicity, they define the vertex feature function as follows, h(y i, X; w) = (y i f (x i ; w)) 2 where f (x i ; w) is the output of any conventional click model. Relational modeling As discussed, if two ads are very similar to each other, their click probabilities will both become lower. To encode this intuition, they define the edge feature function as below. g(y i, y j, X) = s i,j (y i + y j ), where s i,j is the term similarity between ads i and j.
: Relational click prediction The whole model By combining all the feature functions, we obtain the overall conditional probability distribution: P(Y X) = 1 Z(X) exp (y i f (x i ; w)) 2 + βs i,j (y i + y j ) j>i i
: Relational click prediction Experiments Data set: 0.7 million unique queries and 0.6 million unique ads from one month click logs Extracted features: history COEC, relevance of ad to query, attractiveness of ad title and description, reputation of advertiser, etc. Baselines: Logistic Regression (LOCAL) and an variant of the proposed method with no edge features being used. Evaluation: MSE between true CTR and predicted CTR
: Relational click prediction The proposed method significantly outperform baselines. It performs better in lower positions than higher positions which consists with the cascade assumption. Figure: Results of NMSE.
: Relational click prediction They further study the performance of click prediction with respect to different levels of similarities in the ad lists. When the similarity between ads increases, the performance of CRF also increases. Figure: NMSE results at different similarity levels.
Depend on user intent Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
Depend on user intent Depend on user intent: Task-centric click model Task-centric click model (KDD 11, Zhang et al.) Novelty: The first attempt to model user behavior across multiple query sessions. Key ideas: Users tend to express their information needs incrementally, and click fresh documents that are not included before
Depend on user intent Depend on user intent: Task-centric click model Task-centric click model (KDD 11, Zhang et al.) Figure: The macro model of TCM.
Depend on user intent Depend on user intent: Task-centric click model Task-centric click model (KDD 11, Zhang et al.) Figure: The micro model of TCM.
Depend on user intent Depend on user intent: Task-centric click model Graphical model:
Depend on user intent User intent demo
Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models Unbiased hypothesis Position bias hypothesis Depend on click pattern Depend on user intent 4 Future work
Future work Click modeling for long tail query, crowdsourcing? Automatic feature construction, deep learning? Evaluation metrics Click modeling for web search in mobile device very different user browsing behavior may be totally different business model
Reference WWW 07, Richardson et al.: Predicting clicks: estimating the click-through rate for new ads WSDM 08, Carswell et al.: An experimental comparison of click position-bias models WSDM 09, Guo et al.: Efficient multiple-click models in web search WWW 09, Chapelle and Zhang: A dynamic bayesian network click model for web search ranking SIGIR 09, Xu et al.: Temporal click model for sponsored search WSDM 12, Xiong et al.: Relational click prediction for sponsored search KDD 11, Zhang et al.: User-click modeling for understanding and predicting search-behavior
Thanks