Ad Retrieval Systems in vitro and in vivo: Knowledge-Based Approaches to Computational Advertising Evgeniy Gabrilovich gabr@yahoo-inc.com 1
Thank you! and the KSJ Award Panel My colleagues and collaborators My Ph.D. advisor, Prof. Shaul Markovitch 2
Outline Why advertising? Why computational o a? A brief history of approaches to Web advertising Research questions In vitro: adaptation of classical IR to ad retrieval In vivo: lessons learned from observing how advertisers create and how users interact with ads Future research directions Better targeting via modeling of user behavior Fully automated ad creation Technologies described might or might not be in actual use at Yahoo! 3
What changed in 100 years: measurability and reach Half the money I spend on advertising is wasted; the trouble is I don't know which half. -- John Wanamaker (attributed) [1838-1922] No more coupon codes Flexible ad targeting + conversion tracking Experimentation rules! 4
What is Computational Advertising? A new scientific sub-discipline that provides the foundation for building online ad retrieval platforms Find the optimal ad for a given user in a specific context Information retrieval Microeconomics Computational Advertising Statistical modeling & machine learning Large-scale text analysis 5
US online advertising spending (source: emarketer.com, November 2010) Year Online Online % of total media 2009 $22.7B 13.9% 2010 $25.8B 15.3% 2011 $28.5B 16.7% 2012 $32.6B 18.3% 2013 $36.0B 19.8% 2014 $40.5B 21.5% 6
Not all ads are commercial! Ray Bradbury (CHI 2009, Hoffmann et al.) 7
Ads as another source of content for enriching Web search results I do not regard advertising as entertainment or an art form, but as a medium of information. 8
Textual advertising Sponsored Search Ads driven by search keywords aka a.k.a. keyword driven ads or paid search Content Match Ads driven by the content of a web page a.k.a. context driven ads or contextual ads Textual advertising on the Web is strongly related to NLP and information retrieval 9
Sponsored search example Textual ads driven by keyword search 10
Content match example 11
Content match example #2 12
Anatomy of an ad Tutorial at SIGIR 2010 Information Retrieval Challenges in Computational Advertising research.yahoo.com/tutorials/sigir Title Creative Display URL Landing URL http://research.yahoo.com/sigir10_compadv com/sigir10 compadv Bid phrases: {SIGIR 2010, computational advertising, Evgeniy Gabrilovich,...} Bid: $0.10 13
Beyond keyword matching Matching ads is relatively simple for explicitly bid keywords Exact match Covering only those is not enough advertisers need volume! Broad match (or advanced match) Suppose your ad is Low prices on Seattle hotels Naïve approach: bid on all queries that contain the word Seattle Problems Seattle's Best Coffee Chicago Alaska cruises start point Ideally: bid on queries related to Seattle as a travel destination The system should facilitate concept-level ad matching 14
An ultra-brief history of approaches to Web advertising The old school: database-style ad matching Exact match (query = bid phrase) Broad match via query rewrites Content match: reduce the problem to exact match (extract bid phrases from pages) Essentially record lookup Learning to rank The new approach: knowledge-based ad retrieval Elaborate query expansion Ad indexing and scoring using all the info available Bid phrases, title, creative, URL, landing page, etc. Akin to document indexing in IR 2 nd pass relevance reordering (re-ranking) Using features not available to the 1 st pass model (e.g., setlevel features, click history) 15
Research questions How to index the ad corpus? How to select relevant ads? Should we show ads at all? What really happens after an ad click? What is the interplay between the organic and sponsored results? Can we generate entire ad campaigns automatically? 16
Feature generation for ad retrieval How to select relevant ads Problem: Ads are short, queries are even shorter (Future work: how to circumvent this inherent limitation) Solution: Expand them via feature construction Query expansion using Web search results SIGIR 2007, w. Broder et al.; ACM TWEB 2009, Gabrilovich i et al. Ad expansion (bid phrases as queries) SIGIR 2008, w. Radlinski et al. Next: Interaction-based features & more 17
Query expansion using Web search results How to select relevant ads Humans often find it hard to readily see what the query is about But looking at the search results makes things easy Let computers do the same thing Infer query intent from the top algorithmic search results Our approach: cross-corpora pseudo relevance feedback Construct additional features from Web search results to retrieve better ads 18
How to select relevant ads Example: Davy Byrne's CATEGORIES 1.Travel/Destinations/Europe/ Ireland/Dublin 2. Entertainment/Bars and Night Clubs/Brew Pubs 19
If we know it is about Dublin pubs, then ads are easy to find How to select relevant ads 20
How to select relevant ads 1 st pass retrieval Enriched feature space beyond the bag of words (SIGIR 2007, Broder et al.; CIKM 2008, w. Broder et al.) Feature generation to help bridge the lexical mismatch Candidate ads to be re-ranked 21
Learning when (not) to advertise (CIKM 2008, w. Broder et al.) We do not have to show ads! Roughly half of the queries have no ads Should we show ads at all Repeatedly showing non-relevant ads can have detrimental long-term effects Modeling actual (short- and long-term) costs of showing non-relevant ads is very difficult We want to predict when (not) to show ads 22
Should we show ads at all Two approaches: Thresholding vs. Machine Learning Global threshold on relevance scores of individual ads Only show ads with scores above the threshold Problem: Scores are not necessarily comparable across queries! Learn a binary yprediction model for sets of ads Features defined over sets of ads rather than individual ads Relevance (word overlap, cosine similarity between ad and query/page etc.) Result set cohesiveness (coefficient of variation of ad scores, result set clarity, entropy) 23
Incorporating click history (WSDM 2010, Hillard et al.) Should we show ads at all Binary classifier (relevant / non-relevant ads) Baseline: text overlap features (query/ad) Click history (query/ad) with back-off Click propensity in query/ad translation Bayes rule ) p ( D Q ) p ( Q D ) p ( D ) / p ( Q) p( Q trans( q m n IBM D) Model trans( 1 j d i ) j 0 i 0 q logs count( q logs q j d i ) count( q j d j i ) d Counting clicks for query/ad word pairs i ) Cold start (i.e., no click history) is OK Using click data to overcome synonymy Query = running gear Ad = Best jogging shoes Results Query coverage 9% Ads per query 12% CTR 10% Same # clicks on fewer ads 24
Should we show ads at all Incorporating multi-modal interaction data (SIGIR 2010, Guo & Agichtein) i Ready to buy or just browsing? Classifying research- and purchase-oriented sessions Inferring eye gaze position from observable bl actions Keystrokes, GUI (scroll/click), mouse movement, browser (new tab, close, back/forward) Research vs. purchase classification (in lab): F1 = 0.96 Ad clickthrough in sessions classified as Purchase > 2X compared to sessions classified as Research Predicting future ad clicks: F1 = 0.07 0.17 (+141%) 25
How to index the ad corpus Semantically-aware ad indexing and retrieval (WWW 2010, w. Bendersky et al.) 26
Structure of online ad campaigns: the ad schema How to index the ad corpus Advertiser Buy appliances on Black Friday New Year deals on lawn & garden tools Account 1 Account 2 Kitchen appliances Creatives Brand name appliances Compare prices and save money www.appliances-r-us.com Ad group 1 Ad Campaign 1 Ad group 2 Bid phrases { Miele, KitchenAid, Cuisinart, } Campaign 2 What is the right approach to document length normalization? Use information from higher levels to alleviate data sparsity at children nodes What is the appropriate p indexing unit? 27
How to index the ad corpus Possible indexing approaches 1. Term index (Cartesian product of all creatives and bid terms) Huge index, small focused documents 2. Creative index (a creative is coupled with all the bid terms in the ad group) Two-stage retrieval (first choose the creative, then pick the term) Bid terms are duplicated across creatives 3. Ad group index Indexing units are entire ad groups Three-stage retrieval first choose the ad group then the creative and finally pick the term Most compact index 28
How to index the ad corpus Retrieval e speed vs. relevance e Are we trading effectiveness ect e ess for efficiency? Term index yields most relevant ads, yet is least efficient (20x slower than the ad group index) Ad group index is most efficient (2x faster than creative index), yet least effective 29
How to index the ad corpus Using learning to rank techniques: structured re-ranking ranking Step 1: Retrieve an initial set of candidates using the ad group index Step 2: Re-rank the candidate set using structural features (don t ignore the structure by scoring creatives and terms independently) Ad group score, creative-term term pair score # bid terms in the ad group Unigram entropy (cohesiveness) of the ad group Ratio of query words covered by the ad group text Fraction of the titles / terms / URLs that contain at least one query term Other features are possible! More effective than the term index Still very efficient! 30
What happens after an ad click What happens after an ad click? Quantifying the impact of landing pages in Web advertising (CIKM 2009, w. Becker et al.) 31
What happens after an ad click Conceptually: context transfer Search engine result page Click! Landing page Conversion (e.g., purchase of the product or service being advertised) User s activity on the advertiser s Web site 32
What happens after an ad click All landing pages are not created equal (and neither are the corresponding conversion rates) Homepage Main page of the advertiser s site 25% Category Main page of a sub-section of the browse advertiser s site, which describes a 38% category of related products Search Search within the advertiser s site transfer OR on other Web pages 26% Other Terminal pages (e.g., promotion 11% pages or forms) 33
Homepage 25% Category browse 38% Search transfer 26% Other 11% What happens after an ad click Taxonomy of landing pages: examples 34
Homepage 25% Category browse 38% Search transfer 26% Other 11% What happens after an ad click Taxonomy of landing pages: examples 35
Homepage 25% Category browse 38% Search transfer 26% Other 11% What happens after an ad click Taxonomy of landing pages: examples 36
What happens after an ad click Using the landing page taxonomy Picking the right landing page type for each ad Improving the conversion rate Improving advertisers ROI! 37
What happens after an ad click Landing page type usage vs. conversion: breakdown by query frequency Observed conversion rates are in sharp contrast with usage frequency of the different page types! 38
What happens after an ad click Landing page type usage vs. conversion: breakdown by query price As the price goes up, so does the conversion rate Again, observed conversion rates are in sharp contrast with usage frequency across a range of query prices! 39
What is the interplay... Competing for users attention: On the interplay between organic and sponsored search results (WWW 2010, w. Danescu-Niculescu-Mizil et al.) 40
What is the interplay... Ads as another source of content for enriching Web search results 41
What is the interplay... The interplay between ads and organic results... What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it. Herbert Simon, Designing Organizations for an Information-Rich World, 1971 Is there competition for clicks between ads and organic results? Do users prefer ads that are similar to the organic results, or do they prefer diversity it? We found that the answers to these questions depend on the type of the query See also Ashkan et al. (ECIR 2009, CIKM 2009) 42
What is the interplay... Relation between the CTR of ads and the CTR of organic results Negative correlation (competition) Users are only willing to spend limited time and effort on each query Positive correlation (depends on the quality of results) Easy query ( online radio ) decent ads and organic results clicks on both Hard query ( what should I do today? ) poor results on both sides no clicks on either Independence (null hypothesis) Users consider ads and organic results as two independent sources of information o 43
What is the interplay... Findings: competition + positive correlation 44
Decoupling the forces What is the interplay... Users are willing to invest limited effort in each query competition In order to single out the competition effect, we tried to explicitly model the amount of effort the user is willing to invest Low effort = navigational i queries [Broder, 2002] (27% of queries) Pandora radio, Bank of America High effort = non-navigational queries Meaning of life, academia vs. industry 45
What is the interplay... Competition clearly exists for navigational queries We also examined different degrees of navigationality: the less navigational the query, the less competition t o we observeded 46
What is the interplay... Another viewpoint: Do users prefer ads that are more similar to the organic results or more diverse ads? Well, both... So we need to dig deeper again 47
Let s break down by navigationality again What is the interplay... 48
Counterintuitive? What is the interplay... 49
What is the interplay... Responsive and incidental ads Responsive ads Directly address the user s information need Likely to be similar to the organic results Incidental ads Tangentially related to the user s information need Unreasonable as organic results, but ok for ads Likely to be different from the organic results Example: query = free internet radio Responsive Incidental Pandora Internet Radio Discount Bose Computer Speakers 50
Now it all makes sense... What is the interplay... 51
Research frontier: towards automatic ad generation Title Tutorial SIGIR 2010 and Textatsummarization I f Information ti generation Retrieval R t iti l given Challenges Ch iin NL i ll Creative Rich Computational Advertising a rich product description product Full text indexing Display URL research.yahoo.com/tutorials/sigir (possibly personalized) feed based b d on the h entire i Landing URL product description Use mono-lingual MT to generate bid phrases from the product description [w. Ravi et al., 2010] Bid phrases: h {SIGIR 2010 2010, Bid generationadvertising, for advanced computational match on exact match Evgeniy g based y Gabrilovich,...}} bids andd conversion i data d t Bid: $0.10 [w. Broder et al., 2011] ECIR Yahoo! Research 2010 2011 http://research http://research.yahoo.com/sigir10_compadv compadv Matching i yahoo thecom/sigir10 landing i page to the query type ((to maximize conversion)) [w. Becker et al., 2009] 52
Key messages 1. Computational advertising = A principled way to find the best match between a user in a context and a suitable ad 2. Advertising is a form of information 3. Finding the best ad is an IR problem Classical IR needs significant adaptation 4. Plenty to be learned, problems are far from solved! Vibrant research area Numerous open research questions in other areas of Comp. Adv. audience selection, guaranteed delivery, personalization, more interactive ads,... 53
Thank you! gabr@yahoo-inc.com http://research.yahoo.com/~gabr 54
This talk is Copyright Yahoo! 2011. Yahoo! and the Author retain all rights, including copyright and distribution rights. No publication or further distribution in full or in part is permitted without explicit written permission. The opinions expressed herein are the responsibility of the author and do not necessarily reflect the opinion of Yahoo! Inc. This talk benefitted from the contributions of many colleagues and co-authors at Yahoo! and elsewhere. Their help is gratefully acknowledged. 55