Measurng Ad Effectveness Usng Geo Experments Jon Vaver, Jm Koehler Google Inc Abstract Advertsers have a fundamental need to quantfy the effectveness of ther advertsng For search ad spend, ths nformaton provdes a bass for formulatng strateges related to bddng, budgetng, and campagn desgn One approach that Google has successfully employed to measure advertsng effectveness s geo experments In these experments, non-overlappng geographc regons are randomly assgned to a control or treatment condton, and each regon realzes ts assgned condton through the use of geo-targeted advertsng Ths paper descrbes the applcaton of geo experments and demonstrates that they are conceptually smple, have a systematc and effectve desgn process, and provde results that are easy to nterpret 1 Introducton Every year, advertsers spend bllons of dollars on onlne advertsng to nfluence consumer behavor One of the benefts of onlne advertsng s access to a varety of metrcs that quantfy related consumer behavor, such as pad clcks, webste vsts, and varous forms of conversons However, these metrcs do not ndcate the ncremental mpact of the advertsng That s, they do not ndcate how the consumer would have behaved n the absence of the advertsng In order to understand the effectveness of advertsng, t s necessary to measure the behavoral changes that are drectly attrbutable to the ads A varety of expermental and observaton methods have been developed to quantfy advertsng s ncremental mpact (see [1], [5], [3], [2]) Each method has ts own set of advantages and dsadvantages Observatonal methods of measurement mpose the least amount of dsrupton on an advertser s ongong campagns In an observatonal study ad effectveness s assessed by observng consumer behavor n the presence of the advertsng over a perod of tme The analyses assocated wth these studes tend to be complex, and ther results may be vewed wth more skeptcsm, because there s no control group That s, a statstcal model s used to nfer the behavor of a comparable set of consumers wthout ad exposure, as opposed to drectly observng ther behavor va an unexposed control group At Google, observatonal methods have been used to measure the ad effectveness of dsplay advertsng n the Google Content Network [1] and Google Search [2] The most rgorous method of measurement s a randomzed experment One applcaton of randomzed experments that s used to analyze search ad effectveness s a traffc experment At Google, these are performed usng the AdWords Campagn Experments (ACE) tool [3] In these experments, each ncomng search s assgned to a control or treatment condton and the subsequent user behavor assocated wth each condton s compared to determne the ncremental mpact of the advertsng These experments are very effectve at provdng an understandng of consumer behavor at the query level However, they do not account for changes n user behavor that occur further downstream from the search 1
2 GEO EXPERIMENT DESCRIPTION For example, converson level behavor may nvolve multple searches and multple opportuntes for ad exposures, and a traffc experment does not follow ndvdual users to track ther ntal control/treatment assgnment or observe ther longer-term behavor An alternatve approach s to vary the control/treatment condton at the cooke level In a cooke experment, each cooke belongs to the same control/treatment group across tme However, ad servng consstency s stll a concern wth cooke experments because some users may have multple cookes due to cooke churn and ther use of multple devces to perform onlne research Cooke experments have been used at Google to measure dsplay ad effectveness [5] Ths paper descrbes one addtonal method for measurng ad effectveness; the geo experment In these experments, a regon (eg country) s parttoned nto a set of geographc areas, whch we call geos These geos are randomly assgned to ether a treatment or control condton and geo-targetng s used to serve ads accordngly A lnear model s used to estmate the return on ad spend 2 Geo Experment Descrpton Onlne advertsng can mpact a varety of consumer behavors In ths paper, we refer to the behavor of nterest as the response metrc The response metrc mght be, for example, clcks (pad as well as organc), onlne or offlne sales, webste vsts, newsletter sgn-ups, or software downloads The results of an experment come n the form of return on ad spend (ROAS), whch s the ncremental mpact that the ad spend had on the response metrc For example, the ROAS for sales ndcates the ncremental revenue generated per dollar of ad spend Ths metrc ndcates the revenue that would not have been realzed wthout the ad spend A geo experment begns wth the dentfcaton of a set of geos, or geographc areas, that partton a regon of nterest For a natonal advertser, ths regon may be an entre country There are two prmary requrements for these geos Frst, t must be possble to serve ads accordng to a geographcally based control/treatment prescrpton wth reasonable accuracy Second, t must be possble to track the ad spend and the response metrc at the geo level Ad servng nconsstency s a concern due to fnte ad servng accuracy, as well as the possblty that consumers wll travel across geo boundares The locaton and sze of the geos can be used to mtgate these ssues It s not generally feasble to use geos as small as, for example, postal codes The generaton of geos for geo experments s beyond the scope of ths paper In the Unted States, one possble set of geos s the 210 DMAs (Desgnated Market Areas) defned by Nelson Meda, whch s broadly used as a geo-targetng unt by many advertsng platforms The next step s to randomly assgn each geo to a control or treatment condton Randomzaton s an mportant component of a successful experment as t guards aganst potental hdden bases That s, there could be fundamental, yet unknown, dfferences between the geos and how they respond to the treatment Randomzaton ensures that these potental dfferences are equally dstrbuted - statstcally speakng - across the treatment and control groups It also may be helpful to constran ths random assgnment n order to better balance the control and treatment geos across one or more characterstcs or demographc varables For example, we have found that groupng the geos by sze pror to assgnment can reduce the confdence nterval of the ROAS measurement by 10%, or more Each experment contans two dstnct tme perods: pretest and test (see Fgure 1) Durng the pretest perod there are no dfferences n campagn structure across geos (eg bddng strategy, keyword set, ad creatves, etc) In ths tme perod, all geos operate at the same baselne level and the ncremental dfferences between the treatment and control geos n the ad spend and response metrc are zero Durng the test perod the campagns for the 2 Google Inc Confdental and Propretary
3 LINEAR MODEL Fgure 1: Dagram of a geo experment Ad spend s modfed n one set of geos durng the test perod, whle t remans unchanged n another There may be some delay before the correspondng change n a response metrc s fully realzed treatment geos are modfed Ths modfcaton generates a nonzero dfferental n the ad spend n the treatment geos relatve to the control geos That s, the ad spend dffers from what t would have been f the campagn had not been modfed Ths dfferental wll be negatve f the campagn change causes the ad spend to decrease n the treatment geos (eg campagns turned off), and postve f the change causes an ncrease n ad spend (eg bds ncreased or keywords added) Ths ad spend dfferental wll generate a correspondng dfferental n the response metrc, perhaps wth some tme delay, ν Offlne sales s an example of a response metrc that s lkely to have a postve value of ν It takes tme for consumers to complete ther research, make a decson, and then vst a store to make ther purchase The test perod extends beyond the end of the ad spend change by ν to fully capture these ncremental sales 3 Lnear Model After an experment s executed, the results are analyzed usng the followng lnear model: y,1 = β 0 + β 1 y,0 + β 2 δ + ɛ (1) where y,1 s the aggregate of the response metrc durng the test perod for geo, y,0 s the aggregate of the response metrc durng the pretest perod for geo, δ s the dfference between the actual ad spend n geo and the ad spend that would have occurred wthout the experment, and ɛ s the error term Ths model s ft usng weghts w = 1/y,0 n order to control for heteroscedastcty caused by the dfferences n geo sze The frst two parameters n the model, β 0 and β 1, are used to account for seasonal dfferences n the response metrc across the pretest and test perods The parameter of prmary nterest s β 2, whch s the return on ad spend (ROAS) of the response metrc The values of y,1 and y,0 (eg offlne sales) are generated by the advertser s reportng system The geo level ad spend s avalable through Ad- Words If there s no ad spend durng the pretest perod then the ad spend dfferental, δ, requred by Equaton 1 s smply the ad spend durng the test perod However, f the ad spend s postve durng the pretest perod and s ether ncreased or decreased, as depcted n Fgure 1, then the ad spend dfferental s found by fttng a second lnear model: s,1 = γ 0 + γ 1 s,0 + µ (2) Here, s,1 s the ad spend n geo durng the test perod, s,0 s the ad spend n geo durng the pretest perod, and µ s the error term Ths model s ft wth weghts w = 1/s,0 usng only the control geos (C) Ths ad spend model characterzes the mpact of seasonalty on ad spend from the pretest perod to the test perod, and t s used as a counterfactual 1 to calculate the ad spend dfferental The ad spend dfferental n the control and treatment geos (T ) s found usng the followng prescrpton: δ = { s,1 (γ 0 + γ 1 s,0 ) for T 0 for C (3) The zero ad spend dfferental n the control geos reflects the fact that these geos contnue to operate at the baselne level durng the test perod 1 The counterfactual s the ad spend that would have occurred n the absence of the treatment Google Inc Confdental and Propretary 3
5 DESIGN 4 Example Results Return On Ad Spend for Clcks One ssue that s of prmary concern to advertsers s the potental cannbalzaton of cost-free organc clcks by pad search clcks (e users wll clck on a pad search lnk when they would have clcked on an organc search lnk) Although perhaps unlkely, t s also possble that the cooccurrence of a pad lnk and an organc lnk wll make an organc clck more lkely Cost per clck (CPC) does not provde the advertser wth a complete pcture of advertsng mpact because of competng effects such as these A more useful metrc s the cost per ncremental clck (CPIC), whch can be measured wth a geo experment One of Google s advertsers ran an experment to measure the effectveness of ther search advertsng campagn Durng ths experment, whch lasted several weeks, the advertser s search ads were shown n half of the geos Fgure 2 shows the result of fttng the lnear model n Equaton 1 wth successvely longer sets of test perod data to fnd the ROAS for clcks At frst, the confdence nterval of ths metrc s large, but t decreases quckly as more test perod data are accumulated Each dollar of ad spend generates 1/3 of an ncremental clck or, equvalently, the CPIC s $3 In ths case, the reported CPC n AdWords s $240, whch underestmates CPIC by 20% 2 So, the pad clcks do dsplace some organc clcks, but certanly not the bulk of them To further llustrate the ablty of pad search advertsng to generate ncremental clcks, Fgure 3 shows the cumulatve ncremental ad spend across the test perod along wth the cumulatve ncremental clcks The number of ncremental clcks s zero at the begnnng of the test perod and ncreases steadly wth tme along wth the ncremental ad spend However, once the ad spend n the test geos returns to a pretest level, the accumulaton of ncremental ad spend stops At the same tme, the accumulaton of ncremental clcks stops as well Ths behavor ndcates 2 In [2] the authors defne IAC (ncremental ad clcks) as the fracton of pad clcks that are ncremental IAC = CPC / CPIC, so IAC = 80% n ths example ROAS for Clcks 00 01 02 03 04 05 06 ROAS 95% conf nterval conf nterval wdth 0 10 20 30 40 50 Tme Snce Test Start end of ncremental ad spend Fgure 2: Measurement of return on ad spend for clcks as a functon of test perod length The uncertanty n ths estmate decreases untl the ad spend returns to normal levels n all of the geos that, n ths case, search advertsng does not ncrease the number of clcks beyond the day n whch the ad spend occurred As mentoned n Secton 2, the mpact of ad spend s not as tme-lmted for all response metrcs Fgure 4 s analogous to Fgure 3, except the response metrc s offlne sales Even after the ad spend dfferental returns to normal, the mpact of the ad spend contnues to generate ncremental sales for some perod of tme before fadng 5 Desgn Desgn s a crucal aspect of runnng an effectve geo experment Before begnnng a test, t s helpful to understand how characterstcs such as experment length, test fracton, and magntude of ad spend dfferental wll mpact the uncertanty of the ROAS measurement Ths understandng allows for the desgn of an effectve and effcent experment Fortunately, t s possble to make such assessments for the lnear model n Equaton 1 4 Google Inc Confdental and Propretary
5 DESIGN Cumulatve Incremental Ad Spend & Clcks 0 50000 150000 250000 Cumulatve Incremental Ad Spend & Clcks cumulatve ncr spend cumulatve ncr clcks 0 10 20 30 40 50 Tme Snce Test Start end of ncremental ad spend Fgure 3: Cumulatve ncremental ad spend and clcks across the test perod The accumulaton of ncremental clcks stops as soon as the ad spend returns to the pretest level n all geos Cumulatve Incremental Ad Spend & Revenue 0 400000 800000 1200000 Cumulatve Incremental Ad Spend & Revenue end of ncremental ad spend 0 5 10 15 Tme Snce Test Start cumulatve ncr spend cumulatve ncr revenue Fgure 4: Cumulatve ncremental sales across the length of the test perod Incremental sales contnue to be generated even after the ad spend returns to pretest levels n all geos For an experment wth N geos, let ȳ 0 = (1/N) N =1 y,0 and δ = (1/N) N =1 δ Lnear theory ndcates that the varance of β 2 from Equaton 1 s var(β 2 ) = σ 2 ɛ ( ) [ N 1 ρ 2 yδ =1 w (δ δ) ] (4) 2 where σ ɛ s the resdual varance, and ρ 2 yδ = [ N =1 w (y,0 ȳ 0 )(δ δ)] 2 N =1 w (y,0 ȳ 0 ) 2 N =1 w (5) (δ δ) 2 (see Appendx) Usng a set of geo-level pretest data n the response varable, t s possble to use ths expresson to estmate the wdth of the ROAS confdence nterval for a specfed desgn scenaro The frst step n the process s to select a consecutve set of days from the pretest data to create pseudo pretest and test perods The lengths of the pseudo pretest and test perods should match the lengths of the correspondng perods n the hypotheszed experment For example, an experment wth a 14 day pretest perod and a 14 day test perod should have pseudo pretest and test perods that are each 14 days long The data from the pseudo pretest perod are used to estmate y,0 and w n Equaton 4 The next step s to randomly assgn each geo to the treatment or control group We have found that confdence nterval estmates are lower by about 10% when ths random assgnment s constraned n the followng manner The geos are ranked accordng to y,0 Then, ths ranked lst of geos s parttoned nto groups of sze M, where the test fracton s 1/M One geo from each group s randomly selected for assgnment to the treatment group It may be possble to drectly estmate the value of δ at the geo level For example, f the ad spend wll be turned off n the treatment geos, then δ s just the average daly ad spend for treatment geo tmes the number of days n the experment Otherwse, an aggregate ad spend dfferental can be hypotheszed and the geo- Google Inc Confdental and Propretary 5
6 CONCLUDING REMARKS level ad spend dfferental can be estmated usng δ = { (y,0 / y,0) for T 0 for C (6) The last value to estmate n Equaton 4 s σ ɛ Ths estmate s generated by consderng the reduced lnear model; y,1 = ˆβ 0 + ˆβ 1 y,0 + ˆɛ (7) Ths model has the same form as Equaton 1 except the ad spend dfferental term has been dropped Fttng ths model usng the pseudo pretest and test perod data results n a resdual varance of σˆɛ, whch s used to approxmate σ ɛ To avod any peculartes assocated wth a partcular random assgnment, Equaton 4 s evaluated for many random control/treatment assgnments In addton, dfferent parttons of the pretest data are used to create the pseudo pretest and test perods by crcularly shftng the data n tme by a randomly selected offset The half wdth estmate for the ROAS confdence nterval s 2 var(β 2 ), where var(β 2 ) s the average varance of β 2 across all of the random assgnments Ths process can be repeated across a number of dfferent scenaros to evaluate and compare desgns Note that f a lmted set of pretest data s avalable, crcular shftng of the data makes t possble to analyze scenaros wth extended test perods However, dong so requres data ponts to be used multple tmes n generatng each estmate of var(β 2 ), and the example below demonstrates that ths reuse of the data leads to estmates that are overly optmstc Fgure 5 shows the confdence nterval predcton as a functon of experment length for the clck example from Secton 2 The dashed lne corresponds to the predcted confdence nterval half wdth and the sold lne corresponds to results from the experment For ths comparson, the ad spend dfferental from the experment was used as nput to the predcton The predctons are qute accurate beyond the very begnnng of the test perod Addtonally, they mantan ths accuracy untl the combned Confdence Interval Half Wdth (95%) 000 005 010 015 Confdence Interval Predcton begn multple use of pretest data 0 10 20 30 40 Tme Snce Test Start Analyss Results Predcton end of ncremental ad spend Fgure 5: ROAS confdence nterval predcton across the length of the test perod The predcton s qute good untl the test perod becomes long enough that some of the pretest data must be used multple tmes to generate each estmate of var(β 2 ) length of the hypotheszed pretest and test perods becomes longer than the (delberately) lmted set of pretest data used to generate the estmates The good match between these two curves demonstrates that the absolute sze of the confdence nterval can be predcted qute well, at least as long as the ad spend dfferental can be accurately predcted 6 Concludng Remarks Measurng ad effectveness s a challengng problem Currently, there s no sngle methodology that works well n all stuatons However, geo experments are worthy of consderaton n many stuatons because they provde the rgor of a randomzed experment, they are easy to understand, they provde results that are easy to nterpret, and they have a systematc and effectve desgn process Geo experments can be appled to measure a varety of user behavor and can be used wth any advertsng medum that allows for geo-targeted advertsng, Furthermore, these experments do not requre the trackng of 6 Google Inc Confdental and Propretary
7 APPENDIX ndvdual user behavor over tme and therefore avod prvacy concerns that may be assocated wth alternatve approaches Acknowledgments We thank those who revewed ths paper (wth specal thanks to Tony Fagan and Lzzy Van Alstne for ther many helpful suggestons), others at Google who made ths work possble, and the forward lookng advertsers who shared ther data wth us References [1] D Chan, et al Evaluatng Onlne Ad Campagns n a Ppelne: Causal Models at Scale Proceedngs of ACM SIGKDD 2010, pp 7-15 [2] D Chan et al Incremental Clcks Impact Of Search Advertsng researchgooglecom/pubs/archve/37161pdf, 2011 [3] Google Ads Team AdWords Campan Experments Sept 1, 2011 Ad Innovatons http://wwwgooglecom/ads/nnovatons/ acehtml [4] M H Kutner, et al Appled Lnear Statstcal Models New York: McGraw-Hll/Irwn, 2005 [5] T Yldz, et al Measurng and Optmzng Dsplay Advertsng Impact Through Experments In preparaton (researchgooglecom) 7 Appendx To derve Equaton 4, consder the centered versons of the varables y,1, y,0, and δ from Equaton 1; y,1 = y,1 ȳ 1, y,0 = y,1 ȳ 0, and δ = δ δ for 1N and ȳ j = (1/N) y,j Wth these translatons, the relevant lnear model becomes y,1 = β 1 y,0 + β 2 δ + ɛ (8) Or, where Y = Xβ + ɛ (9) y 1,1 Y =, X = y N,1 β = [ β1 β 2 ], ɛ = y 1,0 δ 1 y N,0 ɛ 1 ɛ N Wth the model n ths form, the varancecovarance matrx of the weghted least squares estmated regresson coeffcents s: δ N var(β) = σ 2 ɛ (X T W X) 1 (10) (see [4]), where W s a dagonal matrx contanng the weghts w, w 1 0 0 0 w 2 W = 0 (11) 0 w N Now, [ var(β) = σɛ 2 w y,02 ] 1 w y,0 δ 1 w y,0 δ w δ 2 (12) and the last component of ths matrx s the varance of β 2, σ 2 ɛ var(β 2 ) = w y,0 2 ( ) ( w y,0 2 ) ( w δ 2 2 w y,0 ) δ (13) Usng Equaton 5, ( ) ( ) w y,0 2 w δ 2 (1 ρ yδ ) 2 = Google Inc Confdental and Propretary 7
7 APPENDIX ( ) ( ) ( ) 2 w y,0 2 w δ 2 w y,0δ (14) whch, after substtutng nto Equaton 13, leads to Equaton 4 8 Google Inc Confdental and Propretary