Challenges in Measuring Online Advertising Systems
|
|
|
- Raymond Byrd
- 10 years ago
- Views:
Transcription
1 Challenges in Measuring Online Advertising Systems Saikat Guha Microsoft Research India Bangalore, India Bin Cheng, Paul Francis Max Planck Institute for Software Systems Kaiserslautern-Saarbruecken, Germany ABSTRACT Online advertising supports many Internet services, such as search, , and social networks. At the same time, there are widespread concerns about the privacy loss associated with user targeting. Yet, very little is publicly known about how ad networks operate, especially with regard to how they use user information to target users. This paper takes a first principled look at measurement methodologies for ad networks. It proposes new metrics that are robust to the high levels of noise inherent in ad distribution, identifies measurement pitfalls and artifacts, and provides mitigation strategies. It also presents an analysis of how three different classes of advertising search, contextual, and social networks, use user profile information today. Categories and Subject Descriptors C.4 [Performance of Systems]: Measurement techniques; K.4. [Computers and Society]: Public Policy Issues Privacy General Terms Experimentation, Measurement Keywords Advertising, Privacy, Behavioral Targeting, Contextual, Churn, Similarity, Google, Facebook. INTRODUCTION Online advertising is a key economic driver in the Internet economy, funding a wide variety of websites and services. At the same time, ad networks gather a great deal of user information, for instance users search histories, web browsing behaviors, online social networking profiles, and mobile locations [6 8]. As a result, there are widespread concerns about loss of user privacy. In spite of all this, very little is publicly known about how ad networks use user information Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IMC, November 3, 2, Melbourne, Australia. Copyright 2 ACM //...$.. to target ads to users. For instance, Google recently started allowing advertisers to target ads based not just on keywords and demographics, but on user interests as well [9]. Knowing how well Google and others are able to determine user characteristics is an important consideration in the ongoing public debate about user privacy. If an ad network is able to accurately target users, we can deduce that the ad network is able to determine user characteristics (though the inverse does not follow). Given then the goal of determining how well ad networks can target users, the high-level methodology is straightforward. Create two clients that emulate different values of a given user characteristic (i.e. location or gender), and then measure whether the two clients receive different sets of ads as a result. If the ads are identical, we can trivially conclude the ad network doesn t use that user characteristic for targeting (although it might still be storing the data). But the outcome is unclear if the two sets are different: the difference may genuinely be due to the difference in characteristic, or it may be due to noise. As it turns out, the level of noise in measuring ads is extremely high. Even queries launched simultaneously from two identically configured clients on the same subnet can produce wildly different ads over multiple timescales. As we show later, some of this noise is systemic (e.g. DNS loadbalancing), and can therefore be eliminated through proper experiment design. Other noise has a temporal component, which likely reflects ad churn, the constant process of old ads being deactivated and new ads being activated. We design a metric that mitigates the noise from churn. Overall this paper makes two contributions. First, we present the detailed design of a measurement methodology for measuring online advertising that is robust to the high levels of noise inherent in today s systems. We present a set of guidelines for researchers that wish to study advertising systems. Second, we present an analysis of the key factors that determine ad targeting on Google and on Facebook. 2. MEASUREMENT METHODOLOGY In this section we present the detailed design of our measurement methodology. We face four key challenges: ) comparing individual ads, 2) collecting a representative snapshot of ads, 3) quantifying differences between snapshots while being robust to noise, and 4) avoiding measurement artifacts arising from the experiment design.we present our design decisions and justify each using measurement data. The methodology we design applies to text-ads in any context including ads in search results, contextual ads on webpages
2 Instance A: Instance B: Red Prom Dresses Red Prom Dresses Win a Free Dress for Prom 2. Beautiful Designer Prom Dresses Find New Trends; Great Prices! to Fit Every Figure; Price Range. DavidsProm.com DavidsProm.com MD5(RedirURL): 8ebc... 45dc MD5(RedirURL): d3 Dest:... detail.jsp?i=2462 Dest:... detail.jsp?i=2462 Instance C: Instance D: Baby Doll Prom Dresses Red Prom Dresses Win a Free Dress for Prom 2. Shop JCPenney For Colorful Prom Find New Trends; Great Prices! Gowns; Dresses From Top Designers. DavidsProm.com JCPenney.com/dresses MD5(RedirURL): 99d... fbf MD5(RedirURL): c2d... ce2c Dest:... detail.jsp?i=223 Dest:... X6.aspx?ItemId=7bd2fb Table : Examples that complicate uniquely identifying ads. and webmail systems, and ads on online social networking pages; additionally, many qualitative aspects of our design may also apply to banner ads. 2. Comparing Individual Ads Before we can compare sets of ads, we first need the ability to identify the same ad in different scenarios. The problem is hard because ad networks typically do not reveal the unique ID for the ad (except for Facebook), and some (like Bing) even go so far as to obfuscate or encrypt parameters in the click URL denying scraping based measurement approaches any visibility into internal parameters. The only data available consistently across ad networks is the content of the ad, and even there the extensive abilities to customize it (e.g. for Google), makes it hard to identify different instances of the same ad. Since slight variations of the same ad defeat simple equality tests, heuristics must be used and their false positive and false negative behavior must be understood. Consider, for example, the four ads illustrated in Table. Instances A and B are semantically equivalent, mutually exclusive (i.e. never both served for the same request), and lead to the same page on the advertiser s website (despite having different redirect URLs ). Instances A and C appear to be from the same template, but are semantically different and lead to different pages. Instances A and D are from different advertisers altogether. Ideally, instances A and B would be considered equivalent, and different from both C and D. This cannot be achieved with simple equality tests on any single ad attribute (title, summary, display URL, redirect URL); other examples not presented here demonstrate the presence of false positives or false negatives for equality tests on combinations of attributes. Experiment : Since comparison errors are unavoidable, we analyze false positives and false negatives of different approaches to comparing ads in order to pick the best approach as well as provide a bound on analysis errors. We consider 4 approaches: ) equality of the redirect URL, 2) equality of the display URL, 3) equality of the ad title and display URL, and 4) equality of the ad title and summary text with all occurrences of the search keywords masked. Note: we cannot consider the destination URL for the ad since that is Display URL (e.g. DavidsProm.com) is the URL displayed to the user. Redirect URL (RedirURL), is the URL the user is actually redirected to when he clicks the ad. The redirect URL is longer and less user-friendly, containing deep path information and URL parameters. The destination URL (Dest) where the user is eventually taken to may be different from the redirect URL when multiple redirects are involved. All Fashion Dresses only Approach % FP % FN % FP % FN RedirURL DisplayURL Title + DisplayURL 45 5 Title + Summary Table 2: False positives (FP) and false negatives (FN) of various approaches for uniquely identifying ads. revealed only after the ad is clicked (and clicking on the ad in an automated matter constitutes fraud). We apply each approach to all pairs of ads in two datasets of ads scraped from Google search results. For estimating false positives, we manually check a sampling of pairs flagged as equal by the comparison approach. For false negatives, we manually check pairs flagged as different by the comparison approach; to focus manual analysis on pairs likely to be false negatives, we examine those flagged as equal by one of the other approaches, thus providing a lower bound. The first dataset contains ads for search queries related to fashion in general (clothing, shoes, accessories, etc.), while the second dataset restricts the queries to only dress-related. Table 2 summarizes the false positives and false negatives for each comparison approach based on manual analysis of ad-pairs flagged by each approach. Except when comparing display URLs, there are (almost) no false positives. The false positives for display URLs arise from some stores using the same display URL for a wide range of products (e.g. target.com instead of target.com/mens). The display URL does, however, have significantly lower false negatives in both datasets. This is because advertisers often have multiple variations of the same ad. This includes different title, summary, and a different redirect URL to measure how each variation performs; comparing ads based on these is therefore more error prone. The display URL, in contrast, is more stable across variations of the same ad. False negatives have the effect of over-counting the number of unique ads and in the process, increasing noise. False positives, on the other hand do not increase noise, but undercount the number of unique ads. In this paper we perform all analysis relative to a control experiment (affected equally by false positives or negatives) to mitigate the effect of overcounting or under-counting. This leaves added noise (for false negatives) as the only differentiating factor. In this paper we therefore use the display URL, which has significantly lower false negatives and slightly higher false positives, to compute uniqueness of ads. 2.2 Taking a Snapshot There typically are more ads than can be displayed on a single page (search result, or adbox in a webpage). A single query therefore reveals incomplete information. The actual subset returned depends on the ad network, but likely takes into account the ad value (based on auctions), and frequency capping (not showing the same ad to the same user too often). Reloading the page multiple times typically reveals more ads, but runs the risk of capturing inaccurate snapshots in the presence of ad churn. Experiment 2: To determine how many times the page should be reloaded to capture accurate and complete snapshots we perform the following experiment. We reload the Google search results for a given query every 5 seconds for 5 minutes; the experiment is repeated for over 2 different
3 CDF (Unique ads) # reloads Figure : Reloading the page initially reveals more ads that didn t originally fit. Beyond a point, however, ad churn overtakes diminishing returns of reloading the page error bars Weight: views Weight: value Weight: log(views) Overlap (a) Identical setup (higher is better) search queries (chosen randomly from a set of 5 keywords scraped from Google Product Search 2 ). Figure plots the CDF of unique ads for the median search query, with error bars marking off the 95 th and 5 th percentiles. The graph comprises a steep increase (5 reloads) where we discover most ads that didn t fit the first time, after which point diminishing return kicks in. Instead of flattening out, however, beyond requests the graph becomes linear. This gradual linear increase, we believe, is evidence of a constant rate of ad churn where new ads are constantly activated and old ones deactivated. The high rate of churn ( 4% per minute depending on the search keyword) is consistent with [5] where we found that roughly only 6% of ads are stable (hour-to-hour, and day-to-day) while the rest change rapidly. The knee of the graph (around reloads) represents the point at which the diminishing returns of reloading the page is overtaken by ad churn. In our experiments we therefore balance completeness and churn by reloading the page times when collecting snapshots. We also keep track of the number of times each ad was seen (if multiple reloads contain the ad) and the positions where the ad was seen. 2.3 Quantifying Change The simplest approach to comparing two snapshots is to compute the set overlap. The Jaccard index quantifies this overlap as A B where A and B are the sets of unique ads in A B the two snapshots; implies no overlap and implies identical snapshots. While simple, the Jaccard index is highly susceptible to noise (fleeting ads); each unique ad is weighed equally whether it was seen only once or seen many times. The typical way of dealing with noise is to look at aggregate behavior. Taking the union of multiple snapshots, however, makes the situation worse by also aggregating the noise. The extended Jaccard index (also called cosine similarity) addresses this limitation by interpreting the two sets as vectors in n-dimensional space (where each set element defines one dimension), and the coefficient of the vector in that dimension is some weight function (w) based on the Ā B element. The metric is defined as where Ā = [wa,e]; Ā B w A,e is some non-zero weight if ad e exists in A, or if it doesn t. As before, the metric evaluates to for dissimilar snapshots, and to for identical snapshots. We explore three approaches to picking the non-zero weight: ) the number of page reloads containing the ad, 2) the logarithm of the same, and 3) the number of page reloads scaled by the value of the ad; the value, defined in [4], is based on the ad s position with ads near the top of the page getting more weight than 2 Weight: views Weight: value Weight: log(views) Overlap (b) Different setup (lower is better) Figure 2: CDF of similarity scores computed by the four metrics tracked over 8 days. those near the bottom. By taking the number of times the ad was seen into account, each weight function also effectively attenuates noise when snapshots are aggregated together. Experiment 3: To determine which approach performs the best we conducted the following experiment. We simultaneously collect snapshots from two browser instances configured identically and on the same machine (in New York); we expect the snapshots to be substantially the same. We also simultaneously collect snapshots from a machine set up at a remote location (in San Francisco) where we expect to see some differences in the set of ads. Snapshots (for 5 queries) are collected every 5 minutes for a period of 8 days. We then aggregate hour s worth of data (2 snapshots) and compare the performance of all four metrics: the Jaccard index, and the extended Jaccard index with the three weight functions. Figure 2(a) plots the CDF of the computed metric value for the case where we expect snapshots to be identical; the closer to y = the better. As expected, the plain Jaccard index performs poorly. The other three perform quite well, with the logarithmic weight function trailing slightly due to the reduced influence of highly-stable ads. Figure 2(b) plots the CDF for the case where we expect snapshots to be different; the greater the difference between the corresponding lines between 2(a) and 2(b) the better. The logarithmic weight clearly outperforms the other two here. This is because for the other two weight functions, and especially so for weights based on the ad value, the long tail of ads is drowned out by a handful of highly-stable highly-ranked ads, which tend to be from large companies (e.g. ebay, Amazon) that target broadly across many demographics, interests, and locations. In our analysis we use the extended Jaccard index with logarithmic weights to quantify the similarity or difference between snapshots. The metric in practice is both robust to
4 Same datacenter (A vs. B) Different datacenter (A vs. C) Figure 3: Artifact of DNS load-balancing by ad network. noise, as well as sensitive to changes in the underlying set of ads. 2.4 Avoiding Artifacts During the course of our experiments we identified several measurement artifacts that stemmed from poor interactions between lower level protocols and the operational architecture of the ad network. We discuss two artifacts and how they can be eliminated: the first pertains to DNS load-balancing, and the second to distributed data collection from multiple machines. In one case we observed discrepancies between data collected by three identically configured browser instances all running on the same machine. Snapshots for the first two browser instances (A and B) are virtually identical, while that for the third instance (C) is very different (Figure 3). We discovered the ad network domain is DNS load-balanced and the browsers did not share a DNS cache (and therefore each queried DNS independently). Instance A and B were communicating with different IP addresses in the same /24 (same city), while C was communicating with an IP address in the same /6 but different /24 (different nearby city), which we take to mean two different datacenters. Thus even for identical requests from the same source, the choice of datacenter, we found, dramatically affects the ads served. This artifact is, of course, easily avoided by configuring a static entry in the hosts file so all instances reach the same datacenter. Even with static DNS entries, we sometimes (but not always) observed discrepancies when running identical browser instances on different machines. The 5 th percentile similarity score dropped to.87 for different machines (compared to.99 in the same-machine case). We noticed the cookies assigned to the two instances were different; when we synchronized the cookie values, the noise disappeared. In another case, we measured similar levels of noise when we added a HTTP proxy in front of the two machines (which downgraded HTTP/. to.). We believe these artifacts are because of black-box frontend load-balancer behavior at the ad network that, based on at least the IP address, HTTP version, and cookies, we suspect, directs the requests to different backend servers, each of which has a slightly different cache of ads. The only way to mitigate this source of noise, we believe, is to ensure as many header fields are held constant as possible. Ideally, traces for an experiment are all collected from a single IP address, not behind a proxy, with cookies synchronized across browser instances. Applying these techniques significantly reduces the base noise level, but does not eliminate it. We therefore measure the noise during our experiments to detect anomalies and to establish a level of confidence in our results. 3. ANALYSIS In this section we use the above methodology to explore specific questions regarding how ads are targeted in three different contexts: search, websites, and online social networks. These questions include, among others, whether behavioral targeting affects search ads, whether past searches affect ads on websites, and what profile data affects social network ads. That said, since ad targeting is a black-box where we can reliably control only a small set of inputs, we are restricted in the questions we can answer. An example of a question we cannot answer is whether Google learns the user s gender by observing which search results the user clicks and then uses it to target ads; this is because we cannot reliably affect or verify the gender learned by Google s (black-box) algorithm if it indeed does so at all. For questions where we can reliably affect the inputs to the ad selection algorithm, our experimental methodology is as follows. For each experiment we configure two (or more) measurement instances to differ by exactly one input parameter, and configure two measurement instances identically to serve as the noise-level control. If the similarity score between the control pair is high (i.e. low noise) but that between two differently configured instances is low, we conclude that the input parameter in which the two instances differ affects the choice of ads. For scalability and repeatability, all experiments are scripted using the Chickenfoot browser automation framework [2]. 3. Search Ads Search ads have typically been targeted based on keywords in the search query. It is therefore expected that keyword based targeting dominates search. The question we ask is: to what extent does behavioral targeting affect search ads? Behavioral targeting refers to using the user s browsing habits to influence ad selection. Experiment 4: We set up four browser instances: the first two (A and B), which also serve as our control, disable DoubleClick s DART cookie [3] that Google uses for behavioral targeting. The third (C) and fourth (D) have cookies enabled, but are seeded with different user personae 3. C was seeded with long-term interests in Autos & Vehicles, while D was seeded with interests in Shopping. We then perform Google searches for 73 random product-related queries for a period of 5 days. Figure 4(a) illustrates to what extent behavioral targeting affected search ads. As is evident from the figure, for keywords where the data is not too noisy (i.e. control score is high), there is no appreciable difference in the ads served whether the behavioral targeting cookie is disabled or enabled (A vs. C), or for two users with different interests (C vs. D). To understand why, we looked at the ads served. 73% of them contained the whole search query somewhere in the ad, and 97% of them contained at least one word from the search query. It is therefore clear that ads are selected primarily based on keywords. Furthermore the average number of unique ads for our search queries is 8. Since all the ads matching the keyword can be shown to the user in a 3 Google normally learns short-term and long-term user interests completely automatically. It also allows users to view and modify the learned interests ( ads/preferences), which we use to create (or verify Google learned) different personae.
5 (A vs. B) Cookie enabled (A vs. C) Different interests (C vs. D) Search term (sorted) (A vs. B) Different coast (A vs. C) Different country (A vs. D) (A vs. B) Different interests (A vs. C) Different recent searches (A vs. D) Different recent clicks (A vs. E) (a) (b) (c) Figure 4: (a) Behavioral targeting doesn t appear to affect search ads. (b) Location affects website ads. (c) The affect of user behavior (browsing, search, clicks) on website ads is indistinguishable from noise in the system. short period, there may be little reason to pick and choose any further. 3.2 Website Ads Website ads have typically been based on the context of the page and are hence also known as contextual advertising. We ask here to what extent the user s location affects the choice of ads, and to what extent the user s behavior (browsing behavior, recent searches, and recent clicks on products) affects website ads. We measure a set of 5 websites that show Google ads; the websites are picked randomly from the set of websites visited by CoDeeN [] users. While we are able to answer the location question, we are able to present only weak evidence towards the lack of use of behavioral data due to noise. Experiment 5: To understand the impact of user location, we set up four browsers: A and B, the control pair, in the same city in the US (New York), C in a different city on the other coast (San Francisco), and E in a different country (Germany). Figure 4(b) plots the similarity scores between the different instances. As one might expect, location affects the set of ads, but interestingly, there is (relatively) little difference between cities on opposite coasts (median similarity of.9). This is higher than we expected given the long-tail of ads appears to contain local mom-and-pop retailers, although in retrospect, these retailers may nevertheless conduct business nation wide. Experiment 6: To understand the effect of user behavior, we set up five browsers: instance A and B, the control pair, are configured identically except for the cookie that is needed for tracking user behavior. For C we browse 3 out of the 5 websites in the query set until Google learns the set of long-interests associated with those websites (which we verified 3 ). For D and E we additionally browse random websites and perform Google searches on 5 product-related keywords shortly before collecting each snapshot (but don t click any result for D); we verified 3 that Google learned short-term interests for the random websites visited. Finally, for E we additionally click on product results before collecting each snapshot. Figure 4(c) plots the similarity score between A and the other instances. While at first glance it might appear that browsing behavior, recent searches, and recent product clicks all result in different sets of ads, the problem is that even for the control pair similarity is very low indicative of high levels of noise. We compared the fraction of ads shown to A and D that contained one or more of the search query terms and found no difference. The same held for ads containing the interests associated with instance C, and product names or categories used for instance E. We therefore believe that Google does not currently use recent browsing, search, or click behavior in picking website ads, but due to the high noise cannot definitively make the case for it. That said, the measurement methodology developed here will allow us to monitor the evolution of these systems. As to the origin of the noise, we speculate this is because contextual ad systems have not yet been optimized for relevancy to the same extent as search ad systems, especially considering Google started collecting this data only since 29 [9]. As contextual ad targeting improves over time, we expect the similarity score of the control pair to increase, and depending on whether the score for various user behaviors stay the same or increase in the future, we hope to conclude with certainty whether or not they are used. 3.3 Online Social Network Ads We next turn our attention to ads on online social networking sites, specifically Facebook. We seek to understand which pieces of profile information (gender, age, education, sexual-preference, etc.) Facebook uses today. Experiment 7: We set up three (or more) Facebook profiles: profile A and B are set up identically (control), while profile C onwards differ from A in the value of the profile parameter of interest; the number of unique profiles depends on the number of values that parameter can take (e.g. 2 for gender, 5 for education, etc.) When not being varied, the gender was set to female, the age was set to 3, the location was set to New York, and the remaining fields were left empty. Figure 5 plots the time-series of similarity scores for six different profile parameters. In short, Facebook uses all profile elements we checked. All the plots show some sort of diurnal behavior where around midnight US east-coast time the similarity score changes abruptly. Similar diurnal artifacts were observed in [5], and as in that paper, we believe the cause is the daily reactivation of ads that exhausted their daily budget the previous day. On the issue of profile elements, it appears that the two primary factors affecting ads are the user s age and their gender. While there exist values for education and relationship status that affect ads shown, not all values affect ads equally. For instance, there is little difference between ads targeted to users without any listed education and users in high-school. Similarly, if the gender is male, or the relationship-status is married, relationshipstatus has only a small impact on ads; the greatest impact of relationship-status on ads is seen for women who are engaged. Experiment 8: Lastly, we set up six Facebook profiles to check the impact of sexual-preference: a highly-sensitive personal attribute. Two profiles (male control) are for males interested in females, two (female control) for females inter-
6 Different age Different gender Different education /22 4/23 4/24 4/25 4/26 4/27 4/28 4/29 4/22 4/23 4/24 4/25 4/26 4/27 4/28 4/29 4/22 4/23 4/24 4/25 4/26 4/27 4/28 4/29 (a) Age group (b) Gender (c) Education Different interests Different location Different relationship status /22 4/23 4/24 4/25 4/26 4/27 4/28 4/29 4/22 4/23 4/24 4/25 4/26 4/27 4/28 4/29 4/22 4/23 4/24 4/25 4/26 4/27 4/28 4/29 (d) Interests (e) Location (f) Relationship Status Figure 5: Age group, gender, education, interests, location and relationship-status all appear to affect ads on Facebook (male) Different sexual-preference (a) Male (female) Different sexual-preference (b) Female Figure 6: Sexual-preference affects ads on Facebook, but more so for males than females. ested in males, and one test profile of a male interested in males and one of a female interested in females. The age and location were set to 25 and Washington D.C. respectively. Figure 6 plots the similarity scores for week of data. While there is more noise in general, unlike in 4(c) there is a measurable difference between the control and test pairs; we further manually verified based on ad content that this difference is qualitative in nature (e.g. ads for gay bars were never shown for the control profiles, but shown often for the test profiles). The median similarity score for gay women was.5 higher than for gay men, indicating that advertisers target more strongly to the latter demographic. Alarmingly, we found ads where the ad text was completely neutral to sexual-preference (e.g. for a nursing degree in a medical college in Florida) that was targeted exclusively to gay men. The danger with such ads, unlike the gay bar ad where the target demographic is blatantly obvious, is that the user reading the ad text would have no idea that by clicking it he would reveal to the advertiser both his sexual-preference and a unique identifier (cookie, IP address, or address if he signs up on the advertiser s site). Furthermore, such deceptive ads are not uncommon; indeed exactly half of the 66 ads shown exclusively to gay men (more than 5 times) during our experiment did not mention gay anywhere in the ad text. Overall we find that while location affects Google ads, behavioral targeting does not today appear to significantly affect either search or website ads on Google. Location, user demographics and interests, and sexual-preference all affect Facebook ads. As these systems evolve, our methodology can track changes in their use of user data. We thus inform, and hope to keep informed, the ongoing public debate about user privacy. 4. RELATED WORK There is little past work in studying ad networks through measurement. In [5] we presented an ad hoc measurement result for Google ads, however, based on our experience presented here, we now believe that result significantly underestimated the number of ads and didn t properly account for noise. 5. SUMMARY We have presented the first principled and robust methodology for measurement-based studies of online ad networks. We also inform the ongoing privacy debate regarding what user data is used today for targeting search ads, contextual ads, and ads on online social networks. Like most measurement studies, however, this analysis is a snapshot in time. Moving forwards, we hope that the methodology we have developed can continue to be used to broaden our knowledge of online advertising as well as to track trends in the future.
7 6. REFERENCES [] CoDeeN: A Content Distribution Network for PlanetLab. [2] M. Bolin, M. Webber, P. Rha, T. Wilson, and R. C. Miller. Automation and Customization of Rendered Web Pages. In Proceedings of The Eighteenth Annual ACM Symposium on User Interface Software and Technology (UIST 5), Seattle, WA, Oct. 25. [3] DoubleClick. DART for Advertisers. http: // 29. [4] J. Feng, H. K. Bhargava, and D. M. Pennock. Implementing Sponsored Search in Web Search Engines: Computational Evaluation of Alternative Mechanisms. INFORMS Journal on Computing, 9():37 48, Jan. 27. [5] S. Guha, A. Reznichenko, K. Tang, H. Haddadi, and P. Francis. Serving Ads from localhost for Performance, Privacy, and Profit. In Proceedings of the 8th Workshop on Hot Topics in Networks (HotNets 9), New York, NY, Oct. 29. [6] B. Krishnamurthy and C. Wills. On the Leakage of Personally Identifiable Information Via Online Social Networks. In Proceedings of The Seconds ACM SIGCOMM Workshop on Online Social Networks (WOSN 9), Barcelona, Spain, Aug. 29. [7] B. Krishnamurthy and C. Wills. Privacy Leakage in Mobile Online Social Networks. In Proceedings of The Third Workshop on Online Social Networks (WOSN ), Boston, MA, June 2. [8] B. Krishnamurthy and C. E. Wills. Cat and Mouse: Content Delivery Tradeoffs in Web Access. In Proceedings of the 5th international conference on World Wide Web (WWW 6), Edinburgh, Scotland, 26. [9] Kurt Opsahl. Google Begins Behavioral Targeting Ad Program. Mar. 29.
Test Run Analysis Interpretation (AI) Made Easy with OpenLoad
Test Run Analysis Interpretation (AI) Made Easy with OpenLoad OpenDemand Systems, Inc. Abstract / Executive Summary As Web applications and services become more complex, it becomes increasingly difficult
Load testing with. WAPT Cloud. Quick Start Guide
Load testing with WAPT Cloud Quick Start Guide This document describes step by step how to create a simple typical test for a web application, execute it and interpret the results. 2007-2015 SoftLogica
How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip
Load testing with WAPT: Quick Start Guide This document describes step by step how to create a simple typical test for a web application, execute it and interpret the results. A brief insight is provided
EVILSEED: A Guided Approach to Finding Malicious Web Pages
+ EVILSEED: A Guided Approach to Finding Malicious Web Pages Presented by: Alaa Hassan Supervised by: Dr. Tom Chothia + Outline Introduction Introducing EVILSEED. EVILSEED Architecture. Effectiveness of
Tracking True & False Demystifying Recruitment Marketing Analytics
Tracking True & False Demystifying Recruitment Marketing Analytics THE CANDIDATE JOURNEY SIMPLIFIED THE DECISION CYCLE SIMPLIFIED Awareness & Attraction Research & Decision Conversion Action THE CANDIDATE
Web Analytics Definitions Approved August 16, 2007
Web Analytics Definitions Approved August 16, 2007 Web Analytics Association 2300 M Street, Suite 800 Washington DC 20037 [email protected] 1-800-349-1070 Licensed under a Creative
Opinion 04/2012 on Cookie Consent Exemption
ARTICLE 29 DATA PROTECTION WORKING PARTY 00879/12/EN WP 194 Opinion 04/2012 on Cookie Consent Exemption Adopted on 7 June 2012 This Working Party was set up under Article 29 of Directive 95/46/EC. It is
Performance Workload Design
Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles
Exploratory Testing in an Agile Context
Exploratory Testing in an Agile Context A guide to using Exploratory Testing on Agile software development teams. Elisabeth Hendrickson 2 Exploratory Testing. So you bang on the keyboard randomly, right?
Basheer Al-Duwairi Jordan University of Science & Technology
Basheer Al-Duwairi Jordan University of Science & Technology Outline Examples of using network measurements /monitoring Example 1: fast flux detection Example 2: DDoS mitigation as a service Future trends
Stability of QOS. Avinash Varadarajan, Subhransu Maji {avinash,smaji}@cs.berkeley.edu
Stability of QOS Avinash Varadarajan, Subhransu Maji {avinash,smaji}@cs.berkeley.edu Abstract Given a choice between two services, rest of the things being equal, it is natural to prefer the one with more
User Guide. You will be presented with a login screen which will ask you for your username and password.
User Guide Overview SurfProtect is a real-time web-site filtering system designed to adapt to your particular needs. The main advantage with SurfProtect over many rivals is its unique architecture that
Regain Your Privacy on the Internet
Regain Your Privacy on the Internet by Boris Loza, PhD, CISSP from SafePatrol Solutions Inc. You'd probably be surprised if you knew what information about yourself is available on the Internet! Do you
Snapt Balancer Manual
Snapt Balancer Manual Version 1.2 pg. 1 Contents Chapter 1: Introduction... 3 Chapter 2: General Usage... 4 Configuration Default Settings... 4 Configuration Performance Tuning... 6 Configuration Snapt
Eucalyptus 3.4.2 User Console Guide
Eucalyptus 3.4.2 User Console Guide 2014-02-23 Eucalyptus Systems Eucalyptus Contents 2 Contents User Console Overview...4 Install the Eucalyptus User Console...5 Install on Centos / RHEL 6.3...5 Configure
Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data
Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Sheetal A. Raiyani 1, Shailendra Jain 2 Dept. of CSE(SS),TIT,Bhopal 1, Dept. of CSE,TIT,Bhopal 2 [email protected]
INTRODUCING AZURE SEARCH
David Chappell INTRODUCING AZURE SEARCH Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents Understanding Azure Search... 3 What Azure Search Provides...3 What s Required to
Taking the First Steps in. Web Load Testing. Telerik
Taking the First Steps in Web Load Testing Telerik An Introduction Software load testing is generally understood to consist of exercising an application with multiple users to determine its behavior characteristics.
Privacy leakage vs. Protection measures: the growing disconnect
Privacy leakage vs. Protection measures: the growing disconnect Balachander Krishnamurthy (AT&T Labs Research) Konstantin Naryshkin (Worcester Polytechnic Institute) Craig E. Wills (Worcester Polytechnic
CS 558 Internet Systems and Technologies
CS 558 Internet Systems and Technologies Dimitris Deyannis [email protected] 881 Heat seeking Honeypots: Design and Experience Abstract Compromised Web servers are used to perform many malicious activities.
Recommendations for Performance Benchmarking
Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
Web Application Hosting Cloud Architecture
Web Application Hosting Cloud Architecture Executive Overview This paper describes vendor neutral best practices for hosting web applications using cloud computing. The architectural elements described
Chapter-1 : Introduction 1 CHAPTER - 1. Introduction
Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet
Content Delivery and the Natural Evolution of DNS
Content Delivery and the Natural Evolution of DNS Remote DNS Trends, Performance Issues and Alternative Solutions John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern University
Table of Contents INTRODUCTION... 3. Prerequisites... 3 Audience... 3 Report Metrics... 3
Table of Contents INTRODUCTION... 3 Prerequisites... 3 Audience... 3 Report Metrics... 3 IS MY TEST CONFIGURATION (DURATION / ITERATIONS SETTING ) APPROPRIATE?... 4 Request / Response Status Summary...
DNS (Domain Name System) is the system & protocol that translates domain names to IP addresses.
Lab Exercise DNS Objective DNS (Domain Name System) is the system & protocol that translates domain names to IP addresses. Step 1: Analyse the supplied DNS Trace Here we examine the supplied trace of a
CS 188/219. Scalable Internet Services Andrew Mutz October 8, 2015
CS 188/219 Scalable Internet Services Andrew Mutz October 8, 2015 For Today About PTEs Empty spots were given out If more spots open up, I will issue more PTEs You must have a group by today. More detail
Deploying the BIG-IP System with VMware vcenter Site Recovery Manager
Deployment Guide Version 1.0 Deploying the BIG-IP System with VMware vcenter Site Recovery Manager Contents 2 Prerequisites and configuration notes 2 Deployment overview 3 Example configuration of BIG-IP
Migrating Exchange Server to Office 365
Migrating Exchange Server to Office 365 By: Brien M. Posey CONTENTS Domain Verification... 3 IMAP Migration... 4 Cut Over and Staged Migration Prep Work... 5 Cut Over Migrations... 6 Staged Migration...
Network Performance Monitoring at Small Time Scales
Network Performance Monitoring at Small Time Scales Konstantina Papagiannaki, Rene Cruz, Christophe Diot Sprint ATL Burlingame, CA [email protected] Electrical and Computer Engineering Department University
Search engine ranking
Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University
GOOGLE ANALYTICS 101
GOOGLE ANALYTICS 101 Presented By Adrienne C. Dupree Please feel free to share this report with anyone who is interested in the topic of building a profitable online business. Simply forward it to them
Invited Applications Paper
Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK [email protected] [email protected]
A Beginner s Guide to the Google Display Network
A Beginner s Guide to the Google Display Network Read this guide and learn how to advertise on Google s Display Network, so you open up a whole new channel of traffic, leads and customers. Brought to you
Advanced Preprocessing using Distinct User Identification in web log usage data
Advanced Preprocessing using Distinct User Identification in web log usage data Sheetal A. Raiyani 1, Shailendra Jain 2, Ashwin G. Raiyani 3 Department of CSE (Software System), Technocrats Institute of
STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Mobile Performance Testing
STeP-IN SUMMIT 2014 11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Mobile Performance Testing by Sahadevaiah Kola, Senior Test Lead and Sachin Goyal
Chapter 6 Virtual Private Networking Using SSL Connections
Chapter 6 Virtual Private Networking Using SSL Connections The FVS336G ProSafe Dual WAN Gigabit Firewall with SSL & IPsec VPN provides a hardwarebased SSL VPN solution designed specifically to provide
Small Business SEO Marketing an introduction
Small Business SEO Marketing an introduction Optimax May 2012 www.optimaxads.com 1 CONTENTS Introduction 3 On Page Optimisation 3 How Google views your Web Site 5 How to check your web page code for SEO
Secure Web Application Coding Team Introductory Meeting December 1, 2005 1:00 2:00PM Bits & Pieces Room, Sansom West Room 306 Agenda
Secure Web Application Coding Team Introductory Meeting December 1, 2005 1:00 2:00PM Bits & Pieces Room, Sansom West Room 306 Agenda 1. Introductions for new members (5 minutes) 2. Name of group 3. Current
BUYER S GUIDE: PC INVENTORY AND SOFTWARE USAGE METERING TOOLS
BUYER S GUIDE: PC INVENTORY AND SOFTWARE USAGE METERING TOOLS A guide for identifying an IT/software asset management product that best meets the needs of your organization 200 West Mercer Street Suite
Establishing How Many VoIP Calls a Wireless LAN Can Support Without Performance Degradation
Establishing How Many VoIP Calls a Wireless LAN Can Support Without Performance Degradation ABSTRACT Ángel Cuevas Rumín Universidad Carlos III de Madrid Department of Telematic Engineering Ph.D Student
An Insight into Cookie Security
An Insight into Cookie Security Today most websites and web based applications use cookies. Cookies are primarily used by the web server to track an authenticated user or other user specific details. This
PC-Duo Web Console Installation Guide
PC-Duo Web Console Installation Guide Release 12.1 August 2012 Vector Networks, Inc. 541 Tenth Street, Unit 123 Atlanta, GA 30318 (800) 330-5035 http://www.vector-networks.com Copyright 2012 Vector Networks
Computing and Communications Services (CCS) - LimeSurvey Quick Start Guide Version 2.2 1
LimeSurvey Quick Start Guide: Version 2.2 Computing and Communications Services (CCS) - LimeSurvey Quick Start Guide Version 2.2 1 Table of Contents Contents Table of Contents... 2 Introduction:... 3 1.
Your Blueprint websites Content Management System (CMS).
Your Blueprint websites Content Management System (CMS). Your Blueprint website comes with its own content management system (CMS) so that you can make your site your own. It is simple to use and allows
Measuring CDN Performance. Hooman Beheshti, VP Technology
Measuring CDN Performance Hooman Beheshti, VP Technology Why this matters Performance is one of the main reasons we use a CDN Seems easy to measure, but isn t Performance is an easy way to comparison shop
SAS BI Dashboard 3.1. User s Guide
SAS BI Dashboard 3.1 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2007. SAS BI Dashboard 3.1: User s Guide. Cary, NC: SAS Institute Inc. SAS BI Dashboard
(Refer Slide Time: 01:52)
Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This
Succeed in Search. The Role of Search in Business to Business Buying Decisions A Summary of Research Conducted October 27, 2004
Succeed in Search The Role of Search in Business to Business Buying Decisions A Summary of Research Conducted October 27, 2004 Conducted with the Assistance of Conducted by Gord Hotchkiss Steve Jensen
SiteCelerate white paper
SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance
Real vs. Synthetic Web Performance Measurements, a Comparative Study
Real vs. Synthetic Web Performance Measurements, a Comparative Study By John Bartlett and Peter Sevcik December 2004 Enterprises use today s Internet to find customers, provide them information, engage
Emerald. Network Collector Version 4.0. Emerald Management Suite IEA Software, Inc.
Emerald Network Collector Version 4.0 Emerald Management Suite IEA Software, Inc. Table Of Contents Purpose... 3 Overview... 3 Modules... 3 Installation... 3 Configuration... 3 Filter Definitions... 4
Webmail Instruction Guide
Webmail Instruction Guide This document is setup to guide your through the use of the many features of our Webmail system. You may either visit www.safeaccess.com or webmail.safeaccess.com to login with
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Performance Testing Process A Whitepaper
Process A Whitepaper Copyright 2006. Technologies Pvt. Ltd. All Rights Reserved. is a registered trademark of, Inc. All other trademarks are owned by the respective owners. Proprietary Table of Contents
Building a Scale-Out SQL Server 2008 Reporting Services Farm
Building a Scale-Out SQL Server 2008 Reporting Services Farm This white paper discusses the steps to configure a scale-out SQL Server 2008 R2 Reporting Services farm environment running on Windows Server
Testing and Restoring the Nasuni Filer in a Disaster Recovery Scenario
Testing and Restoring the Nasuni Filer in a Disaster Recovery Scenario Version 7.2 November 2015 Last modified: November 3, 2015 2015 Nasuni Corporation All Rights Reserved Document Information Testing
Internet Advertising Glossary Internet Advertising Glossary
Internet Advertising Glossary Internet Advertising Glossary The Council Advertising Network bring the benefits of national web advertising to your local community. With more and more members joining the
SEARCH ENGINE OPTIMISATION
S E A R C H E N G I N E O P T I M I S AT I O N - PA G E 2 SEARCH ENGINE OPTIMISATION Search Engine Optimisation (SEO) is absolutely essential for small to medium sized business owners who are serious about
Lab 8.3.2 Conducting a Network Capture with Wireshark
Lab 8.3.2 Conducting a Network Capture with Wireshark Objectives Perform a network traffic capture with Wireshark to become familiar with the Wireshark interface and environment. Analyze traffic to a web
Gladinet Cloud Enterprise
GLADINET, INC Gladinet Cloud Enterprise Multi-Site Deployment Guide Gladinet, Inc. 9/27/2013 This document discusses the technologies behind Gladinet Cloud Enterprise Copyright 2013 Gladinet, Inc. Table
Deploying F5 for Microsoft Office Web Apps Server 2013
Deploying F5 for Microsoft Office Web Apps Server 2013 Welcome to the F5 - Microsoft Office Web Apps Server deployment guide. This document contains guidance on configuring the BIG-IP Local Traffic Manager
Using HP Systems Insight Manager to achieve high availability for Microsoft Team Foundation Server
Using HP Systems Insight Manager to achieve high availability for Microsoft Team Foundation Server Executive summary... 2 Audience... 2 Architecture... 3 Prerequisite... 4 Deploying AT servers... 5 Configuring
TECHNICAL REFERENCE. Version 1.0 August 2013
TECHNICAL REFERENCE Version 1.0 August 2013 Technical Reference IPWeb 1.0 Copyright EVS Broadcast Equipment S.A. Copyright 2013. All rights reserved. Disclaimer The information in this manual is furnished
JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights. 2013 Copyright Metric insights, Inc.
JAVASCRIPT CHARTING Scaling for the Enterprise with Metric Insights 2013 Copyright Metric insights, Inc. A REVOLUTION IS HAPPENING... 3! Challenges... 3! Borrowing From The Enterprise BI Stack... 4! Visualization
Load Balancing Microsoft Sharepoint 2010 Load Balancing Microsoft Sharepoint 2013. Deployment Guide
Load Balancing Microsoft Sharepoint 2010 Load Balancing Microsoft Sharepoint 2013 Deployment Guide rev. 1.4.2 Copyright 2015 Loadbalancer.org, Inc. 1 Table of Contents About this Guide... 3 Appliances
Exploring Big Data in Social Networks
Exploring Big Data in Social Networks [email protected] ([email protected]) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
LOAD BALANCING IN WEB SERVER
LOAD BALANCING IN WEB SERVER Renu Tyagi 1, Shaily Chaudhary 2, Sweta Payala 3 UG, 1,2,3 Department of Information & Technology, Raj Kumar Goel Institute of Technology for Women, Gautam Buddh Technical
BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project
BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project Advisor: Sharon Goldberg Adam Udi 1 Introduction Interdomain routing, the primary method of communication on the internet,
Analytics case study
Analytics case study Carer s Allowance service (DWP) Ashraf Chohan Performance Analyst Government Digital Service (GDS) Contents Introduction... 3 The Carer s Allowance exemplar... 3 Meeting the digital
First Steps. QUALITYCLICK.COM c/o NetSlave GmbH Simon-Dach-Straße 12 D-10245 Berlin
First Steps QUALITYCLICK.COM c/o NetSlave GmbH Simon-Dach-Straße 12 D-10245 Berlin Phone +49 30-94408-730 Email [email protected] Fax +49 30-96083-706 Content 1. Fundamentals 2. Preparation 3. Contract,
Secure Networks for Process Control
Secure Networks for Process Control Leveraging a Simple Yet Effective Policy Framework to Secure the Modern Process Control Network An Enterasys Networks White Paper There is nothing more important than
TIBCO Spotfire Metrics Modeler User s Guide. Software Release 6.0 November 2013
TIBCO Spotfire Metrics Modeler User s Guide Software Release 6.0 November 2013 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO SOFTWARE
Using Google Analytics
Using Google Analytics Overview Google Analytics is a free tracking application used to monitor visitors to your website in order to provide site designers with a fuller knowledge of their audience. At
Deploying F5 with Microsoft Active Directory Federation Services
F5 Deployment Guide Deploying F5 with Microsoft Active Directory Federation Services This F5 deployment guide provides detailed information on how to deploy Microsoft Active Directory Federation Services
CS514: Intermediate Course in Computer Systems
: Intermediate Course in Computer Systems Lecture 7: Sept. 19, 2003 Load Balancing Options Sources Lots of graphics and product description courtesy F5 website (www.f5.com) I believe F5 is market leader
RemotelyAnywhere Getting Started Guide
April 2007 About RemotelyAnywhere... 2 About RemotelyAnywhere... 2 About this Guide... 2 Installation of RemotelyAnywhere... 2 Software Activation...3 Accessing RemotelyAnywhere... 4 About Dynamic IP Addresses...
Load Balancing. Outlook Web Access. Web Mail Using Equalizer
Load Balancing Outlook Web Access Web Mail Using Equalizer Copyright 2009 Coyote Point Systems, Inc. Printed in the USA. Publication Date: January 2009 Equalizer is a trademark of Coyote Point Systems
Search Engine Optimization Glossary
Search Engine Optimization Glossary A ALT Text/Tag or Attribute: A description of an image in your site's HTML. Unlike humans, search engines read only the ALT text of images, not the images themselves.
System Recovery in Next to No Time by Jürgen Heyer
Product Test : Storagecraft ShadowProtect Server Edition 3.3 System Recovery in Next to No Time by Jürgen Heyer The advantages of an image-based backup system become obvious when a server fails to start
Digital media glossary
A Ad banner A graphic message or other media used as an advertisement. Ad impression An ad which is served to a user s browser. Ad impression ratio Click-throughs divided by ad impressions. B Banner A
DIGITAL MARKETING BASICS: PPC
DIGITAL MARKETING BASICS: PPC Search Engine Marketing (SEM) is an umbrella term referring to all activities that generate visibility in search engine result pages (SERPs) through the use of paid placement,
