Profiling and Behavioural Advertising Security Seminar 2014: Privacy Maarten Derks & Nick Heijmink
Contents What is behavioural advertising Customer profiling Privacy concerns What are the rules/laws Do Not Track The EU Cookie Law Privacy enhancing technologies: EU Cookie Law issues/improvements Do Not Track issues/improvements Client-side solutions Server/infrastructure-side solutions
Behavioural advertising Tracking customers Find user interest Interest used for targeting purposes Tracking The collection and aggregation of behavioural data of an user Targeting The use of this data during ad selection Advertisement only shown to persons interested Fewer ad impressions are wasted
Example behavioural advertising
Customer profiling What is Customer profiling? According to Nancy J. King, Pernille Wegener Jessen An automatic data processing technique that consists of applying a profile to an individual in order to take decisions concerning him or her; or for analysing or predicting personal preferences, behaviours and attitudes.
Customer profiling technical view In a technical sense, profiling is: Computerized method Involving data mining from data warehouses, Which makes/ should make it possible Place individuals in a particular category With a certain degree of probability, and with a certain induced error rate Take individual decisions relating to them
How do they get this data First-party trackers - Analytic Third-party trackers - Profiling Assigns a machine with a unique identification number (something like "4c812db2922...") stored inside a cookie associated with the web browser First party cookies can become third party cookies Using multiple websites with same tracking network
Removing cookies? 17 percent of Internet users delete cookies on a weekly basis 12 percent do so on a monthly basis 10 percent make it a daily habit
Mobile tracking Generally an individual communication device Games/mobile apps, Web browsing, Contact data, Devise information Geographic location at a particular time (location based advertisement)
What do they store? Age, Age range, Date of birth, Education, Exact date of birth, Gender, Marital status, Home ownership, Own or rent, Estimated income, Exact income, Ethnicity, Presence of children, Number of children, Age range of children, Age of children, Gender of children, Language preference, Religion, Veteran in household, Voter party, Professional certificates (teacher, etc.), Education level, Full name, Email address, City, State, ZIP, ZIP + 4, Home Address, Land-line phone, Social IDs / social media handles and aliases, Mobile phone number, Carrier, Device type, Email address, Vehicle make, model and year, VIN, Estimated vehicle value, Vehicle lifestyle indicator, Model and brand affinity, Used vehicles, Antiques, Apparel (women, men & child), Art, Average direct mail purchase amounts, Museums, Audio books, Auto parts, auto accessories, Beauty and cosmetics, Bible purchaser, Bird owner, Books, Estimated income, Estimated household income, Home value, Length of residence, Purchase date, Purchase price, Purchase amount, Most recent interest rate type, Most recent loan type code, Sales transactioncode, Most recent lender code, Purchase lender code, Most recent lender name, Purchase lender name, Fuel source, Loan to value, Purchase interest rate type, Most recent interest rate, Purchase interest rate, Pool or spa, Home-year built, Air conditioning, Boat ownership, Plane ownership, Motorcycle ownership, Bankruptcy, Beacon score, Credit score-actual, Certificates of deposit/ money market funds, Estimated household income ranges, Income producing assets indicator, Estimated net worth ranges, IRAs, Life insurance, Low-end credit scores, Mutual funds/annuities, Summarized credit score or modelled credit score by neighborhood, Payday loan purchaser, Number of credit lines, Tax liens, Card data, Card holder, Frequent credit card user, New retail card holders, Underbanked or thin file, Stocks or bonds, Average online purchase, Average offline purchase, etc
Applications of customer profiling?
Applications of customer profiling Targeted marketing Finding new locations for stores Analysis of risks and fraud Updating price based on interest
How do they create a profile? Creating a profile based on raw real world data Summarize available, relevant information Reduce the information into a set of categories Algorithm of the filter differs, most of them secret
Targeting Get cookie from user Get profile from user using the cookie Search in database for correlating advertisement Show advertisement
Customer targeting Do you think that consumer profiling and targeting should be allowed? Should advertisers be able to use profiling to predict that a consumer will take advantage of a coupon for online gambling when the profile includes consumers who are likely to be/get addicted to gambling? What if weight-loss aids are promoted to consumers in a profile who have a high probability of having eating disorders, for whom weight-loss aids may create substantial health risks?
Privacy concerns Interference with personal data What will they do with my data? Which data do they collect? How long will they store my data? Which profile is linked to me? Who will see/use my data?
Privacy concerns
Regulatory for behavioural advertising All the EU individuals have privacy and data protection under the Data Protection Directive 95/46 Article 5: Member States shall, within the limits of the provisions of this Chapter, determine more precisely the conditions under which the processing of personal data is lawful. Personal data should be processed on legitimate processing grounds. Collected for specified, explicit and legitimate purposes and not further processed in a way incompatible with those purposes. Processing of data for historical, statistical or scientific purposes shall not be considered as incompatible.
Personal data may be processed if The data subject has unambiguously given his consent. Processing is necessary for the performance of a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract. Processing is necessary for compliance with a legal obligation to which the controller is subject. Processing is necessary in order to protect the vital interests of the data subject. Processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller or in a third party to whom the data are disclosed. Processing is necessary for the purposes of the legitimate interests pursued by the controller or by the third party or parties to whom the data are disclosed.
Do Not Track Webbrowsers Plug-ins 3 types do not track: - Domain blocking - Opt-out cookies - HTTP headers
Do Not Track Domain blocking - Blocks contacting to user specified domains. Opt-out cookies - Informs the domains that the user does not want to be tracked using cookies. HTTP headers - Informs the domains that the user does not want to be tracked using the W3C introduces DNT-tag in the header. Disadvantage: The last two types requires the trust of the user in that the target domain complies. Blocking 1st party cookies: it is very hard to login anywhere Blocking 3rd party cookies: no adverse effects to surfing
EU Cookie Law Privacy legislation that requires websites to get consent from visitors to store or retrieve any information on a computer, smartphone or tablet Users should be provided with clear and precise information about the purposes of cookies.
Set the type of cookie
BREAK
Privacy Enhancing Technologies (PET) EU Cookie Law issues/improvements Do Not Track issues/improvements Client-side solutions - Plugin-based client side profiling - Native client-side profiling Server/infrastructure-side solutions
EU Cookie Law issues Cookies are a low-level mechanism that cannot be easily explained Unclear which cookies could be given a pass and which needed to be explicitly given permission No standardized mechanism for seeking permission Improvements?
EU Cookie Law issues Cookies are a low-level mechanism that cannot be easily explained Unclear which cookies could be given a pass and which needed to be explicitly given permission No standardized mechanism for seeking permission Improvements: - Use the term tracking rather than talking about cookies - Use a stricty controlled syntax for summarizing tracking habits
Example grammar tracking ::= necessary statement * tracking statement * excuses necessary statement ::= We record your what using methods + so you can benefit. what ::= ( login shopping cart )+ tracking statement ::= who tracks you anonymously or not using method +. who tracks you ::= We track you ( Our advertisers company name ) may? track you ( on our behalf )? anonymously or not ::= anonymously personally method ::= server logs the cookie name cookies (unless you disable third-party cookies)? invisible images excuses ::= Source: http://alleged.org.uk/pdc/2012/07/07.html
Do Not Track issues Incorporation of privacy-protecting features in web browsers Two categories require the user to trust that the target domain will comply None of the categories meet requirements from regulatory framework by the Federal Trade Commission (FTC)! Different balance between ease-of-use, universality and enforceability Failed to win the endorsement of advertising industry Focus on behavourial advertising, neglects non-advertising tracking Improvements?
Client-side solutions Goal: reduce or prevent user tracking, while allowing advertising network to retain all or most of the revenue gains achieved from targeting How? By client-side aggregation of personal data Major concerns over behavioural advertising include the user s lack of control over the data collection and retention Allows the user to be targeted while leaving user in possesion of their data Strong alternative to binary solutions like Do Not Track Two types: - Plugin-based client-side profiling - Native client-side profiling
Plugin-based client-side profiling Makes use of a browser plugin installed on the user s machine Plugin maintains a collection of the user s browsing and behavioural data and uses it to facilitate targeting during ad selection We will discuss three examples: Privad Adnostic RePriv
Plugin-based client-side profiling: Privad Developed by the Max Planck Institute for Software Systems (MPI-SWS) Goal: complete user privacy User behaviour monitored by plugin that stores profile on client machine Ad server sends large set of potential advertisements to plugin Plugin selects ad to achieve targeting, based on local profile Ad impressions and clicks are sent encrypted through thirdparty dealer, which anonymizes the source http://adresearch.mpi-sws.org/
Plugin-based client-side profiling: Privad
Plugin-based client-side profiling: Adnostic Developed by Stanford University and New York University Browser plugin selects ad based on locally constructed profile In contrast to Privad: ad impressions kept hidden, but ad clicks not Less vulnerable to click fraud, but reveals targeting attributes of user Plugin makes selection out of 10-20 ads sent by ad network Information about selection is encrypted and aggregated, occasionally decrypted by a trusted third party https://crypto.stanford.edu/adnostic/
Plugin-based client-side profiling: Adnostic
Plugin-based client-side profiling: Downsides Users have to install a plugin Requires ad platform to comply Both plugins make fraud detection difficult Increases network traffic and load times Advertiser budget constraints: - Estimating when an ad s budget will expire - Could result in ads being shown too many/few times Both approaches take control over tracking and targeting out of the hands of the advertising network
RePriv Developed by Microsoft Research Constructs user profiles from raw browsing data on the client machine Sends them to the ad network to facilitate targeting server-side Allows the ad network to view user data and perform personalization User has option to review sent data, and either approve or disapprove Solves difficulties regarding fraud, budgets and innovation Downside: reveals user attributes, requires trust https://research.microsoft.com/en-us/projects/repriv/
Native client-side profiling: Client-only profiles Stores behavioural information on the client, but no browser plugin required Gives users control over data while allowing platforms to target advertisements without making significant structural changes to the current delivery machanisms User behaviour maintained in aggregated form, along with cache of raw recent behaviour in the browser cookie associated with the ad network Only record of user behaviour is maintained on the client in the cookie, leaving the user with the option of deleting their profile any time Downside: Relies on policy compliance by ad networks, not enforceable
Client-only profiles Integration with bidding systems Use of machine learning for predicting a user s future interests Revenue impact analysis: Targeting performance comparable to that of server-side profiling, but without the need to track user behavior server-side Nujabes_Featuring_Shing0 2-Luv(sic)_Part_5-VINYL- FLAC-2012-FrB
Server/infrastructure side solutions Goal: minimizing data retention needs while maintaining efficacy of algorithmic predictions for product placement How? By segregation of data into: 1. Identifying Information Component 2. Tracking Information Component 3. Optimization Information Component This facilitates anonymous tracking Use of inferencing algorithms to select relevant advertisements: Markov model, generalized regression, user similarity,... Downside: re-identification of anonymous data is possible under certain circumstances
Server/infrastructure side solutions Nujabes_Featuring_Shing0 2-Luv(sic)_Part_5-VINYL- FLAC-2012-FrB
Conclusion We know what behavioural advertising is and how it works There are laws created for data processing Multiple regulations in place, but they are not very effective Enforcing and checking policies is difficult Client-side solutions available that require minimal changes to ad network infrastructure (but do require the user to take action) Server-side solutions require adoption by ad networks
Questions?
Literature http://www.youronlinechoices.com http://www.clickz.com/clickz/news/1691871/study-consumers-delete-cookies-surprising-rate Bilenko, M., Richardson, M., & Tsai, J. Y. (2011, July). Targeted, not tracked: Client-side solutions for privacy-friendly behavioral advertising. In The 11th Privacy Enhancing Technologies Symposium (PETS 2011). Toubiana, V., Narayanan, A., Boneh, D., Nissenbaum, H., & Barocas, S. (2010, February). Adnostic: Privacy Preserving Targeted Advertising. In NDSS. Thaw, D., Gupta, N., & Agrawala, A. Privacy-Friendly Design for Online Behavioral Advertising Systems. Pam Dixon and Robert Gellman, The Scoring of America, How Secret Consumer Scores Threaten Your Privacy and Your Future (2014,april), http://www.worldprivacyforum.org/wpcontent/uploads/2014/04/wpf_scoring_of_america_april2014_fs.pdf