BEHAVIORAL ONLINE MARKET RESEARCH: BEST PRACTICES October 2010 Beginning in November 2009, the Center for Democracy & Technology (CDT) hosted a series of meeting of its Internet Privacy Working Group (IPWG) to discuss privacy issues related to the burgeoning area of online data collection for eventual use in the aggregate rather than for behavioral targeting. Most companies with an online presence use this so-called online market research to help them understand site and Internet trends. While best practices have formed around the collection of consumer information for the purposes of online behavioral advertising, there is little guidance for companies engaging in the wide range of practices that constitute online market research. IPWG decided that it would be useful to create a set of best practices for entities engaging in what we define as the collection of behavioral online market research data. This document contains those best practices. This document has been crafted with input from companies that engage in market research, consumer advocates, and market research industry groups. The document seeks to outline a path by which privacy concerns associated with online market research data collection practices can be addressed in a way that empowers both consumers and companies. DEFINITIONS General Definitions Active consent: An affirmative action taken by a data subject that demonstrates agreement to participate in online market research. For example, a data subject has actively consented when she has taken an action in response to a clear and prominent dialog box in which there is no design bias for the user interaction (UI) element used to promote consent (e.g., no pre-checked box, no size difference between the accept and decline buttons). Behavioral online market research: Online market research that involves the collection of behavioral online market research data. Behavioral online market research data: Information about a data subjectʼs Internet activities, collected for the purpose of generating online market research findings, that is observed or inferred or has not been reported by the data subject for the purpose of online market research. Publicly reported data is not considered behavioral online market research data. However, any individual-level data (including publicly reported data) appended to behavioral or self-reported online market research data is considered behavioral online market research data.
Examples of behavioral online market research data include, but are not limited to, clickstream data, location data, user-generated content, social network connections, and assumptions about a participantʼs attitudes or beliefs that are collected, in whole or in part, for the purpose of generating online market research findings and are either based on observed data that was not self-reported by the data subject or are self reported by the data subject for purposes other than online market research. Data that is combined with any of these types of behavioral online market research data or with any selfreported online market research data, phone book information, Tweets, or other publicly reported information, is considered behavioral online market research data. Clickstream data: Data collected about a data subjectʼs website visits or other online activities for the purpose of generating online market research findings, with or without the data subjectʼs awareness. Some current examples are the data subjectʼs IP address and cookies, the date and time of the activity, the URL of a requested site, the data subjectʼs browser and operating system types, the links the data subject clicks on, and the referring URL. Communication content: The substance of a transmission destined for one or more specified individuals (as opposed to a site, service, or application). Current examples include the subject and body of an email and the content of a voice call or text chat. Data subject: An individual or group of individuals such as a household that provides online market research data. A data subject may not realize she is participating in market research and may not have consented to participate. Device: A computer, cell phone, PDA, or other machine capable of accessing the Internet. Derived data: Data relating to an analysis about a data subject derived from the data subjectʼs clickstream data, purchase data, user-generated data, communications content, or other data. Internet activities: Activities of an individual or group of individuals such as a household - that take place in whole or in part over the Internet, where the Internet is defined as the system of interconnected networks that use the Internet Protocol for communication with resources or endpoints (including computers, webservers, hosts, or other devices) that are reachable, directly or through a proxy or gateway, via a globally unique Internet address assigned by the Internet Assigned Numbers Authority (IANA). To be considered part of the Internet for this document, an Internet end point must either be identified by a unique address assigned through the IANA or its delegate registry, or be reachable through a private address assigned by a broadband service provider. This definition shall not include addresses created by a data subject or on a data subject's premises for the data subjectʼs internal network purposes. We do not intend for this definition of the Internet to encompass private intranets generally inaccessible to users of the Internet, or private networks that are typically created within residences behind carrier-provided gateways. For example, the following might be generated by Internet activities: clickstream data, communications content, purchase data, user-generated content, and data relating to user interactions. 2
Online behavioral advertising: The collection of data from a particular computer or device regarding Web viewing behaviors over time and across Web sites for the purpose of using such data to predict user preferences or interests to deliver advertising to that computer or device based on the preferences or interests inferred from such Web viewing behaviors. Ongoing data collection: The collection of either self-reported market research data or behavioral market research data either continuously or at multiple instances, such that data from one instance or period in time is associated, at the individual level, with data from another instance or period in time or with data from other sources. Online market research: The collection and analysis of data generated by, derived from, or provided via those Internet activities regarding opinions, needs, awareness, knowledge, views, experiences or behaviors of a population in which no sales, promotional or marketing efforts are involved and through which there is no attempt to influence a participantʼs attitudes or behavior. This definition does not encompass political research. The collection of data about a userʼs Internet activities for any use other than operational purposes or online behavioral advertising likely falls under the umbrella of online market research. For example, online market research might be conducted using clickstream data, communication content, derived data, resident data accessed during Internet activities (such as a contact list stored on a mobile device and accessed by an application), purchase data, user-generated data, or location data collected for analytics purpose or to otherwise measure an audience. Online market research might be conducted using a combination of online market research data and offline data. Offline data: Data that is not related to a data subjectʼs Internet activities. Current examples include catalog purchases and credit scores. Operational Purpose: A purpose reasonably necessary to facilitate, improve, or safeguard the logistical or technical ability of an entity to provide goods or services, manage its operations, comply with legal obligations, or protect against risks and threats, including: providing, operating, or improving a product or service used, requested, or authorized by an individual, including the ongoing provision of customer service and support; analyzing data related to use of the product or service for purposes of improving the entityʼs products, services, or operations; basic business functions such as accounting, inventory and supply chain management, quality assurance, and internal auditing; protecting or defending the rights or property, including intellectual property, of the covered entity against actual or potential security threats, fraud, theft, unauthorized transactions, or other illegal activities; preventing imminent danger to the personal safety of an individual or group of individuals; complying with a Federal, State, or local law, rule, or other applicable legal requirement, including disclosures pursuant to a court order, subpoena, summons, or other properly executed compulsory process. This term shall not include the use of collected data for marketing or advertising purposes, or any use of or disclosure of collected data for such purposes. 3
Participant: An individual or group of individuals such as household - that has actively consented to providing online market research data. All participants are data subjects, but not all data subjects are participants. Publicly reported data: Data that there is a reasonable basis to believe is lawfully made available to the general public from: (i) Federal, State, or local government records; (ii) widely distributed media; or (iii) disclosures to the general public that are required to be made by Federal, State, or local law. A reasonable basis can be determined by checking that the information is of a type that is available to the general public and that the individual has not taken steps to keep the information from being made available to the general public. Examples of publicly-reported data include public phone books, Twitter posts that are not locked, and messages left on public comment boards. Purchase data: Data relating to an individualʼs purchase or acquisition of goods or services, such as the items purchased, date and time of purchase, payment information, and shipping information. Purchase data is sometimes a type of data generated by Internet activity and is sometimes a type of offline data. Single-instance data collection: The collection of either self-reported market research data or behavioral market research data over a short, pre-determined period of time (length of a survey, one hour of Web browsing, etc.). The information about the data subject in single-instance data collection is limited to the information provided by the data subject during that pre-determined period of time; such data is not later supplemented with data appends corresponding to that specific individual. Self-reported online market research data: User-generated data that a data subject actively and knowingly reports for the purpose of online market research. Examples of self-reported online market research data include survey responses, selfreporting of behaviors, and self-reporting of attitudes. In each of these cases, the data subject must be reasonably expected to understand that she is self-reporting for the purposes of online market research. Examples of data types that are NOT considered self-reported online market research data include, but are not limited to, clickstream data, location data, user-generated content, social network connections, and derived data based in part or in whole on observed data that was not self-reported by the data subject. Sensitive data: Data related to an individual that is generally recognized as calling for some measure of special treatment. Examples may include data related to health or medical conditions, finances, sexual behavior or orientation, or mobile location data. Supplemental market research data: Information used to supplement online market research data that, from the perspective of the entity collecting online market research data, is not directly observed from, or reported to it by, the data subject. Examples of supplemental market research data include data lists purchased by the market researcher from third parties, demographic information associated with a data subjectʼs zip codes, or online market research purchased from another entity. 4
User-generated data: Data generated knowingly by a data subject. Some current examples are search terms, input into online form fields, and posts on public forums. User interaction: The transfer of data between a data subject (or his or her device) and a site, service, or application. Identifiability Definitions These definitions measure data identifiability from the perspective of the entity collecting data for online market research (as opposed to an outside observer or consulting statistician, for example). How easy or hard it may be for such an entity to use data to identify a data subject depends on the other data sources available to the entity, the capabilities of the entity, and the time, effort, and cost required to identify data subjects. Aggregate data: Data about multiple data subjects that cannot reasonably be expected to directly or inferably identify any data subject. Directly identifiable data: Data that directly and overtly identifies an individual or group of individuals such as a household such as name, address, email address, phone number, government identifier, or financial identifier. Inferably identifiable data: Data from which the identity of an individual or group of individuals such as a household - can be reasonably inferred, including combinations of data elements or data sets that would not, on their own, identify a data subject. Pseudonymous data: Data associated with a unique identifier that does not directly identify an individual or group of individuals, such as a household. Note that all inferably identifiable data is pseudonymous, but all pseudonymous data is not necessary inferably identifiable. Guidelines This section provides a set of general best practices for guiding the collection of data to be used in behavioral online market research (single-instance or ongoing) as defined above. Practices that have not traditionally been considered market research may in fact fall within the scope of behavioral online market research; indeed, these guidelines were designed specifically to augment, rather than to replace, well-established standards for more traditional variants of market research. The collection of self-reported online market research data is outside the scope of these guidelines.1 Principles The purpose of this document is to set out best practices for entities participating in online market research or using data generated by online market research. This document has been structured around a set of Fair Information Practice principles (FIPs) 1 Parties interested in best practices for single-instance collection of self-reported online market research data should read http://www.mra-net.org/resources/documents/codemrstandards.pdf, http://www.esomar.org/uploads/pdf/esomar_codes&guidelines_internet_v6.pdf, and http://www.imro.org/profstds/. 5
and is designed to promote compliance with these principles. The focus of this document is on the first six of the principles listed below. - Purpose Specification - Data Minimization - Use Limitation Transparency Data Quality and Integrity Individual Participation - Security - Accountability and Auditing Purpose Specification Transparency of purpose: Entities collecting or using behavioral online market research data should only allow the data to be used for the purposes disclosed to data subjects prior to commencement of collection. Incentives and behavioral online market research: A data subject should not be compelled to participate in behavioral online market research by the offering of essential products or services that are exclusively and solely available through research participation. A current example would be conditioning the receipt of Internet service on the data subjectʼs participation in behavioral online market research or conditioning participation in a social network on participation in behavioral online market research. Note that this does not exclude the offering of survey incentives to reward data subjects for their participation that are either non-essential or are otherwise available for purchase. Transparency Disclosure of practices: Entities that, according to the definitions above, are engaged in behavioral online market research should state in clear and unambiguous language on their Web sites, in privacy policies, and in any clear and prominent notices (see below) that they are engaged in behavioral online market research. Entities should include in these notices language like we use your data for the purpose of behavioral online market research. Vague language like we use this data to improve our product or we use this data for marketing purposes is insufficient. The privacy policy: An important element of proper notification involves the privacy policy. Any collection of behavioral online market research data should be explained using easy-to-understand language in the privacy policy that governs the Web site, application, toolbar, or other entities involved in data collection. The privacy policy should also include a prominent and meaningfully labeled link or mechanism through which participants can opt out of data collection. The privacy policy should include the following information: who is collecting this data, how this data will be shared and used, how long this data will be stored in directly identifiable or inferably identifiable form, precisely what types of data are being collected for use in behavioral online market research, what 6
types of individual-level supplemental data may be appended to collected data, how long the data collection phase will last, how the data subject can access collected data, and how the data subject can opt out of collection of this data or its use in behavioral online market research. Clear and prominent notice: The privacy policy is a necessary but insufficient form of notice. On all Web sites, applications, toolbars, or other spaces through which data collection will occur, the data subject should also be presented with a clear and prominent, concise, easy-to-understand and accessible notice, or a clear, prominent, and meaningfully labeled link to such a notice, that describes who is collecting this data, how this data will be shared and used, how long this data will be stored in directly identifiable or inferably identifiable form, precisely what types of data are being collected for use in behavioral online market research, what types of individual-level supplemental data may be appended to collected data, how long the data collection phase will last, how the data subject can access collected data, and how the data subject can opt out of collection of this data or its use in behavioral online market research. This notice may be combined with other notices about data collection practices for example notices about collection for use in targeted advertising or notices about collection by other online market research entities provided that the required information can still be presented to the data subject in an easy-to-understand and prominent way. As discussed above, notices should explain, in easy-to-understand language, what types of data are being collected from the data subject. For example, the notice should describe which, if any, of the following data types is being collected or appended to collected data: clickstream data, communication content, derived data, offline data, publicly available data, purchase data, sensitive data, supplemental data, usergenerated data, civic location data, geodetic location data, mobile location data, fixed location data, nomadic location data. If the meaning of the relevant terms are not selfevident to the typical data subject, then the notice should explain in simple language what the relevant terms mean. Individual Participation/Data Quality and Integrity Opt out: If an entity is collecting behavioral online market research data, then this entity is responsible for ensuring that the data subject is presented with a clear and prominent, concise, easy-to-understand and accessible notice on all Web pages, application interfaces, or other spaces through which this data collection occurs. This notice should either describe or serve as a link to a notice that describes who is collecting this data, how this data will be shared and used, precisely what types of data are being collected, how long data will remain in directly identifiable or inferably identifiable form, and what types of supplemental, directly identifiable data may be combined with this information. This notice should include a mechanism by which the data subject can opt out of collection of this data or its use in behavioral online market research. Collection of directly identifiable information, inferably identifiable information, and sensitive information: No entity should collect directly identifiable, inferably identifiable, and/or sensitive behavioral market research data unless the data subject has actively consented to collection for the specified purpose. The UI through which the data subject may actively consent should include a clear and prominent, concise, and 7
easy-to-understand notice that describes who is collecting this data, how this data will be shared and used, how long this data will be stored in directly identifiable or inferably identifiable form, and precisely what types of data are being collected for use in behavioral online market research. Collection of all or substantially all of a data subjectʼs activity: Entities should not collect for behavioral market research purposes all or substantially all of a data subjectʼs Internet activities on a particular application without the active consent of the data subject. The UI through which the data subject may actively consent should include a clear and prominent, concise, and easy-to-understand notice that describes who is collecting this data, how this data will be shared and used, how long this data will be stored in directly identifiable or inferably identifiable form, and precisely what types of data are being collected for use in behavioral online market research. Honoring the data subjectʼs choice: All entities that collect behavioral online market research data should provide a clear and prominent, easy-to-use, and accessible method that gives the data subject control over whether her data can be collected for this purpose. In the case of ongoing market research, the data subjectʼs choice expressed using this method should be (1) available for the data subject to view and change, and (2) persistently honored until the data subject decides to alter that choice. A data subjectʼs decision not to participate in a particular instance of behavioral online market research needs to be implemented using technologies that will protect the persistence of the decision. Additionally, the data subject should have a simple way to determine her participation status. It must be possible for the data subject to communicate a decision to not participate in behavioral online market research over the Internet. For example, it is insufficient to require data subjects to send a letter by postal mail in order to opt out of participation. This requirement does not preclude entities from offering other opt out mechanisms, such as postal mail or text message. Revocation of consent: The entity collecting behavioral online market research data should provide data subjects who are participating in ongoing market research with easy-to-use and easy-to-find mechanisms for canceling participation in behavioral online market research. Canceling participation must be possible on the Internet. For example, it is insufficient to require the data subject to send a letter by postal mail in order to cancel participation. This requirement does not preclude entities from offering other opt out mechanisms, such as postal mail or text message. If a participant (a data subject who has actively consented) revokes consent, then all data collected about the participant should be de-identified, i.e. rendered neither directly identifiable nor inferably identifiable. If a non-participant opts out of data collection, then all information still linked to that data subject or her device should be deleted. The entity collecting behavioral online market research data should also inform the data subject of the existence of this consent revocation mechanism. This provision should not be interpreted as a requirement that an entity storing behavioral online market research data in the aggregate form delete a data subjectʼs contribution to that data upon request. Access: If data collected for the purpose of behavioral online market research is stored at an individual level and remains associated with the data subject or the data subjectʼs device, then the data subject should be provided mechanisms for appropriate access to 8
and correction of this data or the relevant derived data. If it is not possible for the entity storing this data to provide access, then the entity must delete the data (including derived data) associated with the data subject or the data subjectʼs device. Entities must also, upon request, de-identifiy, i.e. render neither directly identifiable nor inferably identifiable, any data about a participant (a data subject who has actively consented) that is linked to that participant or her device. Entities must, upon request, delete any data about a non-participant that is still linked to that data subject or her device. Requests for access, correction, deletion or de-identification should be responded to within a timely manner. Data Minimization Minimizing collection: Only data directly relevant and necessary to a specified purpose should be collected. Minimizing the collection of consumer data reduces the privacy risks associated with behavioral online market research. Minimizing data retention periods: Data should be kept only as long as necessary to complete the purpose for which it was collected, unless otherwise required by law. Directly and inferably identifiable data should be removed from the data set as soon as possible. Aggregation: Entities collecting data for behavioral online market research should aggregate data as soon as is practical and should retain it for no longer than necessary to accomplish the specified purpose. No directly or inferably identifiable data should be retained in that form more than thirteen months after the end of the collection phase of the research. No claim should be made that data is in deidentified or anonymous form until that data is in a form such that it is neither directly identifiably nor inferably identifiable. Use Limitation Material changes: If conducting directly identifiable information, inferably identifiable information, sensitive information, or all/substantially all of a data subjectʼs Internet activities for the purposes of behavioral online market research, then the active consent of the data subject is required prior to making material changes to either data collection methods/scope or data use practices going forward as they relate to behavioral online market research. If conducting a type of behavioral online market research that does not require the active consent of the data subject, then prior to making material changes to either data collection methods/scope or data use practices going forward, the entity should present the data subject with a clear and prominent, concise, easy-to-understand, and accessible notice, or a prominent and meaningfully labeled link to a notice that describes these changes and how the data subject can opt out of collection. An entity seeking to make material changes to data use practices for any previously collected data must provide an additional, distinct mechanism for active consent and an accompanying concise, easy-to-understand notice explaining the new uses for previously collected data. This entity must receive active consent from the data subject 9
before using the data in ways that the data subject had not previously consented to. Appending data: Entities collecting data for use in behavioral online market research or otherwise conducting behavioral online market research should not append directly identifiable, inferably identifiable, or sensitive supplemental market research data to data collected through behavioral online market research unless the data subject has provided active consent for such appending. For more information, contact Erica Newland at enewland@cdt.org or (202) 637-9800. 10