Privacy leakage vs. Protection measures: the growing disconnect



Similar documents
How To Find Out If Pii Is Leaked Via Anosn (For A Free Download)

Privacy leakage on the Internet

3 Approach. 3.1 Web Site Overview

WEBSITE PRIVACY POLICY. Last modified 10/20/11

Cookie Policy. Introduction About Cookies

1. Important Information

Information Collected. Type of Information Collected. We may collect two general types of information when you use the Site:

Web Tracking for You. Gregory Fleischer

Cookie Policy. Introduction About Cookies

Opinion 04/2012 on Cookie Consent Exemption

Host Fingerprinting and Tracking on the Web: Privacy and Security Implications

Beasley Broadcast Group, Inc. Privacy Policy

Privacy Policy/Your California Privacy Rights Last Updated: May 28, 2015 Introduction

Leonardo Hotels Group Page 1

Unless otherwise stated, our SaaS Products and our Downloadable Products are treated the same for the purposes of this document.

SKoolAide Privacy Policy

AdvancedMD Online Privacy Statement

WHAT INFORMATION IS COLLECTED AT MOTOROLA.COM.VN AND/OR MOTOROLA.VN AND HOW IS IT PROCESSED AND USED?

Device Fingerprinting and Fraud Protection Whitepaper

1. The information we collect and how we collect it.

Ad Tech for Privacy Counsel November 5, Marc M. Groman, President & CEO Network Advertising Initiative

PRIVACY POLICY. Mil y Un Consejos Network. Mil y Un Consejos Network ( Company or we or us or our ) respects the privacy of

Elo Touch Solutions Privacy Policy

RDM on Demand Privacy Policy

FitCause Privacy Policy

Lifesize Cloud Privacy Statement

Zubi Advertising Privacy Policy

If you have any questions about our privacy practices, please refer to the end of this privacy policy for information on how to contact us.

PRIVACY POLICY. Your Personal Information will be processed by Whistle Sports in the United States.

Privacy Policy. Effective Date: November 20, 2014

A guide to affilinet s tracking technology

ATIS Open Web Alliance. Jim McEachern Senior Technology Consultant ATIS

Green Pharm is committed to your privacy. We disclose our information practices below and we agree to notify you of:

Collection and Use of Information

Information We Collect and Store as You Access and Use the Site

Privacy Policy. MSI may collect information from you on a voluntary basis when you:

SAP Splash Privacy Statement

Privacy Policy and Notice of Information Practices

DESTINATION MELBOURNE PRIVACY POLICY

We may collect the following types of information during your visit on our Site:

Anonymity on the Internet Over Proxy Servers

Adaptive Business Management Systems Privacy Policy

Zep Inc.: Global Online Privacy Notice

AIG INSURANCE COMPANY OF CANADA Privacy Principles

2. What personal information do we collect and hold?

Privacy Policy. Introduction. Scope of Privacy Policy. 1. Definitions

ShoreTel Advanced Applications Web Utilities

Is preventing browser fingerprinting a lost cause?

TargetingMantra Privacy Policy

1.5.5 Cookie Analysis

Recon and Mapping Tools and Exploitation Tools in SamuraiWTF Report section Nick Robbins

Frequently Asked Questions for the USA TODAY e-newspaper

Internet Banking System Web Application Penetration Test Report

Index. AdWords, 182 AJAX Cart, 129 Attribution, 174

ETHICAL ELECTRIC PRIVACY POLICY. Last Revised: December 15, 2015

Facebook Smart Card FB _1800

INTRODUCTION We respect your privacy and are committed to protecting it through our compliance with this privacy policy.

Privacy Policy. PortfolioTrax, LLC v1.0. PortfolioTrax, LLC Privacy Policy 2

PRIVACY POLICY. The Policy is incorporated into Terms of Use and is subject to the terms laid down therein.

Use of Web Measurement and Customization Technologies

girlsdrivebetter.com is a trading style of Policywise Ltd, a limited liability company registered in England and Wales number

LIVE CHAT CLOUD SECURITY Everything you need to know about live chat and communicating with your customers securely

Web Traffic Capture Butler Street, Suite 200 Pittsburgh, PA (412)

PRIVACY POLICY. I. Introduction. II. Information We Collect

Verified Volunteers. A division of SterlingBackcheck. Privacy Policy. Last Updated: November 5, 2014

Privacy Policy I. THE INFORMATION WE COLLECT AND HOW WE USE IT

Transcription:

Privacy leakage vs. Protection measures: the growing disconnect Balachander Krishnamurthy (AT&T Labs Research) Konstantin Naryshkin (Worcester Polytechnic Institute) Craig E. Wills (Worcester Polytechnic Institute) AT&T Labs Research 1

Presenter Balachander Krishnamurthy http://www.research.att.com/~bala/papers AT&T Labs Research 2

Ongoing work since 2005 Cat and Mouse: Content delivery tradeoffs in Web access (WWW 06) Generating a privacy footprint (IMC 06) Privacy Loss and impact of Privacy Protection (SOUPS 07) Characterizing privacy in OSNs (WOSN 08) Privacy Diffusion on the Web: A Longitudinal Perspective (WWW 09) On the leakage of PII via OSNs (WOSN 09) Moral: People read WSJ Privacy issues on Twitter (ICA 10) PII leakage in mobile OSNs (WOSN 10) Situation has steadily worsened with each paper... AT&T Labs Research 3

This study: Popular Non-OSN sites with user accounts Why? Millions of users create accounts with varying degrees of personal information Hitherto unexamined Beyond traditional leakage, potential to examine linkage Examine various protection measures effectiveness in combating leakage/linkage Focus on direct leakage of bits of private information to third party aggregators AT&T Labs Research 4

What is leakage? Depends on the viewpoint! User: Personal information shared with any site other than first party First party: Contracted work outsourced to third party (e.g. analytics) Tracking by third party for marketing/demographic information A first party site can legitimately state that a third party working on its behalf obtaining user data is not leakage. Risks can be lowered by explicit statements in privacy policies covering all data shared in any transaction. But not all first parties have unambiguous, clear statements and thus from user s POV the concern of leakage remains. AT&T Labs Research 5

Categories and choice of sites for study 10 popular + 1 sensitive category + OSNs = 12 categories (120 sites) Site must have a minimum of 100K registered users (most have millions) Select among 17 Alexa (sub)categories and end up with: Arts, Employment, Video Game News, Photosharing, News, Travel, Shopping, Relationships, Generations and Age Groups, and Sports. Sensitive category: Health OSN category: beyond popularity, mainly for contrast AT&T Labs Research 6

Data gathering Use Fiddler Web proxy to capture HTTP requests/responses Ignore SSL (tiny fraction) Steps: create account, confirm verification message, create profile In many sites we can use existing accounts in Facebook, Google, or Twitter but we created accounts directly whenever possible AT&T Labs Research 7

Interacting with sites Actions available only to registered users, and some common actions available to all users Share content with friends via email or OSN links when feasible Create a set of sensitive strings from user s profile (name, email, address). Include search keywords sent to Health and Travel sites Stored HTTP request/response and POST data and then look for any of these strings transmitted to 3d-parties Eliminate false positives by hand (e.g., 90210 being part of long number string) Obviously we only find lower bound of leakage, only for set of actions done, and only what is not encrypted/modified AT&T Labs Research 8

Results: executive summary Worse than our (jaded) expectations Search strings to healthcare sites and travel details going to aggregators was a tad surprising as was extent of leakage (9 of 10 popular sites in these categories) Potential of linkage is high through cookies and other linkable elements Existing techniques are largely ineffective Recall our caveat about what is leakage Next: selected results in more detail on per-action class basis AT&T Labs Research 9

Action: Account creation/confirmation User s email address leaked via a Sports category website via Referer header to a doubleclick.net server. GET http://ad.doubleclick.net/adj/... Referer: http://submit.sports.com/...?email=jdoe@email.com Cookie: id=35c192bcfe0000b1... We will come back to this example when talking about linkage via email address later. AT&T Labs Research 10

Action: Account login/navigation Some sites store private information in site-specific first-party cookies Email, Name, Zipcode leakage Via 1st-party cookies to hidden 3d-Party GET http://metrics.agegroups.site/b/ss/..global/... Host: metrics.agegroups.site Referer: http://www.agegroups.site Cookie:...e=jdoe@email.com&f=John&l=Doe&...&p=12201... metrics.agegroups.site is AGEGROUPS.122.2o7.net i.e., Omniture (2nd largest aggregator, now part of Adobe). Hidden DNS issue is not addressed by any protection measures domain cookie assumes metrics.agegroups.site is same as AGEGROUPS.site and is sent to Omniture. AT&T Labs Research 11

Action: entering content User s input is often sent as parameters in a GET request instead of POST. Fetching embedded 3d-party objects in page result in sending user output in Referer header of subsequent request. Example showing age, zip code and gender of a user on PHOTOSHARE site being leaked to specificlick.net GET http://afe.specificclick.net/?l=7654&sz=200x250... Referer: http://a.photo.com/.../age=30/zip=12201/gender=m/... AT&T Labs Research 12

Action: Searching for Sensitive Terms Users have higher expectation of privacy here Search in Health site: sensitive string to quantserve via Referer header GET http://pixel.quantserve.com/pixel;r=1423312787... Referer: http://search.health.com/search.jsp?q=pancreatic+cancer Search in Travel site: travel itinerary origin/destination/dates sent to doubleclick GET http://pix04.revsci.net/...travelsite Referer: http://fls.doubleclick.net/...u11=economy/coach;u3=bos; u4=20110415 20110417;u1=Flight;u2=MCO... AT&T Labs Research 13

Sensitivity and identifiability of leaked bits High address(3) homephone(2) Identifiability Medium Lesssignificant email(15) fullname(20) More-significant age(21) zip(18) job(4) gender(16) activities(3) city(16) employer(1) travel-ssearch(9) Low Low Medium Sensitivity health-ssearch(9) High Numbers in paren are number of sites leaking that bit of data. Search strings in Health and Travel are shared in 9 (out of 10) cases. AT&T Labs Research 14

Leakage across categories Action Sites w/ Account View/ Direct Create Login/ Edit Input Sens. Category Leakage Account Navig. Profile Content Search Health 9 0 1 0 0 9 Travel 9 0 1 0 0 9 Employment 8 0 2 2 7 0 OSN 7 0 3 5 0 0 Arts 7 0 3 4 1 0 Relationships 7 0 3 2 2 0 News 5 0 5 0 0 0 PhotoShare 4 3 3 0 1 0 Sports 4 1 2 0 1 0 Shopping 3 0 2 0 2 0 AgeGroups 2 0 1 1 0 0 VideoGames 2 0 1 1 0 0 Tot. Sites/Cat. 67/12 4/2 27/12 15/6 14/6 18/2 67 (56%) of 120 sites directly leak private info to at least one 3rd-party. AT&T Labs Research 15

What is linkage? Aggregators receive data from multiple sites (sources). Similar or identical identifying information of users is often present across sites. Aggregators are in a position to link data to get a broader picture about users. Linking can be done via Globally unique IDs and even in the absence of cookies (via browser fingerprinting) Are they linking? We don t know. We report. You decide. AT&T Labs Research 16

Linking via Globally Unique IDs 48% of sites leaked userid to a 3d-party Potential linkage of records through email address GET http://ad.doubleclick.net/activity;... Referer: http://f.nexac.com/...http://www.employment.com/... na fn=john&na ln=doe&na zc=12201& na cy=albany&na st=ny&na a1=24 Main St.& na em=jdoe@email.com... Cookie: id=22a348d29e12001d... Same email address was present in our first example of leakage from a Sports site. Email leaked by 13% of sites many users use same email across sites. AT&T Labs Research 17

Linking in the absence of cookies Cookies are just one vector of linkage. Other vectors include client IP addresses and browser fingerprints. One fingerprint is the set of plugins a user has installed on their browser. Omniture gathers this in the Request-URI in several browsers (Firefox, Chrome). A good use of the browser fingerprint is for the first party to know what versions of plugins users have installed. A possibly risky use is the ability that third parties now have to link users across sites even in the absence of cookies, as the specific set of plugins and versions is likely to be unique. AT&T Labs Research 18

Summary of existing protection measures block: Blocking requests to targeted 3d-parties (AdBlock, IE9, block-hidden) NoCook and Nojs: Refusing all or 3d-party cookies, blocking JavaScript Referer: filtering (modifying/removing) protocol headers Anon: Anonymizing user s IP address (via a proxy, Tor) Opt-out: Opting out of tracking by using NAI opt-out cookie DNT header: Request not to be tracked FTC Dec 10 report: input to legislation, advocated Privacy By Design (PBD) PBD: embed privacy early, make default private, enable access to user data AT&T Labs Research 19

How effective are these techniques/proposals? Protection Measure Leakage/Linkage Scenario block block-hidden nocook no3rdcook nojs referer anon opt-out a) User visit Hidden third party 3rd-party tracking linkage 1st-party tracking linkage b) Leakage via Referer Leakage via cookies Leakage via JS c) Linkage via IP addr Linkage via Flash cookies Linkage via fingerprint Linkage via GUID Linkage w/ Other Sources a) Expected, b) Known, c) Potential AT&T Labs Research 20

Shortcomings Most of the protection measures are ineffective (including current proposals) block seems to be a user s best friend. Turning of JavaScript has negative consequences (not always..) DNT: Enforcement, fines, and reverse engineering technology that can analytically demonstrate users are not being tracked are possible way forward. Economic acquisitions are accelerating Right to be forgotten is needed Hidden parties should be brought to the front. Privacy policies have to be less opaque. 3d-parties can be more demonstrative of their non-linkage guarantees. AT&T Labs Research 21

Conclusion Vectors of leakage abound in non-osn sites. Notion of sensitive and identifiable bits of information. Additional concerns of possible linkage. User s point of view should carry more weight. First party sites can contribute more. AT&T Labs Research 22