HKU Big Data and Privacy Workshop Privacy Risks of Big Data Analytics From a Regulator s Point of View 30 November 2015 Henry Chang, IT Advisor Office of the Privacy Commissioner for Personal Data, Hong Kong
Big Data Analytics and Mobile Apps 1. Data protection principles 2. Big data analytics and privacy 1
Collection OECD Privacy Framework Principles Personal Data Flow Storage, Use or Processing Retention/ Erasure IT System Collection Limitation Data Quality Purpose Specifications Use Limitation Security Safeguards Openness Individual Participation Accountability 2
Big Data Analytics and Mobile Apps 1. Data protection principles 2. Big data analytics and privacy 3
Big Data and Privacy Failures of Big Data Analytics 4
Failures of Big Data Analytics Google Flu Prediction It does not always work Underestimated by half in 2009 when comparing with CDC data Overestimated by half in 2012 when comparing with CDC data Predictor of flu or predictor of winter? A black-box approach makes it hard for people to judge 5
Failures of Big Data Analytics US Presidential Election Past performance does not guarantee future results Colorado professors built a data model that correctly backward predicted the eight US presidential election results since 1980 It failed to forward predict the 2012 election 6
Big Data and Privacy Privacy risks of big data analytics 7
Privacy Risks of Big Data Analytics 1. Sense of rights violation or surprise 2. Re-identification 3. Negative impact/discrimination 8
Privacy Risks of Big Data Analytics 1. Sense of rights violation or surprise 2. Re-identification 3. Negative impact/discrimination 9
Big Data and Privacy Correct predication can still be creepy 10
The Surprise of Big Data Analytics Target s Pregnancy Prediction If it works in this way 11
The Surprise of Big Data Analytics Target s Pregnancy Prediction Target learnt this lesson: Then we started, in the same mailer, mixing baby items with other things we know they would never buy, like lawn mower as long as the pregnant woman doesn t know she has been spied on, it works and she would use the coupons 12
Privacy Risks of Big Data Analytics 1. Sense of rights violation or surprise 2. Re-identification 3. Negative impact/discrimination 13
Big Data and Privacy The myth of anonymisation 14
The Myth of Anonymisation AOL released anonymised search records of 650,000 people over a three-month period User 4417749 was found to be Ms Arnold of Lilburn of Georgia through the keywords she entered Her searches also included nicotine effect, dry mouth, hand tremors, bipolar disorder do we need to worry about Ms Arnold s physical and mental health? 15
The Myth of Anonymisation Anonymised Massachusetts state employee hospital records State employee hospital records released for research Governor reassured the public that the data was deidentified Governor s own record reidentified by a researcher by matching date of birth, gender and ZIP code with a voter database that costed US$20. 16
The Myth of Anonymisation How much data do you need to identify someone? 87% US population can be identified by using Zip code, gender and date of birth; 53% by place, gender and date of birth; and 18% by county, gender and date of birth. 17
The Myth of Anonymisation The only way to make data anonymous is to make it useless Professor Paul Ohm (University of Colorado Law School) 18
Privacy Risks of Big Data Analytics 1. Sense of rights violation or surprise 2. Re-identification 3. Negative impact/discrimination 19
Big Data and Privacy Before we look at discrimination, let s look at the reality of big data analytics 20
The Reality of Big Data Analytics Big data analytics: Correlation Causation 21
The (Academic) Reality of Big Data Analytics US spending on science, space, and technology reveals Suicides by hanging, strangulation and suffocation? 22
The (Academic) Reality of Big Data Analytics Number of Nicolas Cage films reveals swimming-pool drowning? 23
The (Academic) Reality of Big Data Analytics Divorce rate in Maine reveals Per capita consumption of margarine? 24
The Reality of Big Data Analytics But, do we really care about the difference between correlation and causation? 25
The (Commercial) Reality of Big Data Analytics Do you care about correlation or causation if you were the Samaritans at this point? 26
The (Commercial) Reality of Big Data Analytics Do you care about correlation or causation if you were a pool-side lifeguard at this time? 27
The (Commercial) Reality of Big Data Analytics What would you do if you are a margarine producer in the US and learn about the divorce rate in Maine at this time? 28
The Reality of Big Data Analytics 29
The Reality of Big Data Analytics Marketers are not interested in theories, they are interested in results. So if it works, what s the problem? So if users of table feet protectors pay back their loans promptly, what s wrong in lending to them? The problem lies with the have not, those that you are not targeted. You ve denied them of things that they may otherwise entitle to 30
The Reality of Big Data Analytics Is there a solution to this? Need to know what big data is and isn t good at IS Pattern matcher Gives recommendations ISN T Substitutes for proper data collection and analysis, and theory generation (big data hubris) Hand-free predictor 31
Privacy Challenge of Big Data Analytics Risks recap: 1. The (unintended) impacts on people when it is working; 2. The risks of re-identifying people from anonymised sensitive data; and 3. The targeted not. There is a human being behind all those data analytics and decisionmaking. Can things be redressed when something goes wrong? 32
Big Data Analytics 33