Privacy Techniques for Big Data
|
|
|
- Ezra Warner
- 10 years ago
- Views:
Transcription
1 Privacy Techniques for Big Data The Pros and Cons of Syntatic and Differential Privacy Approaches Dr#Roksana#Boreli# SMU,#Singapore,#May#2015# Introductions NICTA Australia s National Centre of Excellence in Information and Communication Technology 700 staff, ~ 300 PhD students Presenter Research Leader, Mobile Systems research group Interests: Privacy enhancing technologies, wireless comms and network protocols VRL ATP, NRL CRL QRL 2
2 Outline Privacy De-identification techniques Experiences from a recently completed project Evaluating the privacy utility trade-offs Question time 3 Personal data is collected by services and apps Used for targeted advertising 4
3 The growing importance of privacy Regulatory environment: legislation protecting personal data (PII) collection, storage, use Consumer attitudes to privacy increasing awareness Media attention high potential for negative publicity Maximise the opportunity: how to minimise the risks while preserving data utility for analytics 5 The meaning of PII Privacy regulations based on PII personal vs non-personal information Numerous examples of re-identification of anonymised data The falacy of PII - Any information could be PII, and should be protected PII : Personally identifiable information 6
4 Regulatory guidelines for de-identification Australian guidelines Frequency and dominance rules for data aggregates US HIPAA regulation (health data) 1. Redact identifiers 2. Generalise (mask) location and dates 3. Residual information should not lead to re-identification The Health Insurance Portability and Accountability Act of 1996 (HIPAA), Safe Harbor method 7 Netflix privacy breach Dataset for Netflix Prize contest: 17,770 movie titles 480,189 users with random customer IDs Ratings: 1-5 For each movie we have the ratings: (MovieID, CustomerID, Rating, Date) Given auxiliary information (random chats, IMDB), recommender identity can be uncovered with high probability. Robust De-anonymization of Large Sparse Datasets, Narayanan and Shmatikov,
5 De-identifying mobility data Based on analysis of anonymised call information of ~1.5 million users in a western country (1 hour precision): * 4 spatio-temporal points are enough to uniquely identify of the individuals. 95% Yves-Alexandre de Montjoye, César A. Hidalgo, Michel Verleysen and Vincent D. Blondel, Unique in the Crowd: the privacy bounds of human mobility. Scientific Reports 3:1376, March Available on 9 Technologies from research domain 1. Anonymisation 2. Obfuscation, Differential privacy 3. Cryptographic solutions More on-going research topics 10
6 Anonymisation Scenario: storage or release of private information Simple: removal of PII and sensitive information Easy to reverse, using side information Name! Gender! Age! Oliver Brown# Emily Taylor# William Walker# Post code! Monthly bill! Male# 43# 3067# $198# Female# 37# 3040# $45# Male# 19# 3825# $146# Jack Harris# Male# 26# 3028# $35# Emma Anderson# Female# 42# 3195# $30# Lily White# Female# 55# 3067# $72# Lucas Johnson# Customer data Male# 59# 3818# $79# Name! Gender! Age! Post code! Jane Eyre# Female# 43# 2066# Emily Taylor# Female# 37# 3040# Rob Reed# Male# 59# 2100# Jack Johnson# Census data Male# 48# 3860# 11 Anonymisation: Syntatic approaches data set based rules Frequency rule Each aggregation result is derived from (min) k records and +confidentialise+data:+the+basic+principles 12
7 Anonymisation: Syntatic approaches k-anonymity Any unique combination of selected attributes/ features can belong to a min group of k users 476** 2* * Male Post code Age Gender Female L. Sweeney. k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10 (7), k-anonymity (k=2) Gender! Age! Post code! Monthly bill! Male# -# 30**# $198# Female# -# 30**# $45# Male# -# 38**# $146# Male# -# 30**# $35# Female# -# 31**# $30# Female# -# 30**# $72# Male# -# 38**# $79# Female# -# 31**# $121# Male# -# 38**# $82# Female# -# 31**# $155# l-diversity: within the group of k, ensure that there is a mix of specific values of sensitive attribute t-closeness, etc.
8 Data obfuscation Scenario: releasing aggregate data information Differential privacy requires that computations be insensitive to changes in any particular individual's record. Consequently, being opted in or out of the database should make little difference to a person s privacy. AΔB=1## A" B" M" M" M(A)" # M(B)" 15 Differential privacy Add calibrated noise to sensitive data: e.g. generated by Laplace function ε parameter ~ privacy strength Name! Gender! Age! Post code! Monthly bill! Oliver Male $198 Brown Emily Female $45 Taylor Average bill: $ noise => Average bill: $91.4 C. Dwork. Differential privacy. In ICALP (2), pages 1 12, 2006 C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proc. of the 3rd TCC, pages ,
9 Evaluating the utility of privacy techniques Industry collaboration project POC implementation of selected privacy techniques Redaction (masking), anonymisation and obfuscation Evaluate the solution for a set of analytics scenarios: quantify utility and privacy Approach Use cases (Analytics) Data sets original after PET Privacy mechanisms Metrics Analytics: RMSE, MRE, Lift Compared to the results using original data Privacy: uniqueness, entropy, diff privacy ε 18
CS346: Advanced Databases
CS346: Advanced Databases Alexandra I. Cristea [email protected] Data Security and Privacy Outline Chapter: Database Security in Elmasri and Navathe (chapter 24, 6 th Edition) Brief overview of
Challenges of Data Privacy in the Era of Big Data. Rebecca C. Steorts, Vishesh Karwa Carnegie Mellon University November 18, 2014
Challenges of Data Privacy in the Era of Big Data Rebecca C. Steorts, Vishesh Karwa Carnegie Mellon University November 18, 2014 1 Outline Why should we care? What is privacy? How do achieve privacy? Big
How To Protect Privacy From Reidentification
Big Data and Consumer Privacy in the Internet Economy 79 Fed. Reg. 32714 (Jun. 6, 2014) Comment of Solon Barocas, Edward W. Felten, Joanna N. Huey, Joshua A. Kroll, and Arvind Narayanan Thank you for the
Policy-based Pre-Processing in Hadoop
Policy-based Pre-Processing in Hadoop Yi Cheng, Christian Schaefer Ericsson Research Stockholm, Sweden [email protected], [email protected] Abstract While big data analytics provides
(Big) Data Anonymization Claude Castelluccia Inria, Privatics
(Big) Data Anonymization Claude Castelluccia Inria, Privatics BIG DATA: The Risks Singling-out/ Re-Identification: ADV is able to identify the target s record in the published dataset from some know information
Big Data and Innovation, Setting the Record Straight: De-identification Does Work
Big Data and Innovation, Setting the Record Straight: De-identification Does Work Ann Cavoukian, Ph.D. Information and Privacy Commissioner Daniel Castro Senior Analyst, Information Technology and Innovation
Privacy Challenges of Telco Big Data
Dr. Günter Karjoth June 17, 2014 ITU telco big data workshop Privacy Challenges of Telco Big Data Mobile phones are great sources of data but we must be careful about privacy 1 / 15 Sources of Big Data
White Paper. The Definition of Persona Data: Seeing the Complete Spectrum
White Paper The Definition of Persona Data: Seeing the Complete Spectrum Omer Tene, Senior Fellow Christopher Wolf, Founder and Co- Chair The Future of Privacy Forum January 2013 The Future of Privacy
ARTICLE 29 DATA PROTECTION WORKING PARTY
ARTICLE 29 DATA PROTECTION WORKING PARTY 0829/14/EN WP216 Opinion 05/2014 on Anonymisation Techniques Adopted on 10 April 2014 This Working Party was set up under Article 29 of Directive 95/46/EC. It is
Privacy Committee. Privacy and Open Data Guideline. Guideline. Of South Australia. Version 1
Privacy Committee Of South Australia Privacy and Open Data Guideline Guideline Version 1 Executive Officer Privacy Committee of South Australia c/o State Records of South Australia GPO Box 2343 ADELAIDE
De-Identification 101
De-Identification 101 We live in a world today where our personal information is continuously being captured in a multitude of electronic databases. Details about our health, financial status and buying
future proof data privacy
2809 Telegraph Avenue, Suite 206 Berkeley, California 94705 leapyear.io future proof data privacy Copyright 2015 LeapYear Technologies, Inc. All rights reserved. This document does not provide you with
DRAFT NISTIR 8053 De-Identification of Personally Identifiable Information
1 2 3 4 5 6 7 8 DRAFT NISTIR 8053 De-Identification of Personally Identifiable Information Simson L. Garfinkel 9 10 11 12 13 14 15 16 17 18 NISTIR 8053 DRAFT De-Identification of Personally Identifiable
PrivacyCanary: Privacy-Aware Recommenders with Adaptive Input Obfuscation
PrivacyCanary: Privacy-Aware Recommenders with Adaptive Input Obfuscation Thivya Kandappu, Arik Friedman, Roksana Boreli, Vijay Sivaraman School of Electrical Engineering & Telecommunications, UNSW, Sydney,
No silver bullet: De-identification still doesn't work
No silver bullet: De-identification still doesn't work Arvind Narayanan [email protected] Edward W. Felten [email protected] July 9, 2014 Paul Ohm s 2009 article Broken Promises of Privacy
De-identification Koans. ICTR Data Managers Darren Lacey January 15, 2013
De-identification Koans ICTR Data Managers Darren Lacey January 15, 2013 Disclaimer There are several efforts addressing this issue in whole or part Over the next year or so, I believe that the conversation
How to De-identify Data. Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008
How to De-identify Data Xulei Shirley Liu Department of Biostatistics Vanderbilt University 03/07/2008 1 Outline The problem Brief history The solutions Examples with SAS and R code 2 Background The adoption
Data De-identification and Anonymization of Individual Patient Data in Clinical Studies A Model Approach
Data De-identification and Anonymization of Individual Patient Data in Clinical Studies A Model Approach Background TransCelerate BioPharma Inc. is a non-profit organization of biopharmaceutical companies
Privacy Preserving Data Mining
Privacy Preserving Data Mining Technion - Computer Science Department - Ph.D. Thesis PHD-2011-01 - 2011 Arie Friedman Privacy Preserving Data Mining Technion - Computer Science Department - Ph.D. Thesis
Comments of the World Privacy Forum To: Office of Science and Technology Policy Re: Big Data Request for Information. Via email to bigdata@ostp.
3108 Fifth Avenue Suite B San Diego, CA 92103 Comments of the World Privacy Forum To: Office of Science and Technology Policy Re: Big Data Request for Information Via email to [email protected] Big Data
Societal benefits vs. privacy: what distributed secure multi-party computation enable? Research ehelse 2015 21-22 April Oslo
Privacy Societal benefits vs. privacy: what distributed secure multi-party computation enable? Research ehelse 2015 21-22 April Oslo Kassaye Yitbarek Yigzaw UiT The Arctic University of Norway Outline
Guidance on De-identification of Protected Health Information November 26, 2012.
Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule November 26, 2012 OCR gratefully
The De-identification of Personally Identifiable Information
The De-identification of Personally Identifiable Information Khaled El Emam (PhD) www.privacyanalytics.ca 855.686.4781 [email protected] 251 Laurier Avenue W, Suite 200 Ottawa, ON Canada K1P 5J6
De-Identification of Health Data under HIPAA: Regulations and Recent Guidance" " "
De-Identification of Health Data under HIPAA: Regulations and Recent Guidance" " " D even McGraw " Director, Health Privacy Project January 15, 201311 HIPAA Scope Does not cover all health data Applies
Degrees of De-identification of Clinical Research Data
Vol. 7, No. 11, November 2011 Can You Handle the Truth? Degrees of De-identification of Clinical Research Data By Jeanne M. Mattern Two sets of U.S. government regulations govern the protection of personal
A Q&A with the Commissioner: Big Data and Privacy Health Research: Big Data, Health Research Yes! Personal Data No!
A Q&A with the Commissioner: Big Data and Privacy Health Research: Big Data, Health Research Yes! Personal Data No! Ann Cavoukian, Ph.D. Information and Privacy Commissioner Ontario, Canada THE AGE OF
Health Data De-Identification by Dr. Khaled El Emam
RISK-BASED METHODOLOGY DEFENSIBLE COST-EFFECTIVE DE-IDENTIFICATION OPTIMAL STATISTICAL METHOD REPORTING RE-IDENTIFICATION BUSINESS ASSOCIATES COMPLIANCE HIPAA PHI REPORTING DATA SHARING REGULATORY UTILITY
Clinical Study Reports Approach to Protection of Personal Data
Clinical Study Reports Approach to Protection of Personal Data Background TransCelerate BioPharma Inc. is a non-profit organization of biopharmaceutical companies focused on advancing innovation in research
ADVISORY GUIDELINES ON THE PERSONAL DATA PROTECTION ACT FOR SELECTED TOPICS ISSUED BY THE PERSONAL DATA PROTECTION COMMISSION ISSUED 24 SEPTEMBER 2013
ADVISORY GUIDELINES ON THE PERSONAL DATA PROTECTION ACT FOR SELECTED TOPICS ISSUED BY THE PERSONAL DATA PROTECTION COMMISSION ISSUED 24 SEPTEMBER 2013 REVISED 16 MAY 2014 PART I: INTRODUCTION AND OVERVIEW...
Information Security in Big Data using Encryption and Decryption
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Information Security in Big Data using Encryption and Decryption SHASHANK -PG Student II year MCA S.K.Saravanan, Assistant Professor
Legal Insight. Big Data Analytics Under HIPAA. Kevin Coy and Neil W. Hoffman, Ph.D. Applicability of HIPAA
Big Data Analytics Under HIPAA Kevin Coy and Neil W. Hoffman, Ph.D. Privacy laws and regulations such as the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule can have a significant
Combining structured data with machine learning to improve clinical text de-identification
Combining structured data with machine learning to improve clinical text de-identification DT Tran Scott Halgrim David Carrell Group Health Research Institute Clinical text contains Personally identifiable
Privacy-preserving Data-aggregation for Internet-of-things in Smart Grid
Privacy-preserving Data-aggregation for Internet-of-things in Smart Grid Aakanksha Chowdhery Postdoctoral Researcher, Microsoft Research ac@microsoftcom Collaborators: Victor Bahl, Ratul Mahajan, Frank
Information Sheet: Cloud Computing
info sheet 03.11 Information Sheet: Cloud Computing Info Sheet 03.11 May 2011 This Information Sheet gives a brief overview of how the Information Privacy Act 2000 (Vic) applies to cloud computing technologies.
Yale University Open Data Access (YODA) Project Procedures to Guide External Investigator Access to Clinical Trial Data Last Updated August 2015
OVERVIEW Yale University Open Data Access (YODA) Project These procedures support the YODA Project Data Release Policy and more fully describe the process by which clinical trial data held by a third party,
A Precautionary Approach to Big Data Privacy
A Precautionary Approach to Big Data Privacy Arvind Narayanan Joanna Huey Edward W. Felten [email protected] [email protected] [email protected] March 19, 2015 Once released to the public,
NSF Workshop on Big Data Security and Privacy
NSF Workshop on Big Data Security and Privacy Report Summary Bhavani Thuraisingham The University of Texas at Dallas (UTD) February 19, 2015 Acknowledgement NSF SaTC Program for support Chris Clifton and
Data Use and the Liquid Grids Model
Data Use Policy Revision 1.1 03/09/2014 Ramos M. Mays, Chief Technology Officer Table of Contents 1. Information Sources... 3 2. Information we receive... 3 3. How we use information... 4 4. How long we
March 31, 2014. Re: Government Big Data (FR Doc. 2014-04660) Dear Ms. Wong:
Microsoft Corporation Tel 425 882 8080 One Microsoft Way Fax 425 936 7329 Redmond, WA 98052-6399 http://www.microsoft.com/ March 31, 2014 Ms. Nicole Wong Big Data Study Office of Science and Technology
The Information Commissioner s Office response to HM Treasury s Call for Evidence on Data Sharing and Open Data in Banking
The Information Commissioner s Office response to HM Treasury s Call for Evidence on Data Sharing and Open Data in Banking The Information Commissioner has responsibility for promoting and enforcing the
ARX A Comprehensive Tool for Anonymizing Biomedical Data
ARX A Comprehensive Tool for Anonymizing Biomedical Data Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn Chair of Biomedical Informatics Institute of Medical Statistics and Epidemiology Rechts der Isar
Decentralizing Privacy: Using Blockchain to Protect Personal Data
Decentralizing Privacy: Using Blockchain to Protect Personal Data Guy Zyskind MIT Media Lab Cambridge, Massachusetts Email: [email protected] Oz Nathan Tel-Aviv University Tel-Aviv, Israel Email: [email protected]
Zubi Advertising Privacy Policy
Zubi Advertising Privacy Policy This privacy policy applies to information collected by Zubi Advertising Services, Inc. ( Company, we or us ), on our Latino Emoji mobile application or via our Latino Emoji
DESTINATION MELBOURNE PRIVACY POLICY
DESTINATION MELBOURNE PRIVACY POLICY 2 Destination Melbourne Privacy Policy Statement Regarding Privacy Policy Destination Melbourne Limited recognises the importance of protecting the privacy of personally
Modeling Unintended Personal-Information Leakage from Multiple Online Social Networks
Modeling Unintended PersonalInformation Leakage from Multiple Online Social Networks Most people have multiple accounts on different social networks. Because these networks offer various levels of privacy
MOBILE PHONE NETWORK DATA FOR DEVELOPMENT
October 2013 MOBILE PHONE NETWORK DATA FOR DEVELOPMENT How analysis of Call Detail Records (CDRs) provides valuable information for humanitarian development action WHAT ARE CDRs? Whenever a mobile phone
We may collect the following types of information during your visit on our Site:
Privacy Policy This Privacy Policy (the Policy ) governs the use and collection of information that Horizon Broadcasting Group, LLC (collectively, "we," "our" or the "website") obtains from you while you
Formal Methods for Preserving Privacy for Big Data Extraction Software
Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability
IT Privacy Certification Outline of the Body of Knowledge (BOK) for the Certified Information Privacy Technologist (CIPT)
Page 1 of 6 IT Privacy Certification Outline of the Body of Knowledge (BOK) for the Certified Information Privacy Technologist (CIPT) I. Understanding the need for privacy in the IT environment A. Evolving
Anonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics
Anonymizing Unstructured Data to Enable Healthcare Analytics Chris Wright, Vice President Marketing, Privacy Analytics Privacy Analytics - Overview For organizations that want to safeguard and enable their
On the Effectiveness of Obfuscation Techniques in Online Social Networks
On the Effectiveness of Obfuscation Techniques in Online Social Networks Terence Chen 1,2, Roksana Boreli 1,2, Mohamed-Ali Kaafar 1,3, and Arik Friedman 1,2 1 NICTA, Australia 2 UNSW, Australia 3 INRIA,
PUBLIC CONSULTATION ISSUED BY THE PERSONAL DATA PROTECTION COMMISSION
PUBLIC CONSULTATION ISSUED BY THE PERSONAL DATA PROTECTION COMMISSION PROPOSED ADVISORY GUIDELINES ON THE PERSONAL DATA PROTECTION ACT FOR SELECTED TOPICS 05 FEBRUARY 2013 PART I: INTRODUCTION AND OVERVIEW...
The De-identification Maturity Model Authors: Khaled El Emam, PhD Waël Hassan, PhD
A PRIVACY ANALYTICS WHITEPAPER The De-identification Maturity Model Authors: Khaled El Emam, PhD Waël Hassan, PhD De-identification Maturity Assessment Privacy Analytics has developed the De-identification
Robust De-anonymization of Large Sparse Datasets
Robust De-anonymization of Large Sparse Datasets Arvind Narayanan and Vitaly Shmatikov The University of Texas at Austin Abstract We present a new class of statistical deanonymization attacks against high-dimensional
DATA MINING - 1DL360
DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
Guide to the National Safety and Quality Health Service Standards for health service organisation boards
Guide to the National Safety and Quality Health Service Standards for health service organisation boards April 2015 ISBN Print: 978-1-925224-10-8 Electronic: 978-1-925224-11-5 Suggested citation: Australian
Healthcare data analytics. Da-Wei Wang Institute of Information Science [email protected]
Healthcare data analytics Da-Wei Wang Institute of Information Science [email protected] Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
De-Identification of Clinical Data
De-Identification of Clinical Data Sepideh Khosravifar, CISSP Info Security Analyst IV TEPR Conference 2008 Ft. Lauderdale, Florida May 17-21, 2008 1 1 Slide 1 cmw1 Craig M. Winter, 4/25/2008 Background
Health Data Governance: Privacy, Monitoring and Research - Policy Brief
Health Data Governance: Privacy, Monitoring and Research - Policy Brief October 2015 www.oecd.org/health Highlights All countries can improve their health information systems and make better use of data
Privacy Aspects in Big Data Integration: Challenges and Opportunities
Privacy Aspects in Big Data Integration: Challenges and Opportunities Peter Christen Research School of Computer Science, The Australian National University, Canberra, Australia Contact: [email protected]
Securing the Big Data Life Cycle
MIT TECHNOLOGY REVIEW CUSTOM Produced in partnership with Securing the Big Data Life Cycle Big data drives big benefits, from innovative businesses to new ways to treat diseases. The challenges to privacy
Data Mining with Differential Privacy
Data Mining with Differential Privacy rik Friedman and ssaf Schuster Technion - Israel Institute of Technology Haifa 32000, Israel {arikf,assaf}@cstechnionacil BSTRCT We consider the problem of data mining
Obfuscation of sensitive data in network flows 1
Obfuscation of sensitive data in network flows 1 D. Riboni 2, A. Villani 1, D. Vitali 1 C. Bettini 2, L.V. Mancini 1 1 Dipartimento di Informatica,Universitá di Roma, Sapienza. E-mail: {villani, vitali,
Best Practice in SAS programs validation. A Case Study
Best Practice in SAS programs validation. A Case Study CROS NT srl Contract Research Organisation Clinical Data Management Statistics Dr. Paolo Morelli, CEO Dr. Luca Girardello, SAS programmer AGENDA Introduction
