Privacy & Big Data: Enable Big Data Analytics with Privacy by Design Datenschutz-Vereinigung von Luxemburg Ronald Koorn DRAFT VERSION 8 March 2014
Agenda? What is 'Big Data'? Privacy Implications Privacy by Design Take aways Q&A 1
What is Big Data? Evolution or revolution? 2
What is Big Data? Big data is high-volume, high-velocity & high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight & decision-making (Gartner) Volume + = Velocity Variety Value Un-Structured Structured Big Data every day more than 2.5 exabytes (2.5 10 18 ) of data is created 3
Big Data context Individual vs. Business Compliance Legal Corporate Social Responsibility Integrity & Fraud Protection Business Process Management Big Data & Privacy Records Mgmt. / Data retention Security Data Quality Controls/ Auditing 4
The Big Data market infancy & immature 8 5
Examples of Big Data applications: Analysis of Earnings per Antenna Measurement of revenue per antenna cell at one single day. Green dots indicate the antennae with relative large amount of revenue, while the red dots indicate antennae with relatively low revenue. Dots in the sea indicate antennae on oil rigs. Measurement of the average revenue per antenna versus the amount of connections. The size of the dots indicate the total sum of revenue. Outliers can be spotted easily to find out issues with costly antennae. Based on this information it is possible to prioritize maintenance of infrastructure by identifying the most important antennae. 8
Examples of Big Data applications: Modeling Mortgage Risks Risk modeling for a banks collection of mortgages based on life-changing events. By combining public information on an online house-selling website (funda.nl) with internal bank data on transactions a model has been built to describe and predict the length of mortgages at that bank (blue curve). Analysis of the difference in house-selling price and the mortgage at the bank. The red curve corresponds to customers in so-called special treatment for negative financial reasons, while the blue curve corresponds to normal customers that have not been in this special treatment. 10
Big Data & Privacy Regulation Analytics Trending Fraud Insights BIG DATA Data Management Process improvements Data Protection & Privacy Purpose limitation Profiling Outsourcing/ external partners Data minimization Rights of data subjects Privacy Impact Assessments Privacy by Design 12
Privacy Requirements for Big Data projects 101010101010101011100110100011111000101100010010001111000111000111100011110001111000111110101010010001110011000011110001110001011010101 010111010010011100111010010001011101000100001110100100100101010011100111001110001110011000110001110100101010101010010001110000111101010 Privacy Principles Big Data Implications 011001010011100001010100001110001010111010001000011101001001001010100111001110011100011100110001100011101001010101010100100011100001111 010100110010100111000010101000011100010101010101010101011100110010001011101000100001110100100100101010011100111001110001110011000110001 110100101010101010010001110000111101010011001010011100001010100001110001010111010001000011101001001001010100111001110011100011100110001 100011101001010101010100100011100001111010100110010100111000010101000011100010101010101010101011100110101010101010101011100110100011111 DPA notification DPA to be notified (depending on use) 000101100010010001111000111000111100011110001111000111110101010010001110011000011110001110001011010101010111010010011100111010010001011 100111010010001011101000100001110100100100101010011100111001110001110011000110001110100101010101010010001110000111101010011001010011100 001010100001110001010111010001000011101001001001010100111001110011100011100110001100011101001010101010100100011100001111010100110010100 Transparency Data subjects (and work council) to be 111000010101000011100010101010101010101011100110100011111000101100010010001111000111000111100011110001111000111110101010010001110011000 notified (esp. autom. decision-making) 011110001110001011010101010111010010011100111010010001011101000100001110100100100101010011100111001110001110011000110001110100101010101 010010001110000111101010011001010011100001010100001110001010111010001000011101001000111000011110101001100101001110000101010000111000101 011101000100001110100100100101010011100111001110001110011000110001110100101010101010010001110000111101010011001010011100001010100001110 Purpose Binding Limit further incompatible use 001010101010101010101110011010101010101010101110011010001111100010110001001000111100011100011110001111000111100011111010101001000111001 100001111000111000101101010101011101001001110011101001000101110011101001000101110100010000111010010010010101001110101110100100111001110 100100010111010001000011101001001001010100111001110011100011100110001100011101001010101010100100011100001111010100110010100111000010101 Legitimate Grounds Consent to individual profiling 000011100010101110100010000111010010010010101001110011100111000111001100011000111010010101010101001000111000011110101001100101001110000 101010000111000101010111000101010101010101010111001101000111110001011000100100011110001110001111000111100011110001111101010100100011100 110000111100011100010110101010101110100100111001110100100010111010001000011101001001001010100111001110011100011100110001100011101001010 Data Quality Match & enrich in/external data sources 101010100100011100001111010100110010100111000010101000011100010101110100010000111010010001110000111101010011001010011100001010100001110 001010111010001000011101001001001010100111001110011100011100110001100011101001010101010100100011100001111010100110010100111000010101000 011100010101010101010101011100110101010101010101011100110100011111000101100010010001111000111000111100011110001111000111110101010010001 Rights of Data Subjects Data subjects to be notified and right to 110011000011110001110001011010101010111010010011100111010010001011100111010010001011101000100001110100100100101010011101011101001001110 000101100010010001111000111000111100011110001111000111110101010010001110011000011110001110001011010101010111010010011100111010010001011 object / opt-out 100111010010001011101000100001110100100100101010011100111001110001110011000110001110100101010101010010001110000111101010011001010011100 001010100001110001010111010001000011101001001001010100111001110011100011100110001100011101001010101010100100011100001111010100110010100 111000010101000011100010101010101010101011100110100011111000101100010010001111000111000111100011110001111000111110101010010001110011000 011110001110001011010101010111010010011100111010010001011101000100001110100100100101010011100111001110001110011000110001110100101010101 010010001110000111101010011001010011100001010100001110001010111010001000011101001000111000011110101001100101001110000101010000111000101 Security Datawarehouse is single point of failure/ 011101000100001110100100100101010011100111001110001110011000110001110100101010101010010001110000111101010011001010011100001010100001110 success; strong access controls. Logging, 001010101010101010101110011010101010101010101110011010001111100010110001001000111100011100011110001111000111100011111010101001000111001 100001111000111000101101010101011101001001110011101001000101110011101001000101110100010000111010010010010101001110101110100100111001110 monitoring & data destruction required 100100010111010001000011101001001001010100111001110011100011100110001100011101001010101010100100011100001111010100110010100111000010101 000011100010101110100010000111010010010010101001110011100111000111001100011000111010010101010101001000111000011110101001100101001110000 101010000111000101010111000101010101010101010111001101000111110001011000100100011110001110001111000111100011110001111101010100100011100 Outsourcing / Onward transfer Data ownership, contracts 13
Big Data & compatible use We have a lot of customer data in various data silos. What data are we allowed to use and how are we allowed to use it? Analyse on a case-by-case basis Data accumulated by the controller, received from the customer may be combined and used without consent for purposes that are not incompatible with the original purposes of the respective collection and processing (excl. traffic data and communication content) Insurance company Compatible use: Profiling for marketing purposes Incompatible use: Profiling for decision making on insurance policies Processing and combining of data from external sources shall be assessed more prohibitively Credit union Compatible use: Adding publicly available data on a person's default history Incompatible use: Adding acquired health data for inclusion in assessment for mortgage eligibility If profiling may have legal effect on data subjects, explicit consent required; also other restrictions to profiling apply Other data protection provisions on the duties of the controller, the rights of data subjects and on direct marketing always apply 14
Big Data & Profiling General restrictions to profiling Additional restrictions to profiling that may have legal or comparable effects The data subjects shall be informed of profiling The data subjects have the right to object to profiling Profiling for e.g. marketing purposes is restricted by the purpose limitation Profiling that has the effect of discriminating against individuals on the basis of race or ethnic origin, political opinions, religion or beliefs, trade union membership, sexual orientation or gender identity, or that results in measures which have such effect, is prohibited Profiling shall not be based solely on special categories of data Profiling of children under 13 poses particular concerns A person may be subjected to profiling which leads to measures producing legal effects concerning the data subject or does similarly significantly affect the interests, rights or freedoms of the concerned data subject only if the processing: (a) is necessary for the entering into, or performance of, a contract; (b) is expressly authorized by a Union or Member State law; or (c) is based on the data subject's consent Such profiling shall not be based solely or predominantly on automated processing and shall include human assessment, including an explanation of the decision reached after such an assessment Privacy Impact assessment required beforehand 15
Why is Privacy & Information Security getting so difficult? Regulatory Proliferation of laws and regulations Rise of compliance red tape & costs Laws applicable regardless of the company s main establishment Nonregulatory Call for greater accountability & better data governance Implementation of Privacy by Design Proliferation of policy papers, best practices & guidelines Enforcement Sensitive issues Rising appetite of the enforcement authorities to confront the private sector, especially the IT sector Creation of quasi- precedent and custom-based data protection system Employees Privacy / Data protection International data transfers & Outsourcing post-nsa revelations Marketing & Big Data Online & Mobile Services Proliferation of data breaches Innovation trends Personalization of services Maximization of benefits flowing from data usage Building consumer trust is an ultimate asset 16
Privacy by Design: Privacy Impact Assessment Phase 1 Determine Scope & Approach Phase 2 Inventory Dataflows, Sources, Elements & Results Phase 3 Identify Applicable Legislation & Expectations Phase 4 Map Regulatory Requirements on Big Data Analytics Phase 5 Identify Big Data Privacy Risks Phase 6 Select Privacy by Design solutions Phase 7 Implementation & Auditing 18
Privacy by Design Appropriate & proportionate technical & organisational measures and procedures implemented in such a way that the processing will meet the requirements of the Privacy regulation & ensure the protection of the rights of data subjects Or, more concretely 1. Proactive not Reactive Preventative not Remedial 2. Privacy as the Default Setting 3. Privacy Embedded into Design 4. Full Functionality Positive-Sum, not Zero-Sum 5. End-to-End Security Full Lifecycle Protection 6. Visibility & Transparency Keep it Open 7. Respect User Privacy Keep it User-Centric* From compliance to default mode of operation * Cavoukian - Reed: Big Privacy: Bridging Big Data and the Personal Data Ecosystem through Privacy by Design (2013) 19
Privacy by Design: Security is insufficient for Big Data Privacy Privacy Privacy Policy Notification Subject Rights Purpose Binding Legitimate Grounds Data Quality Lawful transfer (incl. outside EU) Security Policy Classification Logical Security Physical Security Availability Compliance Incident Mgmt. Security Security Organisation Personnel Security IT Services Mgmt. System Development 20
Effectiveness Privacy by Design & Default Privacy Enhancing Technologies (PET) Level 4 Data Anonymization Level 1 General PET Controls Role-based Access Controls Encryption Level 2 Data Separation Pseudonymization Separation of data in domains Use of Trusted Third Party Level 3 Privacy Management Systems Advanced data management tools P3P / EPAL Privacy Rights Management No recording of personal data Anonymization / data deletion at collection Requirements on system design 21
Privacy by Design: Anonymization vs. Pseudonymization Anonymization: No privacy legislation applicable Totally anonymous is challenging with substantive data sets Less useful for combining internal and external data Limited security requirements remaining after de-identification Pseudonymization: EU Regulation still applicable, in some countries the EU Directive is not Useful for longitudinal data analytics & research Close to the data source Use of Trusted Third Party effective solution No indirect inheritance possible (non-reversible, i.e. one-way hashing) Susceptible for inference attaches (potential re-identification) 22
Effects of upcoming EU Data Protection Regulation on Big Data US-based data warehouse with EU citizens within scope Privacy Impact Assessment mandatory Data minimization & profiling only achievable with anonymization / pseudonymization Separate & explicit consent for (big) data analysis may be required (incl. parental consent for minors) Extensive data breach notification required Data portability easier to provide Right to be erased (for all registrations, incl. transferred data) Privacy by Design required Penalties can be prohibitive 25
Take Aways Big Data projects are better safeguarded by following 5 keys: 1. Accountability / Data Governance 2. Privacy Impact Assessment 3. Anonymization / Pseudonymization 4. Organizational, contractual, technical controls (entire data lifecycle!) 5. Address public opinion & obtain consent / opt-in 26
Interested in discussion? Big Data is too important to leave to (data) nerds Privacy is too important to leave (solely) to lawyers 27
Thank you koorn.ronald@kpmg.nl KPMG IT Advisory P.O. Box 43004 3540 AA Utrecht The Netherlands Tel. +31 (0)30 658 2150