Privacy Committee. Privacy and Open Data Guideline. Guideline. Of South Australia. Version 1



Similar documents
Data protection. Anonymisation: managing data protection risk code of practice

Entrepreneurs Programme - Business Evaluation. Version: 3

Information Circular

Data protection. Anonymisation: managing data protection risk code of practice

Health and Social Care Information Centre

De-identification of Data using Pseudonyms (Pseudonymisation) Policy

Guidelines approved under Section 95A of the Privacy Act December 2001

Data breach notification

PRINCIPLES FOR ACCESSING AND USING PUBLICLY-FUNDED DATA FOR HEALTH RESEARCH

Privacy and Cloud Computing for Australian Government Agencies

SPOT CHECKS ON CREDIT REPORTER COMPLIANCE WITH ACCESS REQUIREMENTS: METHODOLOGY REPORT June October MAY 2016

Risk management, information security and privacy compliance. new meeting of minds or ships in the night?

Risk Management Policy and Framework

Privacy Impact Assessment: care.data

Victorian Training Guarantee Contract Compliance Complaints Management Guide

Disclosure Scheme. The Domestic Violence. Keeping People Safe from Domestic Violence

London Borough of Brent Joint Regulatory Services ENFORCEMENT POLICY

Cloud Computing and Records Management

Personally controlled electronic health record (ehealth record) system

ADVISORY GUIDELINES ON THE PERSONAL DATA PROTECTION ACT FOR SELECTED TOPICS ISSUED BY THE PERSONAL DATA PROTECTION COMMISSION ISSUED 24 SEPTEMBER 2013

Observations on international efforts to develop frameworks to enhance privacy while realising big data s benefits

Privacy Policy on the Collection, Use, Disclosure and Retention of Personal Health Information and De-Identified Data, 2010

Privacy in complaint handling systems

NSW Government Open Data Policy. September 2013 V1.0. Contact

Recovering Your Identity. Advice for victims of identity crime

Captain Compare Privacy Policy

Jump Start Interest Free Loan Business Application Form

PUBLIC CONSULTATION ISSUED BY THE PERSONAL DATA PROTECTION COMMISSION

Privacy Policy Statement

Entrepreneurs Programme - Business Growth Grants

Supplement to the Guide: Nationally Consistent Approval Framework for Workplace Rehabilitation Providers

Professional Indemnity Insurance Application Form for Eligible Midwives

Policy Statement on. Associations. Eligibility to apply for a Scheme under Professional Standards Legislation May 2014

COUNCIL OF EUROPE COMMITTEE OF MINISTERS

The De-identification Maturity Model Authors: Khaled El Emam, PhD Waël Hassan, PhD

DFS C Open Data Policy

Data Protection Act. Conducting privacy impact assessments code of practice

Submission. September 2014

Business Ethics Policy

Complaint: NHS Data Storage in the Google Cloud

Department of the Premier and Cabinet Circular. PC030 Protective Security Policy Framework

005ASubmission to the Serious Data Breach Notification Consultation

QUEENSLAND COUNTRY HEALTH FUND. privacy policy. Queensland Country Health Fund Ltd ABN better health cover shouldn t hurt

Mandatory data breach notification in the ehealth record system

Information Sheet: Cloud Computing

DEALING WITH WORKPLACE BULLYING - A WORKER S GUIDE NOVEMBER 2013

APPLICATION TRANSITION FROM PROVISIONAL TO (FULL) REGISTRATION

Guide to the National Safety and Quality Health Service Standards for health service organisation boards

Data privacy, secrecy and security policy

Once you have submitted the online medical assessment you will receive an online reference number. ONLINE REFERENCE NUMBER Smartform number

Data breach notification guide: A guide to handling personal information security breaches

NATIONAL PARTNERSHIP AGREEMENT ON E-HEALTH

Pawsey Supercomputing Centre Data Storage and Management Policy. Version 2.01

PRIVACY POLICY Personal information and sensitive information Information we request from you

HIPAA-P06 Use and Disclosure of De-identified Data and Limited Data Sets

Business Ethics Policy

CLAIM FOR WORKERS COMPENSATION

Anonymisation Standard for Publishing Health and Social Care Data Specification

Queensland Taxi Security Camera Program Changes

The Information Commissioner s Office response to HM Treasury s Call for Evidence on Data Sharing and Open Data in Banking

COMPLIANCE FRAMEWORK AND REPORTING GUIDELINES

The EDGE 2014 User Conference Information Governance Workshop

BERKLEY INSURANCE AUSTRALIA IMPORTANT NOTICES: PLEASE READ THE FOLLOWING INFORMATION BEFORE COMPLETING THIS PROPOSAL

COAG National Legal Profession Reform Discussion Paper: Trust money and trust accounting

Foundation Working Group

QUESTIONS AND ANSWERS HEALTHCARE IDENTIFIERS BILL 2010

INFORMATION GOVERNANCE STRATEGY

Electrical Training Trust. Safeguarding Vulnerable Groups Policy ETT SVGP 0211

Best Practice Guidelines for Managing the Disclosure of De-Identified Health Information

CORPORATE VOLUNTARY DIRECT DEBIT APPLICATION

Workplace Complaint Form

Aboriginal and Torres Strait Islander Health Workers / Practitioners in focus

Financial Management Framework >> Overview Diagram

GUIDE TO IMPLEMENTING A REGULATORY FOOD SAFETY AUDITOR SYSTEM

Investment Structures Insurance Solutions (ISIS)

Information Handling Policy

SHIP Guiding Principles and Best Practices

Daltrak Building Services Pty Ltd ABN: Privacy Policy Manual

INFORMATION GOVERNANCE AND SECURITY 1 POLICY DRAFTED BY: INFORMATION GOVERNANCE LEAD 2 ACCOUNTABLE DIRECTOR: SENIOR INFORMATION RISK OWNER

Protecting children and supporting families. A guide to reporting child protection concerns and referring families to support services

The Regulation of Health Information Privacy in Australia

A Q&A with the Commissioner: Big Data and Privacy Health Research: Big Data, Health Research Yes! Personal Data No!

PLEASE DO NOT STAPLE.

ENSURING ANONYMITY WHEN SHARING DATA. Dr. Khaled El Emam Electronic Health Information Laboratory & uottawa

CREDIT REPORTING POLICY

HIPAA POLICY REGARDING DE-IDENTIFICATION OF PROTECTED HEALTH INFORMATION AND USE OF LIMITED DATA SETS

Data breach notification guide: A guide to handling personal information security breaches

Customer Service. 1 Good Practice Guide

The Essential Guide to Work Health and Safety for Volunteers. Volunteers

DIPLOMA IN DENTAL HYGIENE AND DENTAL THERAPY APPLICATION FORM FOR ADMISSION IN Jan 2016

Privacy and the Internet AUSTRALIAN ATTITUDES TOWARDS PRIVACY IN THE ONLINE ENVIRONMENT

Data Governance in-brief

ASPEN AUSTRALIA BRANCH PRIVACY POLICY

Applying the legislation

POLICY FRAMEWORK AND STANDARDS INFORMATION SHARING BETWEEN GOVERNMENT AGENCIES

Australia s unique approach to trans-border privacy and cloud computing

How To Protect Your Health Information Under Hiopaa

ANALYSIS FOR OUTCOMES BETTER USE OF DATA TO IMPROVE OUTCOMES

IAPT Data Standard. Frequently Asked Questions

The De-identification of Personally Identifiable Information

Transcription:

Privacy Committee Of South Australia Privacy and Open Data Guideline Guideline Version 1 Executive Officer Privacy Committee of South Australia c/o State Records of South Australia GPO Box 2343 ADELAIDE SA 5001 Phone (08) 8204 8786 privacy@sa.gov.au

Privacy and Open Data Guideline Table of Contents Introduction... 3 Information privacy in the South Australian Government... 3 Information Security Management Framework... 4 The privacy risks of open data... 4 Managing the risks of identification in open datasets... 5 Assessing the risks... 5 The likelihood of identification... 5 Determining the potential consequences of identification... 6 Mitigating risks through de-identification... 6 Removing identifiers... 6 Pseudonymisation... 7 Reducing the precision of the data... 7 Aggregation... 8 Tools to assist in de-identification... 8 Testing de-identification and reassessing the risk... 9 SA NT DataLink... 10 Where do I get more information?... 10 Other relevant documents... 10 Acknowledgements... 11 Appendix 1 privacy risk flow chart... 12 With the exception of the Government of South Australia brand and logo, this work is licensed under a Creative Commons Attribution (BY) 3.0 Australia Licence. To attribute this material, cite the Privacy Committee of South Australia, Government of South Australia, 2014. 6 February 2014 Version 1 Page 2 of 12

Introduction This guideline aims to assist agencies to understand and address the risks to privacy when considering the public release of government datasets through the Government s Declaration of Open Data policy. The Guideline has been developed to ensure compliance with the South Australian Government s Information Privacy Principles Instruction (IPPI). The South Australian Government is committed to government data being open by default and has directed that agencies should release government data proactively and that it be published in accessible formats and available online. In making its data open by default, the Government must also maintain high standards of privacy in the data it releases. The definition of Open Data means non-personal corporate data. Personal information of private citizens will not be released through Open Data. Examples of information to be released under the Open Data program include a table of government spending on infrastructure projects or a dataset consisting of geocodes for public facilities. However, other agency data intended for release may not have such a clear cut distinction between non personal and personal information and may include de-identified personal information. De-identification of personal information is the removal of obscure personal identifiers and personal information so that identification of individuals, that are the subject of the information, is no longer possible. Information privacy in the South Australian Government Information Privacy in the South Australian Government is guided by the IPPI. The IPPI is an Instruction of Cabinet that is issued as Premier and Cabinet Circular No 12. It is the responsibility of the Principal Officer of a public sector agency to ensure that their agency complies with the IPPI. The IPPI sets out ten Information Privacy Principles (IPPs) that guide the way South Australian government agencies collect, store, use and disclose personal information. These Guidelines should be read in conjunction with the IPPI. Under the IPPI, the term personal information means: Information or an opinion, whether true or not, relating to a natural person or the affairs of a natural person whose identity is apparent, or can reasonably be ascertained, from the information or opinion. Personal information is, therefore, any information that can be linked to an identifiable living person. This definition of personal information includes sensitive information. It could include information detailing the person s name, address, date of birth, financial or health status, ethnicity, gender, religion, alleged behaviour, licensing details, or a combination of such details. The important question to ask in determining whether information is personal information is whether it can identify a particular individual. For the purposes of the IPPI a natural person is taken to be a living person. It does not extend to the information of the deceased. However, agencies should consider very carefully the status of the information of the deceased. In some cases there may be other legal restrictions on the publication of information of identified deceased individuals (eg a confidentiality or secrecy provision in a relevant Act). 6 February 2014 Version 1 Page 3 of 12

Given the IPPI only applies to personal information, and not to data that has been de-identified care must be taken to determine that the information is properly de-identified and is not reasonably able to be re-identified. Information Security Management Framework Privacy classification is an important component of the Government s Information Security Management Framework (ISMF). Agencies should ensure that their information systems maintain standards of information security proportionate to the sensitivity of the information; this includes ensuring appropriate classification of information. Agencies are required to comply with the ISMF and any information security procedures developed by their agency. It is recommended that, once a privacy risk assessment and mitigation techniques are undertaken, an appropriate ISMF classification is applied. Agencies can seek further advice regarding information security from their Agency Security Adviser or their IT Security Adviser. The privacy risks of open data While there are significant economic, democratic and social benefits to the release of government data, it can pose risks to the privacy of personal information. The primary risk to privacy in the release of government data is the identification of individuals. That is releasing data that is personal information or can be made into personal information through easily linking with other information. The harms of identification of an individual in the release of a government dataset can be significant. A variety of harms could be reasonably anticipated from such identification, including: cause humiliation, embarrassment or anxiety for the individual, for example from a release of health data, it might be concluded that an individual accessed treatment for a sensitive sexual health condition impact on the employment or relationships of individuals affect decisions made about an individual or their ability to access services, such as their ability to obtain insurance result in financial loss or detriment pose a risk to safety, such as identifying a victim of violence or a witness to a crime. The nature and extent of harm would depend on the type of data released and the extent of any identification of individuals. There are two key types of identification risks associated with the release of government data: spontaneous recognition of an individual and deliberation recognition. Spontaneous recognition is the risk that identification is made without any deliberate attempt to identify a person. This can result from the release of a dataset that includes the data of individuals with rare characteristics. The risk of identification is proportionate to the rarity of the characteristic. 6 February 2014 Version 1 Page 4 of 12

Deliberate recognition is the risk associated with a malicious or deliberate attempt to identify a person from the released dataset. This can result from list matching, or matching common characteristics in the released dataset to other publicly available datasets or information. It can also result from targeting a particular individual using a characteristic in the dataset already known by the person attempting to identify them. Assessing the risks of identification of individuals in the release of government data is one of the necessary steps an agency must take to mitigate those risks to an acceptable level when making a decision whether to release data. Managing the risks of identification in open datasets Assessing the risks The first step to managing privacy risks in the release of a public sector dataset is to undertake an initial assessment of the risk of making that data publicly accessible. Assessing this risk will require a detailed consideration of the data to be released. Methods used to assess this risk include: determining any specific unique identifying variables, such as name. cross-tabulation of other variables to determine unique combinations that may enable a person to be identified, such as a combination of age, income, postcode. Acquiring knowledge of other publicly available datasets and information that could be used for list matching. The level of privacy risk will be dependent on the likelihood that identification could occur from the release of the data and the consequences of such a release. The level of risk will determine what steps the agency takes to mitigate the privacy risks. The likelihood of identification The most obvious factor to consider in the likelihood of identification is the presence of obvious identifying variables in the data, such as a name, date of birth or street address. Even with the absence of such variables the following factors need to be considered: Motivation to attempt identification Consider whether an individual or organisation would receive any tangible benefit (malicious or otherwise) from identification of individuals in the dataset. Level of detail disclosed by the data The more detail included, the more likely identification becomes. Where the dataset contains multiple variables for the same record-subject, identification could be made through the combination of those variables. Presence of rare characteristics If there are rare or remarkable characteristics for a recordsubject the chances of identification are increased. For example, a 19 year old girl who is widowed is likely to be noticeable in the data. 6 February 2014 Version 1 Page 5 of 12

Presence of other information Even if the dataset itself does not include any data that would identify an individual it may include variables that can be matched with other information or datasets to identify a person. Determining the potential consequences of identification It is important to think about the potential consequences that might arise from the identification of individuals within a dataset. This includes the harm that might be suffered by those individuals identified and the impacts on the agency and government of such a data breach. Once the level of risk has been determined the agency can consider the appropriate approach to mitigate that risk. Mitigating risks through de-identification A number of techniques can be applied to properly de-identify the dataset and mitigate any risks of identification of an individual. Consideration should be given to obtaining the proper approvals to undertake de-identification within the agency. Removing identifiers The most basic method of de-identification is to remove obvious identifying variables from the data such as an individual s name or address. For example, consider the following data: Name Address Postcode Age Gender Profession Barry Johns 10 Smith Street Woodville SA 5011 52 Male Instructor Annual Salary $75,000 By removing basic identifiers this can become: Postcode Age Gender Profession 5011 52 Male Instructor Annual Salary $75,000 While on the face of it this data has been stripped of its identifiers, it retains a relatively high potential for re-identification: the data still exists on an individual level and other, potentially identifying, information has been retained. For example, some South Australian postcodes have very small populations and combining this data with other publicly available information, can make re-identification a relatively easy task. While it may be tempting for agencies to strip out all potentially identifying information, doing so could render the data meaningless. The fact that somewhere in Australia there is a driving instructor that earns $75,000 may have limited potential use. 6 February 2014 Version 1 Page 6 of 12

Pseudonymisation A related method of de-identification is pseudonymisation which involves consistently replacing recognisable identifiers with artificially generated identifiers, such as a coded reference or pseudonym. In the example above, Barry Johns would be assigned a randomly selected numerical value: Individual reference Postcode Age Gender Profession SR23597 5011 52 Male Instructor Annual Salary $75 000 Pseudonymisation allows for different information about an individual, often in different datasets to be correlated without the consequence of direct identification of the individual. For example, the information above could be correlated with: Individual reference Marital status Number of children Highest level of education attained SR23597 Divorced 2 Diploma 3 Number of cars owned by household However, pseudonymisation also has a relatively high potential for re-identification, as the data exists on an individual level with other potentially identifying information being retained. Also, because pseudonymisation is generally used when an individual is tracked over more than one dataset, if re-identification does occur more personal information will be revealed concerning the individual Reducing the precision of the data Rendering personally identifiable information less precise can make the possibility of reidentification more remote. Dates of birth or ages can be replaced by age groups; specific salaries can be replaced by salary ranges. For example Barry John s data now becomes: Name Postcode Age range Gender Profession SR23597 5011 50-60 Male Instructor Annual Salary range $60,000 - $80,000 Related techniques include suppression of cells with low values or conducting statistical analysis to determine whether particular values can be correlated to individuals. In such cases it may be necessary to apply the frequency rule by setting a threshold for the minimum number of units contributing to any cell. Common threshold values are 3, 5 and 10. For example, applying a threshold value of 3 to the following table the cell indicating the number of driving instructors at ages 35-40 has a value less than 3 may be suppressed or aggregated into a bigger range. 6 February 2014 Version 1 Page 7 of 12

Age Postcode Number of Instructors Average Annual Salary 25-30 5011 20 $50,000 35-40 5011 2 $60,000 45-50 5011 10 $65,000 More advanced techniques include introducing random values or adding noise. It may also include altering the underlying data in a small way so that original values cannot be known with certainty but the aggregate results are unaffected. Aggregation Individual data can be combined to provide information about groups or populations. The larger the group and the less specific the data is about them, the less potential there will be for identifying an individual within the group. An example of aggregated data would be: Initial data: Profession State Annual Salary Number of drivers $49,500 200 $40,000 10,000 Instructor South Australia $45,000 2,000 $56,000 3,748 $58,000 11,414 $66,000 31,203 Aggregated data: Profession State Annual Salary Number of drivers Instructor South Australia <$50,000 12,200 >$55,000 46,365 Tools to assist in de-identification A range of tools and software packages are available to assist in the task of de-identifying datasets. These tools provide an automated method of applying a particular de-identification 6 February 2014 Version 1 Page 8 of 12

method and may assist an agency to determine with more precision the success of the deidentification method applied and the privacy risk of public release of the dataset. Some examples of these tools are listed below. The Privacy Committee does not endorse the use of any particular tool and provides these examples for information only. Agencies should conduct their own research to determine any tools suited to their de-identification task. It is noted that these tools have been developed in other jurisdictions and, therefore, some of their functionality may not be applicable to the Australian data environment. Mu-ARGUS Statistics Netherlands Privacy Analytics Risk Assessment Tool SUDA University of Manchester Cornell Anonymisation Toolbox University of Texas Anonymisation Toolbox Testing de-identification and reassessing the risk It is good privacy practice to test the methods that the agency has employed to mitigate the privacy risks of publishing the dataset. Primarily this will involve attempting to re-identify individuals from the de-identified dataset. This type of testing is sometimes referred to as penetration testing. In testing the de-identification method by attempting to re-identify the dataset, consideration should be given to all the factors considered in the initial assessment of identification risk, including the: presence of unique or clearly identifying variables presence of rare characteristics cross tabulation of variables to identify unique or rare combinations of variables for the same data subject availability of other data that could be linked with the dataset and lead to identification. The test should meet the following criteria: The test should attempt to identify particular individuals and one or more private attributes relating to those individuals. The test may employ any method which is reasonably likely to be used by a motivated intruder, that is, a person motivated to find out identified information. The test should use any lawfully obtainable data source which is reasonably likely to be used to identify particular individuals in the datasets. The agency should consider whether it is necessary to engage any specialist knowledge or expertise to properly test its methods of de-identification. Testing would need to be conducted by trusted parties in secure environments to avoid any inadvertent disclosure of personal information. The agency should reassess the risk of identification once it has tested the vulnerability of its dataset. If the risk is now at an acceptable level and a person could not be identified from 6 February 2014 Version 1 Page 9 of 12

the dataset then the agency can publish the dataset. If the risk of identification remains high the agency would have to consider whether it can employ further methods of deidentification to mitigate this risk. If the risk of identification is not able to be mitigated to an acceptable level, the dataset should not be released. See Appendix 1 for a privacy risk flow chart. SA NT DataLink It may not be possible to eliminate all the risks and agencies must consider if the data is suitable to release through an Open Data scheme. Another way of allowing secure access to the information is through the use of on-site data laboratories. One such data laboratory is SA NT DataLink. The Privacy Committee recommends that agencies only use data laboratories where that data is available at the custodian level or, where this is not possible, the privacy and ongoing security needs of the information is assured. Agencies should undertake a Privacy Impact Assessment where this is the case. SA NT DataLink is an unincorporated joint venture comprising South Australian and Northern Territory Government and non-government organisations. It enables the linkage of administrative and clinical datasets to allow population level health, social, education and economic research and evidence-based policy development to be undertaken with de-identified data, minimising risks to individual privacy when compared to traditional sample based research using identified data. Data linkage through SA NT DataLink is supported by the Privacy Committee through the granting of a number of exemptions. The exemptions allow State Government agencies to disclose limited identifying variables, such as name, date of birth and address, to SA NT DataLink for inclusion in its Master Linkage File to enable the creation of links between multiple government datasets. The exemptions are subject to strict conditions on the governance of data. For further information about SA NT DataLink please see https://www.santdatalink.org.au/animation. Alternatively, SA NT DataLink may be contacted on telephone 8302 1604 or email to santdatalink@unisa.edu.au. Where do I get more information? This Guideline has been issued by the Privacy Committee of South Australia. The Committee exists to: advise on measures that should be taken to protect personal information refer written complaints received about breaches of privacy to the relevant authority consider agency requests for exemption from compliance with the IPPI. Further information about the IPPI is available at www.archives.sa.gov.au/privacy. Other relevant documents Short Guide to the Information Privacy Principles 6 February 2014 Version 1 Page 10 of 12

Acknowledgements In developing this guidance the Privacy Committee has utilised the significant work of other Australian and international privacy authorities on the issue of online privacy. This includes the Office of the Queensland Information Commissioner, Dataset Publication and Deidentification Techniques. 6 February 2014 Version 1 Page 11 of 12

Appendix 1 privacy risk flow chart Does the Dataset contain personal information? Not clear? Yes individuals can be identified. Is it reasonably likely someone can be identified from the data? What other data is available that could be linked to your dataset? No the data does not relate to individuals. Data can be published. It is likely the IPPI prevents you from disclosing the data. You will need to consider whether the data can be de-identified. Assess the identification risk: 1. Undertake an initial assessment of the privacy risk that considers: Specific unique identifying variables, such as name. Cross-tabulation of other variables to determine unique combinations, such as age income and postcode. Availability of other publicly available datasets and information that could be used for list matching. The likelihood of an individual being identified from the data. The consequences of such identification. 2. Consider what methods can be employed to reduce the risk, such as Removing identifiers; Pseudonymisation; Reducing the precision of the data; Aggregation. 3. Test selected method of de-identification on the dataset. 4. Re-assess the privacy risk. Can individuals be identified from this data? Yes consider making further adjustments to the data and reassess risk If it is not possible to mitigate the risk to an acceptable level do not publish the data. No data can be published. Regularly review published dataset for privacy risks 6 February 2014 Version 1 Page 12 of 12