The De-identification of Personally Identifiable Information Khaled El Emam (PhD) www.privacyanalytics.ca 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue W, Suite 200 Ottawa, ON Canada K1P 5J6
De-identification Works http://www.plosone.org/article/info%3adoi%2f10.1371%2fjournal.pone.0028071
Anonymization = Risk Management
Direct & Quasi-identifiers Examples of direct identifiers: Name, address, telephone number, fax number, MRN, health card number, health plan beneficiary number, VID, license plate number, email address, photograph, biometrics, SSN, SIN, device number, clinical trial record number Examples of quasi-identifiers: sex, date of birth or age, geographic locations (such as postal codes, census geography, information about proximity to known or unique landmarks), language spoken at home, ethnic origin, total years of schooling, marital status, criminal history, total income, visible minority status, profession, event dates, number of children, high level diagnoses and procedures
2 Anonymization Landscape
HIPAA Safe Harbor Method Safe Harbor Direct Identifiers and Quasi-identifiers 1. Names 2. ZIP Codes (except first three) 3. All elements of dates (except year) 4. Telephone numbers 5. Fax numbers 6. Electronic mail addresses 7. Social security numbers 8. Medical record numbers 9. Health plan beneficiary numbers 10.Account numbers 11.Certificate/license numbers 12.Vehicle identifiers and serial numbers, including license plate numbers 13.Device identifiers and serial numbers 14.Web Universal Resource Locators (URLs) 15.Internet Protocol (IP) address numbers 16.Biometric identifiers, including finger and voice prints 17.Full face photographic images and any comparable images; 18. Any other unique identifying number, characteristic, or code
Expert Determination (Statistical) Method A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable: I. Applying such principles and methods; determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information by an anticipated recipient to identify an individual who is a subject of the information; and II. Documents the methods and results of the analysis that justify such determination
Spectrum of Identifiability 1 Cell Size 3 Two matching indirect identifiers in three cells within a dataset
Spectrum of Identifiability There are a range of operational precedents, based on situational context and mitigating controls. 8 10 11 5 16 3 2 20 Little De-identification Significant De-identification
Managing Re-identification Risk
De-identification Process Set Risk Threshold Based on the characteristics of the data recipient, the data, and precedents and quantitative threshold is set. This is an iterative process. The mitigating controls in place can be strengthened to get a more forgiving threshold. De-identification Process Measure Risk Based on plausible attacks, appropriate metrics are selected and used to measure actual reidentification risk from the data. Apply Transformations If the measured risk does not meet the threshold, specific transformations (such as generalization and suppression) are applied to reduce the risk.
Automation
Enabling Post-marketing and Public Health Surveillance Large EMR Vendor Challenge Wants to anonymize data on 535,595 patients from general practices Longitudinal data needs to be used for on-going and on-demand analytics Solution PARAT CORE PARAT integrated in ETL pipeline Why Privacy Analytics De-identified data would allow: 1. Post-marketing surveillance of adverse events 2. Public health surveillance 3. Prescription pattern analysis 4. Health services analysis Customer Profile EMR vendor with more than 2664 clinics and 5850 physicians using the system in family clinics and walk-in clinics. The data set spans more than five years of all clinical, prescription, laboratory, scheduling and billing data.
GI Protocol Two arm protocol; GI events after taking NSAIDs with and without a PPI
Chlamidya Protocol Females 14-24 years old inclusive tested and tested positive for Chlamydia in the previous 12 months
Contact kelemam@privacyanalytics.ca @kelemam www.privacyanalytics.ca