De-Identification of Clinical Data Sepideh Khosravifar, CISSP Info Security Analyst IV TEPR Conference 2008 Ft. Lauderdale, Florida May 17-21, 2008 1 1
Slide 1 cmw1 Craig M. Winter, 4/25/2008
Background One of the major challenges facing Medical Informatics is creating data sets for research and testing that maintain patient confidentiality. De-identification is a required element of information integration, reducing the risks of unauthorized disclosure. 2
Anonymization Anonymization is the process that removes the association between a data set and the data subject. It can be done in the following ways: (1) Removing or transforming identifying characteristics in the data set so that the association is not unique and relates to more than one data subject (2) Increasing the population in the data subjects set so that the association between the data set and the data subject is not unique. Source: ISO/IEC DTS 25237 3
Pseudonymization Pseudonymization is a particular type of anonymization that both removes the association with a data subject and adds an association between a particular set of characteristics relating to the data subject and one or more pseudonyms. It provides a means for information to be linked to the same person across multiple data records without revealing the identity of the person as a data subject. Source: ISO/IEC DTS 25237 4
Re-identification Pseudonymization through the trusted third party can support re-identification where the implementation requires re-identification such as supporting case investigation and other public health event detection and management. Reasons for re-identification that should be considered include: Verification and validation of data integrity Checking for suspected duplicate records Enabling requests for additional data Linking to supplement research information variables Compliance audits Informing data subjects or their care providers of significant findings Facilitating follow-up research Law enforcement. 5
Issues Requiring Consideration Frequency and types of errors in de-identification method. De-id tools are subject to at least two types of errors: (a) Failure to remove information that constitutes one of the 18 HIPAA Safe Harbor data elements (Undermarking), (b) Removal of more information than is required (Overmarking) rendering records less useful and informative. 6
NHIN Anonymization Guidelines HIPAA de-identification [45CFR164.514(b)(2)(i)] Anonymization Guidelines (A) Names; 3) Replace patient, contact, next of kin, provider, technician and any other person name data with fabricated data. 10) Replace all employer, practice, laboratory, etc. names with fabricated names. (B) All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code 6) Replace all geographic location data (patient, provider, etc.) smaller than a state with fabricated data, including street address, city, county, precinct and zip code. 7
NHIN Anonymization Guidelines (C) All elements of dates (except year) for dates directly related to an individual, including admission date, discharge date, date of death; birth date, all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older (D) Telephone numbers; 2) Replace all registration data columns with fabricated data (except for: gender_code, date_of_birth) a) Offset all result dates by a random number of days (between 1 and 90) into the past. a) Offset date_of_birth by random number of days between 1 and 90 into the past. 5) Replace all telephone and fax numbers with a fabricated number, for example 222-555-1111. Use the fictitious exchange code 555 in all cases. 8
NHIN Anonymization Guidelines (E) Fax numbers; 5) Replace all telephone and fax numbers with a fabricated number, for example 222-555-1111. Use the fictitious exchange code 555 in all cases. (F) Electronic mail addresses; 7) Replace all email addresses, URLs and IP addresses with fabricated data. (G) Social security numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. 9
NHIN Anonymization Guidelines (H) Medical record numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. (I) Health plan beneficiary numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. (J) Account numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. 10
NHIN Anonymization Guidelines (K) Certificate/license numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. (L) Vehicle identifiers and serial numbers, including license plate numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. (M) Device identifiers and serial numbers; 4) Replace all Social Security Numbers, order numbers, account numbers, patient ID numbers, Medicare/Medicaid numbers, certificate or licensing numbers, etc,. with fabricated numbers. 11
NHIN Anonymization Guidelines (N) Web Universal Resource Locators (URLs); 7) Replace all email addresses, URLs and IP addresses with fabricated data. (O) Internet Protocol (IP) address numbers; 7) Replace all email addresses, URLs and IP addresses with fabricated data. (P) Biometric identifiers, including finger and voice prints; 11) Replace any other data that can be considered part of the HIPAA 18 individual identifiers. 12
NHIN Anonymization Guidelines (Q) Full face photographic images and any comparable images; and 11) Replace any other data that can be considered part of the HIPAA 18 individual identifiers. (R) Any other unique identifying number, characteristic, or code 2a) Retain gender_code. 11) Replace any other data that can be considered part of the HIPAA 18 individual identifiers. 13
HITSP Pseudonymize Transaction: Patient Pseudo Identifying Information 14
Person Identifier Cross-Reference (PIX) Manager Query 15
Patient Identity Feed 16
Standards Health Insurance Portability an Accountability Act (HIPAA) Health Level Seven (HL7) Integrating the Healthcare Enterprise (IHE) IT Infrastructure Technical Framework (ITI-TF) International Organization for Standardization (ISO) Health Informatics - Pseudonymization, Technical Specification # 25237 17
Summary Data de-identification systems can help accomplish organizations goals of improving quality of care, promoting research, and protecting privacy. However, producing anonymous data that remains specific enough to be useful is often a very difficult task. Although new technology offers some good choices, technical solutions alone remain inadequate. Technology must work with policy for the most effective solutions. 18
References ISO/IEC DTS 25237, Pseudonymization Practices for the Protection of Personal Health Information and Health Related Services HITSP Pseudonymize Transaction Ready for Implementation V2.1 National Health Information Network (NHIN) 19
Contact Information Sepideh Khosravifar, CISSP For Department of Veteran Affairs SAIC - Information Security Analyst IV khosravifars@saic.com 858-826-5447 office 20
Questions? 21