210 likes | 418 Views
Practical De-identification Methods. Khaled El Emam, Privacy Analytics Inc. Definition of De-identified Data.
E N D
Practical De-identification Methods Khaled El Emam, Privacy Analytics Inc.
Definition of De-identified Data Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information.
Direct Identifiers • Fields that would uniquely identify individuals in a database • Name, address, telephone number, fax number, MRN, health card number, health plan beneficiary number, license plate number, email address, photograph, biometrics, SSN, SIN, implanted device number
Quasi-Identifiers • sex, date of birth or age, geographic locations (such as postal codes, census geography, information about proximity to known or unique landmarks), language spoken at home, ethnic origin, aboriginal identity, total years of schooling, marital status, criminal history, total income, visible minority status, activity difficulties/reductions, profession, event dates (such as admission, discharge, procedure, death, specimen collection, visit/encounter), codes (such as diagnosis codes, procedure codes, and adverse event codes), country of birth, birth weight, and birth plurality
De-identification Standards • The HIPAA Privacy Rule specifies two de-identification standards (45 CFR 164.514): • Safe Harbor • Statistical method (also known as the expert statistician method)
HIPAA Safe Harbor Safe Harbor Direct Identifiers and Quasi-identifiers Names ZIP Codes (except first three) All elements of dates (except year) Telephone numbers Fax numbers Electronic mail addresses Social security numbers Medical record numbers Health plan beneficiary numbers Account numbers Certificate/license numbers Vehicle identifiers and serial numbers, including license plate numbers Device identifiers and serial numbers Web Universal Resource Locators (URLs) Internet Protocol (IP) address numbers Biometric identifiers, including finger and voice prints Full face photographic images and any comparable images; 18. Any other unique identifying number, characteristic, or code
Statistical Method (HIPAA) • A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable: • Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and • Documents the methods and results of the analysis that justify such determination
Example – CA Hospital Discharges • Context: data release to a data analytics company who will sign a data use agreement, good practices for managing sensitive health information • There were ~2.1m patients who had ~3m visits • Risk threshold = 0.2; use average risk across all patients • Variables: • Year of birth • Gender • Year of admission • Days since last visit • Length of stay
www.privacyanalytics.ca More Information @PrivacyAnalytic