180 likes | 327 Views
6.3 Genomics and chronic inflammation Pseudonymisation of Patient Data. D. Voets. Pseudonymisation: “Basic Privacy Protection”. Pseudonymisation is a powerful and secure Privacy Enhancing Technique (PET) reconciling the two following conflicting requirements:
E N D
6.3 Genomics and chronic inflammationPseudonymisation of Patient Data D. Voets
Pseudonymisation: “Basic Privacy Protection” Pseudonymisation is a powerful and secure Privacy Enhancing Technique (PET) reconciling the two following conflicting requirements: • The adequateprotection of individuals or organizations with respect to their identity and privacy • The possibility of linking data associated with pseudo-IDs irrespective of the collection time (cf. longitudinal studies) and collection place (cf. multi-center studies) • Simplified: • Pseudonymisation translates a given identifier into a pseudo-identifier by using secure, dynamic and (preferably ir-)reversible cryptographic techniques Note that “pseudonymisation” and “anonymisation” terminology is not universal
Pseudonymisation: Use • Pseudonymisation can be used… • in all applications in which sensitive data is processed while the true identity of the data subjects is not strictly necessary • Typically there is a clear separation between a nominative data realm and an anonymous realmReference application: research databases composed for statistical purposes and data mining • Also for privacy protection during sub-processes in a data processing chaine.g. lab-tests in a hospital, outsourced administration • … but should not be reduced to a simple translation of identifier • Careful Privacy Risk Assessment is needed do define the data protection requirements and policies • Privacy protection can include other PETs
Pseudonymisation Concept A simple example, a drug related study: • At several data sources, medical records of treatment are collected • At certain intervals, collected records are gathered (all over the country) Name: John Doe DOB : July 5, 1973 POB (ZIP): 7951 Address: …… Tel. nr. : …… Start of Treatment: 14/10/2003 Date of Treatment: 24/10/2003 Medication: … Dosage: … Blood pressure: … Cholesterol Level: … … Used for processing into a Pseudo-ID Unnecessary (for the study) identifying data Useful research data
Pseudonymisation Concept Name: John Doe DOB : July 5, 1973 POB (ZIP): 7951 Address: …… Tel. nr. : …… Start of Treatment: 14/10/2003 Date of Treatment: 24/10/2003 Medication: CureAll Dosage: 10cc Blood pressure: … Cholesterol Level: … …. Health condition measurements JFH6UHRJ4MZAQQ9 Identifier Calculation Relative DoT: day 10 Medication: CureAll Dosage: 10cc Blood pressure: … Cholesterol Level: … …. Privacy Protecting Extraction (based on a privacy risk assessment)
Pseudonymisation Concept At the end, the original record is listed in the research database as: Patient: JFH6UHRJ4MZAQQ9 Relative DoT: day 10 Medication: CureAll Dosage: 10cc Blood pressure: … Cholesterol Level: … …. The clinical data of “John Doe” is now gathered for research, but his privacy is protected • Researchers only know a medical treatment profile of a patient “JFH6UHRJ4MZAQQ9”, which is enough information for their job • John does not have to worry that his participation in the study leaks to his bank, that is dealing with John’s application for a loan
Sources Trusted Third Party Data Collection Site Pseudonymisation: Implementation • Pseudonymisation systems: • Batch Data oriented for data collectionUsed for the collection of medical data from a large number of distributed sources, each having a local database management system(added value: data collection system) • Interactive Pseudonymisation“on-the-fly” translation of Pseudo-IDs to Nominative IDs (and back again), effectively splitting up the world into a “nominative” and “anonymous” data realm • Uses Web technology • Extremely transparent and flexible
Pseudonymisation : Need for a TTP Why a ‘Trusted Third Party’? • Best practice in Privacy Protection(In most cases the only way to ensure a correct, secure implementation) • When the one communicating party does not trust the other(a TTP is an independent party trusted by data source and data collector) • Certification: avoids being “self-certified”(TTP certifies operating procedures) • Expertise about regulations, technological implementation and policies is concentrated at the trust service provider • Additional privacy measures (PETs) and features possible, e.g.: • Controlled reversibility • Segmentation of data-streams • Controlled database perturbation • …
Nominative Browsing of an Anonymous Database Custodix Privacy Protection Server Data Collection Site Sources Irreversible Pseudonym Generation Name: John Doe DOB : July 5, 1973 POB (ZIP): 7951 Address: …… Tel. nr. : …… Pseudo-ID: AQWJFK68 Secure Vault LSJCN4575CNGJ82384C1N33AQ1038XMDIK2D Encrypted Storage “Secure Vaults” Day 1 Medication: … Dosage: … Blood pressure: … … Day 1 Medication: … Dosage: … Blood pressure: … … Day 2 Medication: … Dosage: … Blood pressure: … … Day 2 Medication: … Dosage: … Blood pressure: … …
Singapore Epidemiology Project TheSingapore National Disease Registries System (NDRS) The objectives are: • To establish registry processes for the following five registries:Cancer, cardiac, stroke, renal, myopia • To provide an integrated system to ensure the quality of the data collected and security and confidentiality of patients records • To provide a flexible long term solution Initiative by Singapore Health Promotion Board HPB (National level) Partners: • National Disease Registry Organisation (NDRO) (Data Provider) • NCS (Data Management) • Custodix (Data Security and confidentiality) Providing a Privacy Protection Toolbox
Periodontitis Research Data • Types of data: • Clinical and genetic data from clinicians • Oral microbiology data from diagnostic laboratories • Environmental data (e.g. smoking, stress, …) • Administrative data • X-Ray images • Primary focus on identifying administrative data: • name, date of birth • Possibly race (for outliers) • Limit impact on usability of data for researchers
Why pseudonymise? • Strictly speaking, researchers don’t need to see administrative data: • Considered good practice • Security through Privacy • Data ready for export to other studies • Students are not allowed to see personal information Identification needed only under specific conditions & by specific persons: • When a patient requests access to his data (right enforced by law) • When a patient needs to be notified about a certain condition • Data validation (correcting input errors) Offers clear advantages: • Stable id for longitudinal follow-up • Secure and safe way to link records related to same patient • Not all persons operating on DB need to be bound to legal agreements
Transformations • Pseudo-Id generated from multiple identifying data fields • Must be stable in time (e.g. address is a bad idea) • Talking into account input space (dictionary attacks!) • Decreasing error probability (misspelling names …) • “Micro vaults” containing 'as-is’ administrative data • Based on public key or hybrid cryptography • Allows fine-grained access (encryption at the information element level). • Only private key(s) need to be managed. • Easier and less error prone than managing (database) access control infrastructure.
Correction of misspelled names • Use of phonetic techniques to improve capture of person names • Data entry by exchange students with different nationalities, who speak different languages, … different way of spelling names • Adaptation of well-known algorithms like metaphone, double metaphone, soundex, … • Algorithms expected to be dependent on: • Nationality of the data subject • Lingual background of the person doing the data entry. • Validation of the (modified) algorithms used will be done by using list with common (real-life) data entry errors. • Care must be taken to avoid collisions (decreases input space) • E.g. Heynderickx • Alternative spellings: Hendrix, Heyndriks, Hendriks, … • All spellings are translated to the same phonetic code “HNTRKS”
Work to be done • Meeting on ACTA premises mid-August • Agenda: • Presentation of application(s) and network infrastructure • Work out technical solution (will require development of custom plug-ins) • Estimate of integration effort
Thank you for your attention! Custodix NV Verlorenbroodstr. 120 B-9820 Merelbeke Belgium http://www.custodix.com/ or info@custodix.com
REMINDER: INFOBIOMED PRIVACY PROTECTION SURVEY • Very few answers received so far • (mostly from same research group) • Currently not relevant enough to synthesize into report. • Online: http://survey.custodix.com/index.php?sid=2 • Paper copy available upon request.
Network Setup Intranet RDBMS Backup Server Server Web Service Sources