1 / 18

6.3 Genomics and chronic inflammation Pseudonymisation of Patient Data

6.3 Genomics and chronic inflammation Pseudonymisation of Patient Data. D. Voets. Pseudonymisation: “Basic Privacy Protection”. Pseudonymisation is a powerful and secure Privacy Enhancing Technique (PET) reconciling the two following conflicting requirements:

sasson
Download Presentation

6.3 Genomics and chronic inflammation Pseudonymisation of Patient Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6.3 Genomics and chronic inflammationPseudonymisation of Patient Data D. Voets

  2. Pseudonymisation: “Basic Privacy Protection” Pseudonymisation is a powerful and secure Privacy Enhancing Technique (PET) reconciling the two following conflicting requirements: • The adequateprotection of individuals or organizations with respect to their identity and privacy • The possibility of linking data associated with pseudo-IDs irrespective of the collection time (cf. longitudinal studies) and collection place (cf. multi-center studies) • Simplified: • Pseudonymisation translates a given identifier into a pseudo-identifier by using secure, dynamic and (preferably ir-)reversible cryptographic techniques Note that “pseudonymisation” and “anonymisation” terminology is not universal

  3. Pseudonymisation: Use • Pseudonymisation can be used… • in all applications in which sensitive data is processed while the true identity of the data subjects is not strictly necessary • Typically there is a clear separation between a nominative data realm and an anonymous realmReference application: research databases composed for statistical purposes and data mining • Also for privacy protection during sub-processes in a data processing chaine.g. lab-tests in a hospital, outsourced administration • … but should not be reduced to a simple translation of identifier • Careful Privacy Risk Assessment is needed do define the data protection requirements and policies • Privacy protection can include other PETs

  4. Pseudonymisation Concept A simple example, a drug related study: • At several data sources, medical records of treatment are collected • At certain intervals, collected records are gathered (all over the country) Name: John Doe DOB : July 5, 1973 POB (ZIP): 7951 Address: …… Tel. nr. : …… Start of Treatment: 14/10/2003 Date of Treatment: 24/10/2003 Medication: … Dosage: … Blood pressure: … Cholesterol Level: … … Used for processing into a Pseudo-ID Unnecessary (for the study) identifying data Useful research data

  5. Pseudonymisation Concept Name: John Doe DOB : July 5, 1973 POB (ZIP): 7951 Address: …… Tel. nr. : …… Start of Treatment: 14/10/2003 Date of Treatment: 24/10/2003 Medication: CureAll Dosage: 10cc Blood pressure: … Cholesterol Level: … …. Health condition measurements JFH6UHRJ4MZAQQ9 Identifier Calculation Relative DoT: day 10 Medication: CureAll Dosage: 10cc Blood pressure: … Cholesterol Level: … …. Privacy Protecting Extraction (based on a privacy risk assessment)

  6. Pseudonymisation Concept At the end, the original record is listed in the research database as: Patient: JFH6UHRJ4MZAQQ9 Relative DoT: day 10 Medication: CureAll Dosage: 10cc Blood pressure: … Cholesterol Level: … …. The clinical data of “John Doe” is now gathered for research, but his privacy is protected • Researchers only know a medical treatment profile of a patient “JFH6UHRJ4MZAQQ9”, which is enough information for their job • John does not have to worry that his participation in the study leaks to his bank, that is dealing with John’s application for a loan

  7. Sources Trusted Third Party Data Collection Site Pseudonymisation: Implementation • Pseudonymisation systems: • Batch Data oriented for data collectionUsed for the collection of medical data from a large number of distributed sources, each having a local database management system(added value: data collection system) • Interactive Pseudonymisation“on-the-fly” translation of Pseudo-IDs to Nominative IDs (and back again), effectively splitting up the world into a “nominative” and “anonymous” data realm • Uses Web technology • Extremely transparent and flexible

  8. Pseudonymisation : Need for a TTP Why a ‘Trusted Third Party’? • Best practice in Privacy Protection(In most cases the only way to ensure a correct, secure implementation) • When the one communicating party does not trust the other(a TTP is an independent party trusted by data source and data collector) • Certification: avoids being “self-certified”(TTP certifies operating procedures) • Expertise about regulations, technological implementation and policies is concentrated at the trust service provider • Additional privacy measures (PETs) and features possible, e.g.: • Controlled reversibility • Segmentation of data-streams • Controlled database perturbation • …

  9. Nominative Browsing of an Anonymous Database Custodix Privacy Protection Server Data Collection Site Sources Irreversible Pseudonym Generation Name: John Doe DOB : July 5, 1973 POB (ZIP): 7951 Address: …… Tel. nr. : …… Pseudo-ID: AQWJFK68 Secure Vault LSJCN4575CNGJ82384C1N33AQ1038XMDIK2D Encrypted Storage “Secure Vaults” Day 1 Medication: … Dosage: … Blood pressure: … … Day 1 Medication: … Dosage: … Blood pressure: … … Day 2 Medication: … Dosage: … Blood pressure: … … Day 2 Medication: … Dosage: … Blood pressure: … …

  10. Singapore Epidemiology Project TheSingapore National Disease Registries System (NDRS) The objectives are: • To establish registry processes for the following five registries:Cancer, cardiac, stroke, renal, myopia • To provide an integrated system to ensure the quality of the data collected and security and confidentiality of patients records • To provide a flexible long term solution Initiative by Singapore Health Promotion Board HPB (National level) Partners: • National Disease Registry Organisation (NDRO) (Data Provider) • NCS (Data Management) • Custodix (Data Security and confidentiality) Providing a Privacy Protection Toolbox

  11. Periodontitis Research Data • Types of data: • Clinical and genetic data from clinicians • Oral microbiology data from diagnostic laboratories • Environmental data (e.g. smoking, stress, …) • Administrative data • X-Ray images • Primary focus on identifying administrative data: • name, date of birth • Possibly race (for outliers) • Limit impact on usability of data for researchers

  12. Why pseudonymise? • Strictly speaking, researchers don’t need to see administrative data: • Considered good practice • Security through Privacy • Data ready for export to other studies • Students are not allowed to see personal information Identification needed only under specific conditions & by specific persons: • When a patient requests access to his data (right enforced by law) • When a patient needs to be notified about a certain condition • Data validation (correcting input errors) Offers clear advantages: • Stable id for longitudinal follow-up • Secure and safe way to link records related to same patient • Not all persons operating on DB need to be bound to legal agreements

  13. Transformations • Pseudo-Id generated from multiple identifying data fields • Must be stable in time (e.g. address is a bad idea) • Talking into account input space (dictionary attacks!) • Decreasing error probability (misspelling names …) • “Micro vaults” containing 'as-is’ administrative data • Based on public key or hybrid cryptography • Allows fine-grained access (encryption at the information element level). • Only private key(s) need to be managed. • Easier and less error prone than managing (database) access control infrastructure.

  14. Correction of misspelled names • Use of phonetic techniques to improve capture of person names • Data entry by exchange students with different nationalities, who speak different languages, …  different way of spelling names • Adaptation of well-known algorithms like metaphone, double metaphone, soundex, … • Algorithms expected to be dependent on: • Nationality of the data subject • Lingual background of the person doing the data entry. • Validation of the (modified) algorithms used will be done by using list with common (real-life) data entry errors. • Care must be taken to avoid collisions (decreases input space) • E.g. Heynderickx • Alternative spellings: Hendrix, Heyndriks, Hendriks, … • All spellings are translated to the same phonetic code “HNTRKS”

  15. Work to be done • Meeting on ACTA premises mid-August • Agenda: • Presentation of application(s) and network infrastructure • Work out technical solution (will require development of custom plug-ins) • Estimate of integration effort

  16. Thank you for your attention! Custodix NV Verlorenbroodstr. 120 B-9820 Merelbeke Belgium http://www.custodix.com/ or info@custodix.com

  17. REMINDER: INFOBIOMED PRIVACY PROTECTION SURVEY • Very few answers received so far • (mostly from same research group) • Currently not relevant enough to synthesize into report. • Online: http://survey.custodix.com/index.php?sid=2 • Paper copy available upon request.

  18. Network Setup Intranet RDBMS Backup Server Server Web Service Sources

More Related