1 / 16

Li Xiong CS573 Data Privacy and Security

Healthcare privacy and security: Genomic data privacy. Li Xiong CS573 Data Privacy and Security. Genomic data privacy. Genomic data are increasingly collected, stored, and shared in research and clinical environments

ruby-tyler
Download Presentation

Li Xiong CS573 Data Privacy and Security

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Healthcare privacy and security: Genomic data privacy Li Xiong CS573 Data Privacy and Security

  2. Genomic data privacy • Genomic data are increasingly collected, stored, and shared in research and clinical environments • Genomic data are person-specific (there exists no public registrar that maps genomes to names of individuals) • Genomic data is not specified as an identifying patient attribute under HIPAA privacy rule and may be released for public research purposes How can person-specific DNA be shared, such that it cannot be associated to its explicit identity?

  3. Data sharing scenario • John Smith admitted to a local hospital which stores clinical and DNA information • John visits other hospitals • The hospital forward certain DNA data onto a research group, with institution and pseudonyms of the patients • The hospital sends identified discharge record onto a state-controlled database

  4. Data at a specific location • Identified table of patient demographics • De-identified DNA sequences • Can we uniquely link identified data to DNA data?

  5. Data at multiple locations • Each site has an identified table and de-identified DNA sequences • Can we uniquely link identified data to DNA data?

  6. Trails • The set of locations each patient visited is called a trail • The trails can be tracked and matched to link DNA data to identified data

  7. REIDIT-Complete • Re-identification of data in trails (REIDIT) for complete publishing • If there is a unique trail match, then a re-identification occurred

  8. Results

  9. REIDIT-C reidentification • Re-identifiability related to average # people per location

  10. Reserved publishing • Data releasers can reserve certain information • N is reserved to P vs. P is reserved to N

  11. REIDIT - Incomplete • REIDIT for reserved publishing • For each trail in the track with incomplete trails, if there is only one supertrail, then a re-identification occurred • Remove the re-identified supertrail • Important because a trail can be a supertrail to many trails • Repeat the process

  12. REIDIT-Incomplete 0.0, 0.1, 0.5, 0.9: probability of reserving information; hospital rank based on # of patients

  13. Can masking location help? Not necessarily!

  14. Comments and open issues • Can k-anonymity solve the problem? • Pseudonyms subject to dictionary attacks, how to allow linkage of the data without pseudonyms • Genomic protection methods incorporating utility of the genomic data

  15. De-identification e.g. Utah Resource for Genetic and Epidemiologic Research (RGE)

More Related