160 likes | 297 Views
Healthcare privacy and security: Genomic data privacy. Li Xiong CS573 Data Privacy and Security. Genomic data privacy. Genomic data are increasingly collected, stored, and shared in research and clinical environments
E N D
Healthcare privacy and security: Genomic data privacy Li Xiong CS573 Data Privacy and Security
Genomic data privacy • Genomic data are increasingly collected, stored, and shared in research and clinical environments • Genomic data are person-specific (there exists no public registrar that maps genomes to names of individuals) • Genomic data is not specified as an identifying patient attribute under HIPAA privacy rule and may be released for public research purposes How can person-specific DNA be shared, such that it cannot be associated to its explicit identity?
Data sharing scenario • John Smith admitted to a local hospital which stores clinical and DNA information • John visits other hospitals • The hospital forward certain DNA data onto a research group, with institution and pseudonyms of the patients • The hospital sends identified discharge record onto a state-controlled database
Data at a specific location • Identified table of patient demographics • De-identified DNA sequences • Can we uniquely link identified data to DNA data?
Data at multiple locations • Each site has an identified table and de-identified DNA sequences • Can we uniquely link identified data to DNA data?
Trails • The set of locations each patient visited is called a trail • The trails can be tracked and matched to link DNA data to identified data
REIDIT-Complete • Re-identification of data in trails (REIDIT) for complete publishing • If there is a unique trail match, then a re-identification occurred
REIDIT-C reidentification • Re-identifiability related to average # people per location
Reserved publishing • Data releasers can reserve certain information • N is reserved to P vs. P is reserved to N
REIDIT - Incomplete • REIDIT for reserved publishing • For each trail in the track with incomplete trails, if there is only one supertrail, then a re-identification occurred • Remove the re-identified supertrail • Important because a trail can be a supertrail to many trails • Repeat the process
REIDIT-Incomplete 0.0, 0.1, 0.5, 0.9: probability of reserving information; hospital rank based on # of patients
Can masking location help? Not necessarily!
Comments and open issues • Can k-anonymity solve the problem? • Pseudonyms subject to dictionary attacks, how to allow linkage of the data without pseudonyms • Genomic protection methods incorporating utility of the genomic data
De-identification e.g. Utah Resource for Genetic and Epidemiologic Research (RGE)