1 / 28

Privacy-Preserving Genomic Data Sharing

This presentation discusses privacy models and algorithms for anonymizing and releasing genomic data while preserving data utility. The focus is on tree-based approaches and future directions for improving privacy-preserving genomic data sharing.

cmcninch
Download Presentation

Privacy-Preserving Genomic Data Sharing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014

  2. Reference 2 • N. Mohammed, R. Chen, B. C. M. Fung, and P. S. Yu. Differentially private data release for data mining. In Proceedings of the17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 493-501, 2011.

  3. Outline 3 Privacy Models Algorithm for Relational Data Algorithm for Genomic Data Conclusion

  4. Overview Anonymization algorithm 4 Data utility Privacy model

  5. k-Anonymity [Samarati & Sweeney, PODS 1998] Quasi-identifier (QID): The set of re-identification attributes. k-anonymity: Each record cannot be distinguished from at least k-1 other records in the table wrt QID. 5

  6. Differential Privacy [DMNS, TCC 06] A

  7. Differential Privacy D D’ D and D’ are neighbors if they differ on at most one record A non-interactive privacy mechanism Agives ε-differential privacy if for all neighbour D and D’, and for any possible sanitized database D* PrA[A(D) = D*] ≤ exp(ε) × PrA[A(D’) = D*]

  8. Laplace Mechanism ∆f = maxD,D’||f(D) – f(D’)||1 For a counting query f: ∆f =1 For example, for a single counting query Q over a dataset D, returning Q(D) + Laplace(1/ε) maintains ε-differential privacy.

  9. Outline 9 Privacy Models Algorithm for Relational Data Algorithm for Genomic Data Conclusion

  10. Non-interactive Framework 10 0 + Lap(1/ε)

  11. Non-interactive Framework 11 0 + Lap(1/ε) For high-dimensional data, noise is too big

  12. Non-interactive Framework 12

  13. Job Any_Job Age [18-65) Professional Artist [18-40) [40-65) Anonymization Algorithm 13 Engineer Lawyer Dancer Writer [18-30) [30-40)

  14. Candidate Selection we favor the specialization with maximum Score value First utility function: ∆u = Second utility function: ∆u = 1 14 14

  15. Anonymization Algorithm 15 O(Aprx|D|log|D|) O(|candidates|) O(|D|) O(|D|log|D|) O(1)

  16. Anonymization Algorithm 16 O(Aprx|D|log|D|) O((Apr+h)x|D|log|D|) O(|candidates|) O(|D|) O(|D|log|D|) O(1)

  17. Outline 17 Privacy Models Algorithm for Relational Data Algorithm for Genomic Data Conclusion

  18. case_chr2_29504091_30044866 18 rs11686243 AG AG AA AG GG AA AG AA GG AG AA AA AA AA AA AA AA GG GG AG AG AA AG GG AA AA GG AG AG AG GG AG AA AA AG AG AG AG AG AG AA AG GG AG AA GG GG GG GG AG AG AG AG AA GG GG GG AG AA AG GG AG AA GG GG AG AG AG AG AG AA AA AG AG AG AA AG AG AG AG GG AG AG AG GG GG AG AG GG AG AG AG AA AA GG AG AA GG AA AA AG GG AG AG AG AG AG AG AG AG GG GG AA AG AG AG AG AA AG GG AG GG AA AG GG AG AG AG AA AG AG AG GG AG GG AG GG AG AG AG GG AG AG GG GG AG AG GG AA GG AA AG AG AG AG GG AG AA AG GG GG AG AG AG AG AG GG AG AG AA AG AA AA AG GG AA AG AG GG AG GG AG AG GG GG AG AG AA AG AG AG GG AG GG GG AG AG GG AG GG rs4426491 CC CC CC CT CT CC CT CC CT CT CC CC CC CC CC CC CC CT CT CT CC CC CT CT CC CC CT CC CT CC CT CC CC CC CT …. rs4305230 CC CC CC CT CT CC CT CC TT CT CC CC CC CC CC CC CC TT TT CT CC CC CT CT CC CC CT CC CT CC TT CC CC CC CT ….

  19. Raw Data 19

  20. Blocks/Attributes 20 Unique Combinations: AG CC AA CC AG CT GG CT Any AG CC AA CC AG CT GG CT

  21. Taxonomy Trees for Attributes 21 • SNP data was split evenly into N/6 blocks(attributes), where N is number of SNPs

  22. Hierarchy Tree for Chr2 22

  23. Hierarchy Tree for Chr10 23

  24. Block 1 Any Block 3 Any AG CC AA CC CC GG CT AG Genomic Data 24

  25. Anonymized Data 25

  26. Heterogeneous Healthcare Data 26 Relational Data Genomic Data

  27. Conclusions 27 • Privacy-Preserving Genomic Data Release • Tree-based approach is promising • Future work • Partitioning the SNPs to generate blocks • Utility function for specialization • Two-level tree Vs. multi-level hierarchy trees • Single-dimension Vs. multi-dimensional partitioning

  28. Thank You ! 28 • Privacy-Preserving Genomic Data Release • Tree-based approach is promising • Future work • Partitioning the SNPs to generate blocks • Utility function for specialization • Two-level tree Vs. multi-level hierarchy trees • Single-dimension Vs. multi-dimensional partitioning

More Related