1 / 27

Algorithm Safe Privacy-Preserving Data Publishing

Algorithm Safe Privacy-Preserving Data Publishing. Xin Jin George Washington University Nan Zhang George Washington University Gautam Das University of Texas at Arlington. Outline. Introduction Algorithm-safe Data Publishing Model

mindy
Download Presentation

Algorithm Safe Privacy-Preserving Data Publishing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithm Safe Privacy-Preserving Data Publishing Xin Jin George Washington University Nan Zhang George Washington University Gautam Das University of Texas at Arlington

  2. Outline • Introduction • Algorithm-safe Data Publishing Model • AmendmentToolset: Look-ahead Partitioning and Stratified Pick-up • Experimental Results • Conclusion

  3. Privacy-Preserving Data Publishing • Share individual records to enable analytical tasks (e.g. aggregate query answering, data mining) while protecting individual privacy information.

  4. What is Algorithm-based Disclosure? • Algorithm-based disclosure in existing methods (e.g., [WFW+07] [LLV07] [MGK+07]). • An example by using ℓ–diversity. 2 – diversity Table

  5. What If an Adversary Knows the Algorithm? Published Table 1st Conjectured Original Data Better Output Table

  6. What If an Adversary Knows the Algorithm? Published Table Better Output Table 2nd Conjectured Original Data

  7. What If an Adversary Knows the Algorithm? Published Table 3rd Conjectured Original Data

  8. Algorithm-safe Data Publishing (ASP) Q: How likely does Eve have HIV? Smart User Naïve User My answer Algorithm = My answer Background Knowledge Background Knowledge Published Table Published Table

  9. Algorithm-safe Data Publishing (ASP) • Problem Definition: For each tuple (i.e., row) ti= <q, s> in the original data T, there is: Pr{ti [SA] = s’ | ti [QI] = q, K} = Pr{ti [SA] = s’ | ti [QI] = q, K, A} for each s’ in the domain of SA, where K is background knowledge and A is the data publishing algorithm.

  10. Necessary Condition #1QI*-Independence Query SA by QI Query Data Publisher Safe QI-SA correlation QI-SA correlation Safe QI-SA correlation Oracle Original Data QI*-Independence : Generated QI* is conditional independent of the original SA, given a combination of QI and the published SA*. ASP Published Table

  11. Necessary Condition #2SA*-Independence Impossible QI-SA correlation Query SA by QI Query Data Publisher Safe QI-SA correlation Perturbed Safe QI-SA correlation QI-SA correlation Oracle Original Data SA*-Independence : Generated SA* is conditional independent of the original SA, given a combination of QI, QI* and the impossible QI-SA correlation. ASP Published Table

  12. How to Achieve ASP Model? • Play the Role of Oracle • Satisfy QI*-Independence • Never perturb SA • Worst-case Eligibility Test • Look-ahead partitioning

  13. A Mondrian Method [LDR06] to Achieve ℓ–diversity (ℓ = 2) t5 t1 t6 t2 t7 t3 t4 t8

  14. A Mondrian Method to Achieve ℓ–diversity t5 S1 t1 S2 t6 S3 t2 S4 t7 S5 t3 t4 t8 x = 5

  15. A Mondrian Method to Achieve ℓ–diversity t5 t1 S1 S2 t6 t2 S3 y = 5 S4 t7 S5 t3 t4 t8 x = 5

  16. A Mondrian Method to Achieve ℓ–diversity t5 S1 t1 S2 t6 S3 t2 S4 S5 t7 t3 t4 t8 x = 5

  17. Look-Ahead Partitioning t5 t1 t6 t2 t7 t3 t4 t8

  18. Look-Ahead Partitioning t5 S1 t1 S2 t6 S3 t2 S4 t7 S5 t3 t4 t8

  19. Look-Ahead Partitioning t5 S1 t1 S2 t6 S3 t2 S4 S5 t7 t3 t4 t8 x = 5

  20. Amendment Toolset • Look-Ahead Partitioning : Execute the partitioning if a worst (i.e., most skewed) scenario of QI-SA correlation is eligible to achieves the given privacy guarantee (e.g., ℓ–diversity). • Can be extended to other algorithms such as Hilb [GKKM07], Incognito [LDR05], MASK [WFW+07], etc. • Limitation: May harm the utility due to large-sized groups. • Stratified Pick-up: Take as input the anonymous groups and attempt to further partition each of these groups iteratively based solely on the distinctness of SA values.

  21. Stratified Pick-Up t5 S1 t1 S2 t6 S3 t2 S4 S5 t7 t3 t4 t8

  22. Experiment Setup • Adult Dataset (http://archive.ics.uci.edu/ml/) • 45,222 tuples • SA: Education. • Census Dataset (http://ipums.org) • 300K tuples • SA: Occupation

  23. Effect of Amendment Toolset

  24. Time Performance

  25. Conclusion • We unveil algorithm-based disclosure is much more significant than ever studied. • We rigidly define Algorithm-Safe data Publishing (ASP) model. • We propose a screening toolfor algorithm-based disclosure by two necessary conditions. • We explore amendments on problematic methods (if “diagnosed” of algorithm-based disclosure).

  26. References [WFW+07] Wong, R. C. and Fu, A. W. and Wang, K. and Pei, J. Minimality Attack in Privacy-Preserving Data Publishing. [LLV07] Li, N. and Li, T. and Venkatasubramanian, S. t-Closeness: Privacy Beyond k-anonymity and ℓ-diversity [MGK+07] Machanavajjhala, A. and Gehrke, J. and Kifer, D. and Venkitasubramaniam, M. ℓ-diversity: Privacy Beyond k-anonymity. [ZJB07] Zhang, L. and Jajodia, S. and Brodsky, A. Information Disclosure under Realistic Assumptions: Privacy versus Optimality. [GKKM07] Ghinita, G. and Karras, P. and Kalnis, P. and Mamoulis, N. Fast Data Anonymization with Low Information Loss. [LDR06] LeFevre, K. and DeWitt, D. J. and Ramakrishnan, R. Mondrian Multidimensional k-anonymity [LDR05] LeFevre, K. and DeWitt, D. J. and Ramakrishnan, R. Incognito: efficient full-domain k-anonymity [XT06] Xiao, X. and Tao, Y. Anatomy: Simple and Effective Privacy Preservation.

  27. Thank You

More Related