Algorithm Safe Privacy-Preserving Data Publishing

Algorithm Safe Privacy-Preserving Data Publishing Xin Jin George Washington University Nan Zhang George Washington University Gautam Das University of Texas at Arlington

Outline • Introduction • Algorithm-safe Data Publishing Model • AmendmentToolset: Look-ahead Partitioning and Stratified Pick-up • Experimental Results • Conclusion

Privacy-Preserving Data Publishing • Share individual records to enable analytical tasks (e.g. aggregate query answering, data mining) while protecting individual privacy information.

What is Algorithm-based Disclosure? • Algorithm-based disclosure in existing methods (e.g., [WFW+07] [LLV07] [MGK+07]). • An example by using ℓ–diversity. 2 – diversity Table

What If an Adversary Knows the Algorithm? Published Table 1st Conjectured Original Data Better Output Table

What If an Adversary Knows the Algorithm? Published Table Better Output Table 2nd Conjectured Original Data

What If an Adversary Knows the Algorithm? Published Table 3rd Conjectured Original Data

Algorithm-safe Data Publishing (ASP) Q: How likely does Eve have HIV? Smart User Naïve User My answer Algorithm = My answer Background Knowledge Background Knowledge Published Table Published Table

Algorithm-safe Data Publishing (ASP) • Problem Definition: For each tuple (i.e., row) ti= <q, s> in the original data T, there is: Pr{ti [SA] = s’ | ti [QI] = q, K} = Pr{ti [SA] = s’ | ti [QI] = q, K, A} for each s’ in the domain of SA, where K is background knowledge and A is the data publishing algorithm.

Necessary Condition #1QI*-Independence Query SA by QI Query Data Publisher Safe QI-SA correlation QI-SA correlation Safe QI-SA correlation Oracle Original Data QI*-Independence : Generated QI* is conditional independent of the original SA, given a combination of QI and the published SA*. ASP Published Table

Necessary Condition #2SA*-Independence Impossible QI-SA correlation Query SA by QI Query Data Publisher Safe QI-SA correlation Perturbed Safe QI-SA correlation QI-SA correlation Oracle Original Data SA*-Independence : Generated SA* is conditional independent of the original SA, given a combination of QI, QI* and the impossible QI-SA correlation. ASP Published Table

How to Achieve ASP Model? • Play the Role of Oracle • Satisfy QI*-Independence • Never perturb SA • Worst-case Eligibility Test • Look-ahead partitioning

A Mondrian Method [LDR06] to Achieve ℓ–diversity (ℓ = 2) t5 t1 t6 t2 t7 t3 t4 t8

A Mondrian Method to Achieve ℓ–diversity t5 S1 t1 S2 t6 S3 t2 S4 t7 S5 t3 t4 t8 x = 5

A Mondrian Method to Achieve ℓ–diversity t5 t1 S1 S2 t6 t2 S3 y = 5 S4 t7 S5 t3 t4 t8 x = 5

A Mondrian Method to Achieve ℓ–diversity t5 S1 t1 S2 t6 S3 t2 S4 S5 t7 t3 t4 t8 x = 5

Look-Ahead Partitioning t5 t1 t6 t2 t7 t3 t4 t8

Look-Ahead Partitioning t5 S1 t1 S2 t6 S3 t2 S4 t7 S5 t3 t4 t8

Look-Ahead Partitioning t5 S1 t1 S2 t6 S3 t2 S4 S5 t7 t3 t4 t8 x = 5

Amendment Toolset • Look-Ahead Partitioning : Execute the partitioning if a worst (i.e., most skewed) scenario of QI-SA correlation is eligible to achieves the given privacy guarantee (e.g., ℓ–diversity). • Can be extended to other algorithms such as Hilb [GKKM07], Incognito [LDR05], MASK [WFW+07], etc. • Limitation: May harm the utility due to large-sized groups. • Stratified Pick-up: Take as input the anonymous groups and attempt to further partition each of these groups iteratively based solely on the distinctness of SA values.

Stratified Pick-Up t5 S1 t1 S2 t6 S3 t2 S4 S5 t7 t3 t4 t8

Experiment Setup • Adult Dataset (http://archive.ics.uci.edu/ml/) • 45,222 tuples • SA: Education. • Census Dataset (http://ipums.org) • 300K tuples • SA: Occupation

Effect of Amendment Toolset

Time Performance

Conclusion • We unveil algorithm-based disclosure is much more significant than ever studied. • We rigidly define Algorithm-Safe data Publishing (ASP) model. • We propose a screening toolfor algorithm-based disclosure by two necessary conditions. • We explore amendments on problematic methods (if “diagnosed” of algorithm-based disclosure).

References [WFW+07] Wong, R. C. and Fu, A. W. and Wang, K. and Pei, J. Minimality Attack in Privacy-Preserving Data Publishing. [LLV07] Li, N. and Li, T. and Venkatasubramanian, S. t-Closeness: Privacy Beyond k-anonymity and ℓ-diversity [MGK+07] Machanavajjhala, A. and Gehrke, J. and Kifer, D. and Venkitasubramaniam, M. ℓ-diversity: Privacy Beyond k-anonymity. [ZJB07] Zhang, L. and Jajodia, S. and Brodsky, A. Information Disclosure under Realistic Assumptions: Privacy versus Optimality. [GKKM07] Ghinita, G. and Karras, P. and Kalnis, P. and Mamoulis, N. Fast Data Anonymization with Low Information Loss. [LDR06] LeFevre, K. and DeWitt, D. J. and Ramakrishnan, R. Mondrian Multidimensional k-anonymity [LDR05] LeFevre, K. and DeWitt, D. J. and Ramakrishnan, R. Incognito: efficient full-domain k-anonymity [XT06] Xiao, X. and Tao, Y. Anatomy: Simple and Effective Privacy Preservation.

Thank You

Algorithm Safe Privacy-Preserving Data Publishing

Algorithm Safe Privacy-Preserving Data Publishing

Presentation Transcript

Privacy Preserving Market Basket Data Analysis

Privacy-Preserving Data Mining

Privacy-Preserving Data Publishing

Privacy Preserving Data Mining

Privacy-Preserving Data Mashup

Randomization in Privacy Preserving Data Mining

Privacy Preserving Data Dissemination

data privacy-preserving

Privacy Preserving Serial Data Publishing By Role Composition

Privacy Preserving Data Mining

Privacy-Preserving Distributed Data Mining

Privacy-Preserving Multi-Domain Data Aggregation

Data Transformation for Privacy-Preserving Data Mining

Data Transformation for Privacy-Preserving Data Mining

Privacy-Preserving Databases and Data Mining

Privacy-Preserving Data Sharing

A Technological Survey on Privacy Preserving Data Publishing

Inference Problem Privacy Preserving Data Mining

Privacy-Preserving Data Mining

Privacy Preserving Data Mining

Data Transformation for Privacy-Preserving Data Mining

Privacy Preserving Data Mining