Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian

Outline • What is PPDP • Existing Privacy Principles • Proximity Attack • (ε, m)-anonymity • Determine εand m • Algorithm • Experiments and Conclusion

Privacy Preservation Data Publishing • A true story in Massachusetts, 1997 • GIC • 20 dollars • Governor Weld

PPDP • Privacy • Sensitive information of individuals should be protected in the published data • More anonymized data • Utility • The published data should be useful • More accurate data

PPDP • Anonymization Technique • Generalization • Specific value -> General value • Maintain the semantic meaning • 78256 -> 7825*, UTSA -> University, 28 -> [20, 30] • Perturbation • One value -> another random value • Huge information loss -> poor utility

PPDP • Example of Generalization

Some Existing Privacy Principles • Generalization • SA – Categorical • k-anonymity • l-diversity, (α, k)-anonymity, m-invariance, … • (c, k)-safety, Skyline-privacy • … • SA – Numerical • (k, e)-anonymity, Variance Control • t-closeness • δ-presence • …

Next… • What is PPDP • Existing Privacy Principles • Proximity Attack • (ε, m)-anonymity • Determine εand m • Algorithm • Experiments and Conclusion

Proximity Attack

(ε, m)-anonymity • I(t) • private neighborhood of tuple t • I(t) = [t.SA − ε, t.SA + ε] • I(t) = [t.SA·(1 − ε), t.SA·(1 + ε)] • P(t) • the risk of proximity breach of tuple t • P(t) = x / |G|

(ε, m)-anonymity • ε = 20 • I(t1) = [980, 1020] • x = 3, |G| = 4 • P(t1) = 3/4

(ε, m)-anonymity • Principle • Given a real value ε and an integer m ≥ 1, a generalized table T∗ fulfills absolute (relative) (ε,m)-anonymity, if P(t) ≤ 1/m for every tuple t ∈ T. • Larger ε and m mean stricter privacy requirement

(ε, m)-anonymity • What is the Meaning of m? • |G| ≥ m • The best situation is for any two tuples ti and tj in G, and • Similar to l-diversity when the equivalence class has l tuples with distinct SA values.

(ε, m)-anonymity • How to make tj.SA does not fall in I(ti)? • All tuples in G are sorted in ascending order of their SA values • | j – i | ≧ max{ |left(tj,G)|, |right(ti,G)| }

(ε, m)-anonymity • Let maxsize(G) = max∀t∈G { max{ |left(t,G)|, |right(t,G)| } } • | j – i | ≧ maxsize(G)

(ε, m)-anonymity • Partitioning • Ascending order of tuples in G according to SA values • Hash the ith tuple into the jth bucket using function j = (i mod maxsize(G))+1 • Thus, all tuples (SA values) in the same bucket do not fall into the neighborhood of each other.

(ε, m)-anonymity • (6, 2)-anonymity • Privacy is breached • P(t3)= ¾ >1/m =1/2 • Need partitioning • An ascending order is ready according to SA values • g = maxsize(G) = 2 • j = (i mod 2)+1 • New P(t3)= 1/2

Determine εand m • Given εand m • Check if an equivalence class G satisfies (ε, m)-anonymity • Theorem: G has at least one (ε, m)-anonymous generalization, iff • Scan the sorted tuples in G to find maxsize(G) • Predict whether G can be partitioned or not

Algorithm • Step 1: Splitting • Mondrain, ICDE 2006. • Splitting is only based on QI-attributes • Iteratively find median value of frequency sets on one selected QI-dimension to cut G into G1 and G2, and make sure G1 and G2 are legal to be partitioned.

Algorithm • Splitting ((6, 2)-anonymity) 10 40 20 25 50 30

Algorithm • Step 2: Partitioning • After step 1 stops • Check all G produced by splitting • Release directly if G satisfies (ε, m)-anonymity • Otherwise, Partitioning, and then release new buckets

Algorithm • Partitioning ((6, 2)-anonymity) 10 40 20 25 50 30

Next… • What is PPDP • Evolution of Privacy Preservation • Proximity Attack • (ε, m)-anonymity • determine εand m • algorithm • Experiments and Conclusion

Experiments • Real Database SAL http://ipums.org • Attributes are Age, Birthplace, Occupation and Income with domains [16,93], [1,710], [1,983], and [1k, 100k], respectively. • 500K tuples • Compare to a perturbation method (OLAP, SIGMOD 2005 )

Experiments - Utility • Use count query with workload = 1000

Experiments - Utility

Experiments - Efficiency

Conclusion • Discuss most of existing privacy principles in PPDP • Identify the proximity attack and propose (ε, m)-anonymity to prevent this attack • Verify that the method is effective and efficient experimentally

Any Question?

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Presentation Transcript

Privacy-Preserving Data Publishing

Privacy of Data

Algorithm Safe Privacy-Preserving Data Publishing

Towards Privacy-Sensitive Participatory Sensing

Data Preservation

On Privacy Preservation against Adversarial Data Mining

Data preservation in ALICE

Protecting the Privacy and Security of Sensitive customer Data in the Cloud

Template-Based Privacy Preservation in Classification Problems

Privacy Preserving Serial Data Publishing By Role Composition

Versatile Publishing For Privacy Preservation

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Personalized Privacy Preservation

Privacy Preservation for Data Streams

Preservation of Geospatial Data

Privacy-Preserving Publication of User Locations in the Proximity of Sensitive Sites

Privacy-Aware Publishing of Netflix Data

AppIntent: Analyzing Sensitive Data Transmission in Android for Privacy Leakage Detection

A Technological Survey on Privacy Preserving Data Publishing

Publishing Set-Valued Data via Differential Privacy

Privacy-Preserving Publication of User Locations in the Proximity of Sensitive Sites

Data Preservation