1 / 18

Balancing Privacy and Utility: Comparing Randomization and K-Degree Anonymization in Social Network Publishing

This study compares randomization and K-degree anonymization schemes for privacy-preserving social network publishing, evaluating their utility preservation and resistance to identity disclosure and link privacy attacks. Real-world network data is analyzed to quantify risks and benefits of each scheme, highlighting the trade-offs involved in protecting privacy while maintaining data utility.

hughclark
Download Presentation

Balancing Privacy and Utility: Comparing Randomization and K-Degree Anonymization in Social Network Publishing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparisons of Randomization and K-degree Anonymization Schemes for Privacy Preserving Social Network Publishing Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France

  2. Motivation • Privacy Preserving Social Network Publishing • node-anonymization • cannot guarantee identity/link privacy due to subgraph queries. • Backstrom et al. WWW07, Hay et al. UMass TR07 • edge randomization • Random Add/Del, Random Switch • K-anonymity generalization • Hay et al. VLDB08, K-degree Liu&Terzi SIGMOD08, Zhou&Pei ICDE08 • Utility preserving randomization • Spectral feature preserving Ying&Wu SDM08 • Real space feature preserving based on Markov Chain Ying&Wu SDM09, Hanhijarvi et al. SDM09

  3. Motivation • Attacks based on Background Knowledge • Attributes of vertices • Vertex degrees • Specific link relationships between target individuals • Neighborhoods of target individuals • Embedded subgraphs • Graph metric

  4. Focus • We quantify identity disclosure and link disclosure under vertex degrees attacks for Rand Add/Del. • Identity disclosure is measured as the prob. of correctly linking a target individual to an anonymized node given the degree of the target individual. • Link disclosure as the prob. of existence of a sensitive link between two individuals given their known degrees. Details skipped • We compare Rand Add/Del with K-degree generalization in terms of utility preservation (under the same privacy disclosure threshold, i.e., 1/K)

  5. Political books network Network of US political books (105 nodes, 441 edges) Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative". http://www-personal.umich.edu/˜mejn/netdata/

  6. Degree variation due to randomization

  7. Re-identification risks • Applying Bayesian Theorem The attacker does not know the original degree distribution.

  8. Estimate original degree sequence Original degree sequence After randomization Estimated Add & delete 10% edges

  9. Node re-identification risks • Nodes’ prior and posterior risks Given an individual α with degree dα and a randomized graph • Prior risk: • Posterior risks

  10. Re-identification risks • Re-identification risks reduces as k increases; • Add/Del strategy can efficiently reduce the risk.

  11. Protection vs. randomization k • Node’s absolute and relative protection measures • Absolute measure • Relative measure

  12. Comparison • K-degree generalization(Liu&Terzi SIGMOD08) • to construct a K-degree anonymous graph where every node has the same degree with at least K-1 other nodes. • Random Add/Del • Determine perturbation magnitude k to satisfy identity disclosure < 1/K, and then perturb graph using k.

  13. Utility features • Largest eigenvalue of Adjacency matrix: λ1 • Second smallest eigenvalue of Laplacian matrix: μ2 • Harmonic mean of shortest distance: • Modularity (community structure) • Transitivity(cluster coefficient) • Subgraph centrality

  14. Observation • Both Rand Add/Del and K-degree generalization decrease structural properties. • K-degree generally better preserves structural features • K-degree chooses a subset of nodes ( which violate K-degree anonymity) for edge modification while Rand Add/Del treats all nodes/edges equally for randomization • We can improve Rand Add/Del by dividing the graph into blocks and apply randomization on each block. (next slide) • We expect Rand Add/Del is more robust to other attacks. (ongoing work) • We expect reconstruction methods can be designed on the purely randomized graph to recover features accurately. (ongoing work)

  15. Block Add/Del

  16. Conclusion • Quantify how well Rand Add/Del can protect node identity and link privacy under the vertex degree background knowledge attack • Compare with K-degree generalization scheme in terms of utility preservation Future Work • Other background knowledge attacks • Other randomization schemes • Reconstruction methods on the randomized graph

  17. Thank You! Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204.

More Related