180 likes | 198 Views
This study compares randomization and K-degree anonymization schemes for privacy-preserving social network publishing, evaluating their utility preservation and resistance to identity disclosure and link privacy attacks. Real-world network data is analyzed to quantify risks and benefits of each scheme, highlighting the trade-offs involved in protecting privacy while maintaining data utility.
E N D
Comparisons of Randomization and K-degree Anonymization Schemes for Privacy Preserving Social Network Publishing Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France
Motivation • Privacy Preserving Social Network Publishing • node-anonymization • cannot guarantee identity/link privacy due to subgraph queries. • Backstrom et al. WWW07, Hay et al. UMass TR07 • edge randomization • Random Add/Del, Random Switch • K-anonymity generalization • Hay et al. VLDB08, K-degree Liu&Terzi SIGMOD08, Zhou&Pei ICDE08 • Utility preserving randomization • Spectral feature preserving Ying&Wu SDM08 • Real space feature preserving based on Markov Chain Ying&Wu SDM09, Hanhijarvi et al. SDM09
Motivation • Attacks based on Background Knowledge • Attributes of vertices • Vertex degrees • Specific link relationships between target individuals • Neighborhoods of target individuals • Embedded subgraphs • Graph metric
Focus • We quantify identity disclosure and link disclosure under vertex degrees attacks for Rand Add/Del. • Identity disclosure is measured as the prob. of correctly linking a target individual to an anonymized node given the degree of the target individual. • Link disclosure as the prob. of existence of a sensitive link between two individuals given their known degrees. Details skipped • We compare Rand Add/Del with K-degree generalization in terms of utility preservation (under the same privacy disclosure threshold, i.e., 1/K)
Political books network Network of US political books (105 nodes, 441 edges) Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative". http://www-personal.umich.edu/˜mejn/netdata/
Re-identification risks • Applying Bayesian Theorem The attacker does not know the original degree distribution.
Estimate original degree sequence Original degree sequence After randomization Estimated Add & delete 10% edges
Node re-identification risks • Nodes’ prior and posterior risks Given an individual α with degree dα and a randomized graph • Prior risk: • Posterior risks
Re-identification risks • Re-identification risks reduces as k increases; • Add/Del strategy can efficiently reduce the risk.
Protection vs. randomization k • Node’s absolute and relative protection measures • Absolute measure • Relative measure
Comparison • K-degree generalization(Liu&Terzi SIGMOD08) • to construct a K-degree anonymous graph where every node has the same degree with at least K-1 other nodes. • Random Add/Del • Determine perturbation magnitude k to satisfy identity disclosure < 1/K, and then perturb graph using k.
Utility features • Largest eigenvalue of Adjacency matrix: λ1 • Second smallest eigenvalue of Laplacian matrix: μ2 • Harmonic mean of shortest distance: • Modularity (community structure) • Transitivity(cluster coefficient) • Subgraph centrality
Observation • Both Rand Add/Del and K-degree generalization decrease structural properties. • K-degree generally better preserves structural features • K-degree chooses a subset of nodes ( which violate K-degree anonymity) for edge modification while Rand Add/Del treats all nodes/edges equally for randomization • We can improve Rand Add/Del by dividing the graph into blocks and apply randomization on each block. (next slide) • We expect Rand Add/Del is more robust to other attacks. (ongoing work) • We expect reconstruction methods can be designed on the purely randomized graph to recover features accurately. (ongoing work)
Conclusion • Quantify how well Rand Add/Del can protect node identity and link privacy under the vertex degree background knowledge attack • Compare with K-degree generalization scheme in terms of utility preservation Future Work • Other background knowledge attacks • Other randomization schemes • Reconstruction methods on the randomized graph
Thank You! Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204.