240 likes | 255 Views
This paper explores the challenges of publishing social networks while preserving privacy. It presents graph generation techniques that preserve data utility while satisfying feature constraints. The paper discusses both uniform switch procedures and graph generators with feature range and distribution constraints. The privacy risks introduced by these constraints are examined, and potential attacks on the released graph are discussed. The paper concludes with a summary of the graph generation techniques and their implications for privacy.
E N D
Graph Generation with Prescribed Feature Constraints Xiaowei YingXintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Motivation Publishing social networks: Privacy VS. Utility • Privacy issue: anonymization is not enough Active/passive attacks[Backstrom, et. al., WWW07] Subgraph attacks [M. Hay et. al., VLDB08] • K-anonymity in social networks [B. Zhou, et. al. ICDE08] [K. Liu et. al., SIGMOD08] • Randomization approach Local topology is changed – reduce re-identification risk Links are randomized – link privacy is pretected
Motivation Publishing social networks: Privacy VS. Utility • Randomization Approach -- Pure randomization can’t preserve many topological features. [Ying SDM08] -- the largest eigenvalue of adjacency matrix -- the second smallest eigenvalue of Laplacian matrix -- harmonic mean of shortest distance -- transitivity How to generate graphs preserving data utility?
Motivation • Generate graphs for testing data mining results -- Generate a set of graph samples s.t. a feature of the samples satisfies a specified distribution.
Switch and Uniform Graph Generator Uniform switch procedure [Taylor, 1981] -- Preserves the degree sequence/distribution • Accessibility: can access all the graph with the given degree sequence • Uniformity: all such graphs have the same probability to be generated • Application: empirically learning the property of graph features given degree seq.
Graph Generator with FRC How to generate a graph: • with the given degree sequence • with the feature range constraint (FRC): uniformity for accessible graphs
Graph Generator and Privacy Issues Privacy risks introduced by FRC Attackers know: • The released graph preserve the true degree sequence • The true graph has its S feature within range R What attackers can do? With the released graph, attackers can explore the graph space
Graph Generator and Privacy Issues • Graph space : {G: with the given degree seq. & } • Uniformly sample the graph space: Attacker’s confidence on link (i,j)
FRC Can Jeopardize Privacy --A real network example Network of US political books (105 nodes, 441 edges) Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative". http://www-personal.umich.edu/˜mejn/netdata/
FRC Can Jeopardize Privacy --A real network example Polbook network 105 nodes, 441 edges The attacker simply takes t node pairs with the highest probabilities as candidate links Top candidates can seriously jeopardize privacy!! Some features jeopardize privacy, and some others not
FRC Can Jeopardize Privacy -- More real network examples Polbook network 105 nodes, 441 edges Enron email network 151 nodes, 869 edges
FRC Can Jeopardize Privacy -- A theoretical result
FRC Can Jeopardize Privacy -- A theoretical result Conclusion: If the FRC specifies a sub-space close to the true graph, privacy is seriously breached
Graph Generator with FDC Feature Distribution Constraint (FDC) Natural distribution f(x) Uniform generator: • gives the natural distribution of feature S, highly skewed in the range • How to generate graphs s.t. • with given degree seq. • features value has the target distribution g(x) Target distribution g(x)
Graph Generator with FDC • Based on Metropolis-Hastings method • Accept ratio depends on target distr. g(x) & natural distr. f(x)
Graph Generator with FDC Evaluation Natural distribution: Target distribution:
Summary • Graph generator with feature range constraint • Attackers can sample the graph space near the true graph and breach the privacy. • Graph generator with feature distribution constraint • Generate a set of graphs samples for statistical testing
Thank You! Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204.
Graph Generator and Privacy Issues • Example: graphs with degree sequence {3,2,2,2,3}. • Is node 1 and 5 connected? Published graph True graph
Graph Generator with FDC Problem of generator with FRC: Uniform generator: • gives the natural distribution of feature S • highly skewed in the range • generates biased feature value Real-world graph Range