250 likes | 399 Views
Reconstruction from Randomized Graph via Low Rank Approximation. Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte. Outline. Background & Motivation Low Rank Approximation on Graph Data Reconstruction from Randomized Graph Evaluation
E N D
Reconstruction from Randomized Graph via Low Rank Approximation Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte
Outline • Background & Motivation • Low Rank Approximation on Graph Data • Reconstruction from Randomized Graph • Evaluation • Privacy Issue
Background • In the process of publishing/outsourcing network data for mining/analysis, pure anonymization is not enough for protecting the privacy due to topology based attacks(Active/passive attacks, subgraph attacks). • Graph Randomization/Perturbation: • Random Add/Del edges (no. of edges unchanged) • Random Switch edges (nodes’ degree unchanged) • Feature preserving randomization • Spectrum preserving randomization • Feature preserving via Markov-chain based graph generation • Clustering --- grouping subgraphs into supernodes
Motivation • We focus on whether we can reconstruct a graph from s.t. Our Focus
Adjacency Matrix & Its Eigen-Decomposition Matrix Representation of Network • Adjacency Matrix A (symmetric) • Eigen-decomposition: Questions: • What are their relations with graph topology?
Leading Eigenpairs vs. Graph Topology • What are the role of positive and negative eigen-pairs in graph topology? • Without loss of generality, we partition the node set into two groups and the adjacency matrix can be partitioned as where and represent the edges within the two groups and represents the edges between the groups 8
Leading Eigenpairs vs. Graph Topology Original r = 2 r = 1 9
Leading Eigenpairs vs. Graph Topology Original r = 1 r = 2 10
Leading Eigenpairs vs. Graph Topology Original r = 1 r = 4 r = 2 11
Low Rank Approximation on Graph Data • Low Rank Approximation: This provide a best r rank approximation to A • To keep the structure of adjacency matrix, discrete as following:
Reconstructed Features (Political Blogs, Rand Add/Del 40% of Edges) 14
Determine Number of Eigen-pairs • Question: • How to choose an optimal rank r for reconstruction? • Solution: • Choose as the indicator since it is closely related to the other features and there exists an explicit moment estimator • where m is the number of edges, k is the number of edges add/delete, 15
Effect of Noise (Political Blogs) • The method works well to a certain level of noise • Even with high level of noise, the reconstructed features are still closer to the original than the randomized ones 18
Reconstructed Features on 3 real network data • Reconstruction Quality • When , the reconstructed features are closer to the original ones than the randomized ones • All positive for the three data sets 19
Privacy Issue • Question 1: • Can this reconstruction be used by attackers? • Define the normalized Frobenius distance between A and as Political Books Enron Political Blogs Normalized F Norm Normalized F Norm Normalized F Norm 21
Privacy Issue • Question 2: Which type of graphs would have privacy breached? • For low rank graphs which have , the distance between the reconstructed graph and the original graph can be very small 22
Synthetic Low Rank Graphs • Here is a set of synthetic low rank graphs generated from Political Blogs and you can see that the reconstruction works on both the distance and features 23
Conclusion • We show the relationship between graph topological structure and eigen-pairs of the adjacency matrix • We proposea low rank approximation based reconstruction algorithm with a novel solution to determine the optimal rank • For most social networks, our algorithm do not incur further disclosure risks of individual privacy except for networks with low ranks or a small number of dominant eigenvalues 24
Thank You! Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation IIS-0546027 and CNS-0831204. 25