1.04k likes | 1.23k Views
Privacy and Spectral Analysis on Social Network Randomization. Xiaowei Ying, Leting Wu, Xintao Wu University of North Carolina at Charlotte. Framework. Background & Motivation Privacy in Randomized Graph Link privacy (3 method to quantify link privacy) Node privacy
E N D
Privacy and Spectral Analysis on Social Network Randomization Xiaowei Ying, Leting Wu, Xintao Wu University of North Carolina at Charlotte
Framework • Background & Motivation • Privacy in Randomized Graph • Link privacy (3 method to quantify link privacy) • Node privacy • Feature Preserving Randomization • Spectrum preserving randomization • General feature preserving randomization (Markov chain based) • Attacks to feature preserving randomization • Reconstruction from Randomized Graphs • Spectrum Based Fraud Detection • A spectral framework to quantify non-randomness of social networks • Spectrum based fraud detection • Future Work
Background & Motivation Social Network • Network of US political books • (105 nodes, 441 edges) • Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative". • Friendship in Karate club [Zachary, 77] • Biological association network of dolphins [Lusseau et al., 03] • Collaboration network of scientists [Newman, 06]
Background & Motivation Publish/outsource data for mining/analysis • Data Owner • The original graph data • Public/ Third party/ Research Inst. • release • Data miner: discover patterns/features of the data (utility) • -- find central nodes, community partition, link prediction • Attacker: breach sensitive information the data (privacy) • -- identity of nodes (and sensitive attributes), sensitive relation between two individuals
Background & Motivation Privacy issues in publishing social network data: Anonymization is not enough for protecting the privacy. Active/passive attacks[1], subgraph attacks [2]. [1] L. Backstrom, et. al., Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. WWW07 [2] M. Hay et. al. Resisting Structural Reidentification in Anonymized Social Networks, VLDB08
Background & Motivation • Privacy Preserving Social Network Publishing • Node-anonymization • cannot guarantee identity/link privacy due to subgraph queries. • K-anonymity generalization • The released graph has at least k nodes with the same degree/subgraph/neighorhood • [Liu&Terzi SIGMOD08, Zhou&Pei ICDE08, Chen VLDB09] • Graph (edge) randomization • Random Add/Del & Random Switch • Utility preserving randomization • Super graph generalization • Generate nodes into supper nodes, and edges into supper edges
Background & Motivation Graph Randomization/Perturbation: • Random Add/Del edges (no. of edges unchanged) • Random Switch edges (nodes’ degree unchanged)
Background & Motivation Graph Randomization/Perturbation: • Data privacy: How graph randomization prevents privacy disclosure? • Data utility: How will the graph structure change due to randomization? How to preserve graph structural features better?
Background & Motivation Numerous topological measures of networks • Harmonic mean of shortest distance • Transitivity(cluster coefficient) • Subgraph centrality • Modularity (community structure); • And many others
Background & Motivation Spectral measures – adjacency matrix • Adjacency Matrix A (symmetric) • Adjacency Spectrum
Background & Motivation • Laplacian Matrix and Spectrum: • Normal Matrix and Spectrum
Background & Motivation Many topological features are related to spectral measures: • No. of triangles: • Subgraph centrality: • Graph diameter: • k disconnected parts in the graph ⇔ k 0’s in the Laplacian spectrum.
Background & Motivation Two important eigenvalues: and • The maximum degree, chromatic number, clique number etc. are related to ; • Epidemic threshold for virus propagates in the network is related to [Wang et al., KDD03]; • indicates the community structure of the graph: clear community structure ⇔ ≈ 0.
Basic Facts of Graph Spectrum • The Laplacian eigenvalues Graph from: A. Capocci,et. al., Detecting communities in large networks
Basic Facts of Graph Spectrum • The Laplacian eigenvectors
Framework • Background & Motivation • Privacy in Randomized Graph • Link privacy (3 method to quantify link privacy) • Node privacy • Feature Preserving Randomization • Spectrum preserving randomization • General feature preserving randomization (Markov chain based) • Attacks to feature preserving randomization • Spectrum Based Fraud Detection • A spectral framework to quantify non-randomness of social networks • Spectrum based fraud detection • Future Work
Link Privacy: Prior & Posterior Beliefs Quantify attacker’s belief (assume that node identities are known) • Prior probabilities: • Posterior probability for node pair (i, j): • Serious jeopardize the privacy when
Link Privacy: Prior & Posterior Beliefs Method I [Ying, Wu, SDM08] • Add & Del k links • Switch k times
Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • A common phenomenon: in real-world graphs similar nodes tend to connect to each other
Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • Even after moderate randomization, the phenomenon still exists:
Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09]
Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • Add/Del: • True links are deleted w.p. • False links are added w.p. • With Bayes’ theorem
Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • Evaluation (add/del 50% true links)
Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • The total sum of prior and posterior probabilities is the same: prior prob. posterior prob. I posterior prob. II
Link Privacy: Prior & Posterior Beliefs Method III [Ying, Wu, SDM09] • Intuition: degree sequence specifies a graph space, and the true graph is just one member of the space. • Example: switch graphwith degree sequence {3,2,2,2,3} • Is node 1 and 5 connected?
Link Privacy: Prior & Posterior Beliefs Method III [Ying, Wu, SDM09] • Graph space = {G: with a given degree sequence} • Impractical to enumerate all members in the space • Sample the graph space through Markov chain:
Link Privacy: Prior & Posterior Beliefs Method III [Ying, Wu, SDM09] • Evaluation Polbooks (r=8%) Enron (r=8%)
Node Identity Privacy • Identity Privacy: • Re-identify nodes in the anonymous graphs • based on some background information (e.g. degree) • Randomization reduces attackers’ beliefs Polbooks: degree distribution After randomization
Node Identity Privacy • Nodes’ prior and posterior risks Given an individual α with degree dα and a randomized graph • Prior risk: • Posterior risks
Node Identity Privacy Ongoing work: • Compare randomization and k-anonymity approach: -- to achieve the same privacy protection level, which approach can achieve better utility? • Combine identity privacy and node privacy. • Node identity privacy issue under different background information (e.g., sub-graph, neighborhood). • K-degree generalization [Liu et. al.]
Framework • Background & Motivation • Privacy in Randomized Graph • Link privacy (3 method to quantify link privacy) • Node privacy • Feature Preserving Randomization • Spectrum preserving randomization • General feature preserving randomization (Markov chain based) • Attacks to feature preserving randomization • Reconstruction from Randomized Graphs • Spectrum Based Fraud Detection • A spectral framework to quantify non-randomness of social networks • Spectrum based fraud detection • Future Work
Feature Preserving Randomization Topological and spectral features change a lot along the perturbation. Can we better preserve the network structure? (Networks of US political books, 105 nodes and 441 edges)
Features in Social Network Data Two important eigenvalues: and • The maximum degree, chromatic number, clique number etc. are related to ; • Epidemic threshold for virus propagates in the network is related to [Wang et al., KDD03]; • indicates the community structure of the graph: clear community structure ⇔ ≈ 0.
Spectrum Preserving Randomization Spectrum preserving approach [Ying, Wu, SDM08] Intuition: since spectrum is related to many graph topological features, can we preserve more structural features by controlling the movement of eigenvalues?
Spectrum Preserving Randomization Spectral Switch (apply to adjacency matrix): To increase the eigenvalue: To decrease the eigenvalue:
Spectrum Preserving Randomization Spectral Switch (apply to Laplacian matrix): To decrease the eigenvalue: To increase the eigenvalue:
Spectrum Preserving Randomization Evaluation: (Networks of US political books, 105 nodes and 441 edges)
Markov Chain Based Feature Preserving Randomization Markov chain generation [Ying, Wu, SDM09] Data owner puts feature range constrains in switching • Feature range constrains: • The data owner publish the feature range constraint.
Markov Chain Based Feature Preserving Randomization Markov chain generation [Ying, Wu, SDM09] • Markov chain with feature range constraint (uniformity for accessible graphs)
Markov Chain Based Feature Preserving Randomization Markov chain generation [Ying, Wu, SDM09] • Problem: accessibility is not guaranteed • We propose the relaxed algorithm with feature range constraint (accessibility, approximate uniformity) • The relaxed algorithm also has applications in testing the significance data mining results
Attacks to Feature Preserving Randomization Data owner puts feature range constrains in switching • Feature range constrains: • Can attackers utilize the feature constrains to breach link privacy?
Attacks in Feature Preserving Randomization Markov chain approach [Ying, Wu, SDM09] • Markov chain with feature range constraint Graph space = {G: with a given deg. seq. & S(G) in R} • Starting with the randomized data, repeat the switch procedure many times and get one sample graph • Generate N graphs
Attacks in Utility Preserving Randomization Markov chain approach[Ying, Wu, SDM09] • Evaluation Polbooks (r=8%) Enron (r=8%) Future work: what cause the difference? What features will (not) release privacy?
Reconstruction from Randomized Graphs • Motivation • Low Rank Approximation on Graph Data • Reconstruction from Randomized Graph • Privacy Issue • SDM10 paper
Motivation • We focus on whether we can reconstruct a grpah from s.t. Our Focus
Revisit of LRA in Numerical Data • Spectral Filter derive estimation of U from perturbed data • Calculate covariance matrix which is symmetric and positive definite • Apply spectral decomposition to • Derive the eigenvalues information from the covariance matrix of noise V and choose a proper number of dimensions, r • Let and , obtain the estimated data set using