1 / 104

Privacy and Spectral Analysis on Social Network Randomization

Privacy and Spectral Analysis on Social Network Randomization. Xiaowei Ying, Leting Wu, Xintao Wu University of North Carolina at Charlotte. Framework. Background & Motivation Privacy in Randomized Graph Link privacy (3 method to quantify link privacy) Node privacy

merlin
Download Presentation

Privacy and Spectral Analysis on Social Network Randomization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy and Spectral Analysis on Social Network Randomization Xiaowei Ying, Leting Wu, Xintao Wu University of North Carolina at Charlotte

  2. Framework • Background & Motivation • Privacy in Randomized Graph • Link privacy (3 method to quantify link privacy) • Node privacy • Feature Preserving Randomization • Spectrum preserving randomization • General feature preserving randomization (Markov chain based) • Attacks to feature preserving randomization • Reconstruction from Randomized Graphs • Spectrum Based Fraud Detection • A spectral framework to quantify non-randomness of social networks • Spectrum based fraud detection • Future Work

  3. Background & Motivation

  4. Background & Motivation Social Network • Network of US political books • (105 nodes, 441 edges) • Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative". • Friendship in Karate club [Zachary, 77] • Biological association network of dolphins [Lusseau et al., 03] • Collaboration network of scientists [Newman, 06]

  5. Background & Motivation Publish/outsource data for mining/analysis • Data Owner • The original graph data • Public/ Third party/ Research Inst. • release • Data miner: discover patterns/features of the data (utility) • -- find central nodes, community partition, link prediction • Attacker: breach sensitive information the data (privacy) • -- identity of nodes (and sensitive attributes), sensitive relation between two individuals

  6. Background & Motivation Privacy issues in publishing social network data: Anonymization is not enough for protecting the privacy. Active/passive attacks[1], subgraph attacks [2]. [1] L. Backstrom, et. al., Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. WWW07 [2] M. Hay et. al. Resisting Structural Reidentification in Anonymized Social Networks, VLDB08

  7. Background & Motivation • Privacy Preserving Social Network Publishing • Node-anonymization • cannot guarantee identity/link privacy due to subgraph queries. • K-anonymity generalization • The released graph has at least k nodes with the same degree/subgraph/neighorhood • [Liu&Terzi SIGMOD08, Zhou&Pei ICDE08, Chen VLDB09] • Graph (edge) randomization • Random Add/Del & Random Switch • Utility preserving randomization • Super graph generalization • Generate nodes into supper nodes, and edges into supper edges

  8. Background & Motivation Graph Randomization/Perturbation: • Random Add/Del edges (no. of edges unchanged) • Random Switch edges (nodes’ degree unchanged)

  9. Background & Motivation Graph Randomization/Perturbation: • Data privacy: How graph randomization prevents privacy disclosure? • Data utility: How will the graph structure change due to randomization? How to preserve graph structural features better?

  10. Background & Motivation Numerous topological measures of networks • Harmonic mean of shortest distance • Transitivity(cluster coefficient) • Subgraph centrality • Modularity (community structure); • And many others

  11. Background & Motivation Spectral measures – adjacency matrix • Adjacency Matrix A (symmetric) • Adjacency Spectrum

  12. Background & Motivation • Laplacian Matrix and Spectrum: • Normal Matrix and Spectrum

  13. Background & Motivation Many topological features are related to spectral measures: • No. of triangles: • Subgraph centrality: • Graph diameter: • k disconnected parts in the graph ⇔ k 0’s in the Laplacian spectrum.

  14. Background & Motivation Two important eigenvalues: and • The maximum degree, chromatic number, clique number etc. are related to ; • Epidemic threshold for virus propagates in the network is related to [Wang et al., KDD03]; • indicates the community structure of the graph: clear community structure ⇔ ≈ 0.

  15. Basic Facts of Graph Spectrum • The Laplacian eigenvalues Graph from: A. Capocci,et. al., Detecting communities in large networks

  16. Basic Facts of Graph Spectrum • The Laplacian eigenvectors

  17. Privacy in Randomized Graph

  18. Framework • Background & Motivation • Privacy in Randomized Graph • Link privacy (3 method to quantify link privacy) • Node privacy • Feature Preserving Randomization • Spectrum preserving randomization • General feature preserving randomization (Markov chain based) • Attacks to feature preserving randomization • Spectrum Based Fraud Detection • A spectral framework to quantify non-randomness of social networks • Spectrum based fraud detection • Future Work

  19. Link Privacy: Prior & Posterior Beliefs Quantify attacker’s belief (assume that node identities are known) • Prior probabilities: • Posterior probability for node pair (i, j): • Serious jeopardize the privacy when

  20. Link Privacy: Prior & Posterior Beliefs Method I [Ying, Wu, SDM08] • Add & Del k links • Switch k times

  21. Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • A common phenomenon: in real-world graphs similar nodes tend to connect to each other

  22. Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • Even after moderate randomization, the phenomenon still exists:

  23. Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09]

  24. Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • Add/Del: • True links are deleted w.p. • False links are added w.p. • With Bayes’ theorem

  25. Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • Evaluation (add/del 50% true links)

  26. Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] • The total sum of prior and posterior probabilities is the same: prior prob. posterior prob. I posterior prob. II

  27. Link Privacy: Prior & Posterior Beliefs Method III [Ying, Wu, SDM09] • Intuition: degree sequence specifies a graph space, and the true graph is just one member of the space. • Example: switch graphwith degree sequence {3,2,2,2,3} • Is node 1 and 5 connected?

  28. Link Privacy: Prior & Posterior Beliefs Method III [Ying, Wu, SDM09] • Graph space = {G: with a given degree sequence} • Impractical to enumerate all members in the space • Sample the graph space through Markov chain:

  29. Link Privacy: Prior & Posterior Beliefs Method III [Ying, Wu, SDM09] • Evaluation Polbooks (r=8%) Enron (r=8%)

  30. Node Identity Privacy • Identity Privacy: • Re-identify nodes in the anonymous graphs • based on some background information (e.g. degree) • Randomization reduces attackers’ beliefs Polbooks: degree distribution After randomization

  31. Node Identity Privacy • Nodes’ prior and posterior risks Given an individual α with degree dα and a randomized graph • Prior risk: • Posterior risks

  32. Node Identity Privacy Ongoing work: • Compare randomization and k-anonymity approach: -- to achieve the same privacy protection level, which approach can achieve better utility? • Combine identity privacy and node privacy. • Node identity privacy issue under different background information (e.g., sub-graph, neighborhood). • K-degree generalization [Liu et. al.]

  33. Feature Preserving Randomization

  34. Framework • Background & Motivation • Privacy in Randomized Graph • Link privacy (3 method to quantify link privacy) • Node privacy • Feature Preserving Randomization • Spectrum preserving randomization • General feature preserving randomization (Markov chain based) • Attacks to feature preserving randomization • Reconstruction from Randomized Graphs • Spectrum Based Fraud Detection • A spectral framework to quantify non-randomness of social networks • Spectrum based fraud detection • Future Work

  35. Feature Preserving Randomization Topological and spectral features change a lot along the perturbation. Can we better preserve the network structure? (Networks of US political books, 105 nodes and 441 edges)

  36. Features in Social Network Data Two important eigenvalues: and • The maximum degree, chromatic number, clique number etc. are related to ; • Epidemic threshold for virus propagates in the network is related to [Wang et al., KDD03]; • indicates the community structure of the graph: clear community structure ⇔ ≈ 0.

  37. Spectrum Preserving Randomization Spectrum preserving approach [Ying, Wu, SDM08] Intuition: since spectrum is related to many graph topological features, can we preserve more structural features by controlling the movement of eigenvalues?

  38. Spectrum Preserving Randomization Spectral Switch (apply to adjacency matrix): To increase the eigenvalue: To decrease the eigenvalue:

  39. Spectrum Preserving Randomization Spectral Switch (apply to Laplacian matrix): To decrease the eigenvalue: To increase the eigenvalue:

  40. Spectrum Preserving Randomization Evaluation: (Networks of US political books, 105 nodes and 441 edges)

  41. Markov Chain Based Feature Preserving Randomization Markov chain generation [Ying, Wu, SDM09] Data owner puts feature range constrains in switching • Feature range constrains: • The data owner publish the feature range constraint.

  42. Markov Chain Based Feature Preserving Randomization Markov chain generation [Ying, Wu, SDM09] • Markov chain with feature range constraint (uniformity for accessible graphs)

  43. Markov Chain Based Feature Preserving Randomization Markov chain generation [Ying, Wu, SDM09] • Problem: accessibility is not guaranteed • We propose the relaxed algorithm with feature range constraint (accessibility, approximate uniformity) • The relaxed algorithm also has applications in testing the significance data mining results

  44. Attacks to Feature Preserving Randomization Data owner puts feature range constrains in switching • Feature range constrains: • Can attackers utilize the feature constrains to breach link privacy?

  45. Attacks in Feature Preserving Randomization Markov chain approach [Ying, Wu, SDM09] • Markov chain with feature range constraint Graph space = {G: with a given deg. seq. & S(G) in R} • Starting with the randomized data, repeat the switch procedure many times and get one sample graph • Generate N graphs

  46. Attacks in Utility Preserving Randomization Markov chain approach[Ying, Wu, SDM09] • Evaluation Polbooks (r=8%) Enron (r=8%) Future work: what cause the difference? What features will (not) release privacy?

  47. Reconstruction from Randomized Graphs • Motivation • Low Rank Approximation on Graph Data • Reconstruction from Randomized Graph • Privacy Issue • SDM10 paper

  48. Motivation • We focus on whether we can reconstruct a grpah from s.t. Our Focus

  49. Revisit of LRA in Numerical Data • Spectral Filter derive estimation of U from perturbed data • Calculate covariance matrix which is symmetric and positive definite • Apply spectral decomposition to • Derive the eigenvalues information from the covariance matrix of noise V and choose a proper number of dimensions, r • Let and , obtain the estimated data set using

More Related