1 / 31

De-anonymizing Social Networks

De-anonymizing Social Networks. Presenter: Lijie Zhang Advisor: Weining Zhang. Outlines. Motivation Attack Model De-anonymization Algorithm Experiments Conclusions. Motivation. Social network (SN) owner publishes graph data for sharing

nan
Download Presentation

De-anonymizing Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. De-anonymizing Social Networks Presenter: Lijie Zhang Advisor: Weining Zhang

  2. Outlines • Motivation • Attack Model • De-anonymization Algorithm • Experiments • Conclusions

  3. Motivation • Social network (SN) owner publishes graph data for sharing • Academic and government data-mining: phone call networks • Advertising: • Third-party applications: 550,000 Facebook applications • Private information on SNs: • Node attributes: node degree in a sexual network • Edge presence: a single call, romantic relationship

  4. Motivation • SN owner publishes anonymized graph: • Nodes have no identifying attributes • Propose a model to identify nodes from the anonymized graph: • Re-identification: learn the entity to which the node belongs to. • Entity: an account, a real person, a group, an organization

  5. Outlines • Motivation • Attack Model • De-anonymization Algorithm • Experiments • Conclusions

  6. Model – Social Network • Social Network S: • A directed graph G=(V,E) • A set of node attributes X: name, telephone number • A set of edge attributes Y: type of relationship • Treat attributes values from a discrete domain

  7. Model – Data Release • A sanitized subset of nodes and edges in S • Computation: • Vsan: subset of V • Xsan: subset of X including sensitive attributes • Ysan: subset of Y including sensitive attributes • Published attributes by themselves are insufficient for re-identification • Compute induced subgraph on Vsan • Remove some edges and add faked edges

  8. Model – Attacker • Purpose: extract sensitive information about specific individuals from anonymized SN graphs • Attacker’s knowledge • Aggregate auxiliary information • Individual auxiliary information

  9. Aggregate auxiliary information • Large-scale information from other data sources and social networks whose membership overlaps with the target network Ssan • Gaux={Vaux, Eaux} • AuxXand AuxY: probability distributions of each node attribute in Vaux and edge attribute in Eaux, respectively (prior knowledge).

  10. Individual auxiliary information • Identifiable details about a small number of individuals from the target network Ssan and possibly relationships between them

  11. Model – Breaching Privacy • Extract sensitive information about specific individuals from Ssan • Re-identify nodes from target SN Ssan • Re-identification: find a mapping μbetween a node in Vaux and a node in Vsan • : ground truth mapping • Succeeds if

  12. Model – Breaching Privacy • Re-identification algorithm: • Input: Ssan and Saux • Output • is the probability that vaux maps to vsan • Mapping adversary:

  13. Model – Breaching Privacy • Privacy breach: privacy of vsan is breached w.r.t adversary Adv and privacy parameter , if

  14. Model – Measuring Success of an Attack • Let . The success rate of a de-anonymization algorithm outputting a probabilistic mapping , w.r.t a centrality measure , is the probability that μsampled from maps a node v to if v is selected according to

  15. Outlines • Motivation • Attack Model • De-anonymization Algorithm • Experiments • Conclusions

  16. De-anonymization Algorithm • Seed identification: apply individual auxiliary information • Propagation: apply aggregate auxiliary information

  17. Algorithm - Seed Identification • Input: • The target graph • A clique of k nodes which are present both in the auxiliary and the target graphs. • The degree values of k nodes • pairs of common-neighbor counts • Error parameter ε • Output : k-clique with matching ( ) node degrees and common-neighbor counts.

  18. Algorithm - Propagation • Inputs: G1, G2, • Output: μ • Iteratively find new mappings using the topological structure of the network and the feedback from previously constructed mappings.

  19. Algorithm - Propagation function propagationStep(lgraph, rgraph, mapping) for lnode in lgraph.nodes: scores[lnode] = matchScores(lgraph, rgraph, mapping, lnode) if eccentricity(scores[lnode]) < theta: continue rnode = (pick node from rgraph.nodes where scores[lnode][node] = max(scores[lnode])) scores[rnode] = matchScores(rgraph, lgraph, invert(mapping), rnode) if eccentricity(scores[rnode]) < theta: continue reverse_match = (pick node from lgraph.nodes where scores[rnode][node] = max(scores[rnode])) if reverse_match != lnode: continue mapping[lnode] = rnode

  20. Algorithm - Propagation • Eccentricity: measure how much a node in a graph “stands out” from the rest nodes. • Rejects the match if eccentricity of the set of mapping scores is below a threshold,

  21. Algorithm - Propagation • Complexity: O((|E1|+|E2|)d1d2) • d1 : a bound on the degree of the nodes in V1

  22. Outlines • Motivation • Attack Model • De-anonymization Algorithm • Experiments • Conclusions

  23. Experiments – Data Sets • Twitter, Flickr, LiveJournal:

  24. Experiments – Seed Identification • Evaluate the feasibility of seed identification by measuring how much auxiliary information is needed to identify a unique node in the target graph. • LiveJournal graph: auxiliary and target • Construct 4-cliques, and treat a 4-clique in the target graph as a match as long as each degree and common-neighbor count matches within a factor of

  25. Experiments – Seed Identification

  26. Experiments – Propagation • Evaluate the robustness against perturbation and seed selection • Pairs of subgraphs (V1,V2), over 100,000 nodes each of a real-world SN • One for auxiliary SN, the other as the target SN • Perturbation strategy: two subgraphs has nodes overlapped 25% and edges overlapped 50%

  27. Evaluate the robustness against perturbation and seed selection

  28. Experiments – Propagation • Mapping between two real-world social networks: Flickr and Twitter • Finding ground truth : • Exact matches in either the username, or name field • 27,000 mappings • Human inspect ground truth error that is under 5%.

  29. Mapping between two real-world social networks • Seeds: 150 pairs of nodes selected from • Results: • 30.8% of the mappings were re-identified correctly, 12.1% were identified incorrectly, and 57% were not identified. • 41% of the incorrectly identified mappings (5% overall) were mapped to nodes which are at a distance 1 from the true mapping. • 55% of the incorrectly identified mappings (6.7% overall) were mapped to nodes where the same geographic location was reported. • The above two categories overlap; of all the incorrect mappings, only 27% (or 3.3% overall) fall into neither category and are completely erroneous.

  30. Conclusions • Anonymity is not sufficient for privacy when dealing with social networks. • Demonstrate feasibility of successful re-identification based solely on the network topology and assuming that the target graph is completely anonymized.

  31. Reference • [1]  Arvind Narayanan and Vitaly Shmatikov, “De-anonymizing Social Networks”, IEEE Security & Privacy '09.

More Related