320 likes | 388 Views
Introduction. Social networking services are a fast-growing business nowadays Facebook, Twitter, Google+, LiveJournal, YouTube, … When users participate in online social network activities, people’s privacy suffers potential serious threat Create personal portfolio, post current location, …
E N D
Introduction • Social networking services are a fast-growing business nowadays • Facebook, Twitter, Google+, LiveJournal, YouTube, … • When users participate in online social network activities, people’s privacy suffers potential serious threat • Create personal portfolio, post current location, … • Countermeasures • Naïve anonymization: removing “Personally Identifiable Information (PII)” • Edge modification • k-anonymity and its varients • Still vulnerable to powerful structure-based de-anonymization attacks • Narayanan-Shmatikov attack (IEEE S&P 2009) • Srivatsa-Hicks attack (ACM CCS 2012) • Others
Narayanan-Shmatikov attack (IEEE S&P 2009) • Anonymized data: Twitter (crawled in late 2007) • A microblogging service • 224K users, 8.5M edges • Auxiliary data: Flicker (crawled in late 2007/early 2008) • A photo-sharing service • 3.3M users, 53M edges • Result: 30.8% of the users are successfully de-anonymized Twitter Flicker Heuristics Eccentricity Edge directionality Node degree Revisiting nodes Reverse match User mapping
Srivatsa-Hicks (ACM CCS 2012) • Anonymized data • Mobility traces: St Andrews, Smallblue, and Infocom 2006 • Auxiliary data • Social networks: Facebook, and DBLP • De-anonymize mobility traces using corresponding social networks • Over 80% users can be successfully de-anonymized
Other structural de-anonymization attacks • Backstrom et al. attack (WWW 2007) • Both active attacks and passive attacks • Narayanan et al. attack (IJCNN 2011) • A simplified version Narayanan-Shmatikov attack (IEEE S&P 2009) • For breaching link privacy • Pedarsani et al. attack (Allerton 2013) • A Bayesian method based attack
Limitations of existing attacks • Not scalable • E.g., Backstrom et al. attack (WWW 2007) needs to create Sybil users before anonymized data release, which is not controllable or scalable • E.g., Srivatsa-Hicks attack (CCS 2012) has a complexity of O(k!n3), k is the number seeds, which is not scalable • High computational cost • E.g., Narayanan-Shmatikov attack (S&P 2009) has a complexity of O(nk+n4) • Not general • E.g., Narayanan-Shmatikov attack (S&P 2009) is designed for directed graph • E.g., Pedarsani et al. attack (Allerton 2013) is good for sparse graphs but bad for dense graphs
Our contributions • Defined and mesured three de-anonymization metrics • Strucutral similarity, relative distance similarity, and inheritance similarity • Proposed a Unified Similarity (US) based De-Anonymization (DA) framework • Iteratively de-anonymize data with accuracy guarantee • Generalized DA to an Adaptive De-Anonymization (ADA) framework • To de-anonymize large-scale data without the knowledge on the overlap size between the anonymized data and the auxiliary data • Applied the proposed de-anonymization attacks to real world datasets • Successfully de-anonymized three mobility traces: At Andrews, Infocom06, and Smallblue • Successfully de-anonymized three social network datasets: ArnetMiner, Google+, and Facebook
Outline • Background • Preliminaries and Model • De-anonymization • Generalized Scalable De-anonymization • Experiments • Conclusion and Future Work
Preliminaries and Model • Anonymized data graph • Auxiliary data graph • Attack Model • A de-anonymization attack is a mapping of users from the anonymized graph to the auxiliary graph, i.e.,
Datasets – mobility traces • Mobility traces (anonymized data) and social networks (auxiliary data) (same as Srivatsa-Hicks attack (ACM CCS 2012)) • Preprocess mobility traces to construct anonymized contact graphs (see Srivatsa and Hick’s paper for detail) • Use social network as auxiliary data to de-anonymize mobility traces
Datasets – social networks • ArnetMiner • A coauthor network • A weighted graph with weight indicating the number of coauthored papers • 1,127 authors and 6,690 “coauthor” relationships • Google+ • Two Google+ datasets crawled on July 19 and August 6 in 2011, denoted by JUL and AUG, respectively • JUL: 5,200 users, 7,062 connections • AUG: 5,200 users, 7,813 connections • Facebook • 63,731 users • 1,269,502 friend relationships
Outline • Background • Preliminaries and Model • De-anonymization • Generalized Scalable De-anonymization • Experiments • Conclusion and Future Work
De-anonymization • High-Level Description • Seed selection • Mapping propagation • Seed selection • Identify a small number of seed mappings from the anonymized graph to the auxiliary graph • Bootstrap the de-anonymization • Mapping propagation • De- anonymize the anonymized graph using multiple similarity measurements
Mapping Propagation • Metrics • Structural Similarity • Relative Distance Similarity • Inheritance Similarity • Unified Similarity • We also defined the weighted version of these metrics by considering the weights on edges • Propagation framework
Structural Similarity • Degree centrality • The number of ties that a node has in a graph
Structural Similarity • Closeness centrality • How close a node is to others nodes in a graph
Structural Similarity • Betweenness centrality • A node’s global structural importance within a graph
Structural Similarity • Defined as the cosine similarity between two nodes’ degree, closeness, and betweenness centralities
Relative Distance Similarity • Defined as the cosine similarity between two nodes’ distance vectors to seeds
Inheritance Similarity • Characterize the knowledge provided by current mapping results • Two nodes have more common mapped neighbors will have high inheritance similarity score
Unified Similarity (US) • Considering the structural similarity, relative distance similarity, and inheritance similarity Weights US Structural similarity Relative distance similarity Inheritance similarity
US based De-Anonymization (DA) Framework • Step 1: seed identification by existing techniques • Step 2: calculate two candidate node sets Ca and Cu from the anonymized graph and the auxiliary graph, respectively • Step 3: calculate the US of each user from Ca to every user in Cu, and construct a weighted bipartite graph from Ca and Cu based on the calculated US scores • Step 4: Seek a maximum weighted bipartite matching • Step 5: Decide whether to accept a node de-anonymization result in the bipartite mathching • Go to step 2 if the end condition is not reached
Outline • Background • Preliminaries and Model • De-anonymization • Generalized Scalable De-anonymization • Experiments • Conclusion and Future Work
Generalized Scalable De-anonymization • Core Matching Subgraph (CMS)
Adaptive De-Anonymization (ADA) Identify initial CMS Run DA on initial CMS Update CMS or End
Outline • Background • Preliminaries and Model • De-anonymization • Generalized Scalable De-anonymization • Experiments • Conclusion and Future Work
Conclusion and Future Work • Conclusion • Proposed and examined several structural similarity metrics • Designed a new scalable structural de-anonymization framework for mobility traces and social networks • Validated the proposed de-anonymization framework on multiple mobility traces and social networks • Future work • More experiments on large-scale datasets • De-anonymizablity quantification (partially done in our ACM CCS 2014 paper) • Secure data publishing system
Thank you and the presenter Qin Liu! Shouling Ji sji@gatech.edu http://users.ece.gatech.edu/sji/