260 likes | 444 Views
Narayanan-Shmatikov attack (IEEE S&P 2009). A. Narayanan and V. Shmatikov, De-anonymizing Social Networks , IEEE S&P 2009 . Anonymized data: Twitter (crawled in late 2007) A microblogging service 224K users, 8.5M edges Auxiliary data: Flicker (crawled in late 2007/early 2008)
E N D
Narayanan-Shmatikov attack (IEEE S&P 2009) • A. Narayanan and V. Shmatikov, De-anonymizing Social Networks, IEEE S&P 2009. • Anonymized data: Twitter (crawled in late 2007) • A microblogging service • 224K users, 8.5M edges • Auxiliary data: Flicker (crawled in late 2007/early 2008) • A photo-sharing service • 3.3M users, 53M edges • Result: 30.8% of the users are successfully de-anonymized Twitter Flicker Heuristics Eccentricity Edge directionality Node degree Revisiting nodes Reverse match User mapping
Srivatsa-Hicks Attacks (ACM CCS 2012) • M. Srivatsa and M. Hicks, De-anonymizing Mobility Traces: using Social Networks as a Side-Channel, ACM CCS 2012. • Anonymized data • Mobility traces: St Andrews, Smallblue, and Infocom 2006 • Auxiliary data • Social networks: Facebook, and DBLP • Over 80% users can be successfully de-anonymized
Motivation • Question 1: Why can structural data be de-anonymized? • Question 2: What are the conditions for successful data de-anonymization? • Question 3: What portion of users can be de-anonymized in a structural dataset? [1] P. Pedarsani and M. Grossglauser, On the Privacy of Anonymized Networks, KDD 2011. [2] L. Yartseva and M. Grossglauser, On the Performance of Percolation Graph Matching, COSN 2013. [3] N. Korula and S. Lattanzi, An Efficient Reconciliation Algorithm for Social Networks, VLDB 2014.
Motivation • Question 1: Why can structural data be de-anonymized? • Question 2: What are the conditions for successful data de-anonymization? • Question 3: What portion of users can be de-anonymized in a structural dataset? Our Constribution Address the above three open questions under a practical data model.
Outline • Introduction and Motivation • System Model • De-anonymization Quantification • Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice • Implication 2: Secure Data Publishing • Conclusion
System Model • Anonymized Data • Auxiliary Data • De-anonymization • Measurement
System Model • Anonymized Data • Auxiliary Data • De-anonymization • Measurement • Quantification • Configuration Model • G can have an arbitrary degree sequence that follows any distribution conceptual underlying graph
Outline • Introduction and Motivation • System Model • De-anonymization Quantification • Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice • Implication 2: Secure Data Publishing • Conclusion
De-anonymization Quantification • Perfect De-anonymization Quantification Structural Similarity Condition Graph/Data Size Condition
De-anonymization Quantification • -Perfect De-anonymization Quantification Graph/Data Size Condition Structural Similarity Condition
Outline • Introduction and Motivation • System Model • De-anonymization Quantification • Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice • Implication 2: Secure Data Publishing • Conclusion
Evaluation • Datasets
Evaluation Structural Similarity Condition • Perfect De-anonymization Condition Graph/Data Size Condition
Evaluation • -Perfect De-anonymization Condition Projection/Sampling Condition Structural Similarity Condition Graph/Data Size Condition
Evaluation Structural Similarity Condition • -Perfect De-anonymizability
Evaluation Structural Similarity Condition • -Perfect De-anonymizability How many users can be successfully de-anonymized
Outline • Introduction and Motivation • System Model • De-anonymization Quantification • Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice • Implication 2: Secure Data Publishing • Conclusion
Optimization based De-anonymization (ODA) • Our quantification implies • An optimum de-anonymization solution exists • However, it is difficult to find it. Select candidate users from unmapped users with top degrees Mapping candidate users by minimizing the Edge Error function
Optimization based De-anonymization (ODA) • Our quantification implies • An optimum de-anonymization solution exists • However, it is difficult to find it. ODA Features 1. Cold start (seed-free) 2. Can be used by other attacks for landmark (seed) identification 3. Optimization based Space complexity Time complexity
ODA Evaluation • Dataset • Google+ (4.7M users, 90.8M edges): using random sampling to get anonymized graphs and auxiliary graphs • Gowalla: • Anonymized graphs: constructed based on 6.4M check-ins<UserID, latitude, longitude, timestamp, location ID> generated by .2M users • Auxiliary graph: the Gowalla social graph of the .2 users (1M edges) • Results: landmark identification
ODA Evaluation • Dataset • Google+ (4.7M users, 90.8M edges): using random sampling to get anonymized graphs and auxiliary graphs • Gowalla: • Anonymized graphs: constructed based on 6.4M check-ins<UserID, latitude, longitude, timestamp, location ID> generated by .2M users • Auxiliary graph: the Gowalla social graph of the .2 users (1M edges) • Results: de-anonymization
Outline • Introduction and Motivation • System Model • De-anonymization Quantification • Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice • Implication 2: Secure Data Publishing • Conclusion
Secure Structural Data Publishing • Structural information is important • Based on our quantification • Secure structural data publishing is difficult, at least theoretically • Open problem …
Conclusion • We proposed the first quantification framework for structural data de-anonymization under a practical data model • We conducted a large-scale de-anonymizability evaluation of 26 real world structural datasets • We designed a cold-start optimization-based de-anonymization algorithm Acknowledgement We thank the anonymous reviewers very much for their valuable comments!
Thank you! Shouling Ji sji@gatech.edu http://users.ece.gatech.edu/sji/