Introduction

Introduction • Social networking services are a fast-growing business nowadays • Facebook, Twitter, Google+, LiveJournal, YouTube, … • When users participate in online social network activities, people’s privacy suffers potential serious threat • Create personal portfolio, post current location, … • Countermeasures • Naïve anonymization: removing “Personally Identifiable Information (PII)” • Edge modification • k-anonymity and its varients • Still vulnerable to powerful structure-based de-anonymization attacks • Narayanan-Shmatikov attack (IEEE S&P 2009) • Srivatsa-Hicks attack (ACM CCS 2012) • Others

Narayanan-Shmatikov attack (IEEE S&P 2009) • Anonymized data: Twitter (crawled in late 2007) • A microblogging service • 224K users, 8.5M edges • Auxiliary data: Flicker (crawled in late 2007/early 2008) • A photo-sharing service • 3.3M users, 53M edges • Result: 30.8% of the users are successfully de-anonymized Twitter Flicker Heuristics Eccentricity Edge directionality Node degree Revisiting nodes Reverse match User mapping

Srivatsa-Hicks (ACM CCS 2012) • Anonymized data • Mobility traces: St Andrews, Smallblue, and Infocom 2006 • Auxiliary data • Social networks: Facebook, and DBLP • De-anonymize mobility traces using corresponding social networks • Over 80% users can be successfully de-anonymized

Other structural de-anonymization attacks • Backstrom et al. attack (WWW 2007) • Both active attacks and passive attacks • Narayanan et al. attack (IJCNN 2011) • A simplified version Narayanan-Shmatikov attack (IEEE S&P 2009) • For breaching link privacy • Pedarsani et al. attack (Allerton 2013) • A Bayesian method based attack

Limitations of existing attacks • Not scalable • E.g., Backstrom et al. attack (WWW 2007) needs to create Sybil users before anonymized data release, which is not controllable or scalable • E.g., Srivatsa-Hicks attack (CCS 2012) has a complexity of O(k!n3), k is the number seeds, which is not scalable • High computational cost • E.g., Narayanan-Shmatikov attack (S&P 2009) has a complexity of O(nk+n4) • Not general • E.g., Narayanan-Shmatikov attack (S&P 2009) is designed for directed graph • E.g., Pedarsani et al. attack (Allerton 2013) is good for sparse graphs but bad for dense graphs

Our contributions • Defined and mesured three de-anonymization metrics • Strucutral similarity, relative distance similarity, and inheritance similarity • Proposed a Unified Similarity (US) based De-Anonymization (DA) framework • Iteratively de-anonymize data with accuracy guarantee • Generalized DA to an Adaptive De-Anonymization (ADA) framework • To de-anonymize large-scale data without the knowledge on the overlap size between the anonymized data and the auxiliary data • Applied the proposed de-anonymization attacks to real world datasets • Successfully de-anonymized three mobility traces: At Andrews, Infocom06, and Smallblue • Successfully de-anonymized three social network datasets: ArnetMiner, Google+, and Facebook

Outline • Background • Preliminaries and Model • De-anonymization • Generalized Scalable De-anonymization • Experiments • Conclusion and Future Work

Preliminaries and Model • Anonymized data graph • Auxiliary data graph • Attack Model • A de-anonymization attack is a mapping of users from the anonymized graph to the auxiliary graph, i.e.,

Datasets – mobility traces • Mobility traces (anonymized data) and social networks (auxiliary data) (same as Srivatsa-Hicks attack (ACM CCS 2012)) • Preprocess mobility traces to construct anonymized contact graphs (see Srivatsa and Hick’s paper for detail) • Use social network as auxiliary data to de-anonymize mobility traces

Datasets – social networks • ArnetMiner • A coauthor network • A weighted graph with weight indicating the number of coauthored papers • 1,127 authors and 6,690 “coauthor” relationships • Google+ • Two Google+ datasets crawled on July 19 and August 6 in 2011, denoted by JUL and AUG, respectively • JUL: 5,200 users, 7,062 connections • AUG: 5,200 users, 7,813 connections • Facebook • 63,731 users • 1,269,502 friend relationships

De-anonymization • High-Level Description • Seed selection • Mapping propagation • Seed selection • Identify a small number of seed mappings from the anonymized graph to the auxiliary graph • Bootstrap the de-anonymization • Mapping propagation • De- anonymize the anonymized graph using multiple similarity measurements

Mapping Propagation • Metrics • Structural Similarity • Relative Distance Similarity • Inheritance Similarity • Unified Similarity • We also defined the weighted version of these metrics by considering the weights on edges • Propagation framework

Structural Similarity • Degree centrality • The number of ties that a node has in a graph

Structural Similarity • Closeness centrality • How close a node is to others nodes in a graph

Structural Similarity • Betweenness centrality • A node’s global structural importance within a graph

Structural Similarity • Defined as the cosine similarity between two nodes’ degree, closeness, and betweenness centralities

Relative Distance Similarity • Defined as the cosine similarity between two nodes’ distance vectors to seeds

Inheritance Similarity • Characterize the knowledge provided by current mapping results • Two nodes have more common mapped neighbors will have high inheritance similarity score

Unified Similarity (US) • Considering the structural similarity, relative distance similarity, and inheritance similarity Weights US Structural similarity Relative distance similarity Inheritance similarity

US based De-Anonymization (DA) Framework • Step 1: seed identification by existing techniques • Step 2: calculate two candidate node sets Ca and Cu from the anonymized graph and the auxiliary graph, respectively • Step 3: calculate the US of each user from Ca to every user in Cu, and construct a weighted bipartite graph from Ca and Cu based on the calculated US scores • Step 4: Seek a maximum weighted bipartite matching • Step 5: Decide whether to accept a node de-anonymization result in the bipartite mathching • Go to step 2 if the end condition is not reached

Generalized Scalable De-anonymization • Core Matching Subgraph (CMS)

Adaptive De-Anonymization (ADA) Identify initial CMS Run DA on initial CMS Update CMS or End

Experiments – de-anonymize mobility traces

Experiments – de-anonymize ArnetMiner

Experiments – de-anonymize Google+

Experiments – de-anonymize Facebook

Conclusion and Future Work • Conclusion • Proposed and examined several structural similarity metrics • Designed a new scalable structural de-anonymization framework for mobility traces and social networks • Validated the proposed de-anonymization framework on multiple mobility traces and social networks • Future work • More experiments on large-scale datasets • De-anonymizablity quantification (partially done in our ACM CCS 2014 paper) • Secure data publishing system

Thank you and the presenter Qin Liu! Shouling Ji sji@gatech.edu http://users.ece.gatech.edu/sji/

Introduction

Introduction

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction