Privacy Vulnerability of Published Anonymous Mobility Traces

Privacy Vulnerability ofPublished Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao(Oak Ridge National Laboratory)

Motivation:Collecting mobility traces • Mobile network applications • traffic monitoring, road surface sensing, radiation and chemical detection • Mobility traces are collected and published to assist the design, analysis, and evaluation of mobile networks • E.g., Crawdad

<11:32:12, Chris Ma, (41.89840,-87.61999)> <11:30~11:35, ID-271, (41.89~41.90,-87.62~-87.61)> Motivation:Privacy vulnerability • Measures are carried out to protect privacy of the participants • Traces are identified using a random but consistent and unique identifier that is not correlated to the real ID • Spatial and temporal granularities are reduced

Motivation:Privacy vulnerability • These measures are not enough! • Participants can be openly observed • Participants may leak their location information (snapshots of time and location pairs, termed as side information) • web blogs, status in social networks, tweets, causal conversations, etc. • An adversary, who tries to identify the complete trace (movement history) of one or more participants, may succeed with high probability

Our contributions • Comprehensive study of attack strategies • Various ways for side information collection • Analytically proved the optimality of attack strategy • Quantitative simulation results • Privacy implications of characteristics of real traces and synthetic traces • Synthetic nodes are more sparsely placed • More easily identified but more difficult to meet with

Agenda • Problem formulation • Analytical derivation • Experimental analysis • Conclusion

Problem formulation- trace sampling and publication <t, R.B., (x,y)> <t’, IDi, (x’,y’)>

Problem formulation • An adversary tries to identify the complete movement history of the participant(s) • collects side information and compares with the published traces • Possible attack scenarios • Adversary infers the location of a victim indirectly (passive adversary) • Adversary observes the movement of the victims physically (active adversary)

Passive Adversary- infers snapshots of victim Special case:reference times are sampling times

Passive Adversary- infers snapshots of victim General case:reference times are not sampling times

Passive Adversary- infers snapshots of victim General case:reference times are not sampling times Infers the possible location of the node at reference times using a general mobility model - preference of the nodes, physical constraints

Passive Adversary- infers snapshots of victim General case:reference times are not sampling times Infers the possible location of the node at reference times using a general mobility model

Passive Adversary- infers snapshots of victim General case:reference times are not sampling times

Attack approaches of passive adversary • Use of Bayesian approach to determine the trace that gives the best match with the inferred location information Published traces Noisy side information

Attack approaches of passive adversary • For the special case (reference time = sampling time), with the assumption that noise is i.i.d., • For the general case, with the assumptions that noise is i.i.d. and movement is Markovian,

Attack approaches of passive adversary • Most Likelihood Estimator (MLE) approach • Minimum Square (MSQ) approach • Basic (BAS) approach • Weighted Exponential (EXP) approach • When noise is Gaussian, MLE and MSQ are equivalent

Active Adversary- observes victims physically Adversary is one of the participants

Active Adversary- observes victims physically Adversary stays at a (popular) position

Active Adversary- observes victims physically Adversary travels between popular locations

Problem formulation • Why the two different cases? • Active • Needs to consider how to collect the side information physically as time evolves • Adversary tries to identify as many victims as possible – plot of k-anonymity as function of time • Passive • Snapshots of victim are inferred (not collected) and less accurate in general • Adversary tries to identify one victim only – plot of correctness as function of pieces of side information

Attack strategy of active adversary • Algorithm of the attack (in action) real ID trace IDs 1 A, B, C2 A, B, C3 A, B, C 1 A, B, C2 A, B, C3 A, B, C 1 A, B2 A, B3 A, B, C t2 t1 1 2 3

Experimental analysis • Basic information • Real traces • 536 San Francisco taxicabs • 2348 Shanghai Grid buses • Synthetic traces • Using map size and average speed computed from taxi cab traces • Random waypoint (with different maximum trip lengths) • Random walk • Spatial granularity = 1 km • Temporal granularity = 1 minute (unless stated otherwise)

Characteristics of the tracesDistance between traces • Real traces are closer to each other on average • Bus traces have a broader range • For synthetic traces, the shorter the trip length, the further away they are from each other in general

Significant observations • Lack of preferred locations and random initial location of the synthetic traces • Nodes are more sparsely distributed in the network • Implications: • For adversary in general • Can easily identify the trace of a synthetic node since no other traces share similar path • For active adversary • May take longer time to meet with each synthetic node

Attack performancePassive adversary (special case) • Special case - side-information inferred at sampling times of traces • Correct assumption of noise (Gaussian ) • Cab traces • Observations • MLE, MSQ perform equally well • BAS gives the least amount of wrong conclusions initially

Attack performancePassive adversary (special case) • Random waypoint traces • Most efficient attack • traces have very different paths

Attack performancePassive adversary (special case) • Incorrect assumption of noise • Assumption: Uniform • Actual: Gaussian • Cab traces • Observations • MLE is much worsened

Attack performancePassive adversary (general case) • General case – side information at times different from trace sampling times • Worst case scenario – all times are different • Infer the location of the victim using the mobility model • Gaussian noise (no noise as best performance bound) • Cab traces

SummaryPassive adversary • For passive adversary • MLE and MSQ give the best performance among the four approaches in terms of the fraction of correct conclusions • Since MLE relies on the knowledge of type of noise and its magnitude, MSQ is the preferred more robust attack approach

Attack performanceActive adversary as one of mobile nodes • Higher attack efficiency for real traces • Mobile nodes more likely to visit the same set of locations at the same time • Synthetic nodes more sparsely distributed in the network 1 time step = 1 minute

Attack performanceActive adversary who stays at one of the cells • Observations • Comparing real traces and synthetic traces • Attacks on real traces are more efficient – k-anonymity drops more quickly • Popular cells in real traces and random waypoint traces are more aggregated together • Being at a popular cell does not necessarily results in higher attack efficiency cabs buses Random walk Random waypoint

Attack performanceActive adversary who moves among popular cells • The ability to move among popular cells improve attack efficiency • Improvement is more significant if node movements are more localized • Visiting more cells does not necessarily improves efficiency cabs buses Random walk Random waypoint

Conclusion • Study how privacy leaks through trace publication • Under different adversary strategies to collect side information • Using different mobile traces with different characteristics • Experimentally show that the adversary is able to identify the trace of a victim from the published set with high probability

Privacy Vulnerability of Published Anonymous Mobility Traces

Privacy Vulnerability of Published Anonymous Mobility Traces

Presentation Transcript

Traces of Grammar Evolution

Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks

Key Privacy and Anonymous Protocols

Interpretation of RCM traces

Privacy Preserving Data Mining within Anonymous Credentials

Anonymous

Probabilistic Privacy Analysis of Published Views

Anonymous

Anonymous

PRIVACY IN NETWORK TRACES

TRACES:

- Anonymous

Anonymous

Anonymous Biometrics: Privacy Protection of Biometric Templates

Comparing Mobility and Predictability of VoIP and WLAN Traces

Probabilistic Privacy Analysis of Published Views

Traces of Wearmouth

- Anonymous

anonymous