330 likes | 452 Views
Privacy Vulnerability of Published Anonymous Mobility Traces. Chris Y. T. Ma, David K. Y. Yau , Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak Ridge National Laboratory ). Motivation: Collecting mobility traces. Mobile network applications
E N D
Privacy Vulnerability ofPublished Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao(Oak Ridge National Laboratory)
Motivation:Collecting mobility traces • Mobile network applications • traffic monitoring, road surface sensing, radiation and chemical detection • Mobility traces are collected and published to assist the design, analysis, and evaluation of mobile networks • E.g., Crawdad
<11:32:12, Chris Ma, (41.89840,-87.61999)> <11:30~11:35, ID-271, (41.89~41.90,-87.62~-87.61)> Motivation:Privacy vulnerability • Measures are carried out to protect privacy of the participants • Traces are identified using a random but consistent and unique identifier that is not correlated to the real ID • Spatial and temporal granularities are reduced
Motivation:Privacy vulnerability • These measures are not enough! • Participants can be openly observed • Participants may leak their location information (snapshots of time and location pairs, termed as side information) • web blogs, status in social networks, tweets, causal conversations, etc. • An adversary, who tries to identify the complete trace (movement history) of one or more participants, may succeed with high probability
Our contributions • Comprehensive study of attack strategies • Various ways for side information collection • Analytically proved the optimality of attack strategy • Quantitative simulation results • Privacy implications of characteristics of real traces and synthetic traces • Synthetic nodes are more sparsely placed • More easily identified but more difficult to meet with
Agenda • Problem formulation • Analytical derivation • Experimental analysis • Conclusion
Problem formulation- trace sampling and publication <t, R.B., (x,y)> <t’, IDi, (x’,y’)>
Problem formulation • An adversary tries to identify the complete movement history of the participant(s) • collects side information and compares with the published traces • Possible attack scenarios • Adversary infers the location of a victim indirectly (passive adversary) • Adversary observes the movement of the victims physically (active adversary)
Passive Adversary- infers snapshots of victim Special case:reference times are sampling times
Passive Adversary- infers snapshots of victim General case:reference times are not sampling times
Passive Adversary- infers snapshots of victim General case:reference times are not sampling times Infers the possible location of the node at reference times using a general mobility model - preference of the nodes, physical constraints
Passive Adversary- infers snapshots of victim General case:reference times are not sampling times Infers the possible location of the node at reference times using a general mobility model
Passive Adversary- infers snapshots of victim General case:reference times are not sampling times
Attack approaches of passive adversary • Use of Bayesian approach to determine the trace that gives the best match with the inferred location information Published traces Noisy side information
Attack approaches of passive adversary • For the special case (reference time = sampling time), with the assumption that noise is i.i.d., • For the general case, with the assumptions that noise is i.i.d. and movement is Markovian,
Attack approaches of passive adversary • Most Likelihood Estimator (MLE) approach • Minimum Square (MSQ) approach • Basic (BAS) approach • Weighted Exponential (EXP) approach • When noise is Gaussian, MLE and MSQ are equivalent
Active Adversary- observes victims physically Adversary is one of the participants
Active Adversary- observes victims physically Adversary stays at a (popular) position
Active Adversary- observes victims physically Adversary travels between popular locations
Problem formulation • Why the two different cases? • Active • Needs to consider how to collect the side information physically as time evolves • Adversary tries to identify as many victims as possible – plot of k-anonymity as function of time • Passive • Snapshots of victim are inferred (not collected) and less accurate in general • Adversary tries to identify one victim only – plot of correctness as function of pieces of side information
Attack strategy of active adversary • Algorithm of the attack (in action) real ID trace IDs 1 A, B, C2 A, B, C3 A, B, C 1 A, B, C2 A, B, C3 A, B, C 1 A, B2 A, B3 A, B, C t2 t1 1 2 3
Experimental analysis • Basic information • Real traces • 536 San Francisco taxicabs • 2348 Shanghai Grid buses • Synthetic traces • Using map size and average speed computed from taxi cab traces • Random waypoint (with different maximum trip lengths) • Random walk • Spatial granularity = 1 km • Temporal granularity = 1 minute (unless stated otherwise)
Characteristics of the tracesDistance between traces • Real traces are closer to each other on average • Bus traces have a broader range • For synthetic traces, the shorter the trip length, the further away they are from each other in general
Significant observations • Lack of preferred locations and random initial location of the synthetic traces • Nodes are more sparsely distributed in the network • Implications: • For adversary in general • Can easily identify the trace of a synthetic node since no other traces share similar path • For active adversary • May take longer time to meet with each synthetic node
Attack performancePassive adversary (special case) • Special case - side-information inferred at sampling times of traces • Correct assumption of noise (Gaussian ) • Cab traces • Observations • MLE, MSQ perform equally well • BAS gives the least amount of wrong conclusions initially
Attack performancePassive adversary (special case) • Random waypoint traces • Most efficient attack • traces have very different paths
Attack performancePassive adversary (special case) • Incorrect assumption of noise • Assumption: Uniform • Actual: Gaussian • Cab traces • Observations • MLE is much worsened
Attack performancePassive adversary (general case) • General case – side information at times different from trace sampling times • Worst case scenario – all times are different • Infer the location of the victim using the mobility model • Gaussian noise (no noise as best performance bound) • Cab traces
SummaryPassive adversary • For passive adversary • MLE and MSQ give the best performance among the four approaches in terms of the fraction of correct conclusions • Since MLE relies on the knowledge of type of noise and its magnitude, MSQ is the preferred more robust attack approach
Attack performanceActive adversary as one of mobile nodes • Higher attack efficiency for real traces • Mobile nodes more likely to visit the same set of locations at the same time • Synthetic nodes more sparsely distributed in the network 1 time step = 1 minute
Attack performanceActive adversary who stays at one of the cells • Observations • Comparing real traces and synthetic traces • Attacks on real traces are more efficient – k-anonymity drops more quickly • Popular cells in real traces and random waypoint traces are more aggregated together • Being at a popular cell does not necessarily results in higher attack efficiency cabs buses Random walk Random waypoint
Attack performanceActive adversary who moves among popular cells • The ability to move among popular cells improve attack efficiency • Improvement is more significant if node movements are more localized • Visiting more cells does not necessarily improves efficiency cabs buses Random walk Random waypoint
Conclusion • Study how privacy leaks through trace publication • Under different adversary strategies to collect side information • Using different mobile traces with different characteristics • Experimentally show that the adversary is able to identify the trace of a victim from the published set with high probability