Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks

Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip and Nageswara S. V. Rao

Motivations • Mobility traces published to assist study of mobile networks and their applications • Simply removing identity and reducing spatial and temporal granularities of traces are not enough

Proposed Privacy Protection Approaches • Granularity-reduction • Until a trace is not differentiable with k-1 other traces • The granularity could be so coarse that makes the traces useless (e.g., we are all inside Singapore today) • Differential privacy • Provide guarantee on privacy protection – even the most powerful adversary, who knows all but one record in the traces, cannot gain information from the answer • Only allowed limited type and number of queries which answers statistics of traces only, and never the individual ones (what if we want to know how one interacts with others?)

Problem Definition • Mobility traces, each recording a series of time and corresponding location of a mobile node, are published • Traces are anonymized, location granularity is reduced • An adversary with snapshots (time and location pairs) of a victim tries to identify the complete mobility history of the victim from the traces • She also has general knowledge about the region and general preference of the mobile nodes, but NOT that of the victim

Some Possible Protection Approaches • Noise addition • Magnitude of noise is limited to preserve usefulness of traces • Strong ciphers • Encrypting the whole mobility trace • Need the key to access • To enforce privacy, even legitimate users cannot have the key! • Traces are useless for any applications

Our Protection Approaches • Reducing the linkability between the traces published and the side information possessed by the adversary • Cipher techniques • Location cipher • Using symbols to represent locations (consistently) • Instead of saying I am in Novotel, say I am in location A • Zero-time cipher • Publishing time relative to a (concealed) absolute time (consistently) • Instead of saying at 5pm I am in Novotel, say at the n+12-th hour I am in Novotel • Combining the two cipher • Say at the n+12-th hour I am in location A

Attack strategies of adversary

Adversary’s Attack – Breaking the Ciphers • Assumptions • Knowing the cipher techniques used • Knowing the region the traces are collected • Knowing the physical constraints and general preference of the mobile nodes (again, NOT that of the victim)

Adversary’s Attack – Breaking the Location Cipher (Order-0) • Breaking the location cipher • Frequency analysis (Order-0 Markov model) • By knowing the region where the traces are collected, the adversary can rank locations inside the region using general knowledge • Compare the ranking with that from the published location ID • Challenges • Popularity of locations is less clear than texts • may only know the top few ones distinctively

Frequency Analysis (Order-0) 235 grid cells in San Francisco English text

Breaking the Location Cipher – Frequency Analysis (Higher-Order) • Breaking the location cipher • Frequency analysis (higher-order Markov model) • Using general preference and physical constraints to learn higher order trajectory • Challenges • Higher-order knowledge may not be too beneficial • Noisier to learn • Less specific

Adversary’s Attack – Breaking the Zero-Time Cipher • Sub-string matching • Since traces are published with time relative to a concealed absolute time, their order of location-visit is kept • With snapshots collected about a mobile node, the adversary could determine the similarity between the traces and that of the victim

Breaking the Time-Zero Cipher – Substring Matching • Example • 1 snapshot • 2 snapshots • More snapshots give more accurate results 2 2 1 1 5 5 7 7 7 7 5 5 5 5 5 5 1 1 1 1 2 2 2 2 1 1 5 5 5 5 5 5 7 7 7 7 4 4 4 4 7 7 5 5 5 5 5 5 5 5 5 5 5 5 5 5 7 5 7 5 7 5 7 5 7 7 7 7 7

Calculating the Similarity • Infers the possible locations of the victim at the time instants of the side information • Uses Bayesian approach to determine the trace that gives the best match with the side information

The Bayesian Approach • Maximum Likelihood Estimation (MLE) • Assuming distribution of noise is known

Experimental Analysis – Overview • Basic information of traces • 536 San Francisco taxi cabs • Special case of sampling times coincide with that of side information • No inference in location needed

Experimental Analysis – Metrics • Performance-quantifying metrics • % of correct conclusions • % of runs the algorithm returns the victim’s trace (or a trace that is identical to the victim’s trace) correctly with the highest similarity value • % of incorrect conclusions • % of runs the algorithm misidentifies the victim’s trace

Performance of Cipher Techniques – Having both time and location cipher is the most secure % of incorrect conclusions % of correct conclusions

Knowing more may not help!(to break the location cipher) % of incorrect conclusions % of correct conclusions

Conclusion • Presented two cipher techniques to protect published traces (when they need to be published) • Individual cipher technique helps, while using both together gives the best protection (verified in experiments)

Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks