1 / 44

Hiding Stars with Fireworks: Location Privacy through Camouflage

Explore "CacheCloak," a prediction engine that obscures user locations by surrounding them with other paths, enhancing privacy without compromising functionality. Learn about its benefits and system evaluation results.

Download Presentation

Hiding Stars with Fireworks: Location Privacy through Camouflage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Hiding Stars with Fireworks:Location Privacy through Camouflage From: ACM MobiCom2009 Authors: J Meyerowitz, R Choudhury

  2. Outlines • Introduction • Location-only service structure • Limitations of existing work • Predictive privacy • CacheCloak: prediction engine • System evaluation • Results and analysis • Distributed CacheCloak • Discussions • Conclusion

  3. 1. Introduction (1/2) • The proliferation of GPS and ubiquitous wireless connectivity, has enabled new location-based services (LBSs) • Example • Location based advertisements • Geolife: a location-based to-do system • Zero-sum game between privacy & functionality • Reduce spatial accuracy, increase delayed time, etc.

  4. 1. Introduction (2/2) • Main idea of “CacheCloak” • Where other methods try to obscure the user’s path by hiding parts of it, we obscure the user’s location by surrounding it with other users’ paths • Mechanism

  5. 2. Location-only service structure (1/2) • Trusted LBS vs. untrusted LBS • Whether reveal the identity of the user • Reliant on what kind of service (ex: banking application) • Goal • To structure services that only theuser’s location need be sent to the LBS • Example: Geolife • A todo list of “Buy Milk @ Farmer’s Market” • Alert when nearby it

  6. 2. Location-only service structure (2/2) • Query frequency and privacy • “Breadcrumbs dropped by Hansel and Gretel” • Infrequently vs. too frequently

  7. 3. Limitations of existing work (1/5) • 3.1 K-Anonymity • Definition • Ensure that the user cannot be individually identified from a group of k users • By sending a sufficiently large “k-anonymous region” that encloses k users in space • Alternate formulation: CliqueCloak • Similar but expanding its size in time, by forcing some users to wait until enough anonymization can take place • Flaw • Reduce the quality of the user’s localization in space or time • Not suit for low user density scenarios

  8. 3. Limitations of existing work (2/5) • 3.2 Pseudonyms and Mix Zones • Pseudonyms • “Breadcrumbs” – however, frequent updating may expose a pattern of closely spaced queries, allowing one to easily follow the user • A user’s trail may also be revealed if the user sends distinguishable queries to the LBS

  9. Mix Zones Definition: whenever two users occupy the same place at the same time A problem arises from the rarity of space-time intersections Much more common at different times Figure 1: the attacker cannot say whether they turned or went straight 3. Limitations of existing work (3/5)

  10. 3. Limitations of existing work (4/5) • 3.3 Path Confusion • Extend mix zones by resolving the same-place same-time problem – by incorporating a delay in the anonymization • t0 < t1 < t0+tdelay • Ex: t0=7:05pm, t1=7:09pm, and tdelay=10 min (so t0+tdelay=15) • Thus, the users’ trail of locations are exposed at 7:15pm, ensuring confusion at the LBS • Problem: similar as CliqueCloak • By the initial introduction of a delay, realtime operation is compromised

  11. 3. Limitations of existing work (5/5) • 3.4 Limitations

  12. 4. Predictive privacy (1/4) • Main idea • Because eventually passing through the intersection, we can prefetch responses to the user’s continuous location queries • Based on path confusion • Allows us to keep the benefits of accuracy • But without incurring the delay of it

  13. 4. Predictive privacy (2/4) • Figure 2: “cache hit” • Flatcylinders: CacheCloak retrieved from the LBS, and cached in its server • Raisedcylinders: The user receives cached responses from CacheCloak for each of its current locations

  14. 4. Predictive privacy (3/4) • Figure 3: “cache miss” • CacheCloak makes a prediction that extrapolates the predicted path to existing paths with cached data • Without degrading spatial or temporal accuracy of any single data point that the user sees

  15. 4. Predictive privacy (4/4) • Figure 4: View of LBS • The LBS cannot determine what triggered the new set of queries along Main Street • At the ends of the predictions, CacheCloak provide the anonymity

  16. Pixellate Each 10m × 10m “pixel” is assigned an 8 × 8 historical counter matrix C, Make Prediction each element of the matrix cij represents the number of times a user has entered from i and exited toward j 5. CacheCloak: Prediction engine (1/3)

  17. 5. CacheCloak: Prediction engine (2/3) • First-order Markov model • (1) A user will exit side j given an entry from side i • (2) A user will exit side j without any knowledge of the entering side • Subsequent pixels will be “colored” in the direction of most likely next pixel max( P(j|i) for j = 1...8 )

  18. 5. CacheCloak: Prediction engine (3/3) • Benefits of this Markov model • (1) this mobility predictions are based entirely on previously observed user movements • (2) allows us a very rapid and computationally cheap way to create our predictions, as we are maintaining only one time-step of state throughout the prediction

  19. 6. System evaluation (1/5) • 6.1 Simulation • Utilize a trace-based simulation • Repeat to generate the historical data matrix

  20. 6. System evaluation (2/5) • 6.2 Attacker Model • The eight elements of p(x, y), pk(x, y) for k = 1, 2, .., 8, represent the probability that a user is occupying pixel (x, y) and entered from side k • ∑k pk(x,y) = 1 • Assumption: the attacker’s historical data is of the same quality as CacheCloak’s itself

  21. 6. System evaluation (3/5) • 6.2 Attacker Model • Update of Attacker’s Model.

  22. 6. System evaluation (4/5) • 6.2 Attacker Model • An example: • Start from (x0,y0) • Only two direction: k and inv(k) • So, P( k0|inv(k0) ) = P( inv(k0)|k0 ) = 0.5; otherwise P(j|i) = 0 • After the first time-step of diffusion • ∑k pk(x0,y0) = 0, but ∑k pk = 0.5 at (x0+dxk0 , y0+dyk0) and (x0+dxinv(k0), y0+dyinv(k0)) • That is, the attacker is simulating a 50/50 chance that the user went left or right out of a driveway

  23. 6. System evaluation (5/5) • 6.3 Privacy Metrics • The next step past the diffusive attacker model is creating a quantitative measure of privacy • Location entropy • For the probability ∑kPk(x,y) = P(x,y) , define as the number of bits S = -∑P(x,y)log2(P(x,y))  2S/(-∑P(x,y)) = P(x,y)  2 ∑P(x,y)/S = P(x,y) • Ex: two notes P1 & P2: if ∑kPk(x1,y1)=p1=0.5 , P(x2,y2)=0.1, P(x2+1,y2)=0.1, .., P(x2+3,y2+1)=0.1  while S=2.1, in actually only 1 bit of entropy • 2s location  S bits of entropy, but the inverse isn’t always true

  24. 7. Results and analysis (1/11) • About this section • We will show the evidence for strong anonymization • The average user’s trip from location to location takes approximately 10 minutes (in Gruteser & Hoh’s)

  25. 7.1 Evaluation Simulate: over 19 computers in parallel Density: 1 user/3.6km2 ~ 1 user/0.72km2 Total: 2,850 diffusive measurements Result: 95% performed and 5% culled Figure 6: Distribution of users Vs. distance traversed in 20 minutes. Most users traversed around 1 to 2 miles (2 ~ 3.5 km) 7. Results and analysis (2/11)

  26. 7.2 Mean and Minimum Entropy Discuss three cases: mean, worst, and best Time evolution & Entropy (bits) However, it will be valuable to see more about worst-case Figure 7: Mean entropy over time for different user densities 7. Results and analysis (3/11)

  27. Figure 8: Worst-case entropy over time for different userdensities 7. Results and analysis (4/11) • Figure 9: Best-case entropy over time for different userdensities

  28. 7.3 Density Variations Show its robust even in extreme low densities vs. K-anonymity: significant degradation of accuracy vs. Path confusion: also have very few path intersections to work with Figure 10:Entropy after 10 minutes of attempted tracking for different user densities 7. Results and analysis (5/11)

  29. 7.4 Typical Case Show the effect of density on entropy increases with time Figure 11: One arbitrary user’s entropy over time in different density conditions 7. Results and analysis (6/11)

  30. 7.4 Typical Case Branch at intersection vs. paths converge Increase vs. decrease of entropy Figure 12: Three arbitrary users’ entropies over time (n = 50) 7. Results and analysis (7/11)

  31. 7.4 Typical Case Intuitively, one’s ability to track a user cannot become worse Figure 13: The time evolution of a random user’s entropy over 30 minutes. 7. Results and analysis (8/11)

  32. 7.5 Peak Counting Peak: the locations the attacker is most certain a user has gone Increase: as time progresses, the number of locations a user might have traveled increased Drop: as more and more possible locations must be considered Figure 14: Average number of locations a user might be according to a 0.05 threshold 7. Results and analysis (9/11)

  33. Figure 15: Average number of locations a user might be according to a 0.1 threshold. 7. Results and analysis (10/11) • Figure 16: Average number of locations a user might be according to a 0.2 threshold.

  34. The number of peaks for a given threshold decreases with increasing users Showing that more users offer greater opportunity to hide. Figure 17: Variation of number of peaks left after 10 minutes at different densities and thresholds 7. Results and analysis (11/11)

  35. 8. Distributed CacheCloak (1/4) • “What if the users don’t wish to trust CacheCloak?” • for previous mechanism is based on trust server • Distributed vs. Centralized

  36. Centralized form LBS responses for the entire path is cached And the responses forwarded according to the current (x,y) coordinate of the mobile user Figure 18: Mobility prediction based on historical mobility patterns and the bit-mask of caches 8. Distributed CacheCloak (2/4)

  37. Distributed form Mobile: caches the path-wide responses Server: only necessary to maintain the global bit-mask & history matrix from all users in the system Figure 19: The historical patterns and the global bit-mask is periodically shared with the mobile device 8. Distributed CacheCloak (3/4)

  38. 8. Distributed CacheCloak (4/4) • Advantage • Distributed CacheCloak receives no more information about a user’s location than the untrusted LBS does • Flaw • Computation burden on the mobile device and a greater bandwidth burden on the last-hop wireless connection

  39. 9. Discussions (1/5) • 9.1 Pedestrian Users • Difficulty • Get enough historical mobility data to bootstrap the prediction system • One solution • From realistic source-destination pairs on Google Maps, and parse them to get the mobility patterns (plan to investigate in the future)

  40. 9. Discussions (2/5) • 9.2 Bootstrapping CacheCloak • Difficulty • An early adopter with zero users for a new LBS • Privacy for the few first users & gaining the critical mass of users for the system • Solution • However, CacheCloak has been shown to work well with very sparse populations • And it can be used initially with simulation-based historical data

  41. 9. Discussions (3/5) • 9.3 Coping with Distinguishable Queries • Problem • The LBS must be sent indistinguishable queries from different users (for more kinds of service) • Solution of CacheCloak • Not require an instantaneous response by the LBS • And the only content of the query is the location of the user, and the direction of travel

  42. 9. Discussions (4/5) • 9.3 Coping with Distinguishable Queries • Consideration • Query: • Response: • While R1, R2 ⊆ Rg, and CacheCloak generalizes query Qg

  43. 9. Discussions (5/5) • 9.3 Coping with Distinguishable Queries • Example • Query: • Response:

  44. 10. Conclusion • No compromise • between accuracy, realtime operation, continuous operation • Entropy monitoring • Priori knowledge of historical mobility patterns cannot track over a significant amount of time • Density simulations • Shows that CacheCloak can work in extremely sparse systems • Anonymization • Doesn’t need suppress some of the user’s location information

More Related