470 likes | 493 Views
Hiding Stars with Fireworks: Location Privacy through Camouflage. From: ACM MobiCom 2009 Authors: J Meyerowitz, R Choudhury. Outlines. Introduction Location-only service structure Limitations of existing work Predictive privacy CacheCloak: prediction engine System evaluation
E N D
Hiding Stars with Fireworks:Location Privacy through Camouflage From: ACM MobiCom2009 Authors: J Meyerowitz, R Choudhury
Outlines • Introduction • Location-only service structure • Limitations of existing work • Predictive privacy • CacheCloak: prediction engine • System evaluation • Results and analysis • Distributed CacheCloak • Discussions • Conclusion
1. Introduction (1/2) • The proliferation of GPS and ubiquitous wireless connectivity, has enabled new location-based services (LBSs) • Example • Location based advertisements • Geolife: a location-based to-do system • Zero-sum game between privacy & functionality • Reduce spatial accuracy, increase delayed time, etc.
1. Introduction (2/2) • Main idea of “CacheCloak” • Where other methods try to obscure the user’s path by hiding parts of it, we obscure the user’s location by surrounding it with other users’ paths • Mechanism
2. Location-only service structure (1/2) • Trusted LBS vs. untrusted LBS • Whether reveal the identity of the user • Reliant on what kind of service (ex: banking application) • Goal • To structure services that only theuser’s location need be sent to the LBS • Example: Geolife • A todo list of “Buy Milk @ Farmer’s Market” • Alert when nearby it
2. Location-only service structure (2/2) • Query frequency and privacy • “Breadcrumbs dropped by Hansel and Gretel” • Infrequently vs. too frequently
3. Limitations of existing work (1/5) • 3.1 K-Anonymity • Definition • Ensure that the user cannot be individually identified from a group of k users • By sending a sufficiently large “k-anonymous region” that encloses k users in space • Alternate formulation: CliqueCloak • Similar but expanding its size in time, by forcing some users to wait until enough anonymization can take place • Flaw • Reduce the quality of the user’s localization in space or time • Not suit for low user density scenarios
3. Limitations of existing work (2/5) • 3.2 Pseudonyms and Mix Zones • Pseudonyms • “Breadcrumbs” – however, frequent updating may expose a pattern of closely spaced queries, allowing one to easily follow the user • A user’s trail may also be revealed if the user sends distinguishable queries to the LBS
Mix Zones Definition: whenever two users occupy the same place at the same time A problem arises from the rarity of space-time intersections Much more common at different times Figure 1: the attacker cannot say whether they turned or went straight 3. Limitations of existing work (3/5)
3. Limitations of existing work (4/5) • 3.3 Path Confusion • Extend mix zones by resolving the same-place same-time problem – by incorporating a delay in the anonymization • t0 < t1 < t0+tdelay • Ex: t0=7:05pm, t1=7:09pm, and tdelay=10 min (so t0+tdelay=15) • Thus, the users’ trail of locations are exposed at 7:15pm, ensuring confusion at the LBS • Problem: similar as CliqueCloak • By the initial introduction of a delay, realtime operation is compromised
3. Limitations of existing work (5/5) • 3.4 Limitations
4. Predictive privacy (1/4) • Main idea • Because eventually passing through the intersection, we can prefetch responses to the user’s continuous location queries • Based on path confusion • Allows us to keep the benefits of accuracy • But without incurring the delay of it
4. Predictive privacy (2/4) • Figure 2: “cache hit” • Flatcylinders: CacheCloak retrieved from the LBS, and cached in its server • Raisedcylinders: The user receives cached responses from CacheCloak for each of its current locations
4. Predictive privacy (3/4) • Figure 3: “cache miss” • CacheCloak makes a prediction that extrapolates the predicted path to existing paths with cached data • Without degrading spatial or temporal accuracy of any single data point that the user sees
4. Predictive privacy (4/4) • Figure 4: View of LBS • The LBS cannot determine what triggered the new set of queries along Main Street • At the ends of the predictions, CacheCloak provide the anonymity
Pixellate Each 10m × 10m “pixel” is assigned an 8 × 8 historical counter matrix C, Make Prediction each element of the matrix cij represents the number of times a user has entered from i and exited toward j 5. CacheCloak: Prediction engine (1/3)
5. CacheCloak: Prediction engine (2/3) • First-order Markov model • (1) A user will exit side j given an entry from side i • (2) A user will exit side j without any knowledge of the entering side • Subsequent pixels will be “colored” in the direction of most likely next pixel max( P(j|i) for j = 1...8 )
5. CacheCloak: Prediction engine (3/3) • Benefits of this Markov model • (1) this mobility predictions are based entirely on previously observed user movements • (2) allows us a very rapid and computationally cheap way to create our predictions, as we are maintaining only one time-step of state throughout the prediction
6. System evaluation (1/5) • 6.1 Simulation • Utilize a trace-based simulation • Repeat to generate the historical data matrix
6. System evaluation (2/5) • 6.2 Attacker Model • The eight elements of p(x, y), pk(x, y) for k = 1, 2, .., 8, represent the probability that a user is occupying pixel (x, y) and entered from side k • ∑k pk(x,y) = 1 • Assumption: the attacker’s historical data is of the same quality as CacheCloak’s itself
6. System evaluation (3/5) • 6.2 Attacker Model • Update of Attacker’s Model.
6. System evaluation (4/5) • 6.2 Attacker Model • An example: • Start from (x0,y0) • Only two direction: k and inv(k) • So, P( k0|inv(k0) ) = P( inv(k0)|k0 ) = 0.5; otherwise P(j|i) = 0 • After the first time-step of diffusion • ∑k pk(x0,y0) = 0, but ∑k pk = 0.5 at (x0+dxk0 , y0+dyk0) and (x0+dxinv(k0), y0+dyinv(k0)) • That is, the attacker is simulating a 50/50 chance that the user went left or right out of a driveway
6. System evaluation (5/5) • 6.3 Privacy Metrics • The next step past the diffusive attacker model is creating a quantitative measure of privacy • Location entropy • For the probability ∑kPk(x,y) = P(x,y) , define as the number of bits S = -∑P(x,y)log2(P(x,y)) 2S/(-∑P(x,y)) = P(x,y) 2 ∑P(x,y)/S = P(x,y) • Ex: two notes P1 & P2: if ∑kPk(x1,y1)=p1=0.5 , P(x2,y2)=0.1, P(x2+1,y2)=0.1, .., P(x2+3,y2+1)=0.1 while S=2.1, in actually only 1 bit of entropy • 2s location S bits of entropy, but the inverse isn’t always true
7. Results and analysis (1/11) • About this section • We will show the evidence for strong anonymization • The average user’s trip from location to location takes approximately 10 minutes (in Gruteser & Hoh’s)
7.1 Evaluation Simulate: over 19 computers in parallel Density: 1 user/3.6km2 ~ 1 user/0.72km2 Total: 2,850 diffusive measurements Result: 95% performed and 5% culled Figure 6: Distribution of users Vs. distance traversed in 20 minutes. Most users traversed around 1 to 2 miles (2 ~ 3.5 km) 7. Results and analysis (2/11)
7.2 Mean and Minimum Entropy Discuss three cases: mean, worst, and best Time evolution & Entropy (bits) However, it will be valuable to see more about worst-case Figure 7: Mean entropy over time for different user densities 7. Results and analysis (3/11)
Figure 8: Worst-case entropy over time for different userdensities 7. Results and analysis (4/11) • Figure 9: Best-case entropy over time for different userdensities
7.3 Density Variations Show its robust even in extreme low densities vs. K-anonymity: significant degradation of accuracy vs. Path confusion: also have very few path intersections to work with Figure 10:Entropy after 10 minutes of attempted tracking for different user densities 7. Results and analysis (5/11)
7.4 Typical Case Show the effect of density on entropy increases with time Figure 11: One arbitrary user’s entropy over time in different density conditions 7. Results and analysis (6/11)
7.4 Typical Case Branch at intersection vs. paths converge Increase vs. decrease of entropy Figure 12: Three arbitrary users’ entropies over time (n = 50) 7. Results and analysis (7/11)
7.4 Typical Case Intuitively, one’s ability to track a user cannot become worse Figure 13: The time evolution of a random user’s entropy over 30 minutes. 7. Results and analysis (8/11)
7.5 Peak Counting Peak: the locations the attacker is most certain a user has gone Increase: as time progresses, the number of locations a user might have traveled increased Drop: as more and more possible locations must be considered Figure 14: Average number of locations a user might be according to a 0.05 threshold 7. Results and analysis (9/11)
Figure 15: Average number of locations a user might be according to a 0.1 threshold. 7. Results and analysis (10/11) • Figure 16: Average number of locations a user might be according to a 0.2 threshold.
The number of peaks for a given threshold decreases with increasing users Showing that more users offer greater opportunity to hide. Figure 17: Variation of number of peaks left after 10 minutes at different densities and thresholds 7. Results and analysis (11/11)
8. Distributed CacheCloak (1/4) • “What if the users don’t wish to trust CacheCloak?” • for previous mechanism is based on trust server • Distributed vs. Centralized
Centralized form LBS responses for the entire path is cached And the responses forwarded according to the current (x,y) coordinate of the mobile user Figure 18: Mobility prediction based on historical mobility patterns and the bit-mask of caches 8. Distributed CacheCloak (2/4)
Distributed form Mobile: caches the path-wide responses Server: only necessary to maintain the global bit-mask & history matrix from all users in the system Figure 19: The historical patterns and the global bit-mask is periodically shared with the mobile device 8. Distributed CacheCloak (3/4)
8. Distributed CacheCloak (4/4) • Advantage • Distributed CacheCloak receives no more information about a user’s location than the untrusted LBS does • Flaw • Computation burden on the mobile device and a greater bandwidth burden on the last-hop wireless connection
9. Discussions (1/5) • 9.1 Pedestrian Users • Difficulty • Get enough historical mobility data to bootstrap the prediction system • One solution • From realistic source-destination pairs on Google Maps, and parse them to get the mobility patterns (plan to investigate in the future)
9. Discussions (2/5) • 9.2 Bootstrapping CacheCloak • Difficulty • An early adopter with zero users for a new LBS • Privacy for the few first users & gaining the critical mass of users for the system • Solution • However, CacheCloak has been shown to work well with very sparse populations • And it can be used initially with simulation-based historical data
9. Discussions (3/5) • 9.3 Coping with Distinguishable Queries • Problem • The LBS must be sent indistinguishable queries from different users (for more kinds of service) • Solution of CacheCloak • Not require an instantaneous response by the LBS • And the only content of the query is the location of the user, and the direction of travel
9. Discussions (4/5) • 9.3 Coping with Distinguishable Queries • Consideration • Query: • Response: • While R1, R2 ⊆ Rg, and CacheCloak generalizes query Qg
9. Discussions (5/5) • 9.3 Coping with Distinguishable Queries • Example • Query: • Response:
10. Conclusion • No compromise • between accuracy, realtime operation, continuous operation • Entropy monitoring • Priori knowledge of historical mobility patterns cannot track over a significant amount of time • Density simulations • Shows that CacheCloak can work in extremely sparse systems • Anonymization • Doesn’t need suppress some of the user’s location information