450 likes | 547 Views
Privately Querying Location-based Services with SybilQuery. Pravin Shankar , Vinod Ganapathy, and Liviu Iftode Department of Computer Science Rutgers University { spravin, vinodg, iftode } @ cs.rutgers.edu. Location-based Services (LBSes). How is the traffic in the road ahead?.
E N D
Privately Querying Location-based Serviceswith SybilQuery Pravin Shankar, Vinod Ganapathy, and Liviu Iftode Department of Computer Science Rutgers University { spravin, vinodg, iftode } @ cs.rutgers.edu
Location-based Services (LBSes) How is the traffic in the road ahead? Where is my nearest restaurant? Implicit assumption: • Users agree to reveal their locations for access to services IBM Frontiers of Cloud Computing 2010
Privacy concerns while querying an LBS • With two weeks of GPS data from a user’s car, we can infer home address (median error < 60 m) [Krumm ‘07] • 5% of people are uniquely identified by their home and work locations even if it is known only at the census tract level[Golle and Partridge ‘09] IBM Frontiers of Cloud Computing 2010
Querying an LBS Client Home loc1 LBS loc2 . . . locn Work IBM Frontiers of Cloud Computing 2010
Basic Idea Client Home' Home'' Home loc1, loc1', loc1'' LBS loc2, loc2', loc2'' . . . locn, locn', locn'' Work'' Work' Work IBM Frontiers of Cloud Computing 2010
What the LBS sees Which of these is the real user? IBM Frontiers of Cloud Computing 2010
Outline • Introduction • SybilQuery Overview • Design Challenges • Implementation • Evaluation and Results • Conclusions and Future Work IBM Frontiers of Cloud Computing 2010
SybilQuery Overview • Basic Idea: Achieves privacy using synthetic (Sybil) queries • For each real user trip, the system generates • k-1 Sybil start and end points (termed endpoints) • k-1 Sybil paths • For each real query made, the system generates • k-1 Sybil Queries IBM Frontiers of Cloud Computing 2010
SybilQuery Design IBM Frontiers of Cloud Computing 2010
Outline • Introduction • SybilQuery Overview • Design Challenges • Implementation • Evaluation and Results • Conclusions and Future Work IBM Frontiers of Cloud Computing 2010
SybilQuery Challenges • Endpoint generation: • How to automatically generate synthetic endpoints similar to a pair of real endpoints? • Path generation: • How to choose the waypoints of the Sybil path? • Query generation: • How to simulate motion along the Sybil path? IBM Frontiers of Cloud Computing 2010
Endpoint Generator • Produces synthetic endpoints that resemble the real source and destination • High-level idea: • Tag locations with features • Identify clusters of locations that share similar features • Feature used in SybilQuery: traffic statistics IBM Frontiers of Cloud Computing 2010
Tagging locations with traffic statistics • Naïve approach: Annotate locations with descriptive tags • Eg. “parking lot”, “downtown office building”, “freeway” • Laborious manual task • Our approach: Automatically compute features using a database of regional traffic statistics • Dataset: Month-long GPS traces from the San Francisco Cabspotter project - 530 unique cabs; 529,533 trips • Compute traffic density τl for each location from dataset IBM Frontiers of Cloud Computing 2010
Path Generator • Consults an off-the-shelf navigation service • Our implementation uses Microsoft Multimap API to obtain waypoints • Users may not always follow the shortest path to destination • Detours, road closures, user intention • Computes multiple paths to the destination (with varying lengths) • Uses a probability distribution to choose path IBM Frontiers of Cloud Computing 2010
Query Generator • Triggered each time the user queries the LBS • Simulates the motion of users along the Sybil paths • Uses current traffic conditions to more accurately simulate user movement • Eg. Simulate slower movement if traffic is congested IBM Frontiers of Cloud Computing 2010
Endpoint caching • Attack 1: If a real path P frequented by the user (e.g., commuter paths) is associated with multiple Sybil paths: • P can be statistically identifed as the real path • Attack 2: After arriving at the first destination, when a user travels to a new location shortly : • Since the real paths share an endpoint, they could be identified • Solution: Endpoint caching • For most common trips, Sybil endpoints are cached • If the user makes multiple trips from one common endpoint (e.g., home/office), the corresponding Sybil endpoints are cached • When the user embarks on a multi-destination trip, the endpoint of a trip is the same as the startpoint of the following trip IBM Frontiers of Cloud Computing 2010
Providing path continuity • Attack: If a real trip ends before some Sybil trips end • The system stops sending queries • The LBS can differentiate the real path from Sybil paths • SybilQuery guards against this by being an “always on” tool • continues to simulate movement along Sybil paths even when the user’s real trip is complete IBM Frontiers of Cloud Computing 2010
Outline • Introduction • SybilQuery Overview • Design Challenges • Implementation • Evaluation and Results • Conclusions and Future Work IBM Frontiers of Cloud Computing 2010
SybilQuery Implementation • An interface akin to navigation systems • Input: • The source and destination address for the trip • A security parameter k • Number of Sybil users • Query interface: • Integrated with Yahoo! Local Search IBM Frontiers of Cloud Computing 2010
Outline • Introduction • SybilQuery Overview • Design Challenges • Implementation • Evaluation and Results • Conclusions and Future Work IBM Frontiers of Cloud Computing 2010
Evaluation Goals • Privacy • How indistinguishable are Sybil queries from real queries? • Performance • Can Sybil queries be efficiently generated? IBM Frontiers of Cloud Computing 2010
Evaluation: Privacy • User Study • Give the working system to adversarial users, who would try to break the system by find real user paths hidden between Sybil paths • 15 volunteers • Methodology • Pick real paths from the Cabspotter traces • Use SybilQuery to generate Sybil paths with different values of k IBM Frontiers of Cloud Computing 2010
Results from user study IBM Frontiers of Cloud Computing 2010
User approaches to distinguish queries • Contrasting rationale to guess real users • “Circuitous paths” • “Prominent start/end location” • “Odd man out” IBM Frontiers of Cloud Computing 2010
Evaluation: Performance • Setup: • Server: • 2.33 GHz Core2 Duo, 3 GB RAM, 250 GB SATA (7200 RPM) • Client: • 1.73 GHz Pentium-M laptop, 512 MB RAM, Linux 2.6 • Privacy parameter k = 4 (unless otherwise specified) • Micro-benchmarks • One-time and once-per-trip costs • Query-response latency of SybilQuery • Comparison with Spatial Cloaking for Yahoo! local search IBM Frontiers of Cloud Computing 2010
One-time and once-per-trip costs • One-time cost – preprocessing of traffic database • 2 hours 16 mins (processed 529,533 trips) • Once-per-trip costs – endpoint generation and path generation * Includes network latency to query the Microsoft MultiMap API IBM Frontiers of Cloud Computing 2010
Query-response latency of SybilQuery • Scales linearly with k (number of Sybil users) • Sub-second latency for typical values of k IBM Frontiers of Cloud Computing 2010
Conclusions and Future Work • SybilQuery: Efficient decentralized technique to hide user location from LBSes • Experimental results demonstrate: • Sybil queries can be generated efficiently • Sybil queries resemble real user queries • Future Work • Enhance SybilQuery to achieve stronger privacy guarantees, such as l-diversity, t-closeness and differential privacy IBM Frontiers of Cloud Computing 2010
My research on location in mobile computing • Privacy: Users may not want to reveal their private locations for accessing location-based services.SybilQuery – Ubicomp 2009. • Querying mobile phones for real-time location-based state. SocialTelescope – Internship at IBM, Summer 2010. • Incentives for sharing in social networks – WINE 2009. • Rapid change of client location affects network connectivity and performance. Context-Aware Rate Selection (CARS) – a solution for improving network performance by using client location – ICNP 2008. IBM Frontiers of Cloud Computing 2010
Thank You! Pravin Shankar spravin@cs.rutgers.edu
Related Work • Synthetic Locations for Privacy [Krumm ’09, Kido ‘05] • Spacial Cloaking [Gruteser and Grunwald ’03, and others] • Peer-to-peer Schemes [Chow ’06, Ghinita ‘07] • Private Information Retrieval (PIR) [Ghinita ’08] Detailed list is available in paper IBM Frontiers of Cloud Computing 2010
Spatial Cloaking • Spatial Cloaking – k-anonymity solution that uses anonymizers • Users send their location to anonymizer • Anonymizer computes cloaked region • Region where atleast k users are present client anonymizer server IBM Frontiers of Cloud Computing 2010
Performance Comparison with Spatial Cloaking Response Size as users travel • Cloaked regions grow as users travel • SybilQuery overhead constant IBM Frontiers of Cloud Computing 2010
Prior techniques (1/2) client anonymizer server • Spatial Cloaking • Need for Anonymizer - Trusted Third Party • Single point of failure • Scalability and performance bottleneck IBM Frontiers of Cloud Computing 2010
Prior techniques (2/2) • Peer-to-peer schemes • Rely on participating peers • Private Information Retrieval (PIR) • Computationally inefficient IBM Frontiers of Cloud Computing 2010
Tagging locations with traffic statistics (2/2) • Locations represented as QuadTree • Balances precision with scalability San Francisco Airport. Black blocks have higher densities IBM Frontiers of Cloud Computing 2010
Finding suitable endpoints using reverse geocoding • Real endpoints do not start in non-driveable terrain Reverse Geocoding Random point in geographic location Street address closest to the random point IBM Frontiers of Cloud Computing 2010
Our goals • Performance • Autonomy • Ease of deployment IBM Frontiers of Cloud Computing 2010
Basic design of SybilQuery IBM Frontiers of Cloud Computing 2010
Design enhancements • Endpoint Generator • Endpoint caching • Path Generator • Randomizing path selection • Query Generator • Providing path continuity • Adding GPS sensor noise • Handling active adversaries IBM Frontiers of Cloud Computing 2010
Endpoint caching (1/2) • Attack 1: If a real path P frequented by the user (e.g., commuter paths) is associated with multiple sets of Sybil paths: • P can be statistically identifed as the real path • Attack 2: After arriving at the first destination, when a user travels to a new location shortly : • Since the real paths share an endpoint, they could be distinguished from the Sybil paths IBM Frontiers of Cloud Computing 2010
Endpoint caching (2/2) • Solution: SybilQuery employs three types of caching • For most common trips, Sybil endpoints are cached • If the user makes multiple trips from one common endpoint (e.g., home/office), the corresponding Sybil endpoints are cached • When the user embarks on a multi-destination trip, the start points of the Sybil trips are cached • i.e. the endpoint of a trip is the same as the startpoint of the following trip IBM Frontiers of Cloud Computing 2010
Randomizing path selection • Real users may not always follow the shortest path to destination • Detours, road closures, user intention • Path generator computes multiple paths to the destination (each with varying lengths) • Uses a probability distribution (of the frequency with which users choose paths other than the shortest path) to choose an appropriate path IBM Frontiers of Cloud Computing 2010
Handling active adversaries • An actively adversarial LBS may return doctored query responses to differentiate Sybil paths from a client’s real path • For example, it falsely reports traffic congestion at the query location. • SybilQuery handles active adversaries using N-variant queries to multiple LBSes • Unless all the LBSes collude, the adversarial LBS can be detected IBM Frontiers of Cloud Computing 2010
Implementation • SybilQuery implemented as a Python client • Endpoint generator: • Uses a PostgreSQL database with PostGIS spacial extensions to process regional traffic information • Path generator: • Queries the Microsoft Multimap API for waypoints • Query generator: • Interfaced with Yahoo! Local API to simulate movement under the constraints of current traffic IBM Frontiers of Cloud Computing 2010