290 likes | 378 Views
Privacy Preserving Publication of Moving Object Data. Francesco Bonchi Yahoo! Research Avinguda Diagonal 177, Barcelona, Spain. Joey Lei CS295. Outline. Intro & Background Clustering and Perturbation Techniques Spatio-Temporal Cloaking (Generalization) Techniques Future Research.
E N D
Privacy Preserving Publication of Moving Object Data Francesco Bonchi Yahoo! Research Avinguda Diagonal 177, Barcelona, Spain Joey Lei CS295 CS295 - Privacy and Data Management
Outline • Intro & Background • Clustering and Perturbation Techniques • Spatio-Temporal Cloaking (Generalization) Techniques • Future Research CS295 - Privacy and Data Management
Location Privacy • Growing prevalence of location aware devices • mobile phones and GPS devices • Two Analysis Groups • Online • Real-time monitoring of moving objects and motion patterns • development of location based services (LBS) • Google Maps on the iPhone • Offline • Collection of traces left by moving objects • Offline analysis to extract behavioral knowledge • public transportation CS295 - Privacy and Data Management
Privacy Concerns • Location Data allows for intrusive inferences • Reveals habits • Social customs • Religious and sexual preferences • Unauthorized advertisement • User profiling CS295 - Privacy and Data Management
Offline Analysis • Traffic Management Application • Paths (trajectories) of vehicles with GPS are recorded • Geographic Privacy-aware Knowledge Discovery and Delivery (GeoPKDD) • Traffic data published for the city of Milan (Italy) • Car identifiers were replaced with pseudonyms • Daily Commute Example • Bob’s home and workplace are traceable by location systems (QIDs) • Join data with a telephone directory CS295 - Privacy and Data Management
Definitions • Anonymity Preserving Data Publishing of Moving Objects Databases • How to transform published location data while maintaining utility • Moving Object Database (MOD) • A set of individuals, time points, and trajectories CS295 - Privacy and Data Management
Background: Location Based Services • Ideals • Provide service without learning user’s exact position • Location data is forgotten once service is provided • k-anonymity definition • A response to a request for location data is k-anonymous when it is indistinguishable from the spatial and temporal information of at least k – 1 other responses sent from different users CS295 - Privacy and Data Management
LBS: Location k-Anonymity • Spatial Requirements • Ubiquity – that a user visits at least k regions • Congestion – number of users be at least k • One Way to Achieve This: Mix Zones • An area where LBS providers cannot trace a specific users’ movement • Identity is replaced with pseudonyms • Users entering these zones at the same time are mixed together CS295 - Privacy and Data Management
LBS: Location Based Quasi-Identifier • A spatio-temporal pattern that can uniquely identify one individual • set of spatial areas and time intervals plus a recurrence formula • AreaCondominium [7am, 8am],AreaOfficeBldg [8am, 9am], • AreaOfficeBldg [4pm, 6pm],AreaCondominium[5pm, 7pm] • Recurrence : 3.Weekdays ∗ 2.Weeks CS295 - Privacy and Data Management
LBS: Historical k-Anonymity • In the offline context • A set of requests satisfies historical k-anonymity if there exists k – 1 personal histories of locations (trajectories) belonging to k – 1 different users such that they are location-time consistent (undistinguishable) CS295 - Privacy and Data Management
Outline • Intro & Background • Clustering and Perturbation Techniques • Spatio-Temporal Cloaking (Generalization) Techniques • Conclusions CS295 - Privacy and Data Management
Clustering and Perturbation • C&P ignores the inherent problems with location QIDs: • each individual can have their own QIDs which makes it difficult to create a QID for all individuals • Area(Home,Office,??)[??am- ??pm] • Recurrence : 7.Weekdays ∗ 52.Weeks • Solution: anonymize trajectories instead • Microaggregation / k-member anonymity CS295 - Privacy and Data Management
Clustering and Perturbation • Trajectories are not polylines, but instead a cylindrical volume with radius δ (or uncertainty radius) • If another trajectory moves within the cylinder of the given trajectory, then the two trajectory are indistinguishable from each other ((k, δ)-anonymity set) CS295 - Privacy and Data Management
Clustering and Perturbation • Uncertainty trajectory • Anonymity set for two trajectories CS295 - Privacy and Data Management
Achieving (k, δ)-anonymity • Achieved by Space Translation – slightly moving some observations in space • Step One: cluster trajectories of similar sizes • NWA (Never Walk Alone) • All equivalence classes have the same time span and special timestamp requirements π (ie. π = 60, only full hours, from 1:00PM-2:00PM) CS295 - Privacy and Data Management
Achieving (k, δ)-anonymity • Step Two: perturb trajectories within uncertainty radius δ (i.e. transformation into anonymity set) • Grouping and Reconstruction • Finding the nearest matching points to group • Reconstruct a generalization for utility • Multi TGA and Fast TGA Algorithms CS295 - Privacy and Data Management
Outline • Intro & Background • Clustering and Perturbation Techniques • Spatio-Temporal Cloaking (Generalization) Techniques • Conclusions CS295 - Privacy and Data Management
Trajectory Generalization Anonymization of three trajectories tr1, tr2 and tr3, based on point matching and removal, and spatio-temporal generalization CS295 - Privacy and Data Management
Trajectory Reconstruction Reference: Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. CS295 - Privacy and Data Management
Quasi-identifier Methods • QIDs are a sequence of locations with multiple sensitive values (locations) • values are different from the perspective of each adversary • Yet, must consider linkage attacks from all adversaries CS295 - Privacy and Data Management
Quasi-identifier Methods • Possible Attack • T5 and t5A match! We know that person visited b1 CS295 - Privacy and Data Management
Space Generalization • Each position is an exact point on a grid • Generalizations become rectangles of nearby points. CS295 - Privacy and Data Management
Attack Graph • Privacy Breach on prior example • Definitions • I-Nodes (Individuals) • O-Nodes (Moving Object IDs) CS295 – Data Privacy and Confidentiality
Attack Graph • If I1 is mapped to O2, there is no clear mapping for I2 or I3 • Both I2 and I3 map to O3. • Conclusion • O1 must map to I1 CS295 - Privacy and Data Management
Attack Graph • Shortcomings on basic k-anonymity definition • Standard k-anonymity states there should be at least k paths originating from I (based on grouping). • What if we group O to have at least k paths? CS295 - Privacy and Data Management
Attack Graph • Privacy Breach • Assume I2, O5 are a pair • I1 maps to both O1, O2, but this is impossible! • I5 must map to O5 CS295 - Privacy and Data Management
Final k-Anonymity Definition • Every I-node has degree k or more • The attack graph is symmetric • For edge (Ii, Oj) there is also an edge (Ij,Oi) • 2-anonymous attack graph: CS295 - Privacy and Data Management
Future Research • Ad-Hoc anonymization techniques for intended use of data • Privacy Preserving Data Mining • Focus on the analysis methods instead of the publishing CS295 - Privacy and Data Management
Questions? CS295 - Privacy and Data Management