160 likes | 294 Views
Privacy Preserving Data Mining on Moving Object Trajectories. Gy ő z ő Gid ó falvi Geomatic ApS Center for Geoinformatik Xuegang Harry Huang Torben Bach Pedersen Aalborg University. Outline. Need for location privacy Quantifying location privacy
E N D
Privacy Preserving Data Mining on Moving Object Trajectories Győző Gidófalvi Geomatic ApS Center for Geoinformatik Xuegang Harry Huang Torben Bach Pedersen Aalborg University
Outline • Need for location privacy • Quantifying location privacy • Existing methods / systems for location privacy • Undesired loss of privacy • Privacy-preserving, grid-based framework for collecting and mining location data • System architecture • Anonymization / partitioning policies • Mining of anonymized trajectories • Experiments and results • Conclusions MDM, Mannheim 2007
Need for Location Privacy • “New technologies can pinpoint your location at any • time and place. They promise safety and convenience • but threaten privacy and security” • Cover story, IEEE Spectrum, July 2003 MDM, Mannheim 2007
Location-Based Services • With all its privacy threats, why do users still use location-detection devices? • Wide spread of LBSs • Location-based store finders • Location-based traffic reports • Location-based advertisements • LBSs rely on the implicit assumption that users agree on revealing their private user locations • LBSs trade their services with privacy • LBSs make heavy use of context, including patterns in the location data, which is extracted using data mining methods. MDM, Mannheim 2007
Quantifying Location Privacy • User perception of privacy: • “One should only be able to infer a region R that I am in” • Things to consider: external spatio-temporal knowledge • Location and movement of objects is limited in space and time • Road networks and other geographical constraints • Location of businesses • Opening hours • Locations of other moving objects in a given region • Common measures for location privacy: • The area of R >= A_min • k-anonymity: there should be at least (k-1) other moving objects in R MDM, Mannheim 2007
Existing Methods / Systems for Location Privacy • Considering k-anonymity only: • Adaptive-Interval Cloaking Algorithm (MobiSys’03) • Divide the entire system area into quadrants of equal size, until the quadrant includes the user and k-1 other users • Clique-Cloak Algorithm (ICDCS’05) • A clique graph is constructed to search for a minimum bounding rectangle that includes the user and k-1 other users • Considering both k-anonymity and minimum area • CASPER: Adaptive Location Anonymizer + Privacy-Aware Query Processor (VLDB’06) • Grid-based pyramid structure to put the exact location of mobile users into cloaking rectangles made of grid cells MDM, Mannheim 2007
Problems with Existing Systems / Methods • Requires trusted middleware • Providing k-anonymity in environments with untrusted components is unknown and likely to be computationally prohibitive. • Notion of location privacy guaranteed by k-anonymity may not be satisfactory (large number of users in a small area where they do not want to be observed). • Non-deterministically or probabilistically reporting different sized and/or positioned cloacking rectangles for the same location sacrifices location privacy. • Traditional mining methods cannot be easily / efficiently extended to the anonymized location data. MDM, Mannheim 2007
Undesired Loss of Privacy: Example • By studying the “anonymized” locations of a single user, • one can conclude and quantify the likelihood that the • user is in the intersection of the cloaked rectangles • returned on subsequent visits. MDM, Mannheim 2007
Privacy-preserving, grid-based framework for collecting and mining location data • Goals: • Avoid privacy loss • Avoid using trusted middleware component • Allow users to specify desired level of privacy • Obtain detailed and accurate patterns from anonymized data • Approach: • Deterministic mapping from exact locations to anonymization rectangles based on a single predefined 2D grid and user privacy settings • Smart clients are responsible for constructing anonymization rectangles by partitioning the space based on the 2D grid • Anonymization rectangles are constructed so that one can only infer with probability <= maxLocProb which grid cell the user is in • Mine probabilistic patterns from probabilistic location data MDM, Mannheim 2007
System Architecture MDM, Mannheim 2007
Home Office Partitioning Policies • CRP: Common Regular Partitioning • All users use a common regular partitioning • Guarantees the same, in-space uniform privacy level for all users • IRP: Individual Regular Partitioning • Every user uses his/her own regular partitioning • Guarantees in-space uniform, individual privacy levels to users • IIP: Individual Irregular Partitioning • Every user specifies a few private locations with corresponding desired levels of privacy, and constructs a partitioning that meets these requirements • Guarantees in-space non-uniform, individual privacy levels to users MDM, Mannheim 2007
Mining of Anonymized Trajectories • Probabilistic data: • A location of an object is reported in terms of an anonymization rectangles (i.e., a set of (n) grid cells) • The object is inside grid cell c_i with location probability c_i.prob (=1/n) • Probabilistic Dense Spatio-Temporal Area Query: • During a time period find all grid cells where • the maximum number of objects is >= min_count and • the average location probability of the cell is >= min_prob • Evaluation of query results: • Set of true patterns D • False negative (N), is the error of not finding a pattern that does exist in the data FNR = |N| / |D| • False positive (P), is the error of finding a “pattern” that does not exist in the data FPR = |P| / |D| MDM, Mannheim 2007
Experiment Setup • DATA: 600-3000 trajectories from Brinkhoff’s network-based generator of moving objects • 4 types of partitioning based on the 3 policies: • 2x2 CRP and 4x4 CRP • IRP (at most 4x4 grid cells / partition) • IIR (at most 4x4 grid cells at the start and end of trajectories otherwise 1x1) • Evaluation of mining accuracy (FPR & FNR) under varying: • Parameter settings for min_count and min_prob • Grid size • Time span • Number of trajectories MDM, Mannheim 2007
Results • Few false negatives • The number of false positives increases with min_count and grid size, and decreases with time span and the number of trajectories • For each policy, min_prob can be tuned to find an optimal situation (FP vs. FN) MDM, Mannheim 2007
Conclusions • Non-deterministic generalization can lead to privacy loss • Presented a grid-based framework for anonymized data collection and mining that: • Does not require a trusted middleware • Maps exact locations to anonymization rectangles in a deterministic way according to one of three anonymization policies: CRP, IRP, IIP • Avoids privacy loss • Allows users to specify individual desired privacy levels • Allows to extend traditional data mining methods and discover accurate patters while meeting the privacy requirements • Is computationally effective and conceptually simple MDM, Mannheim 2007
Thank you for your attention! MDM, Mannheim 2007