220 likes | 396 Views
PRIVACY CRITERIA. Roadmap. Privacy in Data mining Mobile privacy ( k-e ) – anonymity ( c-k ) – safety Privacy skyline. Privacy in data mining. Random Perturbation (quantitative data) Given value x , return value x + r , r is a random value from a distribution
E N D
Roadmap • Privacy in Data mining • Mobile privacy • (k-e) – anonymity • (c-k) – safety • Privacy skyline
Privacy in data mining • Random Perturbation (quantitative data) • Given value x, return value x + r, r is a random value from a distribution • Construct decision-tree classifier on perturbed data s.t. accuracy is comparable to classifiers of original data • Randomized Response (categorical data) • Basic idea: disguise data by probabilistically changing the value of sensitive attribute to another value • Distribution of original data can be reconstructed using the disguised data
Roadmap • Privacy in Data mining • Mobile privacy • (k-e) – anonymity • (c-k) – safety • Privacy skyline
Mobile privacy • Spatial cloaking: Cloaked region • Contains location q and at least k-1 other user locations • Circular region of location q • Contains location q and number of dummy locations generated by client • Transformation based matching • Transform region through Hilbert curves by using Hilbert keys • Casper: user registers with (k, Amin) profile • k: user is k-anonymous • Amin : minimum acceptable resolution of the cloaked spatial region
Roadmap • Privacy in Data mining • Mobile privacy • (k-e) – anonymity • (c-k) – safety • Privacy skyline
(k-e) - anonymity • Privacy protection for numerical sensitive attributes • GOAL: group sensitive attribute values s.t. • No less than k distinct values • Range of group larger than threshold e • Permutation-based technique to support aggregate queries • Constructing help table Aggregate Query Answering on Anonymized Tables @ ICDE2007
(k-e) - anonymity Original Table Table after Permutation
(k-e) - anonymity Table after Permutation Help Table
Roadmap • Privacy in Data mining • Mobile privacy • (k-e) – anonymity • (c-k) – safety • Privacy skyline
(c-k) – safety • Goal: • quantify background knowledge k of attacker • maximum disclosure w.r.t. k is less than threshold c • Express background knowledge through a language Worst –Case Background Knowledge for Privacy –Preserving Data Publishing @ ICDE2007
(c-k) – safety • Create buckets , where randomly permute sensitive attribute values within each bucket Original Table Bucketized Table
(c-k) – safety • Bound background knowledge i.e., attacker knows k basic implications • Atom: tp[S] = s, s S, p Person • e.g. tJack[Disease] = flu • Basic implication: • For some m, n and Ai, Biatoms • e.g. tJack[Disease] = flu tCharlie[Disease] = flu • is the language consisting of conjunctions of k basic implications
(c-k) – safety • Find bucketization Bof original table s.t. • B is (c-k) – safe • The maximum disclosure of B w.r.t is less than threshold c
Roadmap • Privacy in Data mining • Mobile privacy • (k-e) – anonymity • (c-k) – safety • Privacy skyline
Privacy skyline • Original data transformed in Generalized or Bucketized data • Quantify external knowledge through skyline for each sensitive value • External knowledge for each individual • Having single sensitive value • Having multiple sensitive values Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge @ VLDB 2007
Privacy skyline • Three types of knowledge (l, k, m) e.g.(2, 3, 1) • l: Knowledge about target individual t • flueTom[S] and cancerTom[S](obtained from Tom.s friend) • k: Knowledge about individuals (u1, ..uk) other than t • flue Bob[S] and flue Cary[S] and cancer Frank[S] (obtained from another hospital) • m: Knowledge about the relationship between t and other individuals (v1, …vm) • AIDS Ann[S] AIDS Tom[S] (because Ann is Tom’s wife)
Privacy skyline • Example: knowledge threshold (1, 5, 2) and confidence c=50% for sensitive value AIDS • Adversary knows l≤1 sensitive values that t does not have • Adversary knows sensitive values of k≤5 others • Adversary knows m≤2 members in t’s same-value family Adversary cannot predict individual t to have AIDS with confidence 50% when the above hold
Privacy skyline • If transformed data D* is safe for (1, 5, 2) it is safe for any (l, k, m) with l≤1, k≤5, m≤2 • i.e., the shaded region
Privacy skyline • Skyline for set of incomparable points • {(1, 1, 5), (1, 3, 4), (1, 5, 2)}
Privacy skyline • Given a skyline {(l1, k1, m1, c1), …,(lr, kr, mr, cr)} • release candidate D* is safe for sensitive value iff , for i =1 to r • max {Pr( t[S] | Lt,(li, ki, mi), D*)} < ci • maximum probability of a sensitive value to be for individual t w.r.t external knowledge and release candidate is below confidence threshold ci
Original Table Generalize Table Bucketized Table