350 likes | 484 Views
k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure. Wen Ming Liu 1 , Lingyu Wang 1 , and Lei Zhang 2 1 Concordia University 2 George Mason University ICDT 2010. March 23 , 2010. CIISE / CSIS. Agenda. Background. K-Jump Strategy. Data Utility Comparison.
E N D
k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure Wen Ming Liu1, Lingyu Wang1, and Lei Zhang2 1 Concordia University 2 George Mason University ICDT2010 March 23 , 2010 CIISE / CSIS
Agenda • Background • K-Jump Strategy • Data Utility Comparison • Conclusion
Agenda • Background • Example • Algorithm anaive and asafe • K-JumpStrategy • Data Utility Comparison • Conclusion
Example Data Holder’s View
Data Holder Example – Data Holder’s View Goal: Release table to satisfy 2-diversity generalization generalization Goal: Release table to satisfy 2-diversity Released! 2-diversity? 2-diversity? generalization algorithm: considering generalization function g1 and then g2 in order Released! • Name: identifier. • DoB: quasi-identifier. • Condition: sensitive attribute.
Example (cont.) Adversary’s View
Example (cont.) – Adversary’s View Goal: Guess what is the micro-data • Attacker knows: • generalization • public knowledge • privacy property Adversary What can adversary infer? The three persons in each group may have the three conditions in any given order. permutation set • Name: identifier. • DoB: quasi-identifier. • Condition: sensitive attribute.
Example (cont.) This would be the adversary’s best guesses of the micro-data table, if the released generalization is his/her only knowledge,However … permutation set
Example (cont.) – Adversary Simulating the Algorithm However, adversary also knows the generalization algorithm, andcan simulate the algorithm to further exclude some invalid guesses.
Example (cont.) – Adversary Simulating the Algorithm Simulating the algorithm Violate privacy! Satisfyprivacy! Mental image Is this the valid guess of the micro-data table? Let’s try to check it using the algorithm! disclosure set permutation set
Decision Process of Safe and Unsafe Algorithms Most existing generalization algorithms (without considering this problem): g1(t0) g2(t0) gi(t0) gn(t0) Evaluate the permutation set. (Adversary’s mental image of the micro-data table without the knowledge about the algorithm) Y Y Y Y t0 N N N N ... ... per1 per2 peri pern g1 g2 gi gn anaive Safe generalization algorithms (Zhang’07ccs, ….) g1(t0) g2(t0) gi(t0) gn(t0) Evaluate the disclosure set, instead. (Adversary’s mental image of the micro-data table after simulating the algorithm) Y Y Y Y t0 N N N N ... ... ds1 ds2 dsi dsn per1 per2 peri pern g1 g2 gi gn asafe • box: the ith iteration • diamond: • an evaluation of the privacy property • per: permutation set • ds: disclosure set evaluation path
Agenda • Background • K-Jump Strategy • The Algorithm Family ajump( k ) • Properties of ajump( k ) • DataUtilityComparison • Conclusion
The Algorithm Family ajump(k) g1(t0) g2(t0) g2+k(t0) gn(t0) Y Y Y Y N N N ds1 ds2 ds2+k dsn Y Y Y Y t0 N N N N ... ... per2+k per1 per2 pern g1 g2 g2+k gn ajump(k) • naive strategy : evaluate privacy property on permutation set only • safe strategy : evaluate privacy property on disclosure set directly • k-jump strategy: penalize by jumping over the next k-1 iterations naive strategy: efficient but unsafe safe strategy : safe but costly
Properties of ajump(k) g1(t0) g2(t0) g2+k(t0) gn(t0) Y Y Y Y N N N ds1 ds2 ds2+k dsn Y Y Y Y t0 N N N N ... ... per2+k per1 per2 pern g1 g2 g2+k gn ajump(k) • Computation of the disclosure set • asafe: to compute ds(gi(t0)), must first compute ds(gj(t)) for all t in per(gi(t0)) and j=1,2, … ,i-1 • ajump: to compute ds(gi(t0)) (2<i<2+k), no longer need to compute ds(g2(t)) for all t in per(gi(t0)) • ds(g1(t0)) and ds(g2(t0)) • ds(g1(t0)) = per(g1(t0)) • ds(g2(t0)) is independent of the distance vector. • Size of the family • There are (n-1)! different jump distance vectors.
Agenda • Background • K-JumpStrategy • Data Utility Comparison • Construction for Theorem 1: • 1-jump and i-jump (1<i) incomparable • Construction for Theorem 2: • i-jump and j-jump (1<i<j) incomparable • Construction for Theorem 3: • K1-jump and K2-jump (K1,K2: vector) incomparable • Construction for proposition 2: • Reusing generalization functions • Results onasafe and ajump(1) • Conclusion
Construction for Theorem1:1-jump and i-jump (1<i) incomparable • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 Belongs to one of the four disjoint sets. privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2
Construction for Theorem1(cont.) : 1-jump and i-jump (1<i) • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 • Considering generalizing these tables using g2 2 S2, S3, S4 cannot be disclosed under g2. privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2
Construction for Theorem1(cont.):1-jump and i-jump (1<i) • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 • Considering generalizing these tables using g2 2 a. Subsets in S1 which with both N and O have C7, C8, or C9 cannot be disclosed under g2. privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2
Construction for Theorem1(cont.):1-jump and i-jump (1<i) • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 • Considering generalizing these tables using g2 2 b. For ajump(i),all tables in S1\S1’ will be excluded from ds3i(t0). privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2 Satisfied!
Construction for Theorem1(cont.):1-jump and i-jump (1<i) • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 • Considering generalizing these tables using g2 2 c. For ajump(1),the disclosure set of all tables in S1\S1’ under g2 do not satisfy the privacy property. privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2 Violated! • The ratio of I being associated with C6 is 5/9.
Construction for Theorem2: i-jump and j-jump (1<i<j) incomparable Show the evaluation paths by figures.
Construction for Theorem2(cont.) : i-jump and j-jump (1<i<j) • The case where i-jump has better utility than j-jump is relatively easier to construct. We only show the construction for the other case. • For this construction, generalization gj+2 will be released for j-jump, while gj+i+1 or after will be released for i-jump.
Construction for Theorem3: • K1-jump and K2-jump (K1,K2:vectors) incomparable
Construction for proposition2: Reusing generalization functions Without reusing g2: • The table will lead to disclosing nothing! Belongs to one of the three disjoint sets. Cannot be disclosed under g1(.) or g3(.) . 1 • the jump distance is 1; • the privacy property: • highest ratio of a sensitive value in a group must be no greater than ½. To compute ds2: 2 Violated!
Construction for proposition2(cont.):Reusing generalization functions g2 is reused as g2’: • To calculate ds2’, the tables can be disclosed under g1, g2, and g3 must be excluded from per2’ S1,S2, and S3 cannot be disclosed under g2, as mentioned above. 1 • the jump distance is 1; • the privacy property: • highest ratio of a sensitive value in a group must be no greater than ½. S2 and S3 cannot be disclosed under g3. 2
Construction for proposition2(cont.):Reusing generalization functions g2 is reused as g2’: • To caculate ds2’, the tables can be disclosed under g1, g2, and g3 must be excluded from per2’ S1,S2, and S3 cannot be disclosed under g2, as mentioned above. 1 a. S12 and S13 cannot be disclosed under g3. • the jump distance is 1; • the privacy property: • highest ratio of a sensitive value in a group must be no greater than ½. S2 and S3 cannot be disclosed under g3. 2 S1 can be further divided into three disjoint subsets 3
Construction for proposition2(cont.):Reusing generalization functions • To compute ds3(t0 in S11): g2 is reused as g2’: Excluding any table t for which p(per1(t))=true A • To caculate ds2’, the tables can be disclosed under g1, g2, and g3 must be excluded from per2’ These subsets cannot be disclosed under g2. Belongs to one of the two disjoint sets (nor under g2). B one instance S1,S2, and S3 cannot be disclosed under g2, as mentioned above. 1 b. The tables in subset S11can be disclosed under g3. S2 and S3 cannot be disclosed under g3. 2 S1 can be further divided into three disjoint subsets 3
Construction for proposition2(cont.):Reusing generalization functions g2 is reused as g2’: • The ratio of D and E being associated with C3 are 0.5, which is the highest ratio. • the jump distance is 1; • the privacy property: • highest ratio of a sensitive value in a group must be no greater than ½. Satisfied!
Results on asafe and ajump(1) • When the privacy property is: • either set-monotonic • or based on the highest ratio of sensitive values • Lemma 3: • p(per(t0))=false p(any of its subsets)=false • Corollary 1: • The algorithm asafe has the same data utility as ajump(1) 2. When the privacy property is other cases: • Lemma 4: • The ds3 under asafeis a subset of that under ajump(1) • Theorem 5: • The data utility of asafe and ajump(1) is generally incomparable.
Agenda • Background • K-JumpStrategy • DataUtilityComparison • Conclusion
Conclusion • We have proposed a novel k-jump strategy for micro-data disclosure. • Transform a given generalization algorithm into a large number of safe algorithms. • Show the data utility is generally incomparable by constructing counter-examples. • Practical impact: make a secret choice.
Further Result and Future Work • Further Results in the extended version of this paper: • Computational complexity: • Making a secret choice among unsafe algorithms does not yield a safe solution. • Future studies: • Study more efficient safe algorithms. • Employ statistical methods to compare different k-jump algorithms.. • Further investigate the opportunity in reusing generalization functions.
Data Holder Example – Data Holder View Goal: Release table to satisfy 2-diversity generalization generalization Goal: Release table to satisfy 2-diversity 2-diversity? 2-diversity? generalization algorithm: considering generalization function g1 and then g2 in order • Name: identifier. • DoB: quasi-identifier. • Condition: sensitive attribute.
Toy Example • Attacker knows: • generalization • external data • privacy property Data Holder generalized Attacker 2-diversity • Name: identifier. • DoB: quasi-identifier. • Condition: sensitive attribute. What can attacker infer? permutation set