550 likes | 697 Views
Simulatability “The enemy knows the system”, Claude Shannon. CompSci 590.03 Instructor: Ashwin Machanavajjhala. Announcements. Please meet with me at least 2 times before you finalize your project (deadline Sep 28). Recap – L-Diversity.
E N D
Simulatability“The enemy knows the system”, Claude Shannon CompSci 590.03Instructor: AshwinMachanavajjhala Lecture 6 : 590.03 Fall 12
Announcements • Please meet with me at least 2 times before you finalize your project (deadline Sep 28). Lecture 6 : 590.03 Fall 12
Recap – L-Diversity • The link between identity and attribute value is the sensitive information. “Does Bob have Cancer? Heart disease? Flu?” “Does Umeko have Cancer? Heart disease? Flu?” • Adversary knows ≤ L-2 negation statements.“Umeko does not have Heart Disease.” • Data Publisher may not know exact adversarial knowledge • Privacy is breached when identity can be linked to attribute value with high probabilityPr[ “Bob has Cancer” | published table, adv. knowledge] > t Lecture 6 : 590.03 Fall 12
Recap – 3-Diverse Table L-Diversity Principle: Every group of tuples with the same Q-ID values has ≥ L distinct sensitive values of roughly equal proportions. Lecture 6 : 590.03 Fall 12
Outline • Simulatable Auditing • Minimality Attack in anonymization • Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12
Query Auditing Database Database has numeric values (say salaries of employees). Database either truthfully answers a question or denies answering. MIN, MAX, SUM queries over subsets of the database. Question: When to allow/deny queries? Query Yes Safe to publish? No Researcher Lecture 6 : 590.03 Fall 12
Why should we deny queries? • Q1: Ben’s sensitive value? • DENY • Q2: Max sensitive value of males? • ANSWER: 2 • Q3: Max sensitive value of 1styear PhD students? • ANSWER: 3 • But Q3 + Q2 => Xi = 3 Lecture 6 : 590.03 Fall 12
Value-Based Auditing • Let a1, a2, …, ak be the answers to previous queries Q1, Q2, …, Qk. • Let ak+1 be the answer to Qk+1. ai = f(ci1x1, ci2x2, …, cinxn), i = 1 … k+1 cim = 1 if Qi depends on xm Check if any xj has a unique solution. Lecture 6 : 590.03 Fall 12
Value-based Auditing • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. x1x2 x3 x4 x5 Lecture 6 : 590.03 Fall 12
Value-based Auditing -∞ ≤ x1 … x5≤ 10 • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 Lecture 6 : 590.03 Fall 12
Value-based Auditing • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. -∞ ≤ x1 … x4 ≤ 8 => x5 = 10 max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12
Value-based Auditing Denial means some value can be compromised! • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12
Value-based Auditing What could max(x1, x2, x3, x4) be? • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12
Value-based Auditing From first answer, max(x1,x2,x3,x4) ≤ 10 • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12
Value-based Auditing If, max(x1,x2,x3,x4) = 10Then, no privacy breach • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12
Value-based Auditing Hence, max(x1,x2,x3,x4) < 10=> x5 = 10! • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12
Value-based Auditing Denials leak information. Attack occurred since privacy analysis didnot assume that attacker knows the algorithm. Hence, max(x1,x2,x3,x4) < 10=> x5 = 10! • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12
SimulatableAuditing [Kenthapadi et al PODS ‘05] • An auditor is simulatableif the decision to deny a query Qk is made based on information already available to the attacker. • Can use queries Q1, Q2, …, Qk and answers a1, a2, …, ak-1 • Cannotuse ak or the actual data to make the decision. • Denials provably do not leak informaiton • Because the attacker could equivalently determine whether the query would be denied. • Attacker can mimic or simulate the auditor. Lecture 6 : 590.03 Fall 12
Simulatable Auditing Algorithm Ans > 10 => not possible • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. Ans = 10 => -∞ ≤ x1 … x4 ≤ 10 SAFE UNSAFE Ans < 10 => x5 = 10 max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Before computing answer DENY Lecture 6 : 590.03 Fall 12
Summary of Simulatable Auditing • Decision to deny answers must be based on past queries answered in some (many!) cases. • Denials can leak information if the adversary does not know all the information that is used to decide whether to deny the query. Lecture 6 : 590.03 Fall 12
Outline • Simulatable Auditing • Minimality Attack in anonymization • Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12
Minimality attack on Generalization algorithms • Algorithms for K-anonymity, L-diversity, T-closeness, etc. try to maximize utility. • Find a minimally generalized table in the lattice that satisfies privacy, and maximizes utility. • But … attacker also knows this algorithm! Lecture 6 : 590.03 Fall 12
Example Minimality attack [Wong et al VLDB07] • Dataset with one quasi-identifier and 2 values q1, q2. • q1, q2 generalize to Q. • Sensitive attribute: Cancer – yes/no • We want to ensure P[Cancer = yes] < ½. • OK to know if an individual does not have Cancer. • Published Table: Lecture 6 : 590.03 Fall 12
Which input datasets could have led to the published table? • Output dataset • {q1,q2} Q • (“2-diverse”) • Possible Input dataset • 3 occurrences of q1 Lecture 6 : 590.03 Fall 12
Which input datasets could have led to the published table? • Output dataset • {q1,q2} Q • (“2-diverse”) • Possible Input dataset • 3 occurrences of q1 This is a better generalization! Lecture 6 : 590.03 Fall 12
Which input datasets could have led to the published table? • Output dataset • {q1,q2} Q • (“2-diverse”) • Possible Input dataset • 1 occurrence of q1 Lecture 6 : 590.03 Fall 12
Which input datasets could have led to the published table? • Output dataset • {q1,q2} Q • (“2-diverse”) • Possible Input dataset • 3 occurrences of q1 This is a better generalization! Lecture 6 : 590.03 Fall 12
Which input datasets could have led to the published table? • Output dataset • {q1,q2} Q • (“2-diverse”) There must be exactly two tuples with q1 • Possible Input dataset • 3 occurrences of q1 Lecture 6 : 590.03 Fall 12
Which input datasets could have led to the published table? • Output dataset • {q1,q2} Q • (“2-diverse”) • Possible Input dataset • 2 occurrences of q1 Already satisfies privacy Lecture 6 : 590.03 Fall 12
Which input datasets could have led to the published table? • Output dataset • {q1,q2} Q • (“2-diverse”) • Possible Input dataset • 2 occurrences of q1 Learning Cancer=NO is OK, Hence, this is private Lecture 6 : 590.03 Fall 12
Which input datasets could have led to the published table? • Output dataset • {q1,q2} Q • (“2-diverse”) • Possible Input dataset • 2 occurrences of q1 P[Cancer = yes | q1] = 1 This is the ONLY input that results in the output! Lecture 6 : 590.03 Fall 12
Outline • Simulatable Auditing • Minimality Attack in anonymization • Transparent Anonymization: Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12
Transparent Anonymization • Assume that the adversary knows the algorithm that is being used. I: All possible input tables O: Output table I(O, A): Input tables that result in O due to algorithm A Lecture 6 : 590.03 Fall 12
Transparent Anonymization • According to I(O, A) privacy must be guaranteed. • Probability must be computed assuming I(O,A) is the actual set of all possible input tables. • What is an efficient algorithm for Transparent Anonymization? • For L-diversity? Lecture 6 : 590.03 Fall 12
Ace Algorithm [Xiao et al TODS’10] Step 1: AssignJust based on the sensitive values, construct (in a randomized fashion) an intermediate L-diverse generation. Step 2: SplitOnly based on the quasi-identifier values (and without looking at sensitive values) , deterministically refine the intermediate solution to maximize utility. Lecture 6 : 590.03 Fall 12
Step 1: Assign • Input Table Lecture 6 : 590.03 Fall 12
Step 1: Assign • St is the set of all tuples (grouped by sensitive value) • Iteratively, • Remove αtuples each from the β (≥L) most frequent sensitive values Lecture 6 : 590.03 Fall 12
Step 1: Assign • St is the set of all tuples (grouped by sensitive value) • Iteratively, • Remove αtuples each from the β(≥L) most frequent sensitive values • 1st iteration β=2, α=2 Lecture 6 : 590.03 Fall 12
Step 1: Assign • St is the set of all tuples (grouped by sensitive value) • Iteratively, • Remove αtuples each from the β (≥L) most frequent sensitive values • 2nd iteration β=2, α=1 Lecture 6 : 590.03 Fall 12
Step 1: Assign • St is the set of all tuples (grouped by sensitive value) • Iteratively, • Remove αtuples each from the β (≥L) most frequent sensitive values • 3rd iteration β=2, α=1 Lecture 6 : 590.03 Fall 12
Intermediate Generalization Lecture 6 : 590.03 Fall 12
Step 2: Split • If a bucket contains α>1 tuples of each sensitive value, split it into two buckets, Ba and Bbs.t., • Pick 1 ≤ αa < αtuples from each sensitive value in bucket B, and put them in bucket Ba. The remaining tuples go to Bb. • The division (Ba, Bb) is optimal in terms of utility. Lecture 6 : 590.03 Fall 12
Why does the Ace algorithm satisfy Transparent L-Diversity? • According to I(O, A) privacy must be guaranteed. • Probability must be computed assuming I(O,A) is the actual set of all possible input tables. O: Output table I: All possible input tables I(O, A): Input tables that result in O due to algorithm A Lecture 6 : 590.03 Fall 12
Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): • Consider an intermediate output Int • Suppose there is some input table T such that Assign(T) = Int • Any other table T’ where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output Int. Lecture 6 : 590.03 Fall 12
Ace algorithm analysis Both tables result in the same intermediate output. Lecture 6 : 590.03 Fall 12
Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): • Consider an intermediate output Int • Suppose there is some input table T such that Assign(T) = Int • Any other table T’, where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output. • The set of input tables I(Int,A) contains all possible assignments of diseases to individuals within each group of Int. Lecture 6 : 590.03 Fall 12
Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): • The set of table I(Int,A) contains all possible assignments of diseases to individuals in each group of Int. • P[Ann has dyspepsia | I(Int,A) and Int] = 1/2 Lecture 6 : 590.03 Fall 12
Ace algorithm analysis Lemma 2:The split phase also satisfies transparent L-diversity. Proof (sketch): • I(Int, Assign) contains all tables where an individual is assigned to an arbitrary sensitive value within the same group in Int. • Suppose some input table T ε I(Int, Assign) results in the final output O after Split. Lecture 6 : 590.03 Fall 12
Ace algorithm analysis • Split does not depend on the sensitive values. Ann Gill Ed Ann Gill Bob Bob Ed Ann Gill Ann Ed Bob Ed Gill Bob dyspepsia flu dyspepsia flu dyspepsia flu dyspepsia flu dyspepsia flu dyspepsia flu results in results in Lecture 6 : 590.03 Fall 12
Ace algorithm analysis Table T Table T’ If T ε I(Int, Assign), and it results in O after split, Then, T’ ε I(Int, Assign), and it results in O after split Lecture 6 : 590.03 Fall 12