200 likes | 216 Views
Online Auditing - How may Auditors Inadvertently Compromise Your Privacy. Kobbi Nissim Microsoft. With Nina Mishra HP/Stanford Work in progress. q = (f ,i 1 ,…,i k ). f (d i1 ,…,d ik ). The Setting. Statistical database. Dataset: d={d 1 ,…,d n } Entries d i : Real, Integer, Boolean
E N D
Online Auditing - How may Auditors Inadvertently Compromise Your Privacy Kobbi Nissim Microsoft With Nina MishraHP/Stanford Work in progress
q = (f ,i1,…,ik) f (di1,…,dik) The Setting Statisticaldatabase • Dataset: d={d1,…,dn} • Entries di: Real, Integer, Boolean • Query: q = (f ,i1,…,ik) • f : Min, Max, Median, Sum, Average, Count… • Bad users will try to breach the privacy of individuals
The Data Privacy Game: an Information-Privacy Tradeoff f i f f • Private functions: • Want to hide i(d)=di • Information functions: • Want to reveal query answers f(di1,…,dik) • Major question: what may be computed over d (and given to users) without breaching privacy? • Confidentiality control methods • Perturbation methods: give `noisy’ answers • Query restriction methods: limit the queries users may post, usually imposing some structure (e.g. size/overlap restrictions)
Auditing • [AW89] classify auditing as a query restriction method: • “Auditing of an SDB involves keeping up-to-date logs of all queries made by each user (not the data involved) and constantly checking for possible compromise whenever a new query is issued” • Partial motivation:May allow for more queries to be posed, if no privacy threat occurs • Early work: Hofmann 1977, Schlorer 1976, Chin, Ozsoyoglu 1981, 1986 • Recent interest:Kleinberg, Papadimitriou, Raghavan 2000, Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin 2003
Statisticaldatabase Auditor Auditing Here’s the answer OR Query denied (as the answer would cause privacy loss) Here’s a new query: qi+1 Query log q1,…,qi
Design choices in Prior Work (1) • Privacy definition: • Privacy breached (only) when a database entry may be deduced fully, or within some accuracy • These privacy guarantees do not generally suffice: • Should take into account: Adversary’s computational power, prior knowledge, access to other databases… • Exact answers given • Auditors viewed as a way to give `quality’ answers???
Design choices in Prior Work (2) 3. Which information is taken into account in the auditor decision procedure: • Decision made based on queries q1,…,qi,qi+1and their answers a1,…,ai,ai+1 • Denials ignored 4. Offline vs. Online: • Offline auditing: queries and answers checked for compromise at the end of the day • Only detect breaches • Online auditing: answer/deny queries on the fly • Prevent breaches just before they happen
Auditor Example 1: Sum/Max auditing • di real, sum/max queries, privacy breached if some di learned q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) Denied (the answer would cause privacy loss) Oh well…
Data Queries Breach Complexity Sum/Max [Chin] real Sum/max di learned NP-hard Boolean [KPR00] 0/1 Sum --”-- NP-hard* Max [KPR00] Real Max --”-- PTIME Interval based [LWWJ02] di[a,b] sum di within accuracy PTIME Generalized results [JK03] NP-hard /PTIME Some Prior Work on Auditors * Approx version in PTIME Can we use the offline version for online auditing?
Auditor … After Two Minutes … • di real, sum/max queries, privacy breached if some di learned q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) Denied (the answer would cause privacy loss) There must be a reason for the denial… q2 is denied iff d1=d2=d3 = 5 I win! Oh well…
Auditor Example 2: Interval Based Auditing • di [0,100], sum queries, =1 (PTIME) q1 = sum(d1,d2) Sorry, denied q2 = sum(d2,d3) sum(d2,d3) = 50 d1,d2 [0,1] d3 [49,50] Denial d1,d2[0,1] or [99,100]
Colonel Oliver North, on the Iran-Contra Arms Deal: On the advice of my counsel I respectfully and regretfully decline to answer the question based on my constitutional rights. • David Duncan, Former auditor for Enron and partner in Andersen: Mr. Chairman, I would like to answer the committee's questions, but on the advice of my counsel I respectfully decline to answer the question based on the protection afforded me under the Constitution of the United States. Sounds Familiar?
dn-1 dn d2 d4 d6 d1 d5 d7 d8 … d3 q2 = max(d1,d2,d3) q2 = max(d1,d2) Auditor Max Auditing • di real q1 = max(d1,d2,d3,d4) M1234 M123 / denied If denied: d4=M1234 M12 / denied If denied: d3=M123
Auditor Adversary’s Success q1 = max(d1,d2,d3,d4) If denied: d4=M1234 q2 = max(d1,d2,d3) Denied with probability 1/4 q2 = max(d1,d2) If denied: d3=M123 Denied with probability 1/3 Success probability: 1/4 + (1- 1/4)·1/3 = 1/2 Recover 1/8 of the database!
d2 dn-1 dn … d8 d7 d5 d3 d6 d1 d4 q1 = sum(d1,d2) q2=sum(d2,d3) q2=sum(di,dj,dk) Auditor Boolean Auditing? • di Boolean 1 / denied 1 / denied … qi denied iff di = di+1 learn database/complement Let di,dj,dk not all equal, where qi-1, qi,qj-1, qj, qk-1, qk all denied 1 / 2 Recover the entire database!
Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi, a1,…,ai) qi+1 denied Two Problems • Obvious problem: denied queries ignored • Algorithmic problem: not clear how to incorporate denials in the decision • Subtle problem: • Query denials leak (potentially sensitive) information • Users cannot decide denials by themselves
“Safe” “Unsafe” “Safe” q1,…,qi, qi+1 a1,…,ai q1,…,qi, qi+1 a1,…,ai, ai+1 q1,…,qi, qi+1 A Spectrum of Auditors Size overlap restriction Algebraic structure > privacy < utility *Note: can work in “unsafe” region, but need to prove denials do not leak crucial information
q1,…,qi Statisticaldatabase q1,…,qia1,…,ai qi+1 qi+1 Simulator Auditor Deny/answer Deny/answer Simulatable Auditing* An auditor is simulatable if a simulator exists s.t.: Simulation denials do not leak information * `self auditors’ in [DN03]
Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi, a1,…,ai ) qi+1 denied/allowed Why Simulatable Auditors do not Leak Information?
Summary • Improper usage of auditors may lead to privacy breaches, due to information leakage in the decision procedure. • Cell suppression / some k-anonymity methods should be checked similarly • Should make sure offline auditors do not leak information in decision • Simulatable auditors provably don’t leak information • Give best utility while still “safe” • A launching point for further research on auditors • Further research: • Auditors with more reasonable privacy guarantees