140 likes | 252 Views
Lower Bounds on Noise. Sergey Yekhanin Institute for Advanced Study. Setting. Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries
E N D
Lower Bounds on Noise Sergey Yekhanin Institute for Advanced Study
Setting • Database of information about individuals • E.g. Medical history, Census data, Customer info. • Need to guarantee confidentiality of individual entries • Want to make deductions about the database; learn large scale trends. • E.g. Learn that a drug V increases likelihood of heart disease • Do not leak info about individual patients
Message • Two approaches to database privacy: • Interactive: Analyst asks questions; curator returns approximate answers Analyst Curator
Message • Two approaches to database privacy: • Interactive: Analyst asks questions; curator returns approximate answers • Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Analyst Summary Curator
Message • Two approaches to database privacy: • Interactive: Analyst asks questions; curator returns approximate answers • Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers • Thesis: The interactive approach is the right way to give good accuracy for a given level of privacy • Any non-interactive solution permitting “too accurate” answers to “too many” questions leaks private information.
Mathematical model of database and queries Attacks Somewhataccurate answers to all querieslead to privacy leakage. (Fourier analysis) [Y] (extends [DiNi]). Somewhat accurate answers to a fraction of queries lead to privacy leakage. (Linear programming / Polynomial interpolation) [DMT,DY] Study of privacy leads to a variety of mathematical challenges! Plan
[Dinur-Nissim] Simple Model (easily justifiable) Database: n-bit binary vector x Query: vector a True answer: Dot product ax Response is ax + e = True Answer + Noise Privacy Leakage: Attacker learns a certain bit of x. Blatant Non-Privacy: Attacker learns n−o(n) bits of x. Model
Fourier attack Theorem: If a curator adds o(√n) noise to every response; then an attacker can ask n questions, perform O(n log n) computation and recover n-o(n) bits of the database. • Put database records in one-to-one correspondence with elements of a group . • Think of a database as a function D from to {0,1}. • Choose queries to ask for Fourier coefficients of D. • Noisy Fourier coefficients approximately determine the Boolean function D! (Parseval identity).
Theorem: If a curator adds o(√n) noise to 0.773 fraction of responses; then an attacker can ask O(n) questions, perform O(n3) computation and recover n-o(n) bits of the database. Arbitrarily large error on arbitrary and unknown0.239 fraction on answers. Linear programming attack
Ask O(n) random +1/-1 questions Obtain y=Ax+e, where e is the error vector A natural approach to recover x from y: Solve: min |e'|0 such that y=Ax'+e‘, x' in Rn(hard!) Solve a linear program [D, CT, MT]: min |e'|1 such that y=Ax'+e' x' in Rn Linear programming attack Ax' y
Polynomial interpolation attack • Model: Questions have O(c) large coefficients • Theorem: If a curator adds o(c) noise to 0.501 fraction of responses; then an attacker can ask c questions, perform O(c4) computation and reliably recover any particular bit of the database. • Arbitrarily large error on arbitrary and unknown0.499 fraction on answers.
Assume c is prime. Think of the space of queries as a linear space . To obtain a reliable answer to query x = (1,0, … , 0) , draw a degree two curve through x. Ask all queries that correspond to points on the curve. Use polynomial interpolation to carefully combine the answers. Polynomial interpolation attack q1 q5 x q6 q4 q2 q3
Implications • Privacy has a Price • There is no safe way to avoid increasing the noise as the number of queries increases • Applies to Non-Interactive Setting • Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to attack. • Cannot just output a noisy table.
Helps to know what you need • Non-interactive approach has inherent limitations • Interactive approach works • Can also publish a summary, as long as its clear which stats are accurate, and which ones are not. • Future directions: • Fewer queries • Understand what can and what cannot be done privately