1 / 22

The Price of Privacy and the Limits of LP decoding

The Price of Privacy and the Limits of LP decoding. Kunal Talwar MSR SVC. [ Dwork, McSherry, Talwar, STOC 2007 ]. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A. Teaser. Compressed Sensing: If x 2 R N is k -sparse

odin
Download Presentation

The Price of Privacy and the Limits of LP decoding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Price of Privacy and the Limits of LP decoding Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA

  2. Teaser Compressed Sensing: If x2RN is k-sparse Take M ~Ck log N/k random Gaussian measurements Then L1 minimization recovers x. For what k does this make sense (i.e M < N)? How small can C be?

  3. Outline • Privacy motivation • Coding setting • Results • Proof Sketch

  4. Setting • Database of information about individuals • E.g. Medical history, Census data, Customer info. • Need to guarantee confidentiality of individual entries • Want to make deductions about the database; learn large scale trends. • E.g. Learn that drug V increases likelihood of heart disease • Do not leak info about individual patients Analyst Curator

  5. Dinur and Nissim [2003] • Simple Model (easily justifiable) • Database: n-bit binary vector x • Query: vector a • True answer: Dot product ax • Response is ax+e=True Answer + Noise • Blatant Non-Privacy: Attacker learnsn−o(n)bits of x. • Theorem:If all responses are within o(√n)of the true answer, then the algorithm is blatantly non-private even against a polynomial time adversary asking O(nlog2n)random questions.

  6. Implications Privacy has a Price • There is no safe way to avoid increasing the noise as the number of queries increases Applies to Non-Interactive Setting • Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to the DiNi attack. This work : what if most responses have small error, but some can be arbitrarily off?

  7. Error correcting codes: Model • Real vector x2Rn • Matrix A2Rmxnwith i.i.d. Gaussian entries • Transmit codeword Ax 2 Rm • Channel corrupts message. Receive y=Ax +e • Decoder must reconstruct x, assuming e has small support • small support: at most mentries of e are non-zero. Encoder Decoder Channel

  8. The Decoding problem min support(e') such that y=Ax'+e' x'2Rn solving this would give the original message x. min |e'|1 such that y=Ax'+e' x'2Rn this is a linear program; solvable in poly time.

  9. LP decoding works • Theorem [Donoho/ Candes-Rudelson-Tao-Vershynin] For an error rate  < 1/2000, LP decoding succeeds in recovering x(for m=4n). • This talk: How large an error rate can LP decoding tolerate?

  10. Results • Let * = 0.2390318914495168038956510438285657… • Theorem 1: For any <*, there exists csuch that if Ahas i.i.d. Gaussian entries, and if • Ahas m = cnrows • For k=m, every support k vector eksatisfies|e–ek| <  then LP decoding reconstructs x’where |x’-x|2is O(∕ √n). • Theorem 2: For any >*, LP decoding can be made to fail, even if mgrows arbitrarily.

  11. Results • In the privacy setting: Suppose, for <*, the curator • answers (1- ) fraction of questions within error o(√n) • answers fraction of the questions arbitrarily. Then the curator is blatantly non-private. • Theorem 3: Similar LP decoding results hold when the entries of A are randomly chosen from §1. • Attack works in non-interactive setting as well. • Also leads to error correcting codes over finite alphabets.

  12. In Compressed sensing lingo • Theorem 1: For any <*, there exists csuch that if Bhas i.i.d. Gaussian entries, and if • Bhas M = (1 – c) Nrows • For k=m, for any vector x2RN then given Ax, LP decoding reconstructs x’where

  13. Rest of Talk • Let * = 0.2390318914495168038956510438285657… • Theorem 1 (=0): For any <*, there exists c such that if Ahas i.i.d. Gaussian entries with m=cn rows, and if the error vector e has support at most m, then LP decoding accurately reconstructs x. • Proof sketch…

  14. Scale and translation invariance • LP decoding is scale and translation invariant • Thus, without loss of generality, transmit x = 0 • Thus receive y = Ax+e = e • If reconstruct z  0, then |z|2= 1 • Call such a zbad for A. Ax’ y Ax

  15. Proof Outline Proof: • Any fixed z is very unlikely to be bad for A: Pr[z bad] · exp(-cm) • Net argument to extend to Rn: Pr[9 bad z] · exp(-c’m) Thus, with high probability, A is such that LP decoding never fails.

  16. Suppose z is bad… • z bad: |Az – e|1 < |A0 – e|1 ) |Az – e|1 < |e|1 • Let e have support T. • Without loss of generality, e|T =Az|T • Thus z bad: |Az|Tc < |Az|T )|Az|T > ½|Az|1 0 0 0 . . . . 0 e1 e2 e3 . . . . em a1z a2z a3z . . . . amz T 0 Tc 0 y=e Az

  17. Suppose z is bad… Ai.i.d. Gaussian ) Each entry of Azis an i.i.d. Gaussian Let W = Az; its entries W1,…Wmare i.i.d. Gaussians z bad )i2 T |Wi| > ½i |Wi| Recall: |T| ·m Define S(W)to be sum of magnitudes of the top  fraction of entries of W Thus zbad )S(W) > ½ S1(W) Few Gaussians with a lot of mass! T 0

  18. Defining* • Let us look at E[S] • Let w*be such that • Let * = Pr[|W| ¸w*] • Then E[S*] = ½ E[S1] • Moreover, for any <*, E[S]·(½ – ) E[S1] w* E[S*] =½ E[S1] E[S]

  19. Concentration of measure • Sdepends on many independent Gaussians. • Gaussian Isoperimetric inequality implies: With high probability, S(W)close toE[S]. S1similarly concentrated. • Thus Pr[z is bad] · exp(-cm) E[S*] =½ E[S1] E[S]

  20. Beyond * Now E[S] > ( ½ + ) E[S1] Similar measure concentration argument shows that any zis bad with high probability. Thus LP decoding fails w.h.p. beyond * Donoho/CRTV experiments used random error model. E[S*] =½ E[S1] E[S]

  21. Teaser Compressed Sensing: If x2RN is k-sparse Take M ~Ck log N/k random Gaussian measurements Then L1 minimization recovers x. For what k does this make sense (i.e M < N)? How small can C be? k< *N ≈ 0.239 N C > (* log 1/ *)–1 ≈ 2.02

  22. Summary • Tight threshold for Gaussian LP decoding • To preserve privacy: lots of error in lots of answers. • Similar results hold for +1/-1 queries. • Inefficient attacks can go much further: • Correct (½-)fraction of wild errors. • Correct (1-) fraction of wild errors in the list decoding sense. • Efficient Versions of these attacks? • Dwork-Yekhanin: (½-)using AG codes.

More Related