1 / 38

Foundations of Privacy Lecture 6

Foundations of Privacy Lecture 6. Lecturer: Moni Naor. Recap of last week’s lecture. Counting Queries The BLR Algorithm Efficient Algorithm Hardness Results. Synthetic DB: Output is a DB. ?. answer 1. answer 3. answer 2. Sanitizer. query 1, query 2,. Database.

nansen
Download Presentation

Foundations of Privacy Lecture 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Foundations of PrivacyLecture 6 Lecturer:Moni Naor

  2. Recap of last week’s lecture • Counting Queries • The BLR Algorithm • Efficient Algorithm • Hardness Results

  3. Synthetic DB: Output is a DB ? answer 1 answer 3 answer 2 Sanitizer query 1,query 2,. . . Database Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB Software and people compatible Consistent answers

  4. Counting Queries DatabaseDof sizen • Queries with low sensitivity Counting-queries Cis a setof predicates c: U  {0,1} Query: how many D participants satisfy c ? Relaxed accuracy: answer query withinαadditive errorw.h.p Not so bad:error anyway inherent in statistical analysis Assume all queries given in advance Query c U Non-interactive

  5. And Now… Bad News Runtime cannot be subpoly in |C| or |U| • Output is synthetic DB (as in positive result) • General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

  6. The Bad News For large C and U can’t get efficient sanitizers! • Output is synthetic DB (as in positive result) • General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

  7. Showing (Cryptographic) Hardness • Have to come with universe U and concept class C • A distribution on • databases • Concepts that is hard to sanitize • The distribution may use cryptographic primitives

  8. m1 m2 mn m’ sig(m1) sig(m2) sig(mn) sig(m’) Digital Signatures Digital Signatures (sk,vk) Can build from one-way function [NaYu,Ro] Hard to forge new signature valid signatures under vk

  9. m’1 s1 m1 m2 mn sig(m1) sig(m2) sig(mn) m’k sk Signatures ! No Synthetic DB Universe: (m,s) msg,sig pair Queries:cvk(m,s) output 1 iff s valid sig of m under vk sanitizer most are valid signatures under vk inputs appear in output, no privacy! valid signatures under same vk

  10. Can We output Synthetic DB Efficiently? |C| subpoly poly |U| subpoly ? ? poly ?

  11. Where isthe Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries

  12. vk vk’1 m’1 s1 vk m2 m1 mn sig(m2) sig(mn) sig(m1) vk’k m’k sk vk Hardness on Average Error correcting code Universe: (vk,m,s) key,msg,sig Queries:ci(vk,m,s) - i-th bit of ECC(vk)cv(vk,m,s) - 1 iff valid sig under vk sanitizer are these keys related to vk? Yes! At least one isvk! valid signatures under vk

  13. Hardness on Average Samples: (vk,m,s) key,msg,sig Queries:ci(vk,m,s) - i-th bit of ECC(vk)cv(vk,m,s) - 1 iff valid sig under vk 8i3/4ofvk’jagree w.ECC(vk)[i] 9vk’js.t.ECC(vk’j), ECC(vk)are 3/4-close vk’j = vk(error-correcting code) m’jappears in input. No privacy! vk’1 m’1 s1 vk’k m’k sk are these keys related to vk? Yes! At least one isvk!

  14. Where is Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries Ullman-Vadhan: even marginals on 2 variables hard

  15. Can We output Synthetic DB Efficiently? |C| subpoly poly |U| subpoly ? ? poly ? Signatures Hard on Avg. Using PRFs

  16. Hardness with PRFs • Let F={fs|s seed} be a family of Pseudo-random functions. Length of seed = k • Pseudo-random functions: a family of efficiently computable functions, such that • a random function from the family is indistinguishable (via black-box access) from truly random functions. fs: [ℓ]  [ℓ] • Data Universe U = {(a, b) : a, b 2 [ℓ]}. • Concepts = {cs|s seed}. cs((a, b) ) = 1 iff fs(a)=b Polynomial size Polynomial size

  17. The Hard-to-sanitize Distribution The distribution D on samples • Generate a key s 2 {0, 1}k • Generate n distinct elements a1, ... , an2 [ℓ]. • The i-th entry in the database X is xi = (ai, fs(ai)). Claim: any differentially private sanitizer A cannot be better than 1/3 correct

  18. i.e. with probability noticeably greater than 1/ ℓ. • The function fs is a pseudorandom function • with overwhelming probability over the choice of seed s, for any a 2 [ℓ] that does not appear in a1, ... , an A sanitizer A cannot predict fs(a)any better than it could a random function Expect: no more than a (1/ℓ + neg())-fraction of the a’s in A(X) that are not in X to appear most frequently with the correct b. Suppose this event does not occur. Since all of the items in the input X satisfy the concept cs

  19. General output sanitizers Theorem Traitor tracing schemes exist if and only if sanitizing is hard Tight connection between |U|,|C|hard to sanitizeand key, ciphertext sizes in traitor tracing Separation betweenefficient/non-efficient sanitizersuses [BoSaWa] scheme

  20. Traitor Tracing: The Problem • Center transmits a message to a large group • Some Usersleak their keys to pirates • Pirates construct a clone: unauthorized decryption devices • Given a Pirate Box want to find who leaked the keys K1 K3 K8 E(Content) Content Pirate Box Traitors ``privacy” is violated!

  21. Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup: generates a key bk for the broadcaster and N subscriber keys k1, . . . , kN. Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box Need semantic security!

  22. Simple Example of Tracing Traitor • Let EK(m) be a good shared key encryption sche • Key generation: generate independent keys for E bk = k1, . . . , kN • Encrypt: for bit b generate independent ciphertexts EK1(b), EK2(b), … EKN(b) • Decrypt: using ki: decrypt ith ciphertext • Tracing algorithm: using hybrid argument Properties: ciphertext length N, key length 1.

  23. Equivalence of TT and Hardness of Sanitizing Traitor Tracing Sanitizing hard for distribution of DBs (collection of) Key Database entry (collection of) Ciphertext Query TT Pirate Sanitizer

  24. Traitor Tracing ! Hard Sanitizing Theorem If exists TT scheme cipher length c(n), key length k(n), can construct: Query set C of size ≈2c(n) Data universe U of size ≈2k(n) Distribution D on n-user databases w\ entries from U D is “hard to sanitize”: exists tracer that can extract an entry in D from any sanitizer’s output Separation betweenefficient/non-efficient sanitizersuses [BoSaWa06] scheme Violate its privacy!

  25. Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup: generates a key bk for the broadcaster and N subscriber keys k1, . . . , kN. Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box Need semantic security!

  26. Collusion Important parameter of a traitor-tracing scheme • its collusion-resistance • A scheme is t-resilient if tracing is guaranteed to work as long as no more than t keys were used to create the pirate decoder. • When t = N scheme is said to be fully resilient. • Other parameters ciphertext and private key lengths c(n) and k(n). Need it One-time t-resilient TT scheme: semantic security is only guaranteed against adversaries given a single ciphertext

  27. Data universe: all possible keys U ={0,1}k(n). • Concept class C: a concept for every possible ciphertext - for every m 2 {0,1}c(n) • The concept cm on input a key-string K outputs the decryption of m using the key K • Hard-to-sanitize distribution: • Setup to generate n decryption keys for the users, database X.

  28. Can view any sanitizer that maintains utility as • adversary that outputs an “object” that decrypts encryptions of 0 or 1 correctly. • We can use the traitor-tracing algorithm on such a sanitizer to trace one of the keys in the input of the sanitizer.

  29. From Hard to Sanitize to Tracing Traitors Given hard to sanitize distributions, can create a weak TT scheme: Ciphertext: generate database of individuals. • Each key is a separate subset. • Ciphertext corresponds to queries: knowing individuals allows approximating the query on the database • Need coordination between the different part, since the approximations may differ.

  30. ? Interactive Model query 1 query 2 Sanitizer Data Multiple queries, chosen adaptively

  31. Counting Queries: answering queries interactively DatabaseDof sizen Counting-queries Cis a setof predicates c: U  {0,1} Query: how manyD participants satisfy c ? Relaxed accuracy: answer query withinαadditive errorw.h.p Not so bad:error anyway inherent in statistical analysis • Queries given one by one and should be answered. Queryc U Interactive

  32. Can we answer queries when not known in advance? • Can always answer with independent noise • Limited to number of queries that is smaller than database size. • We do not know the future but we do know the past! • Can answer based on past answers

  33. Idea: Maintain list of Possible Databases • Start with D0 = list of all databases of size m • Each round j: • if list Dj-1 is representative: answer according to averagedatabase in list • Otherwise: prune the list to maintain consistency Dj-1 Dj

  34. Initialize D0 = {all databases of size m over U}. • Each round Dj-1 = {x1, x2, …} where xi of size m For each query c1, c2, …, ckin turn: • LetAjÃAveragei2Dj-1 min{d(x*,xi), √n} • If Ajis small: answer according to median db in Dj-1 • DjÃDj-1 • If Ajis large: remove all db’s that are far away to get Dj • Give true answer Low sensitivity! Noisy threshold Plus noise

  35. Need to show Accuracy and functionality: • The result is accurate • If Ajis large: many of xi2Dj-1 are removed • Djis never empty Privacy • Not many large Aj • Can release large rounds • Can release noisy answers.

  36. Why can we release when large rounds occur? • Do not expect more than O(m) large rounds • Make the threshold noisy For every pair of neighboring databases: D and D’ • Consider vector of thresholds • If far away from threshold – can be the same in both • If close to threshold: can correct at cost • Cannot occur too frequently

  37. Why is there a good xi DatabaseDof sizen • Queries with low sensitivity Counting-queries Cis a setof predicates c: U  {0,1} Query: how manyD participants satisfy c ? Relaxed accuracy: answer query withinαadditive errorw.h.p Not so bad:error anyway inherent in statistical analysis Queryc U SampleFof sizem approximates D on all given c

  38. m is Õ(n2/3 log|C|) There exists x of size m=Õ((n\α)2·log|C|) s.t. maxcj dist(Fgood,D) ≤α Forα=Õ(n2/3log|C|),

More Related