Foundations of Privacy Lecture 6

Foundations of PrivacyLecture 6 Lecturer:Moni Naor

Recap of last week’s lecture • Counting Queries • The BLR Algorithm • Efficient Algorithm • Hardness Results

Synthetic DB: Output is a DB ? answer 1 answer 3 answer 2 Sanitizer query 1,query 2,. . . Database Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB Software and people compatible Consistent answers

Counting Queries DatabaseDof sizen • Queries with low sensitivity Counting-queries Cis a setof predicates c: U  {0,1} Query: how many D participants satisfy c ? Relaxed accuracy: answer query withinαadditive errorw.h.p Not so bad:error anyway inherent in statistical analysis Assume all queries given in advance Query c U Non-interactive

And Now… Bad News Runtime cannot be subpoly in |C| or |U| • Output is synthetic DB (as in positive result) • General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

The Bad News For large C and U can’t get efficient sanitizers! • Output is synthetic DB (as in positive result) • General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

Showing (Cryptographic) Hardness • Have to come with universe U and concept class C • A distribution on • databases • Concepts that is hard to sanitize • The distribution may use cryptographic primitives

m1 m2 mn m’ sig(m1) sig(m2) sig(mn) sig(m’) Digital Signatures Digital Signatures (sk,vk) Can build from one-way function [NaYu,Ro] Hard to forge new signature valid signatures under vk

m’1 s1 m1 m2 mn sig(m1) sig(m2) sig(mn) m’k sk Signatures ! No Synthetic DB Universe: (m,s) msg,sig pair Queries:cvk(m,s) output 1 iff s valid sig of m under vk sanitizer most are valid signatures under vk inputs appear in output, no privacy! valid signatures under same vk

Can We output Synthetic DB Efficiently? |C| subpoly poly |U| subpoly ? ? poly ?

Where isthe Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries

vk vk’1 m’1 s1 vk m2 m1 mn sig(m2) sig(mn) sig(m1) vk’k m’k sk vk Hardness on Average Error correcting code Universe: (vk,m,s) key,msg,sig Queries:ci(vk,m,s) - i-th bit of ECC(vk)cv(vk,m,s) - 1 iff valid sig under vk sanitizer are these keys related to vk? Yes! At least one isvk! valid signatures under vk

Hardness on Average Samples: (vk,m,s) key,msg,sig Queries:ci(vk,m,s) - i-th bit of ECC(vk)cv(vk,m,s) - 1 iff valid sig under vk 8i3/4ofvk’jagree w.ECC(vk)[i] 9vk’js.t.ECC(vk’j), ECC(vk)are 3/4-close vk’j = vk(error-correcting code) m’jappears in input. No privacy! vk’1 m’1 s1 vk’k m’k sk are these keys related to vk? Yes! At least one isvk!

Where is Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries Ullman-Vadhan: even marginals on 2 variables hard

Can We output Synthetic DB Efficiently? |C| subpoly poly |U| subpoly ? ? poly ? Signatures Hard on Avg. Using PRFs

Hardness with PRFs • Let F={fs|s seed} be a family of Pseudo-random functions. Length of seed = k • Pseudo-random functions: a family of efficiently computable functions, such that • a random function from the family is indistinguishable (via black-box access) from truly random functions. fs: [ℓ]  [ℓ] • Data Universe U = {(a, b) : a, b 2 [ℓ]}. • Concepts = {cs|s seed}. cs((a, b) ) = 1 iff fs(a)=b Polynomial size Polynomial size

The Hard-to-sanitize Distribution The distribution D on samples • Generate a key s 2 {0, 1}k • Generate n distinct elements a1, ... , an2 [ℓ]. • The i-th entry in the database X is xi = (ai, fs(ai)). Claim: any differentially private sanitizer A cannot be better than 1/3 correct

i.e. with probability noticeably greater than 1/ ℓ. • The function fs is a pseudorandom function • with overwhelming probability over the choice of seed s, for any a 2 [ℓ] that does not appear in a1, ... , an A sanitizer A cannot predict fs(a)any better than it could a random function Expect: no more than a (1/ℓ + neg())-fraction of the a’s in A(X) that are not in X to appear most frequently with the correct b. Suppose this event does not occur. Since all of the items in the input X satisfy the concept cs

General output sanitizers Theorem Traitor tracing schemes exist if and only if sanitizing is hard Tight connection between |U|,|C|hard to sanitizeand key, ciphertext sizes in traitor tracing Separation betweenefficient/non-efficient sanitizersuses [BoSaWa] scheme

Traitor Tracing: The Problem • Center transmits a message to a large group • Some Usersleak their keys to pirates • Pirates construct a clone: unauthorized decryption devices • Given a Pirate Box want to find who leaked the keys K1 K3 K8 E(Content) Content Pirate Box Traitors ``privacy” is violated!

Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup: generates a key bk for the broadcaster and N subscriber keys k1, . . . , kN. Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box Need semantic security!

Simple Example of Tracing Traitor • Let EK(m) be a good shared key encryption sche • Key generation: generate independent keys for E bk = k1, . . . , kN • Encrypt: for bit b generate independent ciphertexts EK1(b), EK2(b), … EKN(b) • Decrypt: using ki: decrypt ith ciphertext • Tracing algorithm: using hybrid argument Properties: ciphertext length N, key length 1.

Equivalence of TT and Hardness of Sanitizing Traitor Tracing Sanitizing hard for distribution of DBs (collection of) Key Database entry (collection of) Ciphertext Query TT Pirate Sanitizer

Traitor Tracing ! Hard Sanitizing Theorem If exists TT scheme cipher length c(n), key length k(n), can construct: Query set C of size ≈2c(n) Data universe U of size ≈2k(n) Distribution D on n-user databases w\ entries from U D is “hard to sanitize”: exists tracer that can extract an entry in D from any sanitizer’s output Separation betweenefficient/non-efficient sanitizersuses [BoSaWa06] scheme Violate its privacy!

Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup: generates a key bk for the broadcaster and N subscriber keys k1, . . . , kN. Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box Need semantic security!

Collusion Important parameter of a traitor-tracing scheme • its collusion-resistance • A scheme is t-resilient if tracing is guaranteed to work as long as no more than t keys were used to create the pirate decoder. • When t = N scheme is said to be fully resilient. • Other parameters ciphertext and private key lengths c(n) and k(n). Need it One-time t-resilient TT scheme: semantic security is only guaranteed against adversaries given a single ciphertext

Data universe: all possible keys U ={0,1}k(n). • Concept class C: a concept for every possible ciphertext - for every m 2 {0,1}c(n) • The concept cm on input a key-string K outputs the decryption of m using the key K • Hard-to-sanitize distribution: • Setup to generate n decryption keys for the users, database X.

Can view any sanitizer that maintains utility as • adversary that outputs an “object” that decrypts encryptions of 0 or 1 correctly. • We can use the traitor-tracing algorithm on such a sanitizer to trace one of the keys in the input of the sanitizer.

From Hard to Sanitize to Tracing Traitors Given hard to sanitize distributions, can create a weak TT scheme: Ciphertext: generate database of individuals. • Each key is a separate subset. • Ciphertext corresponds to queries: knowing individuals allows approximating the query on the database • Need coordination between the different part, since the approximations may differ.

? Interactive Model query 1 query 2 Sanitizer Data Multiple queries, chosen adaptively

Counting Queries: answering queries interactively DatabaseDof sizen Counting-queries Cis a setof predicates c: U  {0,1} Query: how manyD participants satisfy c ? Relaxed accuracy: answer query withinαadditive errorw.h.p Not so bad:error anyway inherent in statistical analysis • Queries given one by one and should be answered. Queryc U Interactive

Can we answer queries when not known in advance? • Can always answer with independent noise • Limited to number of queries that is smaller than database size. • We do not know the future but we do know the past! • Can answer based on past answers

Idea: Maintain list of Possible Databases • Start with D0 = list of all databases of size m • Each round j: • if list Dj-1 is representative: answer according to averagedatabase in list • Otherwise: prune the list to maintain consistency Dj-1 Dj

Initialize D0 = {all databases of size m over U}. • Each round Dj-1 = {x1, x2, …} where xi of size m For each query c1, c2, …, ckin turn: • LetAjÃAveragei2Dj-1 min{d(x*,xi), √n} • If Ajis small: answer according to median db in Dj-1 • DjÃDj-1 • If Ajis large: remove all db’s that are far away to get Dj • Give true answer Low sensitivity! Noisy threshold Plus noise

Need to show Accuracy and functionality: • The result is accurate • If Ajis large: many of xi2Dj-1 are removed • Djis never empty Privacy • Not many large Aj • Can release large rounds • Can release noisy answers.

Why can we release when large rounds occur? • Do not expect more than O(m) large rounds • Make the threshold noisy For every pair of neighboring databases: D and D’ • Consider vector of thresholds • If far away from threshold – can be the same in both • If close to threshold: can correct at cost • Cannot occur too frequently

Why is there a good xi DatabaseDof sizen • Queries with low sensitivity Counting-queries Cis a setof predicates c: U  {0,1} Query: how manyD participants satisfy c ? Relaxed accuracy: answer query withinαadditive errorw.h.p Not so bad:error anyway inherent in statistical analysis Queryc U SampleFof sizem approximates D on all given c

m is Õ(n2/3 log|C|) There exists x of size m=Õ((n\α)2·log|C|) s.t. maxcj dist(Fgood,D) ≤α Forα=Õ(n2/3log|C|),

Foundations of Privacy Lecture 6

Foundations of Privacy Lecture 6

Presentation Transcript

Foundations of Cryptography Lecture 1

Foundations of Privacy Formal Lecture Zero-Knowledge and Deniable Authentication

Seminar in Foundations of Privacy

Lecture 7: Foundations of Hinduism

Foundations of Privacy Lecture 3

Foundations of Privacy Lecture 7+8

Foundations of Privacy Lecture 5

Foundations of Privacy Lecture 7

Foundations of Privacy Lecture 7

Foundations of Privacy Lecture 6

Approximate Privacy: Foundations and Quantification

Foundations of Privacy Lecture 5

Foundations of Privacy Lecture 8

Foundations of Privacy Lecture 4

Foundations of Cryptography Lecture 12

Seminar in Foundations of Privacy

Foundations of Privacy Lecture 11

Foundations of Cryptography Lecture 11

GDPR: The Foundations of Data Privacy