Private Information Retrieval

Private Information Retrieval Stefan Dziembowski this slides are available atwww.dziemowski.net/Slides

AOL search data scandal (2006) #4417749: • clothes for age 60 • 60 single men • best retirement city • jarrett arnold • jack t. arnold • jaylene and jarrett arnold • gwinnett county yellow pages • rescue of older dogs • movies for dogs • sinus infection Thelma Arnold 62-year-old widow Lilburn, Georgia

Observation The owners of databases know a lot about the users! This poses a risk to users’ privacy. E.g. consider database with stock prices… Can we do something about it? Yes, we can: • trust them that they will protect our secrecy, or • use cryptography! problematic! Disclaimer: Not yet practical...

How can crypto help? Note: this problem has nothing to do with secure communication! database D user U

Our settings secure link database D user U A new primitive: Private Information Retrieval (PIR)

Plan • Definition of PIR • An ideal PIR doesn’t exist • Construction of a computational PIR • Open problems • Literature: • B. Chor, E. Kushilevitz, O. Goldreich and M. Sudan,Private Information Retrieval, Journal of ACM, 1998 • E. Kushilevitz and R. Ostrovsky • Replication Is NOT Needed: SINGLE Database, • Computationally-Private Information Retrieval, FOCS 1997

Question How to protect privacy of queries? database D user U wants to retrieve some data from D shouldn’t learn what U retrieved

Let’s make things simple! ? databaseB: index i = 1,…,w the user should learn Bi each Bi є {0,1} (he may also learn otherBi’s)

Trivial solution The database simply sends everything to the user!

Non-triviality The previous solution has a drawback: the communication complexity is huge! Therefore we introduce the following requirement: “Non-triviality”: the number of bits communicated betweenUand D has to be smaller than w.

correctness secrecy (of the user) non-triviality Private Information Retrieval (PIR) polynomial time randomized interactive algorithms This property needs to be defined more formally! input: input: index i = 1,…,w • at the end the user learns Bi • the database does not learn i • the total communication is < w • Note: secrecy of the database is not required

query Q(i) reply A(Q(i),B) How to define secrecy of the user [1/2]? Def. T(i,B)– transcript of the conversation. i B

How to define secrecy of the user [2/2]? Secrecy of the user: for every i,j є {0,1} ? single-round case: it is impossible to distinguish between Q(i) and Q(j) multi-round case: it is impossible to distinguish between T(i,B) and T(j,B) even if the adversary is malicious • What does it mean? • For now say: • the distribution of Q(i) and Q(j) is the same

PIR doesn’t exists [1/4] We now show that correctness, non-triviality and secrecycannot be satisfied simultaneously. Def: A transcript Tis possiblefor (i,B) if P(T(i,B) = T) > 0 Take some T’,and look where it is possible: databases B indices i

PIR doesn’t exists [2/4] Observation: secrecy→if T’ is possible for someB and i then it is possible forB and all the other i’s databases B indices i

PIR doesn’t exists [3/4] non-triviality→ length(transcript) < length(database) ↓ # transcripts < #databases ↓ there has to exist T’ that is possible for two databases B0 and B1 ← B0 databases B ←B1 indices i

PIR doesn’t exists [4/4] B0 and B1 differ on at least one index i’ So, if i’ is the input of the user then correctness → contradiction i’ ↓ ← B0 databases B ←B1 indices i

So PIR doesn’t exist! • How to bypass the impossibility result? • Two ideas: • limit the computing power of a cheating database • use a larger number of “independent” databases

Computationally-secure PIR computational-secrecy: secrecy: ? For every i,j є {0,1} it is impossible to distinguish efficiently between T(i,B) and T(j,B) Formally: for every polynomial-time probabilistic algorithm A the value: |P(A(T(i,B)) = 0) – P(A(T(j,B))=0)| should be negligible.

Computational security in crpyptography most of the constructions in cryptography “imply” that P ≠ NP So, the best we can hope for is to construct protocols with a conjectured computational security. Two approaches to cryptography: • construct protocols that “look secure” • base security on some well-known hardness assumption. This is sometimes called: “provable security”.

Hardness assumptions? A great source of hard problems is the number theory. [KO97] – construct PIR based on the Quadratic Residuosity Assumption We describe it on the next slides.

Algebraic preliminaries: Zm* Fact: Zm:={0,1,…,m-1}, with addition modulo mis agroup. Is it a group also with multiplication modulo m? No: Suppose that x є Zm is not relatively prime to m, and let d := gcd(x,m). Then for every i є Zm we have that i· x is divisible by d, and hence i· x ≠ 1 mod m. So, x does not have an inverse! But: every x є Zm relatively prime to m, has an inverse, which can be computed using the extended Euclidean algorithm. Hence: the set Zm:*:={x : x є Zm such that xis relatively prime to m} is a multiplicative group! Fact:for any prime pthe groupZp* = {1,...,p-1}is cyclic. Z12

Favourite cryptographers’ group p,q – large random primes (|p|=|q|=21024,say) RSA group: Zn*,where n=pq How to select a random prime? Just take a random number and test if it is prime! • Testing primality is easy [Rabin-Miller test] • By the prime number theorem: P(random x of length t is prime) ≈ 1/ln(t) so primes are “dense”.

Chinese remainder theorem [1/3] Z15: i i mod 5 i mod 3 i mod 5 0 1 2 3 4 3 0 6 12 9 i mod 3 10 1 7 13 4 5 11 2 8 14

It’s not always like this! Consider p = 4 and q = 6: i mod 6 Z24: i mod 4

Chinese remainder theorem [3/3] Chinese remainder theorem (CRT): For n = pq (where p and q are prime) a function λ: Zn → Zp × Zq defined as λ(i) := (i mod p, i mod q) is a bijection. Proof: If λ(i) = λ(j) then i mod p = j mod p →p divides i-j and i mod q = j mod q → q divides i-j becausep and q are prime n divides i-j i = j mod n

λis an isomorphism Moreover λ: Zn → Zp × Zqis an isomorphism! Proof: λ(a + b) = (a + b mod p, a + b mod q) = (a mod p + b mod p, a mod q + b mod q) = λ(a) + λ(b) this + is an operation inZp

Zn vs. Zn* What if we restrict λto Zn* ? λ(i) := (i mod p, i mod q) Observation 1: λis also an isomorphism Zn*→ Zp*× Zq*. Observation 2: |Zn*| = (p-1)(q-1) Z5* Z15 0 1 2 3 4 0 6 12 3 9 Z3* 10 1 7 13 4 Z15* 5 11 2 8 14

How does it look for large p and q? mod p Zn mod q Zn*

Quadratic Residues Def. x isquadratic residue modulomifthere exists a є Zm* such that x = a2 mod m QR(m) := the set of all quadratic residues modulo m. QNR(m) := Zm* \ QR(n) Z13*: a a2: QR(13): Observation: every quadratic residue modulo 13 has exactly2 square roots, and hence |QR(13)| = |Z13*| / 2.

A Lemma about QRs modulo prime p Lemma: For every prime p we have QR(p) = (p-1)/2 Proof: We show that every quadratic residue has exactly 2 square roots in Zp*. Suppose that a2 =b2 mod p, where a,bє Zp*. Thusp divides a2 -b2 = (a – b)(a + b). Hence either • p divides a – b → a = b, or • p divides a + b → a = p – b. Remark: Letgbe a generator ofZp*. ThenQR(p) = {g0,g2,g4,g6,...,gp-3}. QNR(p) = {g1,g3,g5,g7,...,gp-2}.

QRs modulo pq Z15*: a a2 QR(15): Observation: every quadratic residue modulo 15 has exactly4 square roots, and hence |QR(15)| = |Z15*| / 4.

A Lemma about QRs modulo pq Fact: For n=pq we have |QR(n)| = |Zn*| / 4. Proof: x є QR(n) iff x = a2 mod n, for some a iff (by CRT) x = a2 mod p and x = a2 mod q iff x mod p є QR(p) and x mod q є QR(q) mod p Zn*: QR(p) QR(q) QR(n) mod q

QRs modulo pq – an example 22 mod 5 32 mod 5 12 mod 5 42 mod 5 QR(5) Z15: 0 1 2 3 4 0 6 12 3 9 QR(3) 10 1 7 13 4 5 11 2 8 14 12 mod 3 22 mod 3 QR(5) Z15*

Homomorphism of QR(pq) 1 if a є QR(n) 0 otherwise Res(n,a) = Homomorphism: for all a,bє Zn* Res(n,ab) = Res(n,a) xor Res(n,b) Proof: It is enough to show it modulo a prime p: g – generator of Zp* Recall that: a є QR(p) iff a=gv mod p where v is even. Hence ab is aQR iff ab is an even power of g iff (a is an even power of g) AND (b is an even power of g) OR (ais an odd power ofg) AND (b is an odd power ofg) both a and b are a QR both a and b are a QNR

Algorithmic questions about QR • Suppose n=pq • Is it easy to test membership in QR(n)? • Fact: if one knows p and q – yes! • What if one doesn’t know p and q?

Quadratic Residuosity Assumption (QRA) ? Note: Zn+is a group! n=pq, where p and q are large primes a є Zn+ ↓ Zn*: QR(p) QNR(p) • Zn+: • all aє Zn*: such that • a mod p є QR(p) • iff • a mod qє QR(q) QR(q) QR(n) QNR(q) Quadratic Residuosity Assumption (QRA): For a random aєZn+ it is computationally hard to determine if a є QR(n). Formally: for every polynomial-time probabilistic algorithm G the value: |P(G(a) = Res(a)) – 0.5| (where ais random) is negligible.

We are ready to construct PIR! Our PIR will work in the group Zn+, where n=pq. What’s so good about this group?: • testing membership in QR(n) is hard for random elements on Zn+, unless one knowspandq. • homomorphism of Res!

for every j = 1,...,wthe database sets Yj = { Xj2 ifBi = 0 Xjotherwise M First (wrong) idea i i ↓ Yi is a QR iffBj=0 M is a QR iffBj=0 the user checks if M is a QR SetM = Y1· Y2 · ... · Yw

PIR from the previous slide: correctness√ security? The to learn i the database would need to distinguish NQR from QR. √ Problems! • non-triviality? doesn’t hold! communication: user → database: |B|· |Zn| database→ user: |Zn| Call it: (|B|, 1) - PIR

How to fix it? consider each row as a separate database Idea:Given: construct Suppose that |B|= v2 and present Bas a v×v-matrix:

execute v (v,1)- PIRsin parallel Idea that works v Looks even worse: communication: user → database: v2· |Zn| database→ user: v·|Zn| v The method Let j be the column where Biis. In every “row” the user asks for the jth element So, instead of sending v queries the user can send one! Observe: in this way the user learns all the elements in the jth column! j ↓ Bi

Putting things together i kth row Bj=0 iff Mkis QR multiply elements in each row

So we are done! PIR from the previous slide: • correctness√ • non-triviality:communication complexity = 2√|B|· |Zn| √ • security? The to learn i the database would need to distinguish NQR from QR. Formally: fromany adversary that breaks our schemewe can constructan algorithm that breaks QRA simulates:

(X1,…,Xv) (M1,…,Mv) Improvements database D user U the user is interested just in oneMi. Idea: apply PIR recursively!

Complexity of PIRs – overview of the results their conclusion: It is the time-complexity that matters. In real-life: it is still more practical to transmit the entire database. Communication: • “recursive” PIR of [KO97]: for every c: O(|B|c) • [Cachin, Micali, Stadler, 1999]: poly-logarithmic in |B| • [Lipmaa, 2005]: O(log2|B|) For practical analysis see: • [Sion, Carbunar] On the Computational Practicality of Private Information Retrieval.

Extensions • Symmetric PIR (also protect privacy of the database). [Gertner, Ishai, Kushilevitz, Malkin. 1998] • Searching by key-words [Chor, Gilboa, Naor, 1997] • Public-key encryption with key-word search [Boneh, Di Crescenzo, Ostrovsky, Persiano]

Open problems: • Improve efficiency. • Construct new extensions. What was the key property that we used? homomorphism of QR Holy grail: fully-homomorphic encryption

Fully-homomorphic encryption Observe that we constructed a 1-bit probabilistic public-key encryption scheme: { Enc(X) = Which has the following homomorphic with respect to xor: Enc(X xor Y) = Enc(X) • Enc(Y) It would be really useful to have an encryption scheme homomorphic with respect to: conjunction and negationsimultaneously.

Thank you! Questions?

Private Information Retrieval