Private Information Retrieval

Private Information Retrieval Based on the talk by Yuval Ishai,Eyal Kushilevitz, Tal Malkin

Outline • Introduction • Common approaches • Information-theoretical • Computational • Summary

Private Information Retrieval (PIR):assumptions • Semi-honest assumption on servers • Server is trustable in terms of honestly following the protocol • Server knows every bit of the data • Server may record client’s requests/queries • Malicious servers • Drop messages • Change messages • Collude with other parties

Private Information Retrieval (PIR): intro • Goal: allow user to query database while hiding the identity of the data-items she is after. • Note: hides identity of data-items; not the existence of interaction with the user. • Motivation: patent databases; stock quotes; web access; many more.... • Paradox(?): imagine buying in a store without the seller knowing what you buy. (Encrypting requests is useful against third parties; not against the owner of data.)

Modeling • Server: holds n-bit string x n should be thought of as very large • User: wishes • to retrieve xi and • to keepi private • Remark: it is the most basic version; the building block for involved retrieval.

Trivial Private Protocol Server sends entire database x to User. Information theoretic privacy. Communication:n x1,x2 , . . ., xn x =x1,x2 , . . ., xn xi SERVER USER Is this optimal?

Obstacle Theorem [CGKS]: In any 1-server PIRwith information theoretic privacy the communication is at least n. Information theoretic privacy/security: The ciphertext gives no information about the plaintext

More “solutions” • User asks for additional random indices. • Pick a few random indices to hide the real one • Drawback: can be estimated • Employ general crypto protocols to compute xi privately. • 1-out-N Oblivious Transfer • Drawback: highly inefficient (polynomial in n). • Anonymity (e.g., via Anonymizers). • Note: address different problems: hides identity of user; not the fact that xi is retrieved.

Two Approaches Information-Theoretic PIR[CGKS95,Amb97,...] Replicate database among k servers. servers do collude. Computational PIR[CG97,KO97,CMS99,...] Computational privacy,based oncryptographic assumptions – NP hard to break the approach

Multiple servers, information-theoretic PIR: 2 servers, comm. n1/3[CGKS95] k servers, comm. n1/(k)[CGKS95, Amb96,…,BIKR02] log nservers, comm.Poly( log(n) )[BF90, CGKS95] Single server, computational PIR: Comm. Poly( log(n) ), n is the # of items Under appropriate computational assumptions [KO97,CMS99] Known Communication Upper Bounds

U ApproachI: k-Server PIR Correctness: User obtains xi Privacy: No single server gets information about i x {0,1}n S1 i x{0,1}n S2 x{0,1}n Sk

Information-Theoretic 2-Server PIR • Best Known Protocol: comm. n1/3 • Let’s look at an example with comm. cost n1/2 Two protocols: Protocol I: n bit queries, 1 bit answers Protocol II: n1/2bit queries, n1/2 bit answers

Protocol I: 2-server O(n) PIR n 0 1 0 0 1 1 0 1 0 0 1 0 S1 S2 i Q2=Q1  i Q1{1,…,n} Meaning of Q2=Q1  i Q1 is a random subset 1 + 1 = 0 0 + 0 = 0 1 + 0 = 1 i *User sent O(n) bits U

Protocol I: 2-server PIR n 0 0 1 0 0 1 1 0 1 0 0 1 0 S1 S2 i Q2=Q1  i Q1 {1,…,n} i *Server replies 1 bit U

a1a2=xi Protocol I: 2-server PIR n 0 0 1 0 0 1 1 0 1 0 0 1 0 1 S1 S2 i Q2=Q1  i Q1 {1,…,n} i U

j,i i U Protocol II: PIR with O(n1/2) Communication Make the n-bit vector as a n1/2 * n1/2 matrix m=n1/2 m=n1/2 X S1 S2 j Q2=Q1  i Q1 {1,…,m} a1,ja2,j=xj,i Apply ex-or sum to each row

Computational PIR with O(n1/2) Comm. • Based on encryption • Quadratic Residue (QR) N = p*q , p,q are primes. Q(y, N) = 0 iff exists w such that w^2 = y (mod N) 1 otherwise Understanding modulo: w^2 = y+k*N, k can be any integer Example: 2^2 = 4 (mod 10), 4^2 = 6 (mod 10) Security: if p, q is unknown, it is computationally impossible to determine Q(y,N). If p,q is known, Q(y,N) can be determined in O(|N|^3)

Example Quadratic Residue: E(0)  QRN E(1)  NQRN Properties: QR QR = QR NQR QR = NQR For any y, y^2 is QR.

Computational PIR with O(n1/2) Comm. n1/2 • Goal: user wants to know entry M(a,b) • 1.User picks N=pq and sends N to server • 2.User picks uniformly at random s=n1/2 numbers, from the set Z={x|1<=x<=N, gcd(N, x)=1}, such that yb is NQR and yj (j!=b) is QR, and sends them to server • 3. For each row r, server calculate • W(r,j) = yj2 if M(r,j) =0, • yj if M(r,j)=1 0 1 1 0 1 1 1 0 1 1 0 0 0 0 0 1 a n1/2 b 4. Server sends Z1,…,Zs To User, and User checks with Za is QR 5.  Zr is a QR iff M(r,b) =0

Related work • Can be a building block for high-level security protocols • Has connection with “Locally Decodable Codes” (LDC) • It was proved that the three problems: PIR, LDC, and Circuit-based SMC are equivalent.

Summary Focus so far: communication complexity Obstacle:time complexity All existing protocols require high computation by the servers(linear computation per query). Are there methods to reduce server cost?

Private Information Retrieval