Private Information Retrieval

Private Information Retrieval

Contents • What is Private Information retrieval (PIR) ? • Reduction from Private Information Retrieval (PIR) to Smooth Codes • Constructions (Achieving the Barrier) • Construction (Breaking the Barrier)

Private Information Retrieval (PIR) • Query a public database, without revealing the queried record. • Example: A broker needs to query NASDAQ database about a stock, but doesn’t want anyone to know he is interested.

PIR • The Single Server Case • Chor et all have shown in their 1995 paper that for a single server the it is necessary to send the whole content of the database.

PIR • A k server PIR scheme of one round, for database length n consists of:

PIR – definition • These functions should satisfy:

Simple Construction of PIR • 2 servers, one round • Each server holds bits x1,…, xn. • To request bit i, choose uniformly A subset of [n] • Send A to the first server. • Send the second server A+{i} (add i to A if it is not there, remove if it is there) • Servers return the xor of the bits in the indices of the requests. • Xor the answers.

Smoothly Decodable Code C:{0,1}nm is a (q,c,) smoothly decodable code if there exists a prob. algorithm, A, such that: x  {0,1}nand i  {1,..,n}, Pr[ A(C(x),i)=xi ] > ½ +  The Probability is over the coin tosses of A A has access to a non corrupted codeword Areads at most q indices of y (of its choice) Queries are not allowed to be adaptive i  {1,..,n} andj {1,..,m}, Pr[ A(·,i) reads j] ≤ c/m

LDC is Smooth • Claim: Every (q,δ,ε) LDC is a (q,q/ δ, ε) smooth code. • Intuition – If the code is resilient against linear number of errors, then no bit of the output can be queried too often (or else adversary will choose it)

Smooth Code is LDC • A bit can be reconstructed using q uniformly distributed queries, with ε advantage , when no errors • With probability (1-qδ) all the queries are to non-corrupted indices. Remember: Adversary does not know decoding procedure’s random coins

Reduction from PIR to SDC [Gol,Ka,Sch,Tr 02] • A codeword is a Concatenation of all possible answers from the servers • A query procedure is made of k queries to the codeword corresponding to the answers of k servers on the requested bit (for queries generated as in the PIR) • From the PIR properties it follows that the distribution of queries to the indices of the codeword are independent of the requested bit

Reduction from PIR to SDC • Let a be the length of an answer from a server, k the number of servers and q the length of a query • Let l= be the length of a codeword • Let Pj be the probability of querying bit j. Note that • Set . And duplicate bit j Nj times. When querying for bit j choose at random one of the Nj bits

Reduction from PIR to SDC • The probability of accessing each bit is now less than 1/l • The new length of the encoding is less than (k+1)l • We have a (ka,k+1,1/2) LDC

Achieving the Barrier • Ingredients: • X – the database string • E : • Px(Z1,…,Zm) – A polynomial in m=(n^d) variables of degree d s.t. Px(E(i))=xi • s.t.

Achieving the Barrier • The user generates the Yj and sends all Yq q!=j to server j • We can view Px as a polynomial in the km variables Yjl where the Yjl sum to Zj • Each server knows the value of (k-1)m variables • Let d=k-1, hence each monomial of Px has at most k-1 different variables

Achieving the Barrier • Each variable is known to k-1 servers, hence there exists a server who knows the values of all the variables in the monomial. • Assign each monomial to one of the servers who know all its variables.

Achieving the Barrier • Each server calculates the xor of the monomials assigned to it and sends to the user • The user calculates the xor of all the answers.

Achieving the Barrier • Security - each server received k-1 vectors which are random independent strings of length m • Communication Complexity – each server received k-1 vectors, each of length m=O(n^(1/d)) = O(n^(1/(k-1)) by choice of m and d.

Achieving the Barrier • Now take d=1/(2k-1) • Each monomial has a server who misses at most 1 variable, assign the monomial to that server • Each server sends the 1-bit coefficients of the polynomial which is the sum of all monomials assigned to it • The user evaluates the polynomial on the variables Y

Achieving the Barrier • The query complexity is the same O(n^(1/d)) • The answer complexity is (k^2)m=O(n^1/d) • Total complexity : O(n^1/d)= O(n^(2k-1)) by choice of d

Breaking the Barrier • The first idea that comes to mind is to try and increase the degree d even further. • Unfortunately this does not work due to the increasing size of the polynomials the servers return. • The novelty of the paper is how to go around this difficulty.

Breaking the Barrier • Assume that each polynomial is known not to one server but to a group of servers. • Now we do not need to receive the polynomials themselves but can use the PIR scheme (on those servers) to evaluate them on the required input.

Breaking the Barrier • Suppose that we could write Px as a sum of Pv where v ranges over all subsets of the servers. The problem of evaluating Px reduces to evaluating each Pv which (we hope) is of lower degree. • On the other hand, also the number of servers is smaller which is a disadvantage. • The paper comes to find such Pv with good properties

Breaking the Barrier • Define k’ to be a lower bound on the size of the sets V and  the maximum number of variables a server misses in Pv. • All together V misses at most |V| variables in Pv.

Breaking the Barrier • We will choose an encoding E such that the hamming weight of E(i) (and therefore the number of monomials) will be bounded by d (the number of monomials is bounded by 2^d). • If we had Pv as specified then we could apply the PIR recursively on all sets of size more than k’ with communication complexity:

Breaking the Barrier • Let E be an encoding to all strings of length m and weight d. • We can encode different values thus is sufficient to encode n values. • Define it holds that • Define V(M) to be all servers who miss at most  variables in M

Breaking the Barrier • Lemma: for ,k’<=k and d<=(+1)k-(-1)k’+(-2) and M a monomial of degree d in Yj,h then either there is a server who misses at most one variable or |V(M)|>=k’ • Proof: Counting argument

Breaking the Barrier • Claim: Let k,,k’ be as before then there are polynomials Pv,Pj for every V[k] s.t. |V|>=k’ and j[k] s.t. • Pv is of degree |V| and can be computed from Px and {Yj}jV • Pj is of degree 1 and can be computed from Px and {Yj}ji

Breaking the Barrier • Proof: It is sufficient to prove for P consisting of a single monomial, then we can sum over all monomials. • Denote • Define (M) to be the number of variables in M for which

Breaking the Barrier • WLOG take • Define a polynomial in mk variables. • Q has k^d monomials each of the form

Breaking the Barrier • Set Q’=Q, for all V Pv=0 • Find V=V(M) for some monomial M in Q’ s.t. V is of maximal size, if |V|<k’ stop. • While there is M’ s.t. V(M’)=V: • Pick M’ from Q’ which maximizes (M’) • Pv=Pv+T(M’), Q’=Q’-T(M’) • Goto 2

Breaking the Barrier • If the algorithm halts then the Pv are of the desired degree and their sum is equal to P-Q’ for Q’ at the end of the execution. • Likewise, for each M in Q’ there exists a server j who misses at most one variable, add M to Pj

Breaking the Barrier • Define MM’ if V(M)=V(M’) and for all q<=d either or • If M’ is a monomial in T(M) then • V(M’)V(M) • (M’)<=(M) • Equality in 1,2 implies MM’ • M1M2 implies either both are in T(M) of both aren’t

Breaking the Barrier • Each time step 3 is applied we either add to Q’ monomials M’ with smaller V(M’) or (M’) which will be dealt with later. • Or M’M so it already exists in Q’ and is removed.

Breaking the Barrier • Lemma: For all i>0 and k>(i-1)! there exists a PIR protocol Pi with communication complexity O(n^2/ik) • Corollary : there exists a PIR protocol with communication complexity

Summary • For every PIR scheme we have a related smooth code • Upper bound for PIR is raised to • Likewise the upper bound for smooth codes is raised to

Related Topics • T-collusion PIR, the protocol must maintain security against collusions of T servers. General results appear in “Information-Theoretic Private Information Retrieval: A Unified Construction” [Beimel, Ishai] • CPIR – Computational PIR in which the security definition is relaxed to a computational one. • There exist polylog single server CPIR protocols [Cachin, Micali, Stadler]

Private Information Retrieval