550 likes | 741 Views
Replication is not n eeded : Single Database, Computationally-Private Information Retrieval. Mahmudur Rahman ( Ratul ). AOL search data scandal (2006). #4417749: clothes for age 60 60 single men best retirement city jarrett arnold jack t. arnold jaylene and jarrett arnold
E N D
Replication is not needed: Single Database, Computationally-Private Information Retrieval MahmudurRahman (Ratul)
AOL search data scandal (2006) #4417749: • clothes for age 60 • 60 single men • best retirement city • jarrett arnold • jack t. arnold • jaylene and jarrett arnold • gwinnett county yellow pages • rescue of older dogs • movies for dogs • sinus infection Thelma Arnold 62-year-old widow Lilburn, Georgia
Observation The owners of databases know a lot about the users! This poses a risk to users’ privacy. E.g. consider database with stock prices… Can we do something about it? Yes, we can: • trust them that they will protect our secrecy, or • use cryptography! problematic! Disclaimer: Not yet practical...
How can crypto help? Note: this problem has nothing to do with secure communication! database D user U
Our settings secure link database D user U A new primitive: Private Information Retrieval (PIR)
Plan • Definition of PIR • Quadratic Residuosity Assumption(QRA) • Construction of a computational PIR • The Basic cPIR scheme • The Recursive cPIR scheme • Extensions & Generalizations • Open problems • Literature: • B. Chor, E. Kushilevitz, O. Goldreich and M. Sudan,PrivateInformation Retrieval, Journal of ACM, 1998 • E. Kushilevitz and R. OstrovskyReplication Is NOT Needed: SINGLE Database, • Computationally-Private Information Retrieval, FOCS 1997
Question How to protect privacy of queries? database D user U wants to retrieve some data from D shouldn’t learn what U retrieved
Let’s make things simple! ? databaseB: index i = 1,…,w the user should learn Bi each Bi є {0,1} (he may also learn otherBi’s)
Trivial solution The database simply sends everything to the user!
Non-triviality The previous solution has a drawback: the communication complexity is huge! Therefore we introduce the following requirement: “Non-triviality”: the number of bits communicated betweenUand D has to be smaller than w.
correctness privacy (of the user) non-triviality Private Information Retrieval (PIR) polynomial time randomized interactive algorithms This property needs to be defined more formally! input: input: index i = 1,…,w • at the end the user learns Bi • the database does not learn i • the total communication is < w
Reduce Communication Complexity(CC) • Various PIR schemes with significantly smaller CC than the obvious n-bit solution • Schemes for a constant number, k, of servers with CC O(n1/k) where k>1 • Then comes new notion of Computational PIR (cPIR) schemes!!!
Computational PIR (cPIR) • Databases are restricted to perform only polynomial-time computations • The identity of i is only computationally hidden from the DBs. • Assumption was that the data is represented in several databases, that do not communicate with each other. For every 0<c<1 there is a cPIR scheme for k=2 databases with CC O(nc) based on the existence of one-way functions
Main Theorem • For every c > 0 there exists a single database cPIR scheme with communication complexity O(nc) , assuming the hardness of deciding quadratic residuosity.
Some properties of proposed Scheme • The data is stored in the database in its plain form, no need for processing between different users • The scheme is single-rounded query-answer protocol (common communication pattern in context of databases)
Oblivious RAM model • When data is stored in a single database and is encrypted, with a key that is held by the user, then the model becomes the oblivious RAM model. • Differences between proposed cPIR scheme and oblivious RAM model: • Data stored in special encrypted form • Requires pre-processing initialization step • Requires the user to keep some state information • If several users, it requires them to coordinate their acts • Still the key must be hidden from the database This technique do not provide a solution to the single database cPIR problem
Hardness assumptions? A great source of hard problems is the number theory. [KO97] – construct PIR based on the Quadratic Residuosity Assumption We describe it on the next slides.
Quadratic Residues Def. x isquadratic residue modulomifthere exists a єZm* such that x = a2 mod m QR(m) := the set of all quadratic residues modulo m. QNR(m) := Zm* \ QR(n) Z13*: a x: QR(13): Observation: every quadratic residue modulo 13 has exactly2 square roots, and hence |QR(13)| = |Z13*| / 2.
QRs modulo pq Z15*: a a2 QR(15): Observation: every quadratic residue modulo 15 has exactly4 square roots, and hence |QR(15)| = |Z15*| / 4.
Algorithmic questions about QR • So the problem is considered hardest when N is a product of two distinct primes of equal length k/2 • The hard set,Hk = {NIN = p1.p2 where p1, p2 are k/2-bit primes} • Suppose n=pq • Is it easy to test membership in QR(n)? • Fact: if one knows p and q – yes! in O(|N|3) time • What if one doesn’t know p and q?
Quadratic Residuosity Assumption (QRA) ? Note: Zn+is a group! n=pq, where p and q are large primes of k/2 bits a є Zn+ ↓ Zn*: QR(p) QNR(p) • Zn+: • all aє Zn*: such that • a mod p є QR(p) • iff • a mod qє QR(q) QR(q) QR(n) QNR(q) Quadratic Residuosity Assumption (QRA): Let p1 and p2 be distinct prime numbers such that | p1 | = | p2 | (large enough) and let N = p1 p2. If the factorization of N is not known, there is no efficient procedure for solving the Quadratic Residuosity Problem modulo N i.e determining if a number is a QR or a QNR is computationally hard if p1 and p2 are not given
Useful Properties of QR problem The Quadratic Residuosity Problem has some useful properties: • QR ×QR = QR • QR ×QNR = QNR • Picking a random element in QR(n) is easy: pick r ϵ ZN* and compute r2 mod N (for doing it, we do not need to know the factorization of N). • Under QRA, the Quadratic Residuosity Problem modulo N is hard not only for some special x ϵ ZN*but it is hard on average. For this property this problem is random self-reducible.
Single DB Computational PIR • Protocol for two players: U and DB • Limited to probabilistic polynomial-time computations • Each player is a interactive turingmachine with following type of tapes: input, random, work, output and communication • Communication complexity: CC(n,k) where n is length of data string x and security parameter 1kand |q(i)|+ |a(x, q(i) ) |≤CC(n, k) • Must satisfy both correctness and privacy
Single DB Computational PIR(Cont.) The scheme proceeds as follows: • U performs a polynomial time computation and writes single message q(i) on write-only communication tape • DB reads its read-only communication tape, performs a polynomial time computation and writes a single message a(x,q(i)) on its write-only communication tape • U reads the answer sent by DB and then performs a polynomial time computation using DB answer and its work tape U outputs a single bit on its output tape(should be xi)
Single DB Computational PIR(Cont.) • Communication complexity is CC(n,k) where n is length of data string x and security parameter 1kand |q(i)|+ |a(x, q(i) )| ≤ CC(n, k) • Must satisfy both correctness and privacy • Correctness: The output of U must be • Privacy: The DB cannot distinguish, with non-negligible probability,t he difference between any two distributions of requests q(i) and q(j)
for every j = 1,...,wthe database sets Yj = { Xj2 ifBi = 0 Xjotherwise M First (wrong) idea i i ↓ Yi is a QR iffBj=0 M is a QR iffBj=0 the user checks if M is a QR SetM = Y1· Y2 · ... · Yw
PIR from the previous slide: correctness√ security? To learn i the database would need to distinguish NQR from QR. √ Problems! • non-triviality? doesn’t hold! communication: user → database: |B|· |Zn| database→ user: |Zn| Call it: (|B|, 1) - PIR
How to fix it? consider each row as a separate database Idea:Given: construct Suppose that |B|= v2 and present Bas av×v-matrix:
execute v (v,1)- PIRsin parallel Idea that works v Looks even worse: communication: user → database: v2· |Zn| database→ user: v·|Zn| v The method Let j be the column where Biis. In every “row” the user asks for the jth element So, instead of sending v queries the user can send one! Observe: in this way the user learns all the elements in the jth column! j ↓ Bi
Putting things together i kth row Bj=0 iff Mkis QR multiply elements in each row
So we are done! PIR from the previous slide: • correctness√ • non-triviality:communication complexity = 2√|B|· |Zn| √ • security? The to learn i the database would need to distinguish NQR from QR. Formally: fromany adversary that breaks our schemewe can constructan algorithm that breaks QRA simulates:
The Basic Scheme • Single DB cPIR scheme whose communication complexity O(n0.5+c) for any c>0 • DB x as a s*t matrix of bits, denoted M • User interested in retrieving privately the bit xi [(a,b) entry of M] • Steps: • U Picks at random a k-bit number N(multiplication of 2 k/2 bit primes) and sends N to DB • User sends chosen random t numbers(total t*k bits) to DB (uniform including QNR and QR)
The Basic Scheme(Cont.) • 3. DB computes for every row r a number zrϵz*N as follows: • And then computes: • 4. DB sends z1………………….zs to the user (total of s.k bits) • 5. User only considers zawhere this number is QR iffMa,b=0and vice versa. U can efficiently check whether za is a QR and by this retrieve the bit Ma,b
The Basic Scheme(Cont.) • correctness√ • non-triviality: • Total s+t+1 k-bit numbers • Assumption: s=t=√n and security parameter,k=nc • Socommunication complexity = (2√n+1)· k =O(n0.5+c) √ • privacy? Then to learn i the database would need to distinguish QNR from QR. Formally: fromany adversary that breaks our schemewe can constructan algorithm that breaks QRA simulates:
(X1,…,Xv) (M1,…,Mv) Improvements database D user U the user is interested just in oneMi. Idea: apply cPIR recursively which ensures lower communication complexity although more complicated!!
The Recursive Scheme • Consider and replace Step 4 of the basic scheme in following way: • Basic scheme S1 and recursive scheme SL • nL= Number of bits users have when executing SL • tL= Number of columns in the matrix to represent the string • 4*: U and DB execute k times the scheme SL-1 .In each execution, U gets 1 of the bits of zaout of a string of length nL-1=k. SL. After k executions, U has the number zathat U can construct as in step 5 of the basic scheme • Scope of further improvement??
The Recursive Scheme(Cont.) • In first use of SL-1for retrieving za,U retrieves most significant bit of zaand so on.. • So in each of these executions the DB can be smaller in a factor of k • U and DB execute k times the scheme SL-1 • In d-th execution, U gets d-th most significant bit of zaout of a string of length nL-1=SL held by DB .After k executions, U has the number za that U can construct as in step 5 of the basic scheme
The Recursive Scheme(Cont.) • correctness√ • non-triviality: • Communication complexity: n1/(L+1).(kL+L.k) • Assumption: L=O(√logn/logk) so CC=2O(√logn.logk) • If k= ncthen CC= nO(√c) and if k=logcn • then CC=2O(√logn.loglogn) • privacy? The calculation proceeds exactly as the proof for the basic scheme and leads to following theorem:
Complexity of PIRs – overview of the results Communication: • “recursive” PIR of [KO97]: for every c: O(nc) • [Cachin, Micali, Stadler, 1999]: poly-logarithmic in n • [Lipmaa, 2005]: O(log2n) For practical analysis see: • [Sion, Carbunar] On the Computational Practicality of Private Information Retrieval. their conclusion: It is the time-complexity that matters. In real-life: it is still more practical to transmit the entire database.
Extensions • Symmetric PIR (also protect privacy of the database). [Gertner, Ishai, Kushilevitz, Malkin. 1998] • Searching by key-words [Chor, Gilboa, Naor, 1997] • Public-key encryption with key-word search [Boneh, Di Crescenzo, Ostrovsky, Persiano]
Additional Security to User • Experimentation Attack: • Replace the j-th patent/entry in its database with “garbage” • After executing the PIR scheme, see if the user “protests” • How to prevent? • Obtain a certificate • DB entries augmented with authentication readability • Communication-efficient certificate for the user from root of commitment tree (DB signs all the messages)
Additional Security for the Database • Issue: The user does not get multiple entries for the price of one. • Solution: • Ensure zero-knowledge protocol • User’s query is well-formed (correct number of QRs and QNRs) • Communication Complexity: for arbitrary ϵ > 0 • which is proportional to the original query size plus the communication complexity of a zero-knowledge proof
Membership Queries • Problem Statement: • The databases inserts n l-bit strings into a data structure, which supports search operations on strings. The user conducts an oblivious walk on the data structure until either the word w is found, or U is assured of the fact that w is not one of s1;:::;sn • Solution: Combination of • Perfect Hashing • PIR Schemes
Conclusions • Result has been established • With the additional security for the database, the PIR model can be formulated as one-out-of-n Oblivious Transfer protocol, a generalization of the well-known 1-out-of-2 OT primitive, which plays important role in cryptographic design.
Open problems: • Improve efficiency. • Construct new extensions. • How to replace the QR problem in the protocol by shortest lattice vector problem? • For current security parameter which is poly-logarithmic in n, one can achieve CC=2O(√logn.loglogn),can one achieve poly-logarithmic communication complexity? • (Solved in 1999 by Cachin, Silvio Micali and Stadler)
Advances in Computational PIR • The current paper in 1997 by Kushilevitz and Ostrovsky • In 1999, Christian Cachin, Silvio Micali and Markus Stadler achieved poly-logarithmic CC (based on Phi-hiding assumption) • In 2004, HelgerLipmaa achieved log-squared communication complexity O(llogn+Klog2n), where l is the length of the strings and k is the security parameter • In 2005 Craig Gentry and ZulfikarRamzan achieved log-squared communication complexity which retrieves log-square (consecutive) bits of the database
Advances in Computational PIR • In 2009, HelgerLipmaadesigned a computational PIR protocol with communication complexity O(llogn+Klog2n) and worst-case computation of O(n / log n) public-key operations. • Amortization techniques that retrieve non-consecutive bits have been considered by Yuval Ishai, EyalKushilevitz, RafailOstrovsky and AmitSahai.
Advances in Information theoretic PIR Achieving information theoretic security requires the assumption that • there are multiple non-cooperating servers, each having a copy of the database. Without this assumption, any information-theoretically secure PIR protocol requires an amount of communication that is at least the size of the database n.
Relation to other cryptographic primitives • One-way functions are necessary, but not known to be sufficient, for nontrivial (i.e., with sub-linear communication) single database computationally private information retrieval. • Oblivious transfer, also called symmetric PIR, is PIR with the additional restriction that the user may not learn any item other than the one she requested.
Thank you! Questions?