160 likes | 171 Views
This study explores using low-degree homomorphism to strengthen privacy in private conjunction queries, ensuring secure matching records retrieval while concealing data values and attributes from servers.
E N D
Using low-degree Homomorphism for Private Conjunction Queries Dan Boneh, Craig Gentry, ShaiHalevi, Frank Wang, David Wu
Private Conjunction Queries • Clinet has an SQL query of the typeSELECT ⋆ FROM db WHERE a1=v1 AND … AND at=vt • Want to hide the values vi from the serve • maybe also the attributes ai themselves • Our protocols return the indexes of the matching records • The client can use PIR or ORAM to fetch the records themselves
The Basic Approach • Encode database as a polynomial • A set S is encoded as a polynomial P(X) s.t. P(s)=0 for all s S • Use Kissner-Song trick • If P1(X), P2(X) represent S1, S2, the a random linear combination represents the intersection of S1, S2, whp. • If and then A(X) does not leak any information beyond the intersection
Two-Party Settings • Server has database • Client has secret-key for SWHE scheme • Server encode database as bivariate polynomial D(x,y) • D(r,a)=v if record r has attribute a=value v • Size of D ~ size of database
Conjunction Queries • “attr1=val1 AND … AND attrt=valt” • Client interpolates Q(y) s.t. Q(attri)=vali • Send the encrypted Q to server • For simplicity send also attr1,…,attrt in the clear • Server computes • Additive homomorphism suffices • A(r,attri)=0 iff D(r,attri)=vali • Server defines Ai(X) = A(X,attri) • Roots of Ai(X) are records that have attri=vali
Conjunction Queries (cont.) • Server uses Kissner-Song trick, set for random ’s • Whproots of B are the records in the intersection of the ’s • Still additive homomorphism is enough • Need more if attri’s are not send in the clear • Server sends encrypted to client • Client decrypts, find roots , uses PIR/ORAM to get actual records • To hide also the attributes we need higher-degree homomorphism
Three parties: Client-Proxy-Server • Proxy has encrypted inverted index • For every attr=val in DB, keeps a pair (t, Enc(P)) • Tag t = Hash(“attr=val”) • P is polynomial s.t. P(r)=0 if record #r contains this “attr=val” pair • Client sends tags ti for attri=valuei in query • Proxy chooses randomizers Ri sets • Q has roots in the intersection • Server obliviously decrypts for Client • Client factors Q, finds roots , uses PIR/ORAM to get actual records
Conserving Bandwidth • is a wasteful representation • Degree ~ 2 max(deg(Pi)) • High degree needed for Q to not leak information on the Pi’s • Reducing to max(deg(Pi))+min(deg(Pi)) easy: • Say P1 has smallest degree, then set • The si’s are random scalars • , deg(R)=deg(Q’), deg(R’)=def(P1) • Can we reduce it further? • We show how to get min(deg(Pi))
Polynomial GCD • P1, P2 are (monic) polynomials for the sets S1,S2 • The smallest polynomial defining is • G does not leak information on P1,P2 beyond the intersection • Computing Enc(G) from {Enc(Pb)}b takes high homomorphic capacity
Reducing The Degree • Instead of , use • It has degree • If Q is a random multiple of G, so is Q’ • Computing Enc(Q mod P1) is easier • Basic Solution: • Store also • Given the encrypted coefficeints of Q • () • Compute • Only takes quadratic homomorphism
Reducing The Degree (cont.) • Storage/homomorphism tradeoff • Can store less encryptions of by using higher homomorphic capacity • E.g., Store , • When deg(Q)=d+m, it takes log m steps to reduce Q mod P1 • Using deg < 2t deg < d
Speedup Using Batching • Recall: a HE ciphertext encrypts an array of L values • L is at least a few hundred, maybe more • Can use it to get significant speedup: • Break the database into L small db’s • Each record is places at random in one of the small db’s • Run the same query against all the small db’s at once • The i’th database in the i’th entry of all the cipehrtexts • So we get L lists of indexes instead of one • i’th list has the indexes of the records in the i’th database that match the query • Lists are much shorter polynomials have much smaller degree
Implementing 3-party protocol • Two implementation: • Only the basic scheme using additive cryptosystem (Pallier) • The full scheme using the [Bra’12] HE • Only the 2nd implementation scales to large databases • Batching is key • With and without the bandwidth-reduction GCD trick • Without it we need lower homomorphism, smaller parameters • All tests run against a 1-million record database, executing a 5-attribute conjunction () • Balanced tests: each matches roughly same # or records • Unbalanced: matches only ~5% as many as
Balanced Queries Time (minutes) ~2000 matches per tag,8 minutes, 1MB Bandwidth (MB)
Unbalanced Queries – Time (min) (2.5K,2.5K,5K,10K,50K) (10K,20K,25K,50K,200K) (2.5K,2.5K,5K,5K,350K)
Unbalanced Queries – Bandwidth (MB) (2.5K,2.5K,5K,10K,50K) (10K,20K,25K,50K,200K) (2.5K,2.5K,5K,5K,350K)