290 likes | 305 Views
Efficient information retrieval for ranked queries in cost-effective cloud environments using a proxy server to classify queries, construct mask matrices, and filter search results without revealing file details.
E N D
IEEE INFOCOM 2012, March Orlando, USA Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b, and Guojun Wanga a Central South University, China b Temple University, USA 2012-3-26
Cloud Cloud Computing Model • Cloud computing as a new commercial paradigm enables users to outsource data to a cloud • Data is described by a set of keywords • Users retrieve files with a set of keywords F1: { A, B} A, B F2: {B, D} F1 F2 F3: {C, D} Bob … • Cloud will learn user’s search pattern and access pattern
Cloud Private search (Ostrovsky et al, CRYPTO 2005) • Given a public dictionary that contains all keywords, e.g., dictionary=<A,B,C,D> F1: { A, B} F2: {B,D} F3: {C,D} … • key trick: map unmatched files to 0 [1] [1] [0] [0] [1] [1] [0] [0] • E(0)*E(0)=E(0+0)=E(0) • E(0)^F3=E(0*F3)=E(0) Bob F1NA F1 F2 0 NA F1 F2 0 NA • A compressed version of all files F1 F2 F3 • E(F2)* E(0) =E(F2) Homomorphic encryption E(x)*E(y) = E(x+y) E(x)^y = E(x*y) F2 0 • survival • collision • survival • unmatched
Problem: Cost Grows Linearly • Processing each query is expensive. Given n users, the cloud needs to execute n queries • Performance bottleneck • Cloud will return all matched files, even if a user is interested in smaller percentage • Waste bandwidth
Cloud Our Solutions: EIRQ Scheme Efficient Information retrieval for Ranked Query • A proxy server (ADL) is introduced between the users and the cloud (trusted) • Aggregate user queries • Distribute searching results • Support ranked query … ADL
Cloud Rank queries • Queries are classified into ranks • ADL constructs a mask matrix • Cloud filters a certain percentage of matched files F1: { A, B} F2: {B, D} F3: {C, D} Rank-0 query: 100% Rank-1 query: 50% … {A, B} Rank 0 F1 F2 Alice Mask matrix {A, C} Rank 1 F1 F2 F3 F1 F3 ADL F3 is filtered with 50% Bob • Challenges: the cloud • Cannot know which files are filtered/returned • Cannot know each queries’ rank
Intuition of EIRQ • Key techniques: • Construct a mask matrix to protect query ranks • Filter files without knowing which files are filtered User ADL Cloud Step 1: Keywords, rank QueryGen Matrix Construct Mask matrix Step 2: FileFilter Step 3: File Recovery Step 4: Buffer Certain percentage of files matching user keywords
Goal • Queries are classified into 0,1,…,r-1 ranks. • Rank-i query retrieves (1-i/r) percentage of matched files … … … … Files that match rank 1 queries Files that match rank 0 queries Files that match rank i queries Filtered with probability 1/r Filtered with probability i/r Will not be filtered • The cloud • Cannot know which files are filtered/returned • Cannot know each queries’ rank
Cloud Construct Mask Matrix • ADL constructs a maskmatrix that is encrypted with its publics key, and sends it to the cloud A [1] [1] {A, B} Rank 0 Alice B [1] [1] C [1] [0] Number of keywords {A, C} Rank 1 D ADL [0] [0] Bob … … Number of ranks, r=2 For a keyword: Number of 1s is determined by the rank of query it appears: r-i High rank takes over Ratio of 1s to r determines the probability of a file containing it to be returned: (r-i)/r High ratio takes over [0] [0]
Cloud Filter Files The cloud chooses a random column for each file F1: { A, B} F2: {B, D} F3: {C, D} … For F3: 50% 50% E(0)*E(0)=E(0) E(0)*E(0)=E(0) E(0)^F3 =E(0) E(1)^ F3 =E(F3) buffer A [1] [1] B [1] [1] C [1] [0] … A file, matched rank i query, the probability to be filtered i/r D [0] [0] … … [0] [0] F1 and F2 will be returned F3 will be filtered with 50% ADL
Setup • Our simulations are conducted with MATLAB R2010a, running on a local machine with an Intel Core 2 Duo E8400 3.0 GHz CPU and 8 GB RAM. We summarize the parameters in Table.
Percentage of Returned Files • Queries are classified into 0 to 3 ranks • Rank-0: 100% • Rank-1: 75% • Rank-2: 50% • Rank-3: 25% • Our results: • Rank-0: 100% • Rank-1: 75% • Rank-2: 52% • Rank-3: 29%
Computation Cost • ADL: 14.8270s-14.8788s • EIRQ:14.8664s-14. 9269s
Communication Cost Communication cost • EIRQ works better when only a few users • 5 users in each rank, 4 common keywords • EIRQ : 439KB buffer • ADL: 834KB buffer
1 2 3 An ADL is introduced to avoid performance bottleneck of the cloud EIRQ scheme allows the queries with higher rank to retrieve higher percentage of matched files Our solution protects access pattern, search pattern, and rank privacy from the cloud Conclusion
Background System Model Adversary Model Ostrovsky Scheme
Cloud System model • Users in the organization send queries to ADL • ADL will aggregate user queries and query cloud with a combined query • Cloud will return the files matching the combined query to ADL • ADL distributes results to each user ADL Organization Users
Adversary Model • ADL is assumed to be trusted by all users • Cloud is the only adversary • Honest but curious • Obey our schemes, but still want to know some additional information • Our goal is to protect from the cloud • Access pattern • Search pattern • Rank privacy: hiding the rank of each user query
[1], [1], [0], [0], [0] Cloud Ostrovsky Scheme (CRYPTO 2005) F1 : A, B Alice F2 : B F3 : C Public dictionary: <A, B, C, D, E> Alice’s keywords: A, B Alice’s query is a string of 0s and 1s Encrypted using homomorphic encryption • Let E() be encryption • E(x)*E(y) = E(x+y) • E(x)^y = E(x*y)
[1], [1], [0], [0], [0] Cloud Ostrovsky Scheme (CRYPTO 2005) F1 : A, B F2 : B F3 : C Alice’s query * The magic is that unmatched file F3 is processed to 0 [0] [2] [1] [2] ^F1 [1] ^F2 [0] ^F3 Alice’s Buffer [2,2* F1] [1, 1*F2] [0,0]
Cloud Ostrovsky Scheme (CRYPTO 2005) [2,2* F1] [1,1*F2], [0,0] Alice Decrypts to obtain F2 directly F1 is obtained by dividing 2* F1 by 2 The buffer size only relates to the number of matched files
Cloud Cloud Security • The cloud may leak user privacy • Searchable encryption • Will not reveal what the users are searching for (search pattern) • Will reveals whether two users are interested in the same files (access pattern) {A, B} F1: {A, B} F1 F2 Alice F2: {B} {A, C} F3: {C} F3 F1 Bob
Cloud Construction of EIRQ • Step 1. Each user runs the QueryGen algorithm to send keywords and query rank to the ADL File 1: { A, B} Dictionary: <A, B, C, D> 0~2 ranks: Rank 0: 100% Rank 1: 50%, Rank 2: 0% File 2: {B} File 3: {C} A, B, Rank 1 Alice B, C, Rank 1 ADL Bob