280 likes | 377 Views
ENCRYPTED QUERY PROCESSING. Umur Türkay 2006103319. GENERAL INFORMATION. T here is a big challenge to consider when using database which is security of data. Data may either be con fi dential or private.
E N D
ENCRYPTED QUERY PROCESSING Umur Türkay 2006103319
GENERAL INFORMATION • There is a big challenge to consider whenusing databasewhich is security of data. Data may either be confidentialor private. • To give an example, for a bank it is important to hide the accounts of customer's databecause it is confidentialwhereas for a hospital it is vital to hide patient's data since itis private. • Firms using databases as a service, have to take into account that their service providers maynot be trusted. The employee working at service provider side should not see the dataof the data owner. • In literature some solutions regarding these security problems havebeen proposed. These solutions are mainly about using access control mechanisms.But access control mechanisms alone are not enough. • One has to think that the harddrive may be stolen. Most important solution to data security problem in DAS isencrypting the database. But if database is encrypted another big problem arises:Querying the database. • The service provider may continue to do administration issueswithout knowing the database key but in order to serve as a query engine somehowthere should be hints for the service provider to execute queries.
GENERAL INFORMATION • Research in database encryption started with key management . After thatsome techniques have been developed to efficientlysearch keywords based on encryptedtextual string. • Functionalities provided by these two approachesare very limited andinsufficientin executing complex SQL queries over encrypted data. • No matter whetherdatabase is relational or XML, (or even text file) the nave way of encrypted queryprocessing is sending encrypted database totally to the data owner . In this caseservice provider does not serve as a query engine and the main responsibility of queryprocessing is at the data owner side. This schema can be accepted only if small volumes of data are used and the data owner had the capability of decrypting the wholeencrypted database. • But, as expected, this is not a real world scenario since volumeof real world data is very large in most cases. Main problems with this schema areexpensive cost of data transportation due to limited bandwidth, decryption and queryprocessing of the whole database at the client side who may have limited processingcapability. • So there has to be another way in order to process secure encrypted XML data efficiently.
A bucketization approach • Anovel bucketization and partitioning structure is proposed which theninfluencedmany of the papers in literature. An algebraic framework is described forquery rewriting over encrypted attributes. • The main idea is to map the plaintext valuesto ciphertext values by splitting the domain values of plaintexts into some partitionsand giving them bucket ids. Each relation R (A1, A2, ... , AN) is stored as an encryptedrelation: RS (encryptedtuple, A1_S, A1_S, ... , A1_S) where the attribute encrypted tupleis the encrypted string that corresponds to a tuple in R. Each attribute Ai_S is the index for the attribute Ai. The domain of Ai is partitioned into partitions p1, p2, ..., pn such any two partitions do not overlap and the partitions taken as a whole coverthe whole domain. Different attributes may be partitioned using different partitionsfunctions. These partition functions may be any two functions satisfying the abovetwo conditions. • To give an example, consider the following plaintext table:
The ages of the customers lie in the range [0,100]. Assume that the whole rangeis divided into 10 partitions using equi-width partitioning. • The partitions are [0, 10),[10, 20), [20, 30), [30, 40), [40, 50), [50, 60), [60, 70), [70, 80), [80, 90) and finally[90, 100). • Then assign each bucket a unique ID. The IDs assigned may be order preservingor random. Suppose that the IDs are random and 1, 5, 4, 9, 8, 3, 7, 6, 2, 7, and 10respectively. • Table 1 and 2 summarize this situation.
Example Queries • When a query like • "SELECT * FROM Customers WHERE Customer Age < 30“is asked, client looks at mapping table and rewrites this query. • The query becomes like • "SELECT * FROM Customers WHERE Customer Age IN (1, 4, 5)" • In this exampleif order preserving encryption were used for bucket ids, then the translated query wouldbe like • "SELECT * FROM Customers WHERE Customer Age < (OPES (Bucket ID of30))" • It should be noted here that the result set returned by the server is not the exactresult set. It is a super set of the actual result set. • To give an example if the queryasks the customers who are 25 years old, the translated query becomes like • "SELECT* FROM Customers WHERE Customer Age IN (4)" • Here it is obvious that the serverreturns the results of customers who are between 20 and 30. So in order to get theexact results, the results have to be decrypted and post filtered at client side. Thesuccess in this schema is that the mapping function uses order preserving encryptionfunctions . As a result the range queries can successfully be supported.
References for other examples • You can seeanotherexample on B.Hore, S.Mehtroa, G.Tsudik , “Privacy Preserving Index for Range Queries",Proceedings of 30th VLBD Conference, Canada 2004. • This paper concentrates on howbucket sizes are determined. Determining bucket sizes is important in order to processqueries efficiently. If they are too big or too small, then unnecessary tuples will bereturned from server which will increase the post filteringtime at client. • In E. Damiani, S. Jajodia, \Balancing Confidentiality and Efficiency In UntrustedRelational Databases", Proceedings of 30th Very Large Databases Conference, pp23-30, October 2003a hash based method suitable for selection queries is given.In order to execute interval based queries B+ tree structures are adapted to the DAS model.
W3C Encryption Standard • W3C has proposed standards for XML encryption. According to the mentioned standards, thetags and the contents that are going to be encrypted are replaced with a string calledthe Encrypted Data element. • There are four sub elements of Encrypted Data. First oneis the encryption method which indicates the encryption algorithm and the parametersof the specifiedalgorithm. • Second one is the key information which indicates the keyname but not the value. • Third one is the cipher data which contains cipher value as subelement and indicates the encrypted element together with its content. • The last one isthe encryption properties which contain additional information related to generation of encrypted data.
Encrypted Query Processing of XML Documents • Since the serviceprovider doesn't have the decryption key, some clues for answering queries should begiven to the service provider. • These clues should be just enough for service provider toreturn the encrypted tuples but not sufficient to retrieve the structure (schema) or thecontent (instance) of the XML document. • These clues are usually given by maintainingcrypto - indexes on either the service provider or the data owner side like relationaldatabases. • The general architecture of encrypted query processing is summarized inFigure 1. The user creates a query which is then translated into its encrypted formby the query translator at the client side. The rules of encryption are determined bythe client and given to the query translator. • After the query becomes secure enoughnot to show the structure of the XML database, the service provider answers the queryby some predefined rules that are at the server side. • The result set returned by theservice provider is not the exact result set that the user wants. It is a superset of theactual result set. The client decrypts the results and post filters the results in order toget the actual result set.
NOTE • It should be noted that the client should have some processing capability in orderto post process the results. The main purpose of encrypted XML query processing isto increase the work done by the service provider and decrease the work done by theclient.
ATTACK TYPES • Before going into details of encrypted query processing some brief informationabout attack types to encrypted documents is needed. • Mainly there are two types ofattacks. If the attacker can finda match between the cipher text and plain textvalues, then it is possible for the attacker to determine the algorithm and the keyused in the encryption. This attack is called "frequency based attack". This may bepossible by knowing the exact frequency of domain values (e.g. Mehmet white has won10 prizes and there is only one value in the encrypted database that occurs 10 times,so that the attacker can infer that Mehmet corresponds to that encrypted value). • Ifthe attacker can finda lot of matches like this then it is possible for him to guess theencryption schema. • Another attack type is "size based attack". If the length of theplain text determines the length of cipher text the attacker may eliminate the candidatedatabases whose lengths do not match. • These attack types are going to be referred inthe next sections when investigating encrypted query processing of XML documents.
INDEX STRUCTURES • There are two types of index structures for encrypted XML query processing. • One of them is the value index and the other one is the structural index. • Purposeof structural index is to determine whether the path in the query matches any of thepaths in the XML documents. • Purpose of value index is to check the constraints inrangequeries.
Encrypted Query Processing of XML Documents • Pre-order and post-ordertraversal was firstused in [10] to determine the ancestor descendant relationship between nodes. The proposition was: "for two given nodes x and y of a tree T, x isancestor of y if and only is x occurs before y in the pre-order traversal of T and after yin the post order traversal". • After XML has emerged modifiedversion of this numbering schema is used in many papers. The main logic is as follows: every tag is given asequence number starting from 1. Sequence numbers given to tags increment by 1. Thesequence number of the opening tag of a node represents the left bound of the nodeand the sequence number of the closing tag represents the right bound of the node.This enumeration brings up a general rule such that; for a parent node p and childnode c p.leftbound< c.leftbound and p.rightbound > c.rightbound. Thedisadvantageof this schema is that whole tree has to be renumbered in case of insertions. One ofthe approaches to overcome this problem is to leave spaces when numbering the nodes.[8]. But even in that case after the spaces finishthe whole tree has to be numberedagain. • Figure 2 shows an example of this enumeration.
Encrypted Query Processing of XML Documents • In order not to disclose the hierarchical structure of the XML document, theschema explained above is modified in [9]. The name discontinuous structural indexis given to the proposed schema. In DSI, the interval (0, 1) is assigned to the root.The children are assigned sub intervals of the parent's interval. The intervals of thechildren are determined by an algorithm at run time. The general rule still holds; fora parent p and a child c p.leftbound < c.leftbound and p.rightbound > c.rightbound.By this way the structure of XML is hidden from the server. • Table represents themodified schema.
Encrypted Query Processing of XML Documents • Two tables are used for structural index at server side in [8]. • One of them isthe encryption block table and the other one is the DSI table. DSI table holds thetags in one column and the corresponding intervals in other column. Only confidentialtags are encrypted. This satisfiesmore efficientquery processing on nodes which are un-encrypted.
Encrypted Query Processing of XML Documents • There is one more thing that has been done for structural indexing. The sametags which are continuous in DSI table are merged. To give an example if there aretwo encrypted "Book" elements and the first "Book" element's interval is (0.23, 0.34)and the second "Book" element's interval is (0.34, 0.47) then these buckets are mergedinto one bucket whose tag is encrypted form of "Book" element and interval is (0.23,0.47). By doing this the structure of XML index is hidden from server. It is provenin [8] that there are a number of candidate databases where there are m encryptionblocks containing n_i leaf nodes that are represented by k_i intervals.
Splitting and Scaling • In [8], value index is called order preserving encryption with splitting and scaling(OPESS). It is maintained at server side to support range queries. • Splitting and scalingis used to prevent frequency based attacks. By using splitting each plaintext value isencrypted into one or more ciphertext values. • As a result an unencrypted word isrepresented by differentencrypted words. • Scaling is done after splitting. By usingscaling target domain size is multiplied. Number of occurrence of encrypted words ismultiplied by a scale factor. • Main purpose of splitting and scaling is to change frequencydistribution of encrypted data values in value index so that they are differentfrom the 12 frequenciesof originalvalues.
Splitting • For splitting suppose let's say that there are k distinctvalues v1, v2, v3, ..., vn in the input set which satisfy condition v1 < v2 < v3 < ... <vnand whose number of occurrences are n1, n2, n3, ..., nk. • Here n_ican be expressedwith 3 consecutive integers such that n_i= k1_i * (m - 1) + k_2i * m + k3_i * (m +1) wherek1_i, k2_i, k3_iare non-negative integers. • The purpose is to map a data valuewith n_ioccurrences to k1_i + k2_i + k3_idistinct encrypted values each with m-1, m, m+1occurrences so that the distribution of encrypted values will nearly be flat to the server.
Example • To give an example the plaintext value "90" with number of occurrences 34 (34 = 1 *6 + 4 * 7 + 0 * 8) has to be encrypted into 1 + 4 + 0 = 5 differentencrypted valuesusing 5 differentkeys with each distinct encrypted value having number of occurrences6 or 7. • Splitting itself alone is not secure. The attacker who knows the exact frequencyof the domain can group adjacent ciphertext values together until he findsthe inputdistribution. In order to avoid this, after encryption, the values are scaled by a scalefactor which is determined at run time randomly. • Drawback of this approach is thatthe size of the index becomes very large. While preserving security efficiencyis lost.
Main contribution of the approach in [8] is allowing the execution of range queriesat the server side by employing order preserving encryption with splitting and scaling. • Proposed value and structural indexes are provably secure. Sensitive structural information and value associations are hidden from attackers who possess exact knowledgeof domain values and their occurrence frequencies. • Splitting and scaling make the encrypted values in the database nearly uniformly distributed. Thusit prevents an attacker from making a statistical analysis. • Since value and structuralindexes are maintained at the server side, burden of query processing is mainly at theserver side. • In the proposed approach, the client should have a query translator andalso a simple query engine in order to post filterthe results after decrypting. One ofthe limitations of OPESS is that security achieved by scaling encrypted data causesan increase in data size. Increase in data size implies extra time in query processing. • Another limitation of the approach in [8] is that it can not provide security againstprior knowledge of tag distribution, query workload distribution and correlation amongdata values. Also this approach is not very efficientin insertions and updates.
QueryProcessing • The query processing in [8] is as follows: • When a query is submitted to server,the query translator at client transforms the query into encrypted form. The querytranslator replaces every tag with the corresponding encrypted tags in the structuralindex. The DSI of the tags in the query are found from the DSI table. Those intervalsare used to findout the bucket ids in the encryption block table. The bucket idsreturned are the results of structural index. In second phase the client translates thevalue-based constraints in the query. Server findsout the bucket ids satisfying thevalue index. Finally server intersects the bucket ids returned from structural indexand value index. The result of intersection is sent to client for further decrypting.
Some other approaches • An approach having three phases named asquery preparation, query pre-processing and query execution. [7] • Query Aware Decryption Approach [4] • Usage of nonces [3], [2]
REFERENCES [1] O. Ünay and T. I. Gündem. Parallel Processing of Encrypted XML Documents in Database as a Service Concept. Information Technology and Control, 2010, Vol. 39, No.4, 301-309. [2] E. Damiani, S. Jajodia, \Balancing Con¯dentiality and E±ciency In UntrustedRelational Databases", Proceedings of 30th Very Large Databases Conference, pp23-30, October 2003. [3] M. Schre°, K. Grun, \Sem-Crypt - Ensuring Privacy of Electronic Documentsthrough Semantic-Based Encrypted Query Processing", 21st International Conference on Data Engineering, 2003. [4] B.Hore, S.Mehtroa, G.Tsudik , \PrivacyPreservingIndexforRangeQueries", Proceedings of 30th VLBD Conference, Canada 2004. [5] R. Agrawall, J. Kiernan, R.Srikant, \OrderPreservingEncryptionforNumeric Data", ACM SigmodRecord, France, 2004. [6] C. Li, B.Iyer. , \Executing SQL over Encrypted Data in database service providermodel", In Proceedings of ACM Sigmod, USA, 2002. [7] L.Feng, W. Jonker, \E±cient Processing of Secured XML Data", OTM Work-shops,pp. 704-717, 2003. [8] H. Wang, L. Lakshmanan , \An E±cient Secure Query Evaluation over EncryptedXML Databases", Very Large Databases, 2006. [9] Y. Yang, W. Nig, H. Lau , \An E±cient Approach to Support Querying SecureOutsourced XML Information", The 19th International Conference on AdvancedInformationSystemsEngineering, pp. 151-171, 2006. [10] G. Tsudik , \On Using Secure Hardware in Outsourced Databases", Lecture Notes in ComputerScience, Vol. 3749 / Part 1, pp. 213-220, Springer, 2005.