280 likes | 2.2k Views
Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing. Cong Wang 1 , Qian Wang 1 , Kui Ren 1 and Wenjing Lou 2 1 Illinois Institute of Technology, 2 Worcester Polytechnic Institute Proceedings of IEEE Infocom 2010. Computer Systems Lab Group Meeting
E N D
Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2 1 Illinois Institute of Technology, 2 Worcester Polytechnic Institute Proceedings of IEEE Infocom 2010 Computer Systems Lab Group Meeting Presented by: Zakhia Abichar February 25, 2010
data user user user External Audit party Cloud network Cloud Computing • With cloud computing, users can remotely store their data into the cloud and use on-demand high-quality applications • Using a shared pool of configurable computing resources • Data outsourcing: users are relieved from the burden of data storage and maintenance • When users put their data (of large size) on the cloud, the data integrity protection is challenging • Enabling public audit for cloud data storage security is important • Users can ask an external audit party to check the integrity of their outsourced data
data user user user External Audit party Cloud network Third Party Auditor (TPA) • External audit party is called TPA • TPA helps the user to audit the data • To allow TPA securely: • 1) TPA should audit the data from the cloud, not ask for a copy • 2) TPA should not create new vulnerability to user data privacy • This paper presents a privacy-preserving public auditing system for cloud data storage
Outline • Introduction • System and threat model • Proposed scheme • Security analysis & performance evaluation
Introduction • Cloud computing gives flexibility to users • Users pay as much as they use • Users don’t need to set up the large computers • But the operation is managed by the Cloud Service Provider (CSP) • The user give their data to CSP; CSP has control on the data • The user needs to make sure the data is correct on the cloud • Internal (some employee at CSP) and external (hackers) threats for data integrity • CSP might behave unfaithfully • For money reasons, CSP might delete data that’s rarely accessed • CSP might hide data loss to protect their reputation
Introduction • How to efficiently verify the correctness of outsourced data? • Simply downloading the data by the user is not practical • TPA can do it and provide an audit report • TPA should not read the data content • Legal regulations: US Health Insurance Portability and Accountability Act (HIPAA) • This paper presents how to enable privacy-preserving third-party auditing protocol • First work in the literature to do this
System and Threat Model • U: cloud user has a large amount of data files to store in the cloud • CS: cloud server which is managed by the CSP and has significant data storage and computing power (CS and CSP are the same in this paper) • TPA: third party auditor has expertise and capabilities that U and CSP don’t have. TPA is trusted to assess the CSP’s storage security upon request from U
A note on auditing • What’ is auditing? • Reference: http://searchcio.techtarget.com/searchCIO/downloads/AuditTheDataOrElse.pdf
A Public Auditing Scheme This is a framework from previous related work. It is adapted to suit the goals of this paper • Consists of four algorithms (KeyGen, SigGen, GenProof, VerifyProof) • KeyGen: key generation algorithm that is run by the user to setup the scheme • SigGen: used by the user to generate verification metadata, which may consist of MAC, signatures or other information used for auditing • GenProof: run by the cloud server to generate a proof of data storage correctness • VerifyProof: run by the TPA to audit the proof from the cloud server
CSP CSP TPA TPA user TPA Setup KeyGen SigGen File F Public & Secret parameters Verification Metadata Audit issues an audit message or a challenge to GenProof File F Response message VerifyProof Verification Metadata
key MAC File block code code 2 … code 1 Block 2 … code n Block 1 Block n Block 1 Block 2 … Block n Cloud user TPA Basic Scheme 1 File is divided into blocks Message Authentication Code (MAC) • Audit • TPA demands a random number of blocks and their code from CSP • TPA uses the key to verify the correctness of the file blocks • User computes the MAC of every file block • Transfers the file blocks & codes to cloud • Shares the key with TPA Drawbacks: -The audit demands retrieval of user’s data; this is not privacy-preserving -Communication and computation complexity are linear with the sample size
user code 2 code 2 code 2 … … … code 1 code 1 code 1 Block 2 Block 2 … … code n code n code n Block 1 Block 1 Block n Block m Basic Scheme 2 Key 1 Key 2 … Key s Cloud TPA • Setup • User uses s keys and computes the MAC for blocks • User shares the keys and MACs with TPA • Audit • TPA gives a key (one of the s keys) to CSP and requests MACs for the blocks • TPA compares with the MACs at the TPA • Improvement from Scheme 1: TPA doesn’t see the data, preserves privacy • Drawback: a key can be used once. • The TPA has to keep a state; remembering which key has been used • Schemes 1 & 2 are good for static data (data doesn’t change at the cloud)
Privacy-Preserving Public Auditing Scheme Proposed scheme • Uses homomorphic authenticator • Also uses a random mask achieved by a Pseudo Random Function (PRF) Homomorphic authenticator Block 1 Block 2 … Block k Verification Metadata Verification Metadata Verification Metadata Aggregate Verification Metadata A linear combination of data blocks can be verified by looking only at the aggregated authenticator
Privacy-Preserving Public Auditing Scheme - In addition to Aggregate Authenticator, the TPA will receive a linear combination of file blocks: Random Mask by PRF • The PRF function masks the data • It has a property of not affecting the Verification Metadata vi are random number mi are file blocks • If TPA sees many linear combinations of the same blocks, it might be able to infer the file blocks • This, we also use a random mask provided by the Pseudo Random Function (PRF) Block 1 Block 1 Block 1 with PRF Mask Verification Metadata Verification Metadata Equal r is the mask
KeyGen CSP user TPA user sk SigGen Block 1 Block 2 … Block n Public key (sk)& Secret key (pk) Block 2 … Block 1 Block n σ1 σ2 … σn • TPA sends a challenge message to CSP • It contains the position of the blocks that will be checked in this audit σ1 σ2 … σn Setup 1- User generates public and secret parameters 2- A code is generated for each file block 3- The file blocks and their codes are transmitted to the cloud Audit Selected blocks in challenge -CSP also makes a linear combination of selected blocks and applies a mask.Separate PRF key for each auditing. -CSP send aggregate authenticator & masked combination of blocks to TPA GenProof Aggregate authenticator Masked linear combination of requested blocks Compare the obtained Aggregate authenticator to the one received from CSP VerifyProof Aggregate authenticator
Properties • The data sent from CSP to TPA is independent of the data size • Linear combination with mask • Previous work has shown that if the server is missing 1% of the data • We need 300 or 460 blocks to detect that with a probability larger than 95% or 99%, respectively
More Possible Extensions • Batch auditing • There are K users having K files on the same cloud • They have the same TPA • Then, the TPA can combine their queries and save in computation time • The comparison function that compares the aggregate authenticators has a property that allows checking multiple messages in one equation • Instead of 2K operation, K+1 are possible • Data dynamics • The data on the cloud may change according to applications • This is achieved by using the data structure Merkle Hash Tree (MHT) • With MHT, data changes in a certain way; new data is added in some places • There is more overhead involved ; user sends the tree root to TPA • This scheme is not evaluated in the paper
Performance • Reference [11] doesn’t have privacy-preserving property • TPA can read the information
Batch Auditing • Number of auditing tasks increased from 1 to 200 in multiple of 8 • Auditing time per task: total auditing time / number of tasks
Performance with Invalid Responses • In batch auditing, true means that all of the messages are correct • False means at least one is wrong • Divide batch in half, repeat for left- and right parts • Binary search Wrong 1 2 3 4 5 6 7 8 9 10 Wrong 1 2 3 4 5 6 7 8 9 10 1,2,3 and 9,10 1 2 3 4 5 6 7 8 9 10 3 and 10 1 2 3 4 5 6 7 8 9 10
The more errors that there is, it takes more time to find them