420 likes | 608 Views
An Update Model for Network Coding in Cloud Storage Systems. 2012 50th Annual Allerton Conference on Communication , Control, and Computing Mohammad Reza Zakerinasab Mea Wang Department of Computer Science University of Calgary. Outline. Introduction Related Works Proposed System
E N D
An Update Model for Network Coding in Cloud Storage Systems 2012 50th Annual Allerton Conference on Communication, Control, and Computing Mohammad Reza Zakerinasab Mea Wang Department of Computer Science University of Calgary
Outline • Introduction • Related Works • Proposed System • Differential Update Model • Evaluation • Conclusion
Network Coding (1/2) • There are different mechanisms for arranging file copies among storage nodes or devices • standard RAID architectures • erasure code • network coding • The network coding in cloud storage systems allows storage nodes to collectively host multiple copies of a file.
Network Coding (2/2) • In a network-coding-assisted cloud storage system • a file is divided into n blocks • encoded using random coefficients. • encoded blocks are distributed in the Cloud. • decoded the n encoded blocks from any subset of the storage nodes.
Problem Definition • Existing works have been focusing on mechanisms for preserving the level of redundancy. • However, the most frequent operations maintaining coded information in the system up to date performed on files. • file updates • Any change in the file will impact all coded blocks in the system. • replace all traces of the file
Application • GoogleDocs : online collaborative office suites, let users create, edit and publish a document collaboratively from around the world. • When a file is updated, even changing a single byte can outdate all coded blocks in the system. • re-computations • re-deliveries
Problems • Re-computing coded blocks is very CPU intensive. • Replacing all the coded blocks consumes large amount of bandwidth.
Proposed Model • Sending only the modified parts with a minimum possible overhead. • The mathematical model of Differential Update Mechanism (DUM) was presented by this paper. • update algorithms can be performed on all nodes. • The simulation results show that the proposed DUM saving a significant bandwidth in a cloud storage system.
Outline • Introduction • Related Works • Proposed System • Differential Update Model • Evaluation • Conclusion
Related Works (1/2) • Commercial cloud storage systems, such as Microsoft Azure [8] and Google Cloud [9], utilize source erasure codes. • Network coding was originally proposed in information theory in 2000 [1]. • In contrast to source erasure codes, network coding applies coding at intermediate relay nodes throughout the network.
Related Works (2/2) • The benefits for coding at intermediate nodes include • high throughput [1], [3] • efficient routing algorithm design [17] • energy savings in wireless networking [18] • security [19] • The closest related works of update problem are on the repair problem • provide mechanisms for one or more nodes fail [25]. • preserve the level of redundancy.
Reference • [1] R. Ahlswede, N. Cai, S. R. Li, and R. W. Yeung, “Network Information Flow,” IEEE Transactions on Information Theory, vol. 46, no. 4, pp. 1204–1216, July 2000. • [3] R. Koetter and M. Medard, “An Algebraic Approach to Network Coding,” IEEE/ACM Transactions on Networking, vol. 11, no. 5, pp. 782–795, October 2003. • [8] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. ulHaq, M. I. ulHaq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, , and L. Rigas, “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency,” in Proc. of the 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, October 23-26 2011, pp. 143–157.
Reference • [9] D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan, “Availability in Globally Distributed Storage Systems,” in Proc. of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Vancouver, BC, October 4-6 2010, pp. 1–14. • [17] D. S. Lun, N. Ratnakar, R. Koetter, M. Medard, E. Ahmed, and H. Lee, “Achieving Minimum Cost Multicast: A Decentralized Approach Based on Network Coding,” in Proc. of the 24th Conference of the IEEE Communications Society (INFOCOM), Miami, FL, March 13- 17 2005, pp. 1607–1617. • [18] H. Rahul, W. Hu, D. Katabi, M. Medard, and J. Crowcroft, “XORs in the Air: Practical Wireless Network Coding,” IEEE/ACM Transactions on Networking, vol. 16, no. 3, pp. 497–510, June 2008. • [19] C. Gkantsidis and P. Rodriguez, “Cooperative Security for Network Coding File Distribution,” in Proc. of the 25th Conference of the IEEE Communications Society (INFOCOM), Barcelona, Spain, April 23-29 2006, pp. 1–13.
Outline • Introduction • Related Works • Proposed System • Differential Update Model • Evaluation • Conclusion
Modeling the Storage Cloud System Storage Cloud End Hosts
Modeling the Storage Cloud System • Model simplification assumptions : • A single original copy of each file is hosted among the source nodes in the Cloud. • each source node owns a disjoint set of files. • Each node can only be a source node, a storage, or a target node at a time. • nodes of the same type do not connect to each other. • It is common for a storage system to distribute R 1 copies of each file to provide data redundancy, where R is the replication factor.
Network Coding in the Storage Cloud System • With randomized network coding, a file is divided into n original blocks B = [b1, b2, …, bn], where bi has a fixed number of bytes s. • Encoding a new block ci • the source node first independently and randomly chooses a set of coding coefficients εi = [εi,1, εi,2, … , εi,n] in the Galois field GF(28). • . b1, b2, b3, ..….bj b1, b2, b3, . . bn B = …… c1, c2, c3, . . . . . . , cR*n
Network Coding in the Storage Cloud System • Decoding : any n of the R n coded blocks are linearly independent and can be used to recover all original blocks of the corresponding file. • a target node locates and downloads n coded blocks, C= [c1, c2,… , cn], from the storage nodes. • Given the encoding matrix ξ = [ε1, ε2, … , εn], the original blocks B = [b1, b2, …, bn] can be recovered by: • .
The Update Problem • For every single update, we must • transmit Rn new coded blocks from the source nodes to the storage nodes. • transmit Kn coded blocks from the storage nodes to the target nodes.
Outline • Introduction • Related Works • Proposed System • Differential Update Model • Evaluation • Conclusion
Differential Update Model (DUM) • They believe that the update problem is just as essential as the repair problem. • They propose the DUM to update coded blocks by delivering only the blocks that are affected by the updates. • avoids transmissions of the entire file for each update.
Updating Coded Blocks • Assume that the current version number of a file is v, then version v 1 involves arbitrary updates in n’n blocks of the file. • B = [b1, b2, …, bn] be the original file of version v. • B’ = [b1’, b2’, …, bn’] be the updated file of version v 1. • For each block bi’ in version v 1, we can express it as biδi, where δi is the differential vector. • . • Δ = [δ1, δ2, δ3 , … , δn], differential matrix
Updating Coded Blocks • To encode a new block for version v 1, the source node again randomly chooses a set of coding coefficients εi’ = [εi,1’, εi,2’, … , εi,n’] in the Galois field GF(28). • .
Updating Storage Nodes • A significant amount of bandwidth can be saved since most updates will affect only a smaller portion of a file. • Recover Δ from Δ’ • reconstructed by inserting the zero δ-vectors into Δ’ according to the update vector u .
Updating Storage Nodes • Send the non-zero rows of Δ’ = [δ1, δ2, δ3, … , δn’] • Update vector uv+1 = [uv+1,1, uv+1,2,..., uv+1,n] • . • Encode the matrix Δ’, • Decode the matrix Δ’,
Aggregating Updates Across Multiple Versions (1/4) • Storage nodes and target nodes may not be always synchronized to the latest version. • may miss several updates due to various reasons. • Assume that the node missed m update • current version is v. • actual version of file is vm.
Aggregating Updates Across Multiple Versions (2/4) • A coded block in version v may be expressed in terms of the coded blocks of version 0 and the summation of coded δ-blocksfrom version 0 to version m.
Aggregating Updates Across Multiple Versions (3/4) • To support such an aggregated update, the update table that stores • the update vectors • the coded δ- blocks • If a storage node misses one or more updates, then find the first non-empty entry following the empty entries. • the aggregated Δ’ containing changes across the missing versions.
Aggregating Updates Across Multiple Versions (4/4) • Computational overhead • generation of the aggregated update vector • . • generation of n’ aggregated coded δ-vectors • .
Outline • Introduction • Related Works • Proposed System • Differential Update Model • Evaluation • Conclusion
Numerical Analysis • The bandwidth saving in updating the storage nodes using DUM. • . • The bandwidth saving in updating the target nodes using DUM. • .
Experiment Results (1/7) • The number of blocks n should be no more than 100 to ensure that network coding operates at a rate faster than a typical transmission rate in a network. • We compare the performance of conventional network coding update (NC) and DUM.
Experiment Results (2/7) • Bandwidth usages
Experiment Results (3/7) • Bandwidth usage and Computational cost
Experiment Results (4/7) • Computational cost on storage nodes dominates the overall cost.
Experiment Results (5/7) • Aggregated updates
Experiment Results (6/7) • Update affects
Experiment Results (7/7) • Simulation study • Diff [31], bsDiff [32] [31] J. W. Hunt and M. D. McIlroy, “An Algorithm for Differential File Comparison,” Bell Laboratories 41, Computing Science Technical Report, June 1976. [32] C. Percival, “Matching with Mismatches and Assorted Applications,” Ph.D. dissertation, Wadham College, University of Oxford, 2006.
Outline • Introduction • Related Works • Proposed System • Differential Update Model • Evaluation • Conclusion
Conclusion • DUM saves both the communication and computational costs, unless the update affects almost the entire file • DUM conserves CPU cycles for large files and when the data is more scattered in the Cloud. • This paper only considered n’ is smaller than n, what’s happened if n’ is large than n ?