120 likes | 188 Views
Explore erasure coding, redundancy, fault-tolerance, and repairability in a distributed storage system, presenting a unique approach improving repairability using existing techniques. Discuss repair fan-in, data transmission, storage, and related works.
E N D
Redundantly Grouped Cross-object Coding for Repairable Storage Anwitaman Datta & Frédérique OggierNTU Singapore APSYS 2012, Seoul http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage
What is this work about? The story so far … Scale-out C’est la vie Huge volume of data Distributed Storage Systems Failures are inevitable! Redundancy Over time Overheads Repairing lost redundancy Erasure coding Fault-tolerance
What is this work about? The story so far … (n,k) code • Retrieve some • k” blocks (k”=2…n-1) • to recreate a lost block B2 B1 Bn … Bx Bx Re-insert • Reinsert in (new) storage devices, so that there is (again) • n encoded blocks … Lost block n encoded blocks • Design space • Repair fan-in k’’ • Data tx. per node • Overall data tx. • Storage per node • …
Related works Most of these codes look at design of new codes with inherent repairabilityproperties. This work: An engineering approach – can we achieve good repairability using existing (mature) techniques? (Our solution is similar to “codes on codes”) A non-exhaustive list • Codes on codes • e.g. Hierarchical & Pyramid codes • Network coding • e.g. Regenerating codes • Locally repairable codes • e.g. Self-repairing codes Array codes … • An Overview of Codes Tailor-made for Networked Distributed Data Storage • Anwitaman Datta, Frederique Oggier • arXiv:1109.2317
Separation of concerns • Two distinct design objectives for distributed storage systems • Fault-tolerance • Repairability • Related works: Codes with inherent repairability properties • Achieve both objectives together • There is nothing fundamentally wrong with that • E.g., We continue to work on self-repairing codes • This work: An extremely simple idea • Introduce two different kinds of redundancy • Any (standard) erasure code • for fault-tolerance • RAID-4 like parity (across encoded pieces of different objects) • for repairability
Redundantly Grouped Cross-object Coding (RGC) Erasure coding of individual objects e11 e12 e1k e1k+1 e1n e21 e22 e2k e2k+1 e2n … … … … … … … RAID-4 of erasure coded pieces of different objects em1 em2 emk emk+1 emn p1 p1 pk pk+1 pn
RGC repairability • Choosing a suitable m < k • Reduction in data transfer for repair • Repair fan-in disentangled from base code parameter “k” • Large “k” may be desirable for faster (parallel) data access • Codes typically have trade-offs between repair fan-in, code parameter “k” and code’s storage overhead (n/k) • However: The gains from reduced fan-in is probabilistic • For i.i.d. failures with probability “f” • Possible to reduce repair time • By pipelining data through the live nodes, and computing partial parity
Parameter “m” choice • Smaller m: lower repair cost, larger storage overhead • Is there an optimal choice of m? If so, how to determine it? • A rule of thumb: rationalized by r simultaneous (multiple) repairs • E.g. for (n=15, k=10) code: m < 5 • m = 3 or 4 implies • Repair bandwidth saving of 40-50% even for f = 0.1 • Typically, in stable environments, f will be much smaller, and the relative repair gains much more • Relatively low storage overhead of 2x or 1.875x
Further discussions • Possibility to localize repair traffic • Within a storage rack, by placing a whole parity group in same rack • Without introducing any correlated failures of pieces of the same object • Many unexplored issues • Soft errors (flipped bits) • Object update, deletion, … • Non i.i.d./correlated failures
Concluding remarks http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage • RAID-4 parity of erasure encoded pieces of multiple objects • Lowers the cost of data transfer for a repair • Reduces repair fan-in • Possibility to localize repairs (and save precious interconnect BW) • w/o introducing correlated failures w.r.to a single object • Pipelining the repair traffic helps realize very fast repairs • Since the repairing node’s I/O, bandwidth or compute does not become a bottleneck • Also the computations for repair are cheaper than decoding/encoding • Retains comparable storage overhead for comparable static resilience if only erasure coding was used (surprisingly so!) • At least for quite some specific code parameter choices we tried • Opens up many interesting questions that can be investigated experimentally as well as theoretically