1 / 37

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes. Yunfeng Zhu 1 , Patrick P. C. Lee 2 , Liping Xiang 1 , Yinlong Xu 1 , Lingling Gao 1 1 University of Science and Technology of China 2 The Chinese University of Hong Kong DSN’12. Fault Tolerance.

sorcha
Download Presentation

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu1, Patrick P. C. Lee2, Liping Xiang1,Yinlong Xu1, Lingling Gao1 1University of Science and Technology of China2The Chinese University of Hong KongDSN’12

  2. Fault Tolerance • Fault tolerance becomes more challenging in modern distributed storage systems • Increase in scale • Usage of inexpensive but less reliable storage nodes • Fault tolerance is ensured by introducing redundancy across storage nodes • Replication • Erasure codes (e.g., Reed-Solomon codes) A A A B B B A B A+B A+2B

  3. XOR-Based Erasure Codes • Encoding/decoding involve XOR operations only • Low computational overhead • Different redundancy levels • 2-fault tolerant: RDP, EVENODD, X-Code • 3-fault tolerant: STAR • General-fault tolerant: Cauchy Reed-Solomon (CRS)

  4. Failure Recovery • Recovering node failures is necessary • Preserve the required redundancy level • Avoid data unavailability • Single-node failure recovery • Single-node failure occurs more frequently than a concurrent multi-node failure

  5. Example: Recovery in RDP • An RDP code example with 8 nodes node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 ⊕ d0,0 d0,1 d0,2 d0,3 d0,4 d0,5 d0,6 d0,7 ⊕ ⊕ d1,0 d1,1 d1,2 d1,3 d1,4 d1,5 d1,6 d1,7 ⊕ ⊕ d2,0 d2,1 d2,2 d2,3 d2,4 d2,5 d2,6 d2,7 ⊕ ⊕ d3,0 d3,1 d3,2 d3,3 d3,4 d3,5 d3,6 d3,7 ⊕ ⊕ d4,0 d4,1 d4,2 d4,3 d4,4 d4,5 d4,6 d4,7 ⊕ ⊕ d5,0 d5,1 d5,2 d5,3 d5,4 d5,5 d5,6 d5,7 ⊕ Let’s say node0 fails. How do we recover node0?

  6. Conventional Recovery • Idea: useonly row parity sets. Recover each lost data symbol (i.e., data chunk) independently node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 Different metrics can be used to measure the efficiency of a recovery scheme Read symbols:36 Then how do we recover node 0 efficiently?

  7. Minimize Number of Read Symbols • Idea: use a combination of row and diagonal parity sets to maximize overlapping symbols[Xiang, ToS’11] node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 Read symbols:27 Improve rate: 25%

  8. Need A New Metric? • A modern storage system is natural to be composed of heterogeneous types of storage nodes • System upgrades • New node addition • A heterogeneous environment node 1 node 2 node 0 node 3 109Mbps 68Mbps 26Mbps New node 110Mbps Proxy Need a new efficient failure recovery solution for heterogeneous environment! 113Mbps 86Mbps 110Mbps 10Mbps node4 node 7 node 6 node 5

  9. Related Work • Hybrid recovery • Minimize number of read symbols RAID-6 XOR-based erasure codes • e.g., RDP [Xiang, ToS’11], EVENODD [Wang, Globecom’10 • Enumeration recovery [Khan, FAST’12] • Enumerate all recovery possibilities to achieve optimal recovery for general XOR-based erasure codes • Greedy recovery [Zhu, MSST’12] • Efficient search of recovery solutions for general XOR-based erasure codes • Regenerating codes [Dimakis, ToIT’10] • Nodes encode data during recovery • Minimize recovery bandwidth • Heterogeneous case considered in [Li, Infocom’10], but requires node encoding and collaboration

  10. Challenges • How to enable efficient failure recovery for heterogeneous settings? • Minimizing # of read symbols  homogeneous settings • Performance bottlenecked by poorly performed nodes • How to quickly find the recovery strategy? • Minimizing # of read symbols  deterministic metric • Minimizing general cost  non-deterministic metric Recovery decision typically can’t be pre-determined

  11. Our Contributions • Target two RAID-6 codes: RDP and EVENODD • XOR-based encoding operations • Goals: • Minimize search time • Minimize recovery cost Cost-based single-node failure recovery for heterogeneous distributed storage systems

  12. Our Contributions • Formulate an optimization problem for single-node failure recovery in heterogeneous settings • Propose a cost-based heterogeneous recovery (CHR)algorithm • Narrow down search space • Suitable for online recovery • Implement and experiment on a heterogeneous networked storage testbed

  13. Model Formulation • Our formulation: Nodek Node0 Node1 Node p-1 Node p vp-1 vp Node : v0 v1 vk . . . . . . Weight: w0 w1 wp-1 wp . . . . . . Download Distribution: . . . . . . y0 y1 yp-1 yp Minimizing total recovery cost:

  14. Physical Meanings

  15. Solving the Model • Important: Which symbols to be fetched from surviving nodes must follow inherent rules of specific coding schemes • To solve the model, we introduce recovery sequence (x0 , x1 , … , xp-2, 0) • xi = 0 , di,k is recovered from its row parity set • xi = 1 , di,k is recovered from its diagonal parity set 1) Each recovery sequence represents a feasible recovery solution; 2) Download distribution can be represented by recovery sequence; • An example: node 0 node 1 node 2 node 3 node 4 node 5 • recovery sequence: (0, 0, 1, 1, 0) d0,0 d0,1 d0,2 d0,3 d0,4 d0,5 d1,0 d1,1 d1,2 d1,3 d1,4 d1,5 • download distribution: • (3, 2, 2, 3, 2) d2,0 d2,1 d2,2 d2,3 d2,4 d2,5 d3,0 d3,1 d3,2 d3,3 d3,4 d3,5

  16. Solving the Model (2) • Step 1: use recovery sequence to represent downloads • Step 2: narrow down search space by only considering min-read recovery sequences (i.e., download minimum number of read symbols during recovery) • Step 3: reformulate the model as Minimize

  17. Expensive Enumeration Challenge: Too many min-read recovery sequences to enumerate even we narrow down search space Observation: many min-read recovery sequences return the same download distribution

  18. Optimize Enumeration Process • Two conditions under which different recovery sequences have same download distribution: • Shift condition (0, 0, 0, 1, 1, 1, 0)  (0, 0, 1, 1, 1, 0, 0)  (0, 1, 1, 1, 0, 0, 0)  (1, 1, 1, 0, 0, 0, 0) … • Reverse condition (0, 0, 0, 1, 1, 1, 0)  (0, 1, 1, 1, 0, 0, 0) Key idea: not all recovery sequences need to be enumerated (details in the paper)

  19. Cost-based Heterogeneous Recovery (CHR) Algorithm: Intuition • Step 1: initialize a bitmap to track all possible min-read recovery sequences R • Step 2: compute recovery cost of R. • Step 3: mark all shifted and reverse sequences of R as being enumerated • Step 4: switch to another R; return the one with minimum cost

  20. Example node 1 node 2 node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 node 0 node 3 109Mbps 68Mbps 26Mbps New node 110Mbps node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 Proxy 113Mbps 86Mbps 110Mbps 10Mbps node4 Our proposed CHR algorithm Hybrid approach [Xiang, ToS’11] node 7 3 5 4 4 5 3 3 node 6 node 5 5 4 3 3 4 5 3

  21. Recovery Cost Comparison • CHR approach • Hybrid approach • Conventional approach reduce by 25.89% reduce by 40.91%

  22. Simulation Studies (1): Traverse Efficiency • Evaluate the computational time of CHR CHR significantly reduces the traverse time of the naive approach by over 90% as p increases!

  23. Simulation Studies (2): Robustness Efficiency • Evaluate if CHR achieves the global optimal among all the feasible recovery sequences CHR has a very high probability (over 93%) to hit the global optimal recovery cost!

  24. Simulation Studies (3): Recovery Efficiency • Evaluate via 100 runs for each p the recovery efficiency of CHR in a heterogeneous storage environment • CHR can reduce recovery cost by up to 50% over the conventional approach • CHR can reduce recovery cost by up to 30% over the hybrid approach

  25. Experiments • Experiments on a networked storage testbed • Conventional vs. Hybrid vs. CHR • Default chunk size = 1MB • Communication via ATA over Ethernet (AoE) • Consider two codes: RDP and EVENODD • Only RDP results shown in this talk • Recovery operation: • Read chunks from surviving nodes • Reconstruct lost chunks • Write reconstructed chunks to a new node nodes Gigabit switch Recovery process

  26. Experiments • Two types of Ethernet interface card equipped by physical storage devices • 100Mbps  set weight = 1/(100Mbps) • 1Gbps  set weight = 1/(1Gbps) Configuration for RDP code

  27. Different Number of Storage Nodes • Total recovery time for RDP • CHR improves conventional by 21-31% • CHR improves hybrid by 15-20%

  28. Different Chunk Size • Total recovery time for RDP (p = 11) • CHR improves conventional by 18-26% • CHR improves hybrid by 14-19%

  29. Different Failed Nodes • Total recovery time for RDP (p = 11) • CHR still outperforms conventional and hybrid

  30. Conclusions • Address single-node failure recovery RAID-6 coded heterogeneous storage systems • Formulate a computation-efficient optimization model • Propose a cost-based heterogeneous recovery algorithm • Validate the effectiveness of the CHR algorithm through extensive simulations and testbed experiments • Future work: • Different cost formulations • Extension for general XOR-based erasure codes • Degraded reads • Source code: • http://ansrlab.cse.cuhk.edu.hk/software/chr/

  31. Backup

  32. Cost-based Heterogeneous Recovery (CHR) Algorithm Notation: Algorithm:

  33. Example node 1 node 2 node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 node 0 node 3 109Mbps 68Mbps 26Mbps New node 110Mbps Proxy 113Mbps 86Mbps 110Mbps 10Mbps node4 node 7 3 5 4 4 5 3 3 node 6 node 5 Step 1: Initialize F[0..63] with 0-bits, R = {1110000}, the recovery cost C = MAX_VALUE Step 2:F[7]=1, mark R’s shifted and reverse recovery sequences: F[56]=F[28]=F[14]=1; Calculate the recovery cost for R, C will be 0.7353α; R*, C* will be updated by R, C Step 3: Get the next min-read recovery sequence R and go to Step 2 Step 4: Finally, we can find that R* = {1010100} and C* = 0.5449α

  34. Recovery Cost Comparison • CHR approach • Hybrid approach • Conventional approach node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 reduce by 25.89% reduce by 40.91% 5 4 3 3 4 5 3

  35. Different Number of Storage Nodes • Consider the overall performance of the complete recovery operation for EVENODD

  36. Different Chunk Size • Evaluate the impact of chunk size for EVENODD on the recovery time performance

  37. Different Failed Nodes • Evaluate the recovery time performance for EVENODD when the failed node is in a different column

More Related