510 likes | 693 Views
Cooperative regenerating codes for distributed storage systems. Kenneth Shum (Joint work with Yuchong Hu ) 22nd July 2011. Multiple node failures. Large-scale storage system Google data center, example from Kannan’s talk. 800000 servers, fail rate = 4% per year Repair in 2 days
E N D
Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with YuchongHu) 22nd July 2011
Multiple node failures • Large-scale storage system • Google data center, example from Kannan’s talk. • 800000 servers, fail rate = 4% per year • Repair in 2 days • Mean number of failed servers in 2 days = 175. • The lazy-repair policy in TotalRecall • A repair process is triggered only after the number of failed nodes has reached a certain threshold. kshum
Jointly repair multiple failures Storage nodes Newcomers Data exchange Can we further reduce therepair-bandwidth? Hu et al. (JSAC, Feb 2010) kshum
Distributed storage (erasure coding) Wu, Dimakis ISIT09 A1 A2 A1, A2, B1, B2 Data Collector B1 B2 A1+B1 2 A2+B2 2 A1+B1 A2+B2 kshum
Naive Repair A1 A2 A1 A2 B1, B2 A1, A2, B1, B2 B1 B2 A1+B1, 2 A1+B2 4 packets required. A1+B1 2 A2+B2 2 A1+B1 A2+B2 kshum
Repair with ``code alignment’’ A1 A2 A1 A2 B1+ B2 A1, A2, B1, B2 B1 B2 A1+2 A2+B1+ B2 2 A1+ A2+B1+ B2 3 packets required. A1+B1 2 A2+B2 Solve: P1 = A1+2 A2 P2 = 2 A1+ A2 2 A1+B1 A2+B2 kshum
2 packets B1 B2 2 packets 2 packets 2 A1+B1 A2+B2 2 packets Multiple failures, separate repair 8 packets in total 4 packets per newcomer A1 A2 A1, A2, B1, B2 B1 B2 A1+B1 2 A2+B2 2 A1+B1 A2+B2 kshum
Multiple failures, cooperative repair (I) 6 packets in total 3 packets per newcomer A1 A2 2 A1+B1 A2+B2 A1, A2, B1, B2 B1 B2 B1 B2 A1, A2 A1+B1 2 A2+B2 B1,B2 2A2+B2A1+B1 2 A1+B1 A2+B2 kshum
A1 A1 A1+B1 A2 A1+B1 2A2+B2 A2 2A2+B2 Multiple failures, cooperative repair (II) 6 packets in total 3 packets per newcomer A1 A2 A1, A2, B1, B2 B1 B1 B2 B2 2A1+B1 A1+B1 2 A2+B2 B2 2A1+B1 2 A1+B1 A2+B2 A2+B2 kshum
Outline of the talk • Is it optimal in terms of repair-bandwidth? • What is the tradeoff between storage and repair-bandwidth for cooperative repair? • Can we achieve the Pareto-optimal operating points on the tradeoff curve by linear network coding? • Exact repair • Functional repair kshum
Information flow graph In1 Out1 Out6 In6 Mid6 2 In2 Out2 1 2 Out7 In7 Mid7 1 1 S In3 Out3 1 DataCollector 1 In4 Out4 1 In5 Out5 kshum
A1 A1 A1+B1 A2 A1+B1 2A2+B2 A2 2A2+B2 Is this regenerating code optimal ? 6 packets in total 3 packets per newcomer A1 A2 A1, A2, B1, B2 A1 B1 B2 B2 2A1+B1 A1+B1 2 A2+B2 B2 2A1+B1 2 A1+B1 A2+B2 A2+B2 kshum
First cut In1 Out1 Out6 In6 Mid6 2 1 In2 Out2 2 In7 Out7 1 Mid7 B 1 In3 Out3 1 DataCollector In4 Out4 B 4 1 kshum
Second cut 2 Out1 Out1 Mid1 In1 DataCollector 2 1 Out2 2 2 1 In2 Out2 Mid2 1 Out3 1 1 1 Out4 In3 Out3 Mid3 2 2 In4 Out4 Mid4 B 2+1+ 2 kshum
A linear programming problem • Minimize 21+ 2 (repair bandwidth) • Subject to 4 41 4 2+1 + 2 1 , 2 0 2 1 1 1 1 1 2 1 At least 3 packets kshum
Non-homogeneous download traffic In1 Out1 Out6 In6 Mid6 2 a In2 Out2 2 In7 Out7 b Mid7 B c In3 Out3 d DataCollector In4 Out4 B a +b +c +d kshum
Non-homogeneous traffic 2 Out1 Out1 Mid1 DataCollector In1 2 1 e Out2 2 2 1 In2 Out2 Mid2 f 1 Out3 f g 1 h In3 Out3 Mid3 B 2+f +j i Out4 j In4 Out4 Mid4 kshum
Non-homogeneous traffic 2 Out1 Out1 Mid1 DataCollector In1 2 1 e Out2 2 2 1 In2 Out2 Mid2 f 1 Out3 f g 1 h In3 Out3 Mid3 B 2+f +j B 2+h +i i Out4 j In4 Out4 Mid4 kshum
Non-homogeneous traffic 2 Out1 Out1 Mid1 DataCollector In1 2 1 e Out2 2 2 1 In2 Out2 Mid2 f 1 Out3 f g 1 h In3 Out3 Mid3 B 2+f +j B 2+h +iB 2+e +j i Out4 j In4 Out4 Mid4 kshum
Non-homogeneous traffic 2 Out1 Out1 Mid1 DataCollector In1 2 1 e Out2 2 2 1 In2 Out2 Mid2 f 1 Out3 f g 1 h In3 Out3 Mid3 B 2+f +j B 2+h +iB 2+e +j B 2+g +i i Out4 j In4 Out4 Mid4 kshum
The same LP problem • Minimize • Subject to 1 1 At least 3 packets kshum
Storage vs Repair-bandwidth (S., ICC 2011, Kermarrec, Le Scouamec and Straub, Netcod 2011.) File size = 420d = 8 k = 4 One-by-one repair Repairing 3 newcomers jointly d k DC kshum
Fair comparison? repair degree = 8 Cooperative repair One-by-one repair Surviving nodes Surviving nodes Number of connectionsper each newcomer = 8 Number of connectionsper each newcomer = 8+2 kshum
MBCR and MSCR Minimum bandwidthcooperative repair (MBCR) One-by-one repair Cooperative repair Minimum storagecooperative repair (MSCR) kshum
How much can we improve? File size = 2275 d = 30 k = 5 One-by-one repair When d is large, joint repair does not have significant advantage over one-by-one repair. Repairing 10 newcomers jointly d k DC kshum
How much can we improve? File size = 616 d = 8 k = 4 One-by-one repair Repairing 10 newcomers jointly Repair-bandwidth reductionis more prominent when d is not so large. d k DC kshum
AN EXPLICIT CONSTRUCTION FOR MINIMUM-BANDWIDTHCOOPERATIVE REPAIR kshum
An explicit construction for MBCR (S., Hu, ISIT 2011.) Require d = k, r = n–d • B = 8 information packets • n = 4 nodes • Each node stores 5 packets. • Repair r = 2 failures simultaneously • No. of connections for each DC = k=2 • No. of helpers for each failed node =d=2 • Minimum repair-bandwidth • Storage per node kshum
Min-Bandwidth point One-by-one repair Repairing 2 new nodes cooperatively kshum
Data Distribution XOR A, B, C, D, F+G C, D, E, F, H+A 8 data packets: A, B, C, D, E, F, G, H E, F, G, H, B+C G, H, A, B, D+E 5 packets: 4 systematic, 1 parity-check kshum
Data collection A, B, C, D, F+G A, B, C, D C, D, E, F, H+A Data collector E, F, G, H, B+C E, F, G, H A,B,C,D,E,F,G,H G, H, A, B, D+E kshum
Data collection Data collector A, B, C, D, F+G A, B, C, F+G C, D, E, F, H+A D, E, F, H+A A B C D E F G H A B E, F, G, H, B+C C D Triangular, Full-rank E G, H, A, B, D+E F F+G H+A kshum
Exact Repair How to repair? A, B, C, D, F+G A B C D F+G C, D, E, F, H+A B+C F+G E, F, G, H, B+C E F G H B+C G, H, A, B, D+E Total repair-bandwidth=10 kshum
How to repair? Exact Repair A, B, C, D, F+G C, D, E, F, H+A C D E F H+A D+E E E F E, F, G, H, B+C F+G F G E F H B+C G, H, A, B, D+E Total repair-bandwidth=10 kshum
Min-Bandwidth point One-by-one repair Repairing 2 new nodes cooperatively kshum
AN EXPLICIT CONSTRUCTION FOR MINIMUM-STORAGE COOPERATIVE REPAIR kshum
An explicit construction for MSCR Require d = k (S. ICC 2011.) • Minimum repair-bandwidth • Storage per node • B = 6 information packets • n nodes • Each node stores 2 packets. • Repair r = 2 failures simultaneously • No. of connections for each DC = k=3 • No. of helpers for each failed node =d=3 kshum
The min-storage point 3 3 DC Non-cooperative k=3,d=3, r =2,B=6 storage cost per node = 2repair bandwidth per node = 4 Cooperative kshum
Data retrieval MDS code with dimension k=3 Source data codeword encode codeword =2 …… Storage nodes Data collector decode kshum
Repair : phase 1 Source data codeword encode codeword lost lost Storage nodes newcomers decode decode kshum
Repair: phase 2 codeword encode codeword Storage nodes lost lost Repair bandwidth per node= 8/2 = 4 newcomers Re-encode Re-encode exchange kshum
The construction is optimal 3 3 DC Non-cooperative k=3,d=3, r =2,B=6 storage cost per node = 2repair bandwidth per node = 4 Cooperative kshum
EXISTENCE OF COOPERATIVE REGENERATING CODES UNDER FUNCTIONAL REPAIR kshum
Existence of optimal linear regenerating codes in general (S., Hu, Netcod 2011.) • Sustainable storage system • Will it work after arbitrarily many repairs? • Technical difficulty: The information flow graph is unbounded. • Can we work over a fixed finite field, for unlimited number of regenerations? • Yes if we can construct an exact regenerating code. • The answer is also “yes” for cooperative functional repair in general. kshum
Trellis structure … … … … Stage 2 Stage 1 Stage 0 mT0T1 mT0 T0 is the “transfer matrix” in stage 0 m T1 is the “transfer matrix” in stage 1 mT0T1T2 Message vector(row vector) T2 is the “transfer matrix” in stage 2 kshum
Flow in information flow graph 4 5 5 In1 Mid1 Out1 DC Out1 0 1 5 2 2 1 1 1 2 3 5 0 5 2 In2 Out2 Mid2 Out2 S 4 2 2 4 5 2 3 2 2 4 2 5 In3 Out3 Mid3 Out3 Out3 2 1 1 0 5 4 2 2 1 1 The cut-set bound says that the cut capacity is at least 8. Can we constructa flow with value 8? 5 In4 Out4 Mid4 Out4 Out4 kshum
Cross-sectional flow pattern 5 4 0 5 4 In1 Mid1 Out1 DC Out1 0 1 5 2 2 1 1 1 2 4 3 5 0 0 5 2 3 In2 Out2 Mid2 Out2 S 4 2 2 4 5 3 2 2 2 4 2 0 4 0 In1 Out3 Mid1 Out1 Out3 2 1 1 0 5 4 2 2 1 1 0 4 0 5 In2 Out4 Mid2 Out2 Out4 kshum
A recursive construction of flow Stage s Stage s+1 Identify a set of cross-section flow pattern, say H. For any cross-section flow pattern (h1, h2, h3, h4) in H stage s+1, we can find a flow in this segment of graph, such that (g1, g2, g3, g4) is also in H. Each pattern corresponds to a submatrix of the transfer matrix. By Schwartz-Zippel lemma, we can find the local encoding vectors so that all such determinants are non-zero, if the finite field is sufficiently large. g1 h1 In1 Mid1 Out1 g2 h2 In2 Mid2 Out2 h3 g3 Out3 Out3 h4 g4 Out4 Out4 kshum
Summary • Multiple node failures in medium-scale to large-scale storage system • Formulation as a linear program • Functional repair: Linear regenerating code over fixed finite field which matches the cut-set bound on repair-bandwidth exists. • Exact repair: two families of explicit code constructions • Minimum-bandwidth point: d=k, r = n – d • Minimum-storage point: d=k, r arbitrary kshum