Network Coding and Distributed Storage

Network Coding andDistributed Storage Kenneth Shum (Joint work with Minghua Chen, Hanxu Hou and Hui Li)

Window Azure data centers kshum

Inside a data center http://technoblimp.com kshum

Data distribution • Encode and distribute a data file to n storage nodes. Data File: “Butterfly” kshum

Data collector • Data collector can retrieve the whole file by downloading from any k storage nodes. “Butterfly”  kshum

Three kinds of disk failures • Transient error due to noise corruption • repeat the disk access request • Disk sector error • partial failure • detected and masked by the operating system • Catastrophic error • total failure due to disk controller for instance • the whole disk is regarded as erased kshum kshum 6

Frequency of node failures Figure from “XORing elephants: novel erasure codes for Big Data” by Sathiamoorthy et al. Number of failed nodes over a single month in a 3000 node production cluster of Facebook. kshum 7

Outline of this talk • Comparison of three coding schemes • Repetition scheme • Traditional erasure-correcting codes • Reed-Solomon codes • Network-coding-based scheme • BASIC regenerating codes • A max-flow-min-cut bound • Connection to network coding. kshum kshum 8

Distributed storage system • Encode a data file and distribute it to ndisks • (n,k) recovery property • The data file can be rebuilt from any kdisks. • Repair • If a node fails, we regenerate a new node by connecting and downloading data from any d surviving disks. • Aim at minimizing the repair bandwidth(Dimakis et al 2007). • A coding scheme with the above properties is called a regenerating code. kshum

Repetition scheme • Google File System: Replicate data 3 times • Gmail: Replicate data 21 times kshum

2x Repetition scheme Divide the datafile into 2 parts 1G A A, B 1G Data Collector B 1G A 1G Cannot toleratedouble disk failures B kshum

1G Repair is easy for repetition-based system New node A A B A Repair bandwidth =1G B kshum

Reed-Solomon Code Divide the file into 2 parts A A, B Data Collector B A+B It can toleratedouble disk failures A+2B kshum 13

Repair requires essentially decoding the whole file A A New node 1G B 1G A+B Repair bandwidth = 2G A+2B kshum kshum 14

    BASIC regeneration code Binary AdditionShiftImplementableConvolutional Divide the datafile into 4 parts 0.5G 0.5G 0.5G 0.5G Utilization of bit-wise shift in storage was proposed byPiret and Krol (1983), andQureshi, Foh and Cai (2012).

Download from nodes 1 and 2     1G Data Collector 0.5G 1G 0.5G 0.5G 0.5G 16

Download from nodes 1 and 3     1G Data Collector 0.5G 0.5G 0.5G 1G 0.5G 17

Download from nodes 1 and 4     1G Data Collector 0.5G 0.5G 0.5G 0.5G 1G 18

Download from nodes 2 and 3     1G Data Collector 0.5G 0.5G 0.5G 1G 0.5G 19

Download from nodes 2 and 4     1G Data Collector 0.5G 0.5G 0.5G 0.5G 1G 20

Download from nodes 3 and 4     Data Collector 0.5G 1G 0.5G 0.5G 0.5G 1G 21

Zigzag decoding à laGollakata and Katabi (2008) What to solvefor P1and P2. P1  P2 P1P2 P1  P2’ P1P2’ kshum kshum 22

Repair of BASIC regenerating code New node XOR  Bitwise shift and XOR   Bitwise shift and XOR  kshum

Interference alignment Repair of BASIC regenerating code Repair bandwidth=1.5 G  Decode the blueand red packets byzigzag decoding    kshum

Comparison of the three examples kshum kshum 25

Outline of this talk • Comparison of three coding schemes • Repetition scheme • Traditional erasure-correcting codes • Reed-Solomon codes • Network-coding-based scheme • BASIC regenerating codes • A max-flow-min-cut bound • Connection to network coding. kshum kshum 26

Two modes of repair • Exact repair • The content of the new node is exactly the same as the content of the failed node • Functional repair • only requires that the (n,k) recovery property is preserved. Functional Repair Exact Repair kshum

Information flow graph  In1 Out1    In2  Out2 DataCollector Source   In3 Out3    In4 Out4 Dimakis et al., INFOCOMM, May, 2007 kshum

Information flow graph (cont’d)   In5 In1 Out5 Out1    In2    Out2 Source    In3 Out3 DataCollector    In4 Out4 kshum

Is 1.5G repair bandwidth optimal? 1 1 In5 In1 Out5 Out1   1 In2    Out2 2  1  In3 Out3 DataCollector  1  In4 Out4 1 + 2  2   0.5 kshum

 not necessarily the same (1st cut) 1 1 In1 In6 Out1 Out6 1  1  2 In2  Out2 DataCollector 3 2  1 In3 Out3  1  1 + 1+ 2 2 In4 Out4 kshum

 not necessarily the same (2nd cut) 1 1 In1 In6 Out1 Out6 1  2 1  In2  Out2 DataCollector  2  1 In3 Out3 3  1 1 + 1+ 2 2 1 + 1+ 3  2 In4 Out4 kshum

 not necessarily the same (3rd cut) 1 1 In1 In6 Out1 Out6 1  1 2 In2   Out2  2  1 DataCollector In3 Out3 3  1 1 + 1+ 2 2 1 + 1+ 3  2 1 + 2+ 3  2  1+ 2+ 3  1.5 In4 Out4 kshum

Min-cut bound • B: total file size • : storage per node • d: total repair bandwidth • k: No. of connections from a data collector For d k d     DC k Dimakis et al., INFOCOMM, May, 2007 kshum

Waterfilling interpretation  • Given B, d, k,  • Find the minimum storage * d (d–1) * (d–k+1) Area = B d  k   1 2 … k DC  kshum

Tradeoff between storage and bandwidth • B=1 • n>16 • k=4 • d=16 d=16     DC k=4 kshum

When repair bandwidth d is very large  d • * = B/k • Any combinations of k nodes will contain B bits. d (d–1) (d-1) (d–k+1) * = B/k Area = B d  k   1 2 … k DC  kshum

Minimum-storage regeneration (MSR)  d • B=1 • n>16 • k=4 • d=16 Area = B 1 2 … k kshum

The corresponding mincut d    Out1 In1       DC    S    … k-1 …    Outn Inn kshum

A change of slope  d • B=1 • n>16 • k=4 • d=16 Area = B 1 2 … k kshum

Another change of slope  d • B=1 • n>16 • k=4 • d=16 Area = B 1 2 … k kshum

Minimum-bandwidth regeneration (MBR)  d • B=1 • n>16 • k=4 • d=16 I n f e a s i b l e Area = B 1 2 … k kshum

Storage-bandwidth tradeoff curve • B=1 • n>12 • k=10 • d=12 Min-Bandwidth regeneration (MBR) Min-Storage regeneration (MSR) kshum

Other performance metrics • Security issues • A compromised storage node cannot decode any part of the data file • A Byzantine storage node may launch pollution attack • Locality • number of nodes contacted during repair • Disk I/O cost • Number of disk accesses during the repair process kshum

Summary • We can reduce repair bandwidth by network coding. • Tradeoff between storage and repair bandwidth. • BASIC regenerating codes • A failed storage node can be repaired by simple bit-wise shift and XOR operations. • Small storage overhead due to shifting. Aug 2013 kshum kshum 45

References • Piret and Krol, MDS convolution codes, IEEE Trans. of Information Theory, 1983. • Dimakis, Brighten, Wainwright and Ramchandran, Network coding for distributed storage systems, INFOCOM, 2007. • Gollakata and Katabi, Zigzag decoding: combating hidden terminals in wireless networks, Proc. in the ACM Sigcomm, 2008. • Qureshi, Foh, and Cai, Optimal solution for the index coding problem using network coding over GF(2), Proc. IEEE Conf. on Sensor Mesh and Ad Hoc Comm. and Network, 2012. • Sung and Gong, A zigzag decodable code with MDS property for distributed storage systems, Proc. IEEE Symp. on Information Theory, 2013. • Hou, Shum, Chen and Li, BASIC regenerating code: binary addition and shift for exact repair, Proc. IEEE Symp. on Information Theory, 2013. kshum

Network Coding and Distributed Storage

Network Coding and Distributed Storage

Presentation Transcript

Distributed Video Coding

Distributed Source Coding

Distributed Storage

NC-Audit: Auditing for Network Coding Storage

Coding for Distributed Storage Alex Dimakis (UT Austin)

Prioritized Distributed Video Delivery With Randomized Network Coding

Distributed Network Coding Based Opportunistic Routing for Multicast

Distributed Storage And WAN Transport

Network Coding Distributed Storage

Network Coding for Distributed Storage Systems

Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding

Distributed Storage

Towards Practical Distributed Coding

Simple Regenerating Codes: Network Coding for Cloud Storage

Distributed Storage and Consistency

Network coding

Distributed Video Coding

Distributed Source Coding

Distributed Video Coding

Secure Cloud Storage meets with Secure Network Coding

Distributed Source Coding

Coding for Distributed Storage Alex Dimakis (UT Austin)