430 likes | 555 Views
236601 - Coding and Algorithms for Memories Lecture 11. Array Codes and Distributed Storage. Large Scale Storage Systems. Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) Failures are the norm. 3.
E N D
Large Scale Storage Systems • Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) • Failures are the norm 3
Node failures at Facebook Date XORingElephants: Novel Erasure Codes for Big Data M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, VLDB 2013 4
State-of-the-Art Storing Schemes • 3x Replication: • Easily implemented and maintained • Can tolerate any 2 disk failures • Large storage overhead of 300% - A Big Problem! • More sophisticated schemes: • Reed-Solomon (RS) Codes • The repair problem Widely used 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 5
Problem Setup • Disks are stored together in a group (rack) • Disk failures should be supported • Requirements: • Support as many disk failures as possible • And yet… • Optimal and fast recovery • Low complexity
Problem Setup • Question 1: How many extra disks are required to support a singledisk failure? Answer: 1, How? • Question 2: How many extra disks are required to support twodisk failures? Answer: 2, How? • Question 3: How many extra disks are required to support 3disk failures?Answer: 3, How?
Problem Setup • Question 1: How many extra disks are required to support a singledisk failure? • Question 2: How many extra disks are required to support twodisk failures? • Question 3: How many extra disks are required to support 3disk failures? A B C A+B+C A B C A+B+C A+B+C A B C A+B+C A+B+C ’A+’B+’C
Problem Setup • Question 1: How many extra disks are required to support a singledisk failure? • Question 2: How many extra disks are required to support twodisk failures? • Question 3: How many extra disks are required to support d disk failures? A B C A+B+C {(x1,x2,x3,x4): x1+x2+x3+x4= 0 } {(x1,x2,x3,x4,x5): x1+x2+x3+x4=0x1+x2+x3+x5=0 } A B C A+B+C A+B+C {(x1,x2,x3,x4,x5,x6): x1+x2+x3+x4=0x1+x2+x3+x5=0’x1+’x2+’x3+x6=0} A B C A+B+C A+B+C ’A+’B+’C
Problem Setup • Question 1: How many extra disks are required to support a singledisk failure? • Question 2: How many extra disks are required to support twodisk failures? • Question 3: How many extra disks are required to support d disk failures? {(x1,x2,x3,x4): x1+x2+x3+x4= 0 } A B C A+B+C {(x1,x2,x3,x4): H1∙(x1,x2,x3,x4)T=0} H1 = (1,1,1,1) {(x1,x2,x3,x4,x5): x1+x2+x3+x4=0x1+x2+x3+x5=0 } A B C A+B+C A+B+C {(x1,x2,x3,x4,x5): H2∙(x1,x2,x3,x4,x5)T=0} H2= (1,1,1,1,0; ,,,0,1) {(x1,x2,x3,x4,x5,x6): x1+x2+x3+x4=0x1+x2+x3+x5=0’x1+’x2+’x3+x6=0} A B C A+B+C A+B+C ’A+’B+’C {(x1,x2,x3,x4,x5,x6):H3∙(x1,x2,x3,x4,x5,x6)T=0} H3= (1,1,1,1,0,0; ,,,0,1,0; ’,’,’,0,1,0)
Problem Setup • Question 2: How many extra disks are required to support twodisk failures? • Question: What is the requirement on H2? Answer: Every 2x2 sub-matrix has rank two • Question: What is the requirement on H3? Answer: Every 3x3 sub-matrix has rank three {(x1,x2,x3,x4,x5): x1+x2+x3+x4=0x1+x2+x3+x5=0 } A B C A+B+C A+B+C {(x1,x2,x3,x4,x5): H2∙(x1,x2,x3,x4,x5)T=0} H2= (1,1,1,1,0; ,,,0,1)
Problem Setup • Question: How many extra disks are required to support ddisk failures?Answer: d, How? {(x1,x2,…,xn-1,xn):H∙(x1,x2,…,xn-1,xn)T=0}, n=k+d • What is the requirement on H? • Answer: Every sub-matrix of size dxd has rank d • Is it possible to construct such matrices?
Reed Solomon Codes • A code with parity check matrix of the form Where is a primitive element at some extension field and O() > n-1 Claim: Every sub-matrix of size dxd has full rank
Reed Solomon Codes • Advantages: • Support the maximum number of disk failures • Are very comment in practice and have relatively efficient encoding/decoding schemes • Disadvantages • Require to work over large fields • Need to require all the disks in order to recover even a single disk failure – not efficient rebuild
EVENODD Codes • Designed by Mario Balum, Jim Brady, JehoshuaBruck, and Jai Menon • Goal: Construct array codes correcting 2 disk failures using only binary XOR operations • No need for calculations over extension fields • Code construction: • Every disk is a column • The array size is (m-1)x(m+2), m is prime • The last two arrays are used for parity
EVENODD Codes • Code construction: • Every disk is a column • The array size is (m-1)x(m+2), m is prime • The last two arrays are used for parity
EVENODD Codes • Redundancy Calculation: • First parity drive – a simple XOR of the first m-1 disks for 0 ≤ l ≤ m-2 • Second parity drive – S=1 for 0 ≤ l ≤ m-2
EVENODD Codes • Redundancy Calculation: • First parity drive – a simple XOR of the first m-1 disks for 0 ≤ l ≤ m-2 • Second parity drive – S=1 for 0 ≤ l ≤ m-2
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
EVENODD Codes • Redundancy Calculation: • First parity drive – a simple XOR of the first m-1 disks for 0 ≤ l ≤ m-2 • Second parity drive – S=1 for 0 ≤ l ≤ m-2
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
Decoding of EVENODD Codes • Observation: the value of S is the bits sum on the last two columns S = 1
EVENODD Codes • Redundancy Calculation: • First parity drive – a simple XOR of the first m-1 disks for 0 ≤ l ≤ m-2 • Second parity drive – S=1 for 0 ≤ l ≤ m-2