CPR E 545: Fault Tolerant Systems

CPR E 545: Fault Tolerant Systems Advances in RAID Architecture By: Mohammad Fraiwan

Summary of RAID Levels

RAID-6 • Block-level striping with dual distributed parity. • Two sets of parity are calculated. • Better fault tolerance • Can handle two faulty disks. • Writes are slightly worse than 5 due to the added overhead of more parity calculations. • May get better read performance than 5 because data and parity is spread into more disks. • If one disks fail, then levels 6 becomes level 5.

Reed-Solomon ECC [3] • A coding scheme that works by: • Constructing a polynomial from the data. • Construct an over-sampled plot of the original polynomial. • The redundant information in the over-sampled data allows the original polynomial to be reconstructed even in the face of errors.

Method Overview • Data is stored (send) as an encoded block. • The total number of m-bit symbols in the encoded block is n=2^m−1. • Example: a Reed-Solomon code operating on 8-bit symbols has n=2^8−1 = 255 symbols per block. • There are n-k parity symbols of m bits each.

Method Overview cont. • A commonly used code encodes k = 223 8-bit data symbols plus 32 8-bit parity symbols in a n = 255-symbol block. • The codes are denoted as (n,k) codes. In the above case (n,k) = (255,223). • An (n,k) code is capable of correcting (n-k)/2 symbol errors per block.

Method Overview Cont. • The scheme encodes the block as points in a polynomial plotted over a finite field. • A Finite field is a field that contains finitely many elements. • The coefficients of the polynomial are the data symbols of the block. • The plot over-determines the coefficients, which can be recovered from subsets of the plotted points. • a Reed-Solomon code can bridge a series of errors in a block of data to recover the coefficients of the polynomial that drew the original curve.

Properties of the Reed-Solomon Code • The measure of redundancy in the block, n-k, determines the error correction capability. • If the locations of the erroneous symbols are not known in advance, then a Reed-Solomon code can correct up to (n−k)/2 erroneous symbols. • Reed-Solomon codes especially well-suited to applications where errors occur in bursts. • This is because it does not matter to the code how many bits in a symbol are in error—if multiple bits in a symbol are corrupted it only counts as a single error.

RS Applications • Storage devices (including tape, Compact Disk, DVD, barcodes, etc) • Wireless or mobile communications (including cellular telephones, microwave links, Satellite, etc) • Digital television / DVB • High-speed modems such as ADSL, xDSL, etc. • Reed Solomon is a poor choice in applications with random number of single bit errors.

History of Reed-Solomon codes • Invented in 1960 by Irving S. Reed and Gustave Solomon. Members of MIT Lincoln Laboratory. • The article title was "Polynomial Codes over Certain Finite Fields.“ • Digital technology was not advanced enough to implement the concept. • The key to application of Reed-Solomon codes was the invention of an efficient decoding algorithm by Elwyn Berlekamp, a professor of electrical engineering at the University of California, Berkeley.

RAID-7 • Proprietary RAID design trademark of Storage Computer Corporation . • Asynchronous, cached striping with dedicated parity. • based on concepts used in RAID levels 3 and 4. • Greatly enhanced to address some of the limitations of those levels. • Extensive Caching and a specialized real-time processor for managing the array asynchronously.

RAID-7 cont. • Best access concurrency. • Best throughput. • Good fault tolerance. • Expensive solution made and supported by one company.

Nested RAID Levels • Single RAID levels have distinct advantages and disadvantages. • It is possible to get some of the advantages of more than one RAID level by designing arrays that use a combination of multiple levels. • Nested RAID levels typically provide better performance than single levels, but at some cost.

RAID-10: Striped Mirroring • RAID 10 = Striping + mirroring • A striped array of RAID 1 arrays • High performance of RAID 0, and high tolerance of RAID 1 (at the cots of doubling disks).

Comparing RAID levels

Comparing RAID levels [1]

[1] Comparing RAID Levels, http://www1.us.dell.com/content/topics/global.aspx/power/en/ps1q02_long?c=us&l=en&s=gen • [2] RAID Levels http://www.pcguide.com/ref/hdd/perf/raid/levels/index.htm • [3] Reed-Solomon error correction http://www.4i2i.com/reed_solomon_codes.htm

CPR E 545: Fault Tolerant Systems