300 likes | 486 Views
CS252 Graduate Computer Architecture Lecture 17 ECC (continued), CRC. John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252. Review: Code Vector Space. Code Space.
E N D
CS252Graduate Computer ArchitectureLecture 17ECC (continued), CRC John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
Review: Code Vector Space Code Space • Not every vector in the code space is valid • Hamming Distance (d): • Minimum number of bit flips to turn one code word into another • Number of errors that we can detect: (d-1) • Number of errors that we can fix: ½(d-1) C0=f(v0) Code Distance (Hamming Distance) v0 cs252-S09, Lecture 17
G must be an nk matrix Review: How to Generate code words? • Consider a linear code. Need a Generator Matrix. • Let vi be the data value (k bits), Ci be resulting code (n bits): • Are there 2k unique code values? • Only if the k columns of G are linearly independent! • Of course, need some way of decoding as well. • Is this linear??? Why or why not? • A code is systematic if the data is directly encoded within the code words. • Means Generator has form: • Can always turn non-systematiccode into a systematic one (row ops) cs252-S09, Lecture 17
Error vector Implicitly Defining Codes by Check Matrix • But – what is the distance of the code? Not obvious • Instead, consider a parity-check matrix H (n[n-k]) • Compute the following syndrome Si given code element Ci: • Define valid code words Ci as those that give Si=0 (null space of H) • Size of null space? (n-rank H)=k if (n-k) linearly independent columns in H • Suppose you transmit code word C, and there is an error. Model this as vector E which flips selected bits of C to get R (received): • Consider what happens when we multiply by H: • What is distance of code? • Code has distance d if no sum of d-1 or less columns yields 0 • I.e. No error vectors, E, of weight < d have zero syndromes • Code design: Design H matrix with these properties cs252-S09, Lecture 17
P is (n-k)k, I is (n-k)(n-k) Result: H is (n-k)k P is (n-k)k, I is kk Result: G is nk How to relate G and H (Binary Codes) • Defining H makes it easy to understand distance of code, but hard to generate code (H defines code implicitly!) • However, let H be of following form: • Then, G can be of following form (maximal code size): • Notice: G generates values in null-space of H cs252-S09, Lecture 17
Parity code (8-bits): Note: Complexity of logic depends on number of 1s in row! c8 v7v6v5v4v3v2v1v0 C8 C7C6 C5 C4 C3 C2 C1 C0 + + s0 Simple example (Parity, D=2) cs252-S09, Lecture 17
Repetition code (1-bit): Positives: simple Negatives: Expensive: only 33% of code word is data Not packed in Hamming-bound sense (only D=3). Could get much more efficient coding by encoding multiple bits at a time C0 v0 C1 C2 C0 Error C1 C2 Simple example: Repetition (voting, D=3) cs252-S09, Lecture 17
p1 p2 p3 Note: number bits from left to right. Simple Example: Hamming Code (d=3) • Example: (7,4) code: • Protect 4 data bits with 3 parity bits 1 2 3 4 5 6 7 p1 p2 d1 p3 d2 d3 d4 • Bit position number 001 = 110 011 = 310 101 = 510 111 = 710 010 = 210 011 = 310 110 = 610 111 = 710 100 = 410 101 = 510 110 = 610 111 = 710 cs252-S09, Lecture 17
How to correct errors? • But – what is the distance of the code? Not obvious • Instead, consider a parity-check matrix H (n[n-k]) • Compute the following syndrome Si given code element Ci: • Suppose that two correctable error vectors E1 and E2 produce same syndrome: • But, since both E1 and E2 have (d-1)/2 bits, E1 + E2 d-1 bits set this cannot be true! • So, syndrome is unique indicator of correctable error vectors cs252-S09, Lecture 17
Example, d=4 code (SEC-DED) • Design H with: • All columns non-zero, odd-weight, distinct • Note that odd-weight refers to Hamming Weight, i.e. number of zeros • Why does this generate d=4? • Any single bit error will generate a distinct, non-zero value • Any double error will generate a distinct, non-zero value • Why? Add together two distinct columns, get distinct result • Any triple error will generate a non-zero value • Why? Add together three odd-weight values, get an odd-weight value • So: need four errors before indistinguishable from code word • Because d=4: • Can correct 1 error (Single Error Correction, i.e. SEC) • Can detect 2 errors (Double Error Detection, i.e. DED) • Example: • Note: log size of nullspace will be (columns – rank) = 4, so: • Rank = 4, since rows independent, 4 cols indpt • Clearly, 8 bits in code word • Thus: (8,4) code cs252-S09, Lecture 17
Tweeks: • No reason cannot make code shorter than required • Suppose n-k=8 bits of parity. What is max code size (n) for d=4? • Maximum number of unique, odd-weight columns: 27 = 128 • So, n = 128. But, then k = n – (n – k) = 120. Weird! • Just throw out columns of high weight and make 72, 64 code! • But – shortened codes like this might have d > 4 in some special directions • Example: Kaneda paper, catches failures of groups of 4 bits • Good for catching chip failures when DRAM has groups of 4 bits • What about EVENODD code? • Can be used to handle two erasures • What about two dead DRAMs? Yes, if you can really know they are dead cs252-S09, Lecture 17
Galois Field • Definition: Field: a complete group of elements with: • Addition, subtraction, multiplication, division • Completely closed under these operations • Every element has an additive inverse • Every element except zero has a multiplicative inverse • Examples: • Real numbers • Binary, called GF(2) Galois Field with base 2 • Values 0, 1. Addition/subtraction: use xor. Multiplicative inverse of 1 is 1 • Prime field, GF(p) Galois Field with base p • Values 0 … p-1 • Addition/subtraction/multiplication: modulo p • Multiplicative Inverse: every value except 0 has inverse • Example: GF(5): 11 1 mod 5, 23 1mod 5, 44 1 mod 5 • General Galois Field: GF(pm) base p (prime!), dimension m • Values are vectors of elements of GF(p) of dimension m • Add/subtract: vector addition/subtraction • Multiply/divide: more complex • Just like read numbers but finite! • Common for computer algorithms: GF(2m) cs252-S09, Lecture 17
Consider polynomials whose coefficients come from GF(2). Each term of the form xnis either present or absent. Examples:0, 1, x, x2, and x7 + x6 + 1 = 1·x7 + 1· x6 + 0 · x5 + 0 · x4 + 0 · x3 + 0 · x2 + 0 · x1 + 1· x0 With addition and multiplication these form a field: “Add”: XOR each element individually with no carry: x4 + x3 + + x + 1 + x4 + + x2 + x x3 + x2 + 1 “Multiply”: multiplying by x is like shifting to the left. x2 + x + 1 x + 1 x2 + x + 1 x3 + x2 + x x3 + 1 Specific Example: Galois Fields GF(2n) cs252-S09, Lecture 17
x4 + x3 So what about division (mod) x4 + x2 = x3 + x with remainder 0 x x4 + x2 + 1 = x3 + x2 with remainder 1 X + 1 x3 + x2 + 0x + 0 x4 + 0x3 + x2 + 0x + 1 X + 1 x3 + x2 x3 + x2 0x2 + 0x 0x + 1 Remainder 1 cs252-S09, Lecture 17
These polynomials form a Galois (finite) field if we take the results of this multiplication modulo a prime polynomial p(x). A prime polynomial is one that cannot be written as the product of two non-trivial polynomials q(x)r(x) Perform modulo operation by subtracting a (polynomial) multiple of p(x) from the result. If the multiple is 1, this corresponds to XOR-ing the result with p(x). For any degree, there exists at least one prime polynomial. With it we can form GF(2n) Additionally, … Every Galois field has a primitive element, , such that all non-zero elements of the field can be expressed as a power of . By raising to powers (modulo p(x)), all non-zero field elements can be formed. Certain choices of p(x) make the simple polynomial x the primitive element. These polynomials are called primitive, and one exists for every degree. For example, x4 + x + 1 is primitive. So = x is a primitive element and successive powers of will generate all non-zero elements of GF(16). Example on next slide. Producing Galois Fields cs252-S09, Lecture 17
0 = 1 1 = x 2 = x2 3 = x3 4 = x + 1 5 = x2 + x 6 = x3 + x2 7 = x3 + x + 1 8 = x2 + 1 9 = x3 + x 10 = x2 + x + 1 11 = x3 + x2 + x 12 = x3 + x2 + x + 1 13 = x3 + x2 + 1 14 = x3 + 1 15 = 1 Note this pattern of coefficients matches the bits from our 4-bit LFSR example. In general finding primitive polynomials is difficult. Most people just look them up in a table, such as: Galois Fields – Primitives 4 = x4 mod x4 + x + 1 = x4 xor x4 + x + 1 = x + 1 cs252-S09, Lecture 17
x2 + x +1 x3 + x +1 x4 + x +1 x5 + x2 +1 x6 + x +1 x7 + x3 +1 x8 + x4 + x3 + x2 +1 x9 + x4 +1 x10 + x3 +1 x11 + x2 +1 Primitive Polynomials x12 + x6 + x4 + x +1 x13 + x4 + x3 + x +1 x14 + x10 + x6 + x +1 x15 + x +1 x16 + x12 + x3 + x +1 x17 + x3 + 1 x18 + x7 + 1 x19 + x5 + x2 + x+ 1 x20 + x3 + 1 x21 + x2 + 1 x22 + x +1 x23 + x5 +1 x24 + x7 + x2 + x +1 x25 + x3 +1 x26 + x6 + x2 + x +1 x27 + x5 + x2 + x +1 x28 + x3 + 1 x29 + x +1 x30 + x6 + x4 + x +1 x31 + x3 + 1 x32 + x7 + x6 + x2 +1 Galois Field Hardware Multiplication by x shift left Taking the result mod p(x) XOR-ing with the coefficients of p(x) when the most significant coefficient is 1. Obtaining all 2n-1 non-zero Shifting and XOR-ing 2n-1 times. elements by evaluating xk for k = 1, …, 2n-1 cs252-S09, Lecture 17
For k-bit LFSR number the flip-flops with FF1 on the right. The feedback path comes from the Q output of the leftmost FF. Find the primitive polynomial of the form xk + … + 1. The x0 = 1 term corresponds to connecting the feedback directly to the D input of FF 1. Each term of the form xn corresponds to connecting an xor between FF n and n+1. 4-bit example, uses x4 + x + 1 x4 FF4’s Q output x xor between FF1 and FF2 1 FF1’s D input To build an 8-bit LFSR, use the primitive polynomial x8 + x4 + x3 + x2 + 1 and connect xors between FF2 and FF3, FF3 and FF4, and FF4 and FF5. Building an LFSR from a Primitive Poly(Cycle through all non-zero values) cs252-S09, Lecture 17
Reed-Solomon Codes • Galois field codes: code words consist of symbols • Rather than bits • Reed-Solomon codes: • Based on polynomials in GF(2k) (I.e. k-bit symbols) • Data as coefficients, code space as values of polynomial: • P(x)=a0+a1x1+… ak-1xk-1 • Coded: P(0),P(1),P(2)….,P(n-1) • Can recover polynomial as long as get any k of n • Properties: can choose number of check symbols • Reed-Solomon codes are “maximum distance separable” (MDS) • Can add d symbols for distance d+1 code • Often used in “erasure code” mode: as long as no more than n-k coded symbols erased, can recover data • Side note: Multiplication by constant in GF(2k) can be represented by kk matrix: ax • Decompose unknown vector into k bits: x=x0+2x1+…+2k-1xk-1 • Each column is result of multiplying a by 2i cs252-S09, Lecture 17
Reed-Solomon Codes (con’t) • Reed-solomon codes (Non-systematic): • Data as coefficients, code space as values of polynomial: • P(x)=a0+a1x1+… a6x6 • Coded: P(0),P(1),P(2)….,P(6) • Called Vandermonde Matrix: maximum rank • Different representation(This H’ and G not related) • Clear that all combinations oftwo or less columns independent d=3 • Very easy to pick whatever d you happen to want: add more rows • Fast, Systematic version of Reed-Solomon: • Cauchy Reed-Solomon, others cs252-S09, Lecture 17
Another Example: Redundant Check • Send a message M and a “check” word C • Simple function on <M,C> to determine if both received correctly (with high probability) • Example: XOR all the bytes in M and append the “checksum” byte, C, at the end • Receiver XORs <M,C> • What should result be? • What errors are caught? *** bit i is XOR of ith bit of each byte cs252-S09, Lecture 17
Example: TCP Checksum TCP Packet Format Application (HTTP,FTP, DNS) 7 Transport (TCP, UDP) 4 Network (IP) 3 Data Link (Ethernet, 802.11b) 2 TCP Checksum a 16-bit checksum, consisting of the one's complement of the one's complement sum of the contents of the TCP segment header and data, is computed by a sender, and included in a segment transmission. (note end-around carry) Summing all the words, including the checksum word, should yield zero Physical 1 cs252-S09, Lecture 17
Example: Ethernet CRC-32 Application (HTTP,FTP, DNS) 7 Transport (TCP, UDP) 4 Network (IP) 3 Data Link (Ethernet, 802.11b) 2 Physical 1 cs252-S09, Lecture 17
n bits of zero at the end tack on n bits of remainder Instead of the zeros CRC concept • I have a msg polynomial M(x) of degree m • We both have a generator poly G(x) of degree n • Let r(x) = remainder of M(x) xn / G(x) • M(x) xn = G(x)p(x) + r(x) • r(x) is of degree n • What is (M(x) xn – r(x)) / G(x) ? • So I send you M(x) xn – r(x) • m+n degree polynomial • You divide by G(x) to check • M(x) is just the m most signficant coefficients, r(x) the lower n • x-bit Message is viewed as coefficients of x-degree polynomial over binary numbers cs252-S09, Lecture 17
Polynomial division • When MSB is zero, just shift left, bringing in next bit • When MSB is 1, XOR with divisor and shift eft 0 0 0 0 1 0 1 1 0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 1 0 0 cs252-S09, Lecture 17
CRC encoding 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 1 01 0 1 0 0 0 0 1 01 0 1 0 0 0 0 01 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 Message sent: 1 0 1 1 0 0 1 1 0 1 0 cs252-S09, Lecture 17
CRC decoding 1 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 1 01 0 1 1 0 1 0 1 01 0 1 1 0 1 0 01 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 cs252-S09, Lecture 17
Generating Polynomials • CRC-16: G(x) = x16 + x15 + x2 + 1 • detects single and double bit errors • All errors with an odd number of bits • Burst errors of length 16 or less • Most errors for longer bursts • CRC-32: G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1 • Used in ethernet • Also 32 bits of 1 added on front of the message • Initialize the LFSR to all 1s cs252-S09, Lecture 17
Conclusion • ECC: add redundancy to correct for errors • (n,k,d) n code bits, k data bits, distance d • Linear codes: code vectors computed by linear transformation • Erasure code: after identifying “erasures”, can correct • Reed-Solomon codes • Based on GF(pn), often GF(2n) • Easy to get distance d+1 code with d extra symbols • Often used in erasure mode • Redundancy useful to gain reliability • Redundant disks+controllers+etc (RAID) • Geographical scale systems (OceanStore) • Disk technology: • Two innovations: GMR, Vertical recording • Disk Latency = Queuing Time + Seek Time + Rotation Time + Xfer Time + Ctrl Time cs252-S09, Lecture 17