340 likes | 497 Views
D ata C ompression. RunLength Encoding. there’s an alternative that would send <char> <char> 5. no. of repetitions. RLE ( R UN L ENGTH E NCODING). Aims to save money and time by reducing amount of data transmitted. Instead of sending <char> <char><char><char><char><char><char>
E N D
Data Compression RunLength Encoding there’s an alternative that would send <char> <char> 5 no. of repetitions RLE (RUN LENGTH ENCODING) Aims to save money and time by reducing amount of data transmitted Instead of sending <char> <char><char><char><char><char><char> Send ESC 7 <char> When data includes ESC, send ESC ESC • Text usually unsuitable for RLE • only contains repeated space chars more elegant; <char> acts as its own esc • Binary files better • often contain repeated chars, especially NUL RLE is used for encoding faxes
Data Compression Huffman Coding D.A. Huffman HUFFMAN CODING ASCII encodes all characters with 7 bits • Characters occur with unequal frequencies • f e =100 x fq Use fewer bits to encode most-common Makes boundaries between characters hard to find
Data Compression Huffman Coding D.A. Huffman 2-bit code Huffman code HUFFMAN CODING Consider an alphabet with only 4 letters 00 00 00 00 01 01 01 10 10 11 AAAABBBCCD 1 1 1 1 01 01 01 001 001 000 • 20 bits → 19 bits • better compression with larger alphabets!
Data Compression Huffman Coding D.A. Huffman HUFFMAN CODING Both ends must agree on the code set • Most efficient to create a code specifically for the data being sent • Allows for different letter frequencies in different languages • Fax machines use a modified Huffman scheme. • codes for sequences containing • 1, 2, 3, .... , 63, 64 black or white dots • 128, 192, ... (i.e. multiples of 64) dots So 67-dot sequence would be sent as codes for 64 then 3
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel LEMPEL-ZIV COMPRESSION ZIP and UNIX Compress utility use (modified) L-Z compression Codes are fixed-length (usually 12 or 16-bits ) • 7-bit ASCII for single characters + nine 0-bits • Inefficient when sending single characters • But after a while, very few single characters get sent • Extra codes for most common character sequences • Sender creates extra codes based on the letter frequencies in the message • Receiver constructs extra codes while decompressing the original message
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel LEMPEL-ZIV COMPRESSION the Remaining string R denotes the unsent part of the message Initially, R is the complete message A code table relates character sequences to codes • a 1 • b 2 • . • z 26 • . • th 27 • . • the 79 • . • then 158 • Initially, code table just contains the alphabet • Sender and receiver have the same table • New sequences are added when they are encountered • sender and receiver both add the same codes • easy for the sender! • L denotes the longest string of characters… • starting from the first character of R • occurring in the code table L’ denotes L + the next character in R
Data Compression Lempel-Ziv Compression R L’ Jacob Ziv Abraham Lempel L LEMPEL-ZIV COMPRESSION …they then and there theorised that this was thus Sender identifies L, sends the code for L to the receiver Receiver receives code looks it up in the code table adds L to the message string Sender identifies L’, & makes a new entry in the code table for L’
Data Compression Lempel-Ziv Compression Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 Jacob Ziv Abraham Lempel LEMPEL-ZIV COMPRESSION
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel th 28 the 33 ► LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 t h e y _ t h e n _ a n d _ t h e r e _ t h e o r i s e d _ t h a t _ t h i s _ w a s _ t h u s t h R L L’ 20 28
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel he 29 ► th 28 the 33 th 28 the 33 ► ► LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 h e e y _ t h e n _ a n d _ t h e r e _ t h e o r i s e d _ t h a t _ t h i s _ w a s _ t h u s t h R L L’ 8 29
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel ey 30 en 34 ► he 29 he 29 ► ► th 28 the 33 th 28 the 33 ► ► LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 e y y _ t h e n _ a n d _ t h e r e _ t h e o r i s e d _ t h a t _ t h i s _ w a s _ t h u s t h e R L L’ 5 30
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel ey 30 en 34 ey 30 en 34 ► ► he 29 he 29 ► ► th 28 the 33 th 28 the 33 ► ► y_ 31 ► LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 y _ _ t h e n _ a n d _ t h e r e _ t h e o r i s e d _ t h a t _ t h i s _ w a s _ t h u s t h e y R L L’ 25 31
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel ey 30 en 34 ey 30 en 34 ► ► he 29 he 29 ► ► th 28 the 33 th 28 the 33 ► ► y_ 31 y_ 31 ► ► _t 32 _a 36 ► LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 _ t t h e n _ a n d _ t h e r e _ t h e o r i s e d _ t h a t _ t h i s _ w a s _ t h u s t h e y _ R L L’ 27 32
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel ey 30 en 34 ey 30 en 34 ► ► he 29 he 29 ► ► th 28 the 33 th 28 the 33 ► ► y_ 31 y_ 31 ► ► _t 32 _a 36 _t 32 _a 36 ► ► t h LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 th e e n _ a n d _ t h e r e _ t h e o r i s e d _ t h a t _ t h i s _ w a s _ t h u s t h e y _ R L L’ 28 33
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel ey 30 en 34 ► he 29 he 29 ► ► th 28 the 33 th 28 the 33 ► ► y_ 31 y_ 31 ► ► _t 32 _a 36 _t 32 _a 36 ► ► t h LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 ey 30 en 34 ► Several further steps ensue… e n _ a n d _ t h e r e _ t h e o r i s e d _ t h a t _ t h i s _ w a s _ t h u s t h e y _ e n R L L’ 5 34
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel ey 30 en 34 e_ 43 ► he 29 her 41 an 37 an 37 d_ 39 d_ 39 he 29 her 41 ► ► ► ► ► ► ► ► nd 38 n_ 35 n_ 35 ► ► ► th 28 the 33 ► y_ 31 y_ 31 ► ► _the 43 ► t h _ t h LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 ey 30 en 34 e_ 43 ► nd 38 ► ► re 42 re 42 ► ► th 28 the 33 ► _th 40 _th 40 _t 32 _a 36 _t 32 _a 36 ► ► ► ► _th _th e e o r i s e d _ t h a t _ t h i s _ w a s _ t h u s t h e y _ e n _ a n d _ t h e r e e o r i s e d _ t h a t _ t h i s _ w a s _ t h u s R L L’ 40 43
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel sp 73 sp 73 ► ► spin 84 spi 79 spi 79 spin- 89 spin- 89 spin-s 92 spin 84 ► ► ► ► ► ► ► ► ► spin- LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 A special case (well, nearly) <stringa><stringa> <not char1 of stringa> s spin- s pin- e f f e c t R L L’ 89 92
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel sp 73 sp 73 ► ► spin 84 spin-e 93 spin-s 92 spin- 89 spi 79 spin- 89 spin-s 92 spi 79 spin 84 ► ► ► ► ► ► ► ► ► ► ► spin- spin- LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 A special case (well, nearly) <stringa><stringa> <not char1 of stringa> spin- e e f f e c t R L L’ 89 93
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel sp 73 sp 73 ► ► spin-s 92 spin-e 93 ef 94 spin-e 93 spin-s 92 spi 79 spin 84 spi 79 spin 84 spin- 89 spin- 89 ► ► ► ► ► ► ► ► ► ► ► ► ► spin- spin- LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 A special case (well, nearly) <stringa><stringa> <not char1 of stringa> e f f f e c t e R L L’ 5 94
Data Compression Lempel-Ziv Compression Jacob Ziv Abraham Lempel sp 73 sp 73 ► ► spi 79 spin-s 92 spin- 89 spin- 89 spin 84 spi 79 spin 84 ► ► ► ► ► ► ► ► ► spin- LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 A special case (yes, really!) <stringa><stringa> <char1 of stringa> s spin- s pin- s p l i t t i n g R L L’ 89 92
Data Compression Lempel-Ziv Compression We are standing up Jacob Ziv Abraham Lempel sp 73 sp 73 ► ► spin- 89 spin-sp 93 spin-s 92 spi 79 spin-s 92 spin 84 spin- 89 spi 79 spin 84 ► ► ► ► ► ► ► ► ► ► ► spin-s spin- LEMPEL-ZIV COMPRESSION Receiver Sender a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26 _ 27 A special case (yes, really!) <stringa><stringa> <char1 of stringa> spin-s p p l i t t i n g R L L’ 92 93
Error Detection and Correction Error Detection Sensitivity of applications to errors high low Tanenbaum 3rd edition: 183-190 ERROR DETECTION Errors caused by noise Impulse (clicks) Crosstalk (between lines) Thermal (can’t eliminate) Bank transactions • If we can detect errors, we can eliminate them • Undetected errors can’t be eliminated altogether • Aim for a detection rate high enough for application Video transfer
Error Detection and Correction Error Detection Methods ERROR DETECTION METHODS Double sending used by data prep operators not normally used in data comms Parity • Add 1 or 0 after character • to make total no of 1s even (even parity ) or odd (odd parity) 0111111 ↓ 01111110 with even parity: 1111111 ↓ 11111111 • On arrival, no. of 1 bits in characters should still be even • Single-bit corruptions make no. of 1 bits odd • Two-bit corruptions are undetected.
Error Detection and Correction Block Checksums BLOCK CHECKSUMS • Sender • sends byte sequence & XOR sum of byte sequence • Receiver • calculates XOR sum of complete byte sequence • detects error if calculated XOR sums is non-zero does not detect two characters that are reversed.
Error Detection and Correction Block Parity 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 BLOCK PARITY Uses both longitudinal and horizontal parity. conventional “horizontal” parity 0 1 1 1 0 1 1 1 0 1 0 0 0 0 0 0 “longitudinal” parity Note: positional information => block parity can be used for error correction
Error Detection and Correction CRCs divisor represents a polynomial: 11001 represents a polynomial of degree D = 4 1x4 + 1x3 + 0x2 + 0x1 + 1x0 Sender divisor DATA Receiver divisor CRC (CYCLIC REDUNDANCY CHECK) + + remainder remainder quotient DATA + 0 quotient DATA’ DATA’
Error Detection and Correction CRCs A/B In , B “goes into” A if B’s high-order bit is in the same position as A’s high-order bit CRC (CYCLIC REDUNDANCY CHECK) • If data = 11100110, divisor = 11001 (D = 4 (x4 is the highest term)) • add D 0s to the data • divide, using rules of modulo-2 division: • XOR instead of subtraction
Error Detection and Correction CRCs 11001) 11100110 0000 CRC (CYCLIC REDUNDANCY CHECK) 1 0 0 0 1 1 1 1 11001 0101 1 The CCITT polynomial (divisor) is x16 + x15 + x2+ x0 00000 1011 1 11000000000000101 11001 1110 0 11001 0101 0 00000 • CRCs detect • all single bit errors, • most double bit errors, • all error bursts <16 bits • most error burst >16 bits. 1010 0 11001 1101 0 11001 0011 0 00000
Error Detection and Correction Error Correction ERROR CORRECTION • ARQ (Automatic Retransmission on reQuest) • Most common in data comms • if received data contains errors, request retransmission • FEC (Forward Error Control) • Used where retransmission is undesirable • Include extra information with message so it can be reconstructed • Computer memory, or disk • Simplex transmission from a data logger • Transmissions from distant spacecraft
Error Detection and Correction Hamming Codes 011 111 101 001 010 110 100 HAMMING CODES Closer to 111 than to 000 Richard Hamming • Facilitate error detection and correction • Use >1 bit to encode a bit 0 in data becomes codeword 000 1 in data becomes codeword 111 “Hamming Distance” = 3 000 Closer to 000 than to 111 • With HD = 3, it is possible • EITHER to detect 2-bit errors • OR to correct 1-bit errors To detect dbit errors, a code’s HD must bed+1 To correct dbit errors, a code’s HD must be 2d+1
Error Detection and Correction Hamming Codes Inter- character Hamming Distances a b c d a 5 10 5 b 5 5 10 c 10 5 5 d 5 10 5 2 5 To detect dbit errors, code’s HD must bed+1 To correct dbit errors, a code’s HD must be 2d+1 8 5 HAMMING CODES Richard Hamming So, for 2-bit error correction, HD 5 a 0000011111 b 0000000000 c 1111100000 d 1111111111 HD for some character-pairs is 10 But minimum intercharacter HD is 5, so HD for the whole code is 5. If 1 or 2 bits change, the result is nearer to the original valid codeword 0000011111 a 0000000000 b 1111100000 c 1111111111 d a 0000011111 0000101111
Error Detection and Correction Hamming Codes 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 HAMMING CODES Richard Hamming 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 1 1 0 1 0 1 0 0 1 0 0 0
Error Detection and Correction Hamming Codes 1 0 0 0 1 1 1 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 Hamming Codes HAMMING CODES Richard Hamming 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 1 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0 0 0 0 0 0