180 likes | 351 Views
Increasing Information per Bit. Information in a source Mathematical Models of Sources Information Measures Compressing information Huffman encoding Optimal Compression for DMS? Lempel-Ziv-Welch Algorithm For Stationary Sources? Practical Compression Quantization of analog data
E N D
Increasing Information per Bit • Information in a source • Mathematical Models of Sources • Information Measures • Compressing information • Huffman encoding • Optimal Compression for DMS? • Lempel-Ziv-Welch Algorithm • For Stationary Sources? • Practical Compression • Quantization of analog data • Scalar Quantization • Vector Quantization • Model Based Coding • Practical Quantization • m-law encoding • Delta Modulation • Linear Predictor Coding (LPC)
Huffman encoding • Variable length binary code for DMS • finite alphabet, fixed probabilities • Code satisfies the Prefix Condition • Codes are instantaneously and unambiguously decodable as they arrive e.g, 0,10,110,111 is OK 0,01,011,111 is not OK 01 = 0,111, or 01
Huffman encoding • Use Probabilities to order coding priorities of letters • Low probability get codes first (more bits) • This smoothes out the information per bit
Huffman encoding • Use a code tree to make the code • Combine the symbols with lowest probability to make a new block symbol • Assign a 1 to one of the old symbols code word and 0 to the other symbol • Now reorder and combine the two lowest probability symbols of the new set • Each time the synthesized block symbol has lowest probability the code words get shorter D0 D1 D2 D3 D4
Huffman encoding • Result: • Self Information or Entropy • H(X) = 2.11 (The best possible average number of bits) • Average number of bits per letter nk = number of bits per symbol So the efficiency =
Huffman encoding • Lets compare to simple 3 bit code
Huffman encoding • Another example D0 D1 D2 D3
Huffman encoding • Multi-symbol Block codes • Use symbols made of original symbols Symbols Can show the new codes information per bit satisfies: So large enough block code gets you as close to H(X) as you want
Huffman encoding • Lets consider a J=2 block code example
Encoding Stationary Sources • Now there are joint probabilities of blocks of symbols that depend on previous bits Unless DMS Can show the joint entropy is: Which means less bits than a symbol by symbol code can be used
Encoding Stationary Sources • H(X | Y) is the conditional entropy Joint (total) probability of xi Information in xi given yj Can show:
Conditional Entropy • Plotting this for n = m = 2 we see that when Y depends strongly on X then H(X|Y) is low P(Y=0|X=0) P(Y=1|X=1)
Conditional Entropy • To see how P(X|Y) and P(Y|X) relate consider: • They are very simlar when P(X=0) ~ 0.5 P(X=0|Y=0) P(Y=0|X=0)
Optimal Codes for Stationary Sources • Can show that for large blocks of symbols Huffman encoding is efficient Define Then Huffman code gives: Now if Get: i.e., Huffman is optimal
Lempel-Ziv-Welch Code • Huffman encoding is efficient but need to know joint probabilities of large blocks of symbols • Finding joint probabilities is hard • LZW is independent of source statistics • Is a universal source code algorithm • Is not optimal
Lempel-Ziv-Welch • Build a table from strings not already in the table • Output table location for strings in the table • Build the table again to decode Source: http://dogma.net/markn/articles/lzw/lzw.htm
Lempel-Ziv-Welch • Decode
Lempel-Ziv-Welch • Typically takes hundreds of entries to table before compression occurs • Some nasty patents make licensing an issue