1 / 62

15-211 Fundamental Structures of Computer Science

Lempel-Ziv Compression. 15-211 Fundamental Structures of Computer Science. Feb. 24, 2005. Ananda Guna. Recap. Huffman Trees. Huffman Trees can be used to construct an optimal prefix code. What does optimal mean? Greedy algorithm to assemble a Huffman tree.

selia
Download Presentation

15-211 Fundamental Structures of Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lempel-Ziv Compression 15-211Fundamental Structuresof Computer Science Feb. 24, 2005 Ananda Guna

  2. Recap

  3. Huffman Trees • Huffman Trees can be used to construct an optimal prefix code. • What does optimal mean? • Greedy algorithm to assemble a Huffman tree. • locally optimal steps to global optimization • Requires symbol frequencies. • read the file twice – counting and encoding

  4. Huffman Encoding Process

  5. Adaptive Huffman or Dynamic Huffman • Clearly, having to read the data twice (first for frequency count, then for actual compression) is a bit cumbersome. • Perhaps data is available in blocks (streaming data) • Can build an adaptive Huffman tree that adjusts itself as more frequency data become available.

  6. Adaptive Huffman ctd.. • Mapping from source messages to code words based upon a running estimate of the source message probabilities • Change the tree to remain optimal for the current estimates • adaptive Huffman codes respond to locality • Requires only a single pass of the data

  7. Beating Huffman How about beating the compression achieved by Huffman? Impossible! It produces an optimal prefix code. Right. But who says we have to use a prefix code?

  8. Dictionary-Based Compression

  9. Dictionary-based methods • Here is a simple idea: • Keep track of “words” that we have seen, and replace them with a code number when we see them again. • We can maintain dictionary entries • (word, code) • and make additions to the dictionary as we read the input file.

  10. Lempel & Ziv (1977/78)

  11. Fred Hacker’s algorithm… • Fred now knows what to do… ( <the-whole-file>, 1 ) Transmit 1, done.

  12. Right? • Fred’s algorithm provides excellent compression, but… • …the receiver does not know what is in the dictionary! • And sending the dictionary is the same as sending the entire uncompressed file • Thus, we can’t decompress the “1”.

  13. Hence… • …we need to build our dictionary in such a way that the receiver can rebuild the dictionary easily.

  14. LZW Compression: The Byte Version

  15. Byte method LZW • We start with a trie that contains a root and n children • one child for each possible character • each child labeled 0…n • When we compress as before, by walking down the trie • but, after emitting a code and growing the trie, we must start from the root’s child labeled c, where c is the character that caused us to grow the trie

  16. LZW: Byte method example • Suppose our entire character set consists only of the four letters: • {a, b, c, d} • Let’s consider the compression of the string • baddad

  17. Byte LZW: Compress example baddad Input: ^ Dictionary: b a c d 0 1 2 3 Output:

  18. Byte LZW: Compress example baddad Input: ^ Dictionary: b a c d 0 1 2 3 a 4 Output: 1

  19. Byte LZW: Compress example baddad Input: ^ Dictionary: b a c d 0 1 2 3 d a 5 4 Output: 10

  20. Byte LZW: Compress example baddad Input: ^ Dictionary: b a c d 0 1 2 3 d a d 5 4 6 Output: 103

  21. Byte LZW: Compress example baddad Input: ^ Dictionary: b a c d 0 1 2 3 a d a d 5 4 6 7 Output: 1033

  22. Byte LZW: Compress example baddad Input: ^ Dictionary: b a c d 0 1 2 3 a d a d 5 4 6 7 Output: 10335

  23. Byte LZW output • So, the input • baddad • compresses to • 10335 • which again can be given in bit form, just like in the binary method… • …or compressed again using Huffman

  24. Byte LZW: Uncompress example • The uncompress step for byte LZW is the most complicated part of the entire process, but is largely similar to the binary method

  25. Byte LZW: Uncompress example 10335 Input: ^ Dictionary: b a c d 0 1 2 3 Output:

  26. Byte LZW: Uncompress example 10335 Input: ^ Dictionary: b a c d 0 1 2 3 b Output:

  27. Byte LZW: Uncompress example 10335 Input: ^ Dictionary: b a c d 0 1 2 3 a 4 ba Output:

  28. Byte LZW: Uncompress example 10335 Input: ^ Dictionary: b a c d 0 1 2 3 d a 5 4 bad Output:

  29. Byte LZW: Uncompress example 10335 Input: ^ Dictionary: b a c d 0 1 2 3 d a d 5 4 6 badd Output:

  30. Byte LZW: Uncompress example 10335 Input: ^ Dictionary: b a c d 0 1 2 3 a d a d 5 4 6 7 baddad Output:

  31. LZW Byte method: An alternative presentation

  32. Getting off the ground Suppose we want to compress a file containing only letters a, b, c and d. It seems reasonable to start with a dictionary a:0 b:1 c:2 d:3 At least we can then deal with the first letter. And the receiver knows how to start.

  33. Growing pains Now suppose the file starts like so: a b b a b b … We scan the a, look it up and output a 0. After scanning the b, we have seen the word ab. So, we add it to the dictionary a:0 b:1 c:2 d:3 ab:4

  34. Growing pains We output a 1 for the b. Then we get another b. a b b a b b … output 1, and add bb it to the dictionary a:0 b:1 c:2 d:3 ab:4 bb:5

  35. So? Right, so far zero compression. But now we get a followed by b, and ab is in the dictionary a b b a b b … so we output 4, and put bab into the dictionary … d:3 ab:4 bb:5 ba:6 bab:7

  36. And so on Suppose the input continues a b b a b b b b a … We output 5, and put bbb into the dictionary … ab:4 bb:5 ba:6 bab:7 bbb:8

  37. More Hits As our dictionary grows, we are able to replace longer and longer blocks by short code numbers. a b b a b b b b a … 0 1 1 4 5 6 And we increase the dictionary at each step by adding another word.

  38. More importantly • Since we extend our dictionary in such a simple way, it can be easily reconstructed on the other end. • Start with the same initialization, then • Read one code number after the other, look up the each one in the dictionary, and extend the dictionary as you go along.

  39. Again: Extending We scan a sequence of symbols a1 a2 a3 …. ak where each prefix is in the dictionary. We stop when we fall out of the dictionary: a1 a2 a3 …. ak b

  40. Again: Extending We output the code for a1 a2 a3 …. ak and put a1 a2 a3 …. ak b into the dictionary. Then we set a1 = b And start all over.

  41. Sort of Let's take a closer look at an example. Assume alphabet {a,b,c}. The code for aabbaabb is 0 0 1 1 3 5. The decoding starts with dictionary a:0, b:1, c:2

  42. Moving along The first 4 code words are already in D. 0 0 1 1 3 5 and produce output a a b b. As we go along, we extend D: a:0, b:1, c:2, aa:3, ab:4, bb:5 For the rest we get a a b b a a b b

  43. Done We have also added to D: ba:6, aab:7 But these entries are never used. Everything is easy, since there is already an entry in D for each code number when we encounter it.

  44. Is this it? Unfortunately, no. It may happen that we run into a code word without having an appropriate entry in D. But, it can only happen in very special circumstances, and we can manufacture the missing entry.

  45. A Bad Run Consider input a a b b b a a ==> 0 0 1 5 3 After reading 0 0 1, D looks like this: a:0, b:1, c:2, aa:3, ab:4

  46. Disaster The next code is 5, but it’s not in D. a:0, b:1, c:2, aa:3, ab:4 How could this have happened? Can we recover?

  47. … narrowly averted • This problem only arises when • the input contains a substring …s  s … • s  was just added to the dictionary. • Here s is a single symbol, but  a (possibly empty) word.

  48. … narrowly averted But then the fix is to output x + first(x) where x is the last decompressed word, and first(x) the first symbol of x. And, we also update the dictionary to contain this new entry.

  49. Example • In our example we had • s = b • w = empty The output and new dictionary word is bb.

  50. Another Example aabbbaabbaaabaababb ==> 0 0 1 5 3 6 7 9 5 Decoding (dictionary size: initial 3, final 11) a 0 a + 0 aa b + 1 ab bb - 5 bb aa + 3 bba bba + 6 aab aab + 7 bbaa aaba - 9 aaba bb + 5 aabab

More Related