1 / 29

CS 253: Algorithms

CS 253: Algorithms. LZW Data Compression. Data Compression. Reduce the size of data Reduces storage space and hence the cost Reduces time to retrieve and transmit data Compression ratio = original data size / compressed data size. Lossless and Lossy Compression.

shasta
Download Presentation

CS 253: Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 253: Algorithms LZW Data Compression

  2. Data Compression • Reduce the size of data • Reduces storage space and hence the cost • Reduces time to retrieve and transmit data Compression ratio = original data size / compressed data size

  3. Lossless and Lossy Compression • compressedData = compress(originalData) • decompressedData= decompress(compressedData) • Lossless compression  originalData = decompressedData • Lossy compression  originalDatadecompressedData

  4. Lossless and Lossy Compression • Lossy compressors generally obtain much higher compression ratios as compared to lossless compressors e.g. 100 vs. 2 • Lossless compression is essential in applications such as text file compression. • Lossycompression is acceptable in many audio and imaging applications • In video transmission, a slight loss in the transmitted video is not noticableby the human eye.

  5. Text Compression • Lossless compression is essential • Popular text compressors such as zip and Unix’s compress are based on the LZW(Lempel-Ziv-Welch) method. LZW Compression • Character sequences in the original text are replaced by codes that are dynamically determined. • The code table is not encoded into the compressed text, because it may be reconstructed from the compressed text during decompression.

  6. code 0 1 key a b LZW Compression • Assume the letters in the text are limited to {a, b} • In practice, the alphabet may be the 256 character ASCII set. • The characters in the alphabet are assigned code numbers beginning at 0 • The initial code table is:

  7. code 0 1 key a b LZW Compression • Original text = abababbabaabbabbaabba • Compression is done by scanning the original text from left to right. • Find longest prefix p for which there is a code in the code table. • Represent p by its code pCode and assign the next available code number to pc, where c is the next character in the text to be compressed.

  8. 2 ab code 0 1 key a b LZW Compression • Original text =abababbabaabbabbaabba • p = a pCode= 0 Compressed text = 0 • c = b • Enter p | c = abinto the code table.

  9. 3 ba code 0 1 2 key a b ab • Original text = abababbabaabbabbaabba • Compressed text = 0 • p = b pCode= 1 Compressed text = 01 • c = a • Enter p | c = bainto the code table.

  10. 4 aba code 0 1 2 key a b ab 3 ba • Original text = abababbabaabbabbaabba • Compressed text = 01 • p = abpCode= 2 Compressed text = 012 • c = a • Enteraba into the code table

  11. 5 abb code 0 1 2 key a b ab 3 4 ba aba • Original text = abababbabaabbabbaabba • Compressed text = 012 • p = abpCode= 2 Compressed text = 0122 • c = b • Enterabbinto the code table.

  12. 6 bab code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 0122 • p = bapCode= 3 Compressed text = 01223 • c = b • Enterbabinto the code table.

  13. 7 6 baa bab code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 01223 • p = bapCode= 3 Compressed text = 012233 • c = a • Enterbaa into the code table.

  14. 6 7 8 baa bab abba code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 012233 • p = abbpCode= 5 Compressed text = 0122335 • c = a • Enterabbainto the code table.

  15. 7 6 9 8 abbaa baa bab abba code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 0122335 • p = abbapCode= 8 Compressed text = 01223358 • c = a • Enterabbaainto the code table.

  16. 7 6 9 8 abbaa baa bab abba code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 01223358 • p = abbapCode= 8 Compressed text = 012233588 • c = null • No need to enter anything to the table

  17. 8 9 7 6 abbaa abba baa bab code 0 1 2 key a b ab Code Table Representation 3 4 5 • Dictionary. • Pairs are (key, element) = (key, code). • Operations are: get(key) and put(key, code) • Limit number of codes to212 • Use a hash table • Convert variable length keys into fixed length keys. • Each key has the form pc, where the string p is a key that is already in the table. • Replace pc with (pCode)c ba aba abb

  18. 8 7 6 9 9 8 7 6 8a bab baa abba 3a 5a abbaa 3b 3 4 5 ba aba abb 3 4 5 1a 2a 2b code code 0 0 1 1 2 2 key key a a b b ab 0b Code Table Representation

  19. code 0 1 key a b LZW Decompression • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • Convert codes to text from left to right. • pCode=0representsp=aDecompressedText= a

  20. code 0 1 2 key a b ab LZW Decompression • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode=1representsp = bDecompressedText= ab • lastP = a followed by first character of p is entered into the code table (a | b)

  21. code 0 1 2 3 key a b ab ba • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 2representsp=abDecompressedText= abab • lastP= b followed by first character of p is entered into the code table (b | a)

  22. code 0 1 2 3 key a b ab ba 4 aba • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 2representsp = ab DecompressedText= ababab. • lastP= ab followed by first character of p is entered into the code table ( ab | a )

  23. code 0 1 2 3 key a b ab ba 4 5 aba abb • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 3representsp = baDecompressedText= abababba. • lastP= ab followed by first character of p is entered into the code table

  24. code 0 1 2 3 key a b ab ba 4 5 6 aba abb bab • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 3represents p = baDecompressedText= abababbaba. • lastP= bafollowed by first character of p is entered into the code table.

  25. code 0 1 2 3 key a b ab ba 4 5 6 7 aba abb bab baa • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 5representsp = abbDecompressedText= abababbabaabb. • lastP= bafollowed by first character of p is entered into the code table.

  26. code 0 1 2 3 key a b ab ba 4 5 6 7 8 aba abb bab baa abba • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • 8represents ??? • When a code is not in the table (then, it is the last one entered), and its key is lastP followed by first character of lastP • lastP = abb • So 8 represents p = abbaDecompressed text = abababbabaabbabba

  27. code 0 1 2 3 key a b ab ba 4 5 6 7 9 aba abb bab baa abbaa 8 • Original text = abababbabaabbabbaabba • Compressed text = 012233588 abba • pCode= 8representsp=abbaDecompressedText=abababbabaabbabbaabba. • lastP= abbafollowed by first character of p is entered into the code table ( abba | a )

  28. 8 9 7 6 abbaa abba baa bab code 0 1 2 key a b ab Code Table Representation 3 4 5 • Dictionary. • Pairs are (code, what the code represents) = (code, key). • Operations are : get(code, key) andput(key) • Code is an integer 0, 1, 2, … • Use a 1D array codeTable. E.g.codeTable[code] = Key • Each keyhas the form pc, where the string p is a codeKeythat is already in the table; therefore you may replace pc with (pCode)c ba aba abb

  29. Time Complexity • Compression Expected time = O(n) where n is the length of the text that is being compressed. • Decompression O(n), where n is the length of the decompressed text.

More Related