290 likes | 467 Views
CS 253: Algorithms. LZW Data Compression. Data Compression. Reduce the size of data Reduces storage space and hence the cost Reduces time to retrieve and transmit data Compression ratio = original data size / compressed data size. Lossless and Lossy Compression.
E N D
CS 253: Algorithms LZW Data Compression
Data Compression • Reduce the size of data • Reduces storage space and hence the cost • Reduces time to retrieve and transmit data Compression ratio = original data size / compressed data size
Lossless and Lossy Compression • compressedData = compress(originalData) • decompressedData= decompress(compressedData) • Lossless compression originalData = decompressedData • Lossy compression originalDatadecompressedData
Lossless and Lossy Compression • Lossy compressors generally obtain much higher compression ratios as compared to lossless compressors e.g. 100 vs. 2 • Lossless compression is essential in applications such as text file compression. • Lossycompression is acceptable in many audio and imaging applications • In video transmission, a slight loss in the transmitted video is not noticableby the human eye.
Text Compression • Lossless compression is essential • Popular text compressors such as zip and Unix’s compress are based on the LZW(Lempel-Ziv-Welch) method. LZW Compression • Character sequences in the original text are replaced by codes that are dynamically determined. • The code table is not encoded into the compressed text, because it may be reconstructed from the compressed text during decompression.
code 0 1 key a b LZW Compression • Assume the letters in the text are limited to {a, b} • In practice, the alphabet may be the 256 character ASCII set. • The characters in the alphabet are assigned code numbers beginning at 0 • The initial code table is:
code 0 1 key a b LZW Compression • Original text = abababbabaabbabbaabba • Compression is done by scanning the original text from left to right. • Find longest prefix p for which there is a code in the code table. • Represent p by its code pCode and assign the next available code number to pc, where c is the next character in the text to be compressed.
2 ab code 0 1 key a b LZW Compression • Original text =abababbabaabbabbaabba • p = a pCode= 0 Compressed text = 0 • c = b • Enter p | c = abinto the code table.
3 ba code 0 1 2 key a b ab • Original text = abababbabaabbabbaabba • Compressed text = 0 • p = b pCode= 1 Compressed text = 01 • c = a • Enter p | c = bainto the code table.
4 aba code 0 1 2 key a b ab 3 ba • Original text = abababbabaabbabbaabba • Compressed text = 01 • p = abpCode= 2 Compressed text = 012 • c = a • Enteraba into the code table
5 abb code 0 1 2 key a b ab 3 4 ba aba • Original text = abababbabaabbabbaabba • Compressed text = 012 • p = abpCode= 2 Compressed text = 0122 • c = b • Enterabbinto the code table.
6 bab code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 0122 • p = bapCode= 3 Compressed text = 01223 • c = b • Enterbabinto the code table.
7 6 baa bab code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 01223 • p = bapCode= 3 Compressed text = 012233 • c = a • Enterbaa into the code table.
6 7 8 baa bab abba code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 012233 • p = abbpCode= 5 Compressed text = 0122335 • c = a • Enterabbainto the code table.
7 6 9 8 abbaa baa bab abba code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 0122335 • p = abbapCode= 8 Compressed text = 01223358 • c = a • Enterabbaainto the code table.
7 6 9 8 abbaa baa bab abba code 0 1 2 key a b ab 3 4 5 ba aba abb • Original text = abababbabaabbabbaabba • Compressed text = 01223358 • p = abbapCode= 8 Compressed text = 012233588 • c = null • No need to enter anything to the table
8 9 7 6 abbaa abba baa bab code 0 1 2 key a b ab Code Table Representation 3 4 5 • Dictionary. • Pairs are (key, element) = (key, code). • Operations are: get(key) and put(key, code) • Limit number of codes to212 • Use a hash table • Convert variable length keys into fixed length keys. • Each key has the form pc, where the string p is a key that is already in the table. • Replace pc with (pCode)c ba aba abb
8 7 6 9 9 8 7 6 8a bab baa abba 3a 5a abbaa 3b 3 4 5 ba aba abb 3 4 5 1a 2a 2b code code 0 0 1 1 2 2 key key a a b b ab 0b Code Table Representation
code 0 1 key a b LZW Decompression • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • Convert codes to text from left to right. • pCode=0representsp=aDecompressedText= a
code 0 1 2 key a b ab LZW Decompression • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode=1representsp = bDecompressedText= ab • lastP = a followed by first character of p is entered into the code table (a | b)
code 0 1 2 3 key a b ab ba • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 2representsp=abDecompressedText= abab • lastP= b followed by first character of p is entered into the code table (b | a)
code 0 1 2 3 key a b ab ba 4 aba • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 2representsp = ab DecompressedText= ababab. • lastP= ab followed by first character of p is entered into the code table ( ab | a )
code 0 1 2 3 key a b ab ba 4 5 aba abb • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 3representsp = baDecompressedText= abababba. • lastP= ab followed by first character of p is entered into the code table
code 0 1 2 3 key a b ab ba 4 5 6 aba abb bab • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 3represents p = baDecompressedText= abababbaba. • lastP= bafollowed by first character of p is entered into the code table.
code 0 1 2 3 key a b ab ba 4 5 6 7 aba abb bab baa • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • pCode = 5representsp = abbDecompressedText= abababbabaabb. • lastP= bafollowed by first character of p is entered into the code table.
code 0 1 2 3 key a b ab ba 4 5 6 7 8 aba abb bab baa abba • Original text = abababbabaabbabbaabba • Compressed text = 012233588 • 8represents ??? • When a code is not in the table (then, it is the last one entered), and its key is lastP followed by first character of lastP • lastP = abb • So 8 represents p = abbaDecompressed text = abababbabaabbabba
code 0 1 2 3 key a b ab ba 4 5 6 7 9 aba abb bab baa abbaa 8 • Original text = abababbabaabbabbaabba • Compressed text = 012233588 abba • pCode= 8representsp=abbaDecompressedText=abababbabaabbabbaabba. • lastP= abbafollowed by first character of p is entered into the code table ( abba | a )
8 9 7 6 abbaa abba baa bab code 0 1 2 key a b ab Code Table Representation 3 4 5 • Dictionary. • Pairs are (code, what the code represents) = (code, key). • Operations are : get(code, key) andput(key) • Code is an integer 0, 1, 2, … • Use a 1D array codeTable. E.g.codeTable[code] = Key • Each keyhas the form pc, where the string p is a codeKeythat is already in the table; therefore you may replace pc with (pCode)c ba aba abb
Time Complexity • Compression Expected time = O(n) where n is the length of the text that is being compressed. • Decompression O(n), where n is the length of the decompressed text.