410 likes | 1.09k Views
Lempel-Ziv Compression Techniques. Introduction to Run-Length Encoding Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 Example 1: Encoding using LZ78 Example 2: Decoding using LZ78. Introduction to Run-Length Encoding.
E N D
Lempel-Ziv Compression Techniques • Introduction to Run-Length Encoding • Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 • Example 1: Encoding using LZ78 • Example 2: Decoding using LZ78
Introduction to Run-Length Encoding • A run is defined as a sequence of identical characters. Run-length encoding assumes data has many runs. • Each character that repeats itself in a sequence three or more times is replaced by the single character and its frequency. • Data to be transmitted is first scanned to find the coding information before transmission. • For example, the message AAAABBBAABBBBBCCCCCCCCDABCBAAABBBBCCCD • is replaced by 4A3BAA5B8CDABCB3A4B3CD • Saving can be dramatic when long runs are present.
Run-Length Encoding: Possible Problems • A problem arises if one of the characters being transmitted is a digit, as in 11111111111544444 which is represented as 1111554 (eleven 1s, one 5, five 4s). • A solution is instead of using a number n, a character can be used whose ASCII value is n. • For example, the run of 43 consecutive letters “c” is represented as +c (“+” has ASSCII code 43), and the run of 49 1s is coded as 11 (“1” has ASCII code 49). • This method is only modestly efficient for text files and widely used for encoding fax images. • The main advantage of this algorithm is simplicity and speed.
Introduction to Lempel-Ziv Encoding • Data compression up until the late 1970's mainly directed towards creating better methodologies for Huffman coding. • An innovative, radically different method was introduced in1977 by Abraham Lempel and Jacob Ziv. • This technique (called Lempel-Ziv) actually consists of two considerably different algorithms, LZ77 and LZ78. • Due to patents, LZ77 and LZ78 led to many variants: • The zip and unzip use the LZH techique while UNIX's compress methods belong to the LZW and LZC classes.
LZ78 Compression Algorithm Start with empty dictionary P = empty WHILE (there more characters in the char stream) C = next character in the char stream IF (the string P+C in the dictionary) P = P+C ELSE //(if P is empty,zero is its codeword) output the code word corresponding to P output C add the string P+C to the dictionary P = empty IF (P is not empty) output the code word corresponding to P END
Example 1: LZ78 Compression 1 Example 1: Use the LZ78 algorithm to encode the message ABBCBCABABCAABCAAB Solution: The encoding process is presented below in which: • The column STEP indicates the number of the encoding step. • The column POS indicates the current position in the input data. • The column DICTIONARY shows what string has been added to the dictionary. The index of the string is equal to the step number. • The column OUTPUT presents the output in the form (W,C). W represents the index of prefix in the dictionary. • The output of each step decodes to the string that has been added to the dictionary.
Example 1: LZ78 Compression Step 1 ABBCBCABABCAABCAAB POS = 1 C = empty P = empty
Example 1: LZ78 Compression Step 2 ABBCBCABABCAABCAAB POS = 2 C = empty P = empty
Example 1: LZ78 Compression Step 3 ABBCBCABABCAABCAAB POS = 3 C = empty P = empty
Example 1: LZ78 Compression Step 4 ABBCBCABABCAABCAAB POS = 5 C = empty P = empty
Example 1: LZ78 Compression Step 5 ABBCBCABABCAABCAAB POS = 8 C = empty P = empty
Example 1: LZ78 Compression Step 6 ABBCBCABABCAABCAAB POS = 10 C = A P = BCA
Example 1: LZ78 Compression Step 7 ABBCBCABABCAABCAAB POS = 14 C = empty P = empty
Example 1: Coded Information in Bits • Now, we calculate the number of bits needed to represent the coded information <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> • The number of bits needed to represent each integer with indexn is at most equal to the number of bits needed to represent the (n-1)th index. • For example, the number of bits needed to represent the integer 3 within index 4 is equal to 2, because it takes two bits to express 3 (the (4-1)th index) in binary. • Thus, we see that the number of bits required when the string ABBCBCABABCAABCAAB is compressed is (1+8)+(1+8)+(2+8)+(2+8)+(2+8)+(3+8)+(3+8) = 70 bits
LZ78 Decompression Algorithm At the start the dictionary is empty While (more code words in the code stream) W = code word C = character following it output the string of W + C add the string of W + C to the dictionary end
Example 2: LZ78 Decompression Example 2: Use the above algorithm to decompress the compressed output sequence of Example 1, namely, <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
Example 2: LZ78 Decompression Step 1 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty
Example 2: LZ78 Decompression Step 2 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty
Example 2: LZ78 Decompression Step 3 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty
Example 2: LZ78 Decompression Step 4 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty
Example 2: LZ78 Decompression Step 5 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty
Example 2: LZ78 Decompression Step 6 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty
Example 2: LZ78 Decompression Step 7 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty
LZ78: Concluding Remarks • LZ77 is based on a sliding window. Its output is very similar to that LZ78. • The compression ratio of LZ77 is similar to that of LZ78. • The biggest advantage of LZ78 over the LZ77 algorithm is the reduced number of string comparisons in each encoding step. • More application areas of LZ77 and LZ78: • LZ77: gzip, Squeeze, LHA, PKZIP, ZOO • LZ78: compress, GIF, CCITT (modems), ARC, PAK
Exercises • How many bits are required to represent the codeword (2,C) generated on page 11? How many bits are required to represent the associated string without compression? • Use LZ78 to trace encoding the string SATATASACITASA. • Write a Java program that encodes a given string using LZ78. • Write a Java program that decodes a given set of encoded code words using LZ78.