440 likes | 456 Views
Learn about Lempel-Ziv (LZ) encoding, compression, and decompression in multimedia technologies. Understand the dictionary-based encoding method and practice examples for better comprehension.
E N D
P.E.S's Modern College of Engineering, Pune Lecture Notes on MULTIMEDIA TECHNOLOGIES By Ajeet Pathak Department of Information Technology
LEMPEL ZIV (LZ) ENCODING Lempel Ziv (LZ) encoding is an example of a category of algorithms called dictionary-based encoding. The idea is to create a dictionary (a table) of strings used during the communication session. If both the sender and the receiver have a copy of the dictionary, then previously-encountered strings can be substituted by their index in the dictionary to reduce the amount of information transmitted. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM - COMPRESSION Compression In this phase there are two concurrent events: building an indexed dictionary and compressing a string of symbols. The algorithm extracts the smallest substring that cannot be found in the dictionary from the remaining uncompressed string. It then stores a copy of this substring in the dictionary as a new entry and assigns it an index value. Compression occurs when the substring, except for the last character, is replaced with the index found in the dictionary. The process then inserts the index and the last character of the substring into the compressed string. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM - COMPRESSION • It is dictionary-based encoding • Basic idea: • Create a dictionary(a table) of strings used during communication. • If both sender and receiver have a copy of the dictionary, then previously-encountered strings can be substituted by their index in the dictionary. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM - COMPRESSION • Have 2 phases: • Building an indexed dictionary • Compressing a string of symbols • Algorithm: • Extract the smallest substring that cannot be found in the remaining uncompressed string. • Store that substring in the dictionary as a new entry and assign it an index value • Substring is replaced with the index found in the dictionary • Insert the index and the last character of the substring into the compressed string UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM - COMPRESSION Example 1 Compression UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM - DECOMPRESSION Decompression Decompression is the inverse of the compression process. The process extracts the substrings from the compressed string and tries to replace the indexes with the corresponding entry in the dictionary, which is empty at first and built up gradually. The idea is that when an index is received, there is already an entry in the dictionary corresponding to that index. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM - DECOMPRESSION Example 1 Decompression UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM EXAMPLE 2 - Encode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ algorithm The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B) UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM EXAMPLE 2 EXPLANATION 1. A is not in the Dictionary; insert it 2. B is not in the Dictionary; insert it 3. B is in the Dictionary. BC is not in the Dictionary; insert it. 4. B is in the Dictionary. BC is in the Dictionary. BCA is not in the Dictionary; insert it. 5. B is in the Dictionary. BA is not in the Dictionary; insert it. 6. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is not in the Dictionary; insert it. 7. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is in the Dictionary. BCAAB is not in the Dictionary; insert it. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM Encode (i.e., compress) the string BABAABRRRA using the LZ78 algorithm. EXAMPLE 3 The compressed message is: (0,B)(0,A)(1,A)(2,B)(0,R)(5,R)(2, ) UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM EXAMPLE 3 EXPLANATION 1. B is not in the Dictionary; insert it 2. A is not in the Dictionary; insert it 3. B is in the Dictionary. BA is not in the Dictionary; insert it. 4. A is in the Dictionary. AB is not in the Dictionary; insert it. 5. R is not in the Dictionary; insert it. 6. R is in the Dictionary. RR is not in the Dictionary; insert it. 7. A is in the Dictionary and it is the last input character; output a pair containing its index: (2, ) UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM EXAMPLE 4 Encode (i.e., compress) the string AAAAAAAAA using the LZ78 algorithm. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM EXAMPLE 4 EXPLANATION 1. A is not in the Dictionary; insert it 2. A is in the Dictionary AA is not in the Dictionary; insert it 3. A is in the Dictionary. AA is in the Dictionary. AAA is not in the Dictionary; insert it. 4. A is in the Dictionary. AA is in the Dictionary. AAA is in the Dictionary and it is the last pattern; output a pair containing its index: (3, ) UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM EXAMPLE 5 Decode (i.e., decompress) the sequence (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) The decompressed message is: ABBCBCABABCAABCAAB UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM EXAMPLE 6 Decode (i.e., decompress) the sequence (0, B) (0, A) (1, A) (2, B) (0, R) (5, R) (2, ) The decompressed message is: BABAABRRRA UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV (LZ) ALGORITHM EXAMPLE 7 Decode (i.e., decompress) the sequence (0, A) (1, A) (2, A) (3, ) The decompressed message is: AAAAAAAAA UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Compression Decompression • A dictionary that is indexed by “codes” is used. • The dictionary is assumed to be initialized with 256 entries (indexed with ASCII codes 0 through 255) representing the ASCII table. • The compression algorithm assumes that the output is either a file or a communication channel. The input being a file or buffer. • Conversely, the decompression algorithm assumes that the input is a file or a communication channel and the output is a file or a buffer. file/buffer Compressed file/ Communication channel file/buffer UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM • Universal coding schemes, like LZW, do not require advance knowledge and can build such knowledge on-the-fly. • LZW is the foremost technique for general purpose data compression due to its simplicity and versatility. • It is the basis of many PC utilities that claim to “double the capacity of your hard drive” • LZW compression uses a code table, with 4096 as a common choice for the number of table entries. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM • Codes 0-255 in the code table are always assigned to represent single bytes from the input file. • When encoding begins the code table contains only the first 256 entries, with the remainder of the table being blanks. • Compression is achieved by using codes 256 through 4095 to represent sequences of bytes. • As the encoding continues, LZW identifies repeated sequences in the data, and adds them to the code table. • Decoding is achieved by taking each code from the compressed file, and translating it through the code table to find what character or characters it represents. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM • Where is the compression? • Original String to decode : ^WED^WE^WEE^WEB^WET • Decoded String : ^WED<256>E<260><261><257>B<260>T • Plain ASCII coding of the string : 19 * 8 bits = 152 bits • LZW coding of the string: 12*9 bits = 108 bits (7 symbols and 5 codes, each of 9 bits) • Why 9 bits? • An ASCII character has a value ranging from 0 to 255 • All tokens have fixed length • There has to be a distinction in representation between an ASCII character and a Code (assigned to strings of length 2 or more) • Codes can only have values 256 and above <- ASCII characters (0 to 255) 0 9 bits <- Codes (256 to 512) 1 9 bits UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM 1 1 0 0 1 0 1 • With 9 bits we can only have a maximum of 256 codes for strings of length 2 or above (with the first 256 entries for ASCII characters) • Original LZW uses dictionary with 4K entries, with the length of each symbol/code being 12 bits <- ASCII characters (0 to 255 entries) 0 0 0 0 12 bits <- Codes (256 to 4096 entries) 1 • With 12 bits, we can have a maximum of 212 – 256 codes. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM • Approaches for practical implementation • Flush the dictionary periodically • – no wasted codes • Grow the length of the codes as the algorithm proceeds • - First start with a length of 9 bits for the codes. • - Once we run out of codes, increase the length to 10 bits. When we run out of codes with 10 bits then we increase the code length to 11 bits and so on. • - more efficient. Out of codes Out of codes UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM LZW Compression: set w = NIL loop read a character k if wk exists in the dictionary w = wk else output the code for w add wk to the dictionary w = k endloop The program reads one character at a time. If the code is in the dictionary, then it adds the character to the current work string, and waits for the next one. This occurs on the first character as well. If the work string is not in the dictionary, (such as when the second character comes across), it adds the work string to the dictionary and sends over the wire (or writes to a file) the code assigned to the work string without the new character. It then sets the work string to the new character. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Input String: ^WED^WE^WEE^WEB^WET set w = NIL loop read a character k if wk exists in the dictionary w = wk else output the code for w add wk to the dictionary w = k endloop UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM • The LZW decompressor creates the same string table during decompression. • It starts with the first 256 table entries initialized to single characters. • The string table is updated for each character in the input stream, except the first one. • Decoding achieved by reading codes and translating them through the code table being built. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM LZW Decompression: read fixed length token k (code or char) output k w = k loop read a fixed length token k entry = dictionary entry for k output entry add w + first char of entry to the dictionary w = entry endloop The nice thing is that the decompressor builds its own dictionary on its side, that matches exactly the compressor's, so that only the codes need to be sent. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Input String (to decode): ^WED<256>E<260><261><257>B<260>T read a fixed length token k (code or char) output k w = k loop read a fixed length token k (code or char) entry = dictionary entry for k output entry add w + first char of entry to the dictionary w = entry endloop UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 1: Compression using LZW Example 1: Use the LZW algorithm to compress the string BABAABAAA UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 1: LZW Compression Step 1 BABAABAAA P=A C=empty UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 1: LZW Compression Step 2 BABAABAAA P=B C=empty UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 1: LZW Compression Step 3 BABAABAAA P=A C=empty UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 1: LZW Compression Step 4 BABAABAAA P=A C=empty UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 1: LZW Compression Step 5 BABAABAAA P=A C=A UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 1: LZW Compression Step 6 BABAABAAA P=AA C=empty UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 2: LZW Decompression 1 Example 2: Use LZW to decompress the output sequence of Example 1: <66><65><256><257><65><260>. UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 2: LZW Decompression Step 1 <66><65><256><257><65><260> Old = 65 S = A New = 66 C = A UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 2: LZW Decompression Step 2 <66><65><256><257><65><260> Old = 256 S = BA New = 256 C = B UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 2: LZW Decompression Step 3 <66><65><256><257><65><260> Old = 257 S = AB New = 257 C = A UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 2: LZW Decompression Step 4 <66><65><256><257><65><260> Old = 65 S = A New = 65 C = A UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM Example 2: LZW Decompression Step 5 <66><65><256><257><65><260> Old = 260 S = AA New = 260 C = A UNIT I INTRODUCTION TO MULTIMEDIA
LEMPEL ZIV WELSCH (LZW) ALGORITHM LZW advantages/disadvantages • advantages • simple, fast and good compression • can do compression in one pass • dynamic codeword table built for each file • decompression recreates the codeword table so it does not need to be passed • disadvantages • not the optimum compression ratio • actual compression hard to predict UNIT I INTRODUCTION TO MULTIMEDIA