280 likes | 668 Views
Dictionary Techniques. Split the input into classes, frequently and infrequently occurring. Keep a list , or DICTIONARY, of frequently occurring patterns and encode them with a reference to the dictionary. Encode others less efficiently. Dictionary techniques.
E N D
Dictionary Techniques • Split the input into classes, frequently and infrequently occurring. • Keep a list , or DICTIONARY,of frequently occurring patterns and encode them with a reference to the dictionary. • Encode others less efficiently
Dictionary techniques • The size of the dictionary must be much smaller that the number of all possible patterns. • Useful with sources that generate a relatively small number of patterns, such as text sources and computer commands. • Effective for skewed alphabets
Dictionary techniques Depending upon how much knowledge is available, there are static and adaptive dictionary techniques.
Static Dictionary • The dictionary is permanent (or allowing addition, but not deletion) • Application-specific, or data specific Example-1: Digram Coding for text compression be, th, ie, ch, sh, ar, or, en,…..
Example of static dictionary Let the source alphabet A={a,b,….z,., ,,!,?, :, ;} of size 32 • For 4-character words, there are 324=220 patterns. • Thus, Fixed–length coding needs 20 b/word.
Example of static dictionary • Put 256 =28 most frequently occurring patterns into a dictionary • If a pattern is in the dictionary • (1-bit flag)+(8-bit index)=9 bits • else • (1-bit flag)+(20-bit code)= 21 bits
Example of static dictionary • L =9p+21(1-p)=21-12p bits/word , where p is the probability of a pattern in a dictionary • L < 20, if p>0.0833 • p is to be skewed to get high compression
Adaptive Dictionary-Based Techniques Jacob Ziv, Abraham Lempel LZ techniques And the contribution of TERRY WELSH LZW algorithm
IDEA • Adapt to the characteristics of the source. • The dictionary is a portion of the previously encoded sequence. • Start with an empty dictionary. • Add entries as they are found in the input stream.
LZ77 Search pointer W S Previously encoded sequence Next portion of a sequence Asearch pointer is moved back through the search buffer that contains a portion of the recently encoded to match a pattern, or a symbol in the look ahead buffer.
LZ77 • The encoder searches the search buffer for the longest match pattern and sends Code=(Offset, Max_match_length,New_symbol) Where, Offset is a distance from the pointer to the found pattern. New Symbol is a code of a next symbol after the match pattern. Max_match_length – is a number of symbols in the string found in the search buffer and identical with those in the beginning of lookahead buffer
Length • If the size of the source alphabet is A, then the number of bits needed to encode the triple using fixed-length codes is Log2 S + log2W+ log2A
Example: • Search buffer of size 7, look-ahead buffer of size 6 • No match is found in the search buffer, so • <0,0,c(d)>
Analysis of LZ77 • LZ77 assumes patterns in the input stream occur close together. • Any pattern that recurs over a period longer than the search buffer size will not be captured. • A better compression method would save frequently occurring patterns in the dictionary. • The size L of look-ahead buffer is limited • The size S of search buffer is limited
Analysis of LZ77 • When increasing L (or S), longer matches would be possible, thus compression efficiency increases • But search for longer matches would reduce the speed. • When increasing the length of buffers, compression efficiency drops
Improvements of LZ77 • To encode the triples using VLC, e.g. PKZIP, ZIP, LHarc, PNG, ARJ, Winzip LZSS • Encode two fields instead of three • Use a flag bit to indicate whether what follows is the codeword for a new symbol. • For example 0- for single characters • 1-for triples