1 / 19

Dictionary Techniques

Dictionary Techniques. Split the input into classes, frequently and infrequently occurring. Keep a list , or DICTIONARY, of frequently occurring patterns and encode them with a reference to the dictionary. Encode others less efficiently. Dictionary techniques.

wynona
Download Presentation

Dictionary Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dictionary Techniques • Split the input into classes, frequently and infrequently occurring. • Keep a list , or DICTIONARY,of frequently occurring patterns and encode them with a reference to the dictionary. • Encode others less efficiently

  2. Dictionary techniques • The size of the dictionary must be much smaller that the number of all possible patterns. • Useful with sources that generate a relatively small number of patterns, such as text sources and computer commands. • Effective for skewed alphabets

  3. Dictionary techniques Depending upon how much knowledge is available, there are static and adaptive dictionary techniques.

  4. Static Dictionary • The dictionary is permanent (or allowing addition, but not deletion) • Application-specific, or data specific Example-1: Digram Coding for text compression be, th, ie, ch, sh, ar, or, en,…..

  5. Example of static dictionary Let the source alphabet A={a,b,….z,., ,,!,?, :, ;} of size 32 • For 4-character words, there are 324=220 patterns. • Thus, Fixed–length coding needs 20 b/word.

  6. Example of static dictionary • Put 256 =28 most frequently occurring patterns into a dictionary • If a pattern is in the dictionary • (1-bit flag)+(8-bit index)=9 bits • else • (1-bit flag)+(20-bit code)= 21 bits

  7. Example of static dictionary • L =9p+21(1-p)=21-12p bits/word , where p is the probability of a pattern in a dictionary • L < 20, if p>0.0833 • p is to be skewed to get high compression

  8. Adaptive Dictionary-Based Techniques Jacob Ziv, Abraham Lempel LZ techniques And the contribution of TERRY WELSH LZW algorithm

  9. IDEA • Adapt to the characteristics of the source. • The dictionary is a portion of the previously encoded sequence. • Start with an empty dictionary. • Add entries as they are found in the input stream.

  10. LZ77 Search pointer W S Previously encoded sequence Next portion of a sequence Asearch pointer is moved back through the search buffer that contains a portion of the recently encoded to match a pattern, or a symbol in the look ahead buffer.

  11. LZ77 • The encoder searches the search buffer for the longest match pattern and sends Code=(Offset, Max_match_length,New_symbol) Where, Offset is a distance from the pointer to the found pattern. New Symbol is a code of a next symbol after the match pattern. Max_match_length – is a number of symbols in the string found in the search buffer and identical with those in the beginning of lookahead buffer

  12. Length • If the size of the source alphabet is A, then the number of bits needed to encode the triple using fixed-length codes is Log2 S + log2W+ log2A

  13. Example: • Search buffer of size 7, look-ahead buffer of size 6 • No match is found in the search buffer, so • <0,0,c(d)>

  14. Coding: cabracadabrarrarrad

  15. DECODING cabracadabrarrarrad

  16. Analysis of LZ77 • LZ77 assumes patterns in the input stream occur close together. • Any pattern that recurs over a period longer than the search buffer size will not be captured. • A better compression method would save frequently occurring patterns in the dictionary. • The size L of look-ahead buffer is limited • The size S of search buffer is limited

  17. Analysis of LZ77 • When increasing L (or S), longer matches would be possible, thus compression efficiency increases • But search for longer matches would reduce the speed. • When increasing the length of buffers, compression efficiency drops

  18. Improvements of LZ77 • To encode the triples using VLC, e.g. PKZIP, ZIP, LHarc, PNG, ARJ, Winzip LZSS • Encode two fields instead of three • Use a flag bit to indicate whether what follows is the codeword for a new symbol. • For example 0- for single characters • 1-for triples

  19. LZSS- example

More Related