1 / 21

STATISTIC & INFORMATION THEORY (CSNB134)

STATISTIC & INFORMATION THEORY (CSNB134). MODULE 10 VARIABLE LENGTH CODING. Recaps. In Module 9, you have been introduced to the concepts of two type of coding:- (1) fixed length coding (FLC) the number of bits that represent all symbols are equal

malina
Download Presentation

STATISTIC & INFORMATION THEORY (CSNB134)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STATISTIC & INFORMATION THEORY (CSNB134) MODULE 10 VARIABLE LENGTH CODING

  2. Recaps.. • In Module 9, you have been introduced to the concepts of two type of coding:- (1) fixed length coding (FLC) the number of bits that represent all symbols are equal In Module 8 itself, we have learned about ASCII code. ASCII code is a good example of a VLC. (2) variable length coding (VLC) the number of bits that represent all symbols are not equal

  3. Recaps.. • The need for VLC arises in order to increase the coding efficiency of information. • Recaps: Coding Efficiency formulae: where the entropy formulae: and the average number of bits formulae:

  4. Fixed Length Coding (FLC) • Assuming we want to transmit three symbols where each has the following probability of occurrence: • By using ordinary binary FLC we need to use 2 bits to represent 3 symbols, thus we derive the following codewords:

  5. Exercise 1 • Based on the previous example of FLC, what is the actual average number of bits (Ṝ)? • What is the entropy (H)? • What is the coding efficiency (ἠ)?

  6. Note: Just look at this sentence, there are many vowels but none ‘k’ or ‘x’ Variable Length Coding (VLC) • Variable length coding is made possible due to the unequal probabilities of occurrence of amongst symbols. • For example, in the English text email that we compose everyday, the vowel ‘a’, ‘e’, ‘i’, ‘o’, ‘u’ occurs more frequently compared to consonant letters such as ‘k’ and ‘x’. • Thus, we can use VLC to manipulate the unequal probabilities of occurrence by: (i) assigning most bits to the symbol that have the lowest probability of occurrence (ii) assigning least bits to the symbol that have the highest probability of occurrence

  7. Variable Length Coding (cont.) • In this module, we will study two techniques of FLC which are: (i) Fano Coding (ii) Huffman Coding IMPORTANT!U need to know EXACTLY how to derive the codewords by using both techniques!!!

  8. Fano Coding The rules to generate Fano Code are: 1.Sort the symbols by falling probabilities 2.Divide in two groups, so that both groups have equal or almost equal sums of probabilities 3.Assignvalue 0 to the first group, and value 1 to the second 4.For each of the both groups go to step 2 5.Repeat 4 until all symbols have a code word assigned

  9. Step2 Divide2 Step4 Repeat Step2 Step1Sort 0 0 1 1 Step3 Assign Step5 Repeat Step3 Fano Coding (cont.) • Assuming we want to transmit three symbols where each has the following probability of occurrence: • To generate Fano codewords: SymbolP(x) A 0.5 B 0.25 C 0.25

  10. Exercise 2 • Based on the previous example of VLC using Fano Code, what is the actual average number of bits (Ṝ)? • What is the entropy (H)? • What is the coding efficiency (ἠ)? Note: ἠ=1 means the code is optimal. This is possible since the example is simple. In actual more complicated scenario, Fano coding rarely reach optimum!

  11. 0 0 1 0 1 0 1 1 Exercise 3 • Derive the Fano codewords for symbols A, B, C, D and E where the probabilities occurrence of each symbol are 0.4, 0.2, 0.2, 0.1 and 0.1 respectively. SymbolP(x) A 0.4 B 0.2 C 0.2 D 0.1 E 0.1

  12. Exercise 4 • Based on the previous example of VLC using Fano Code, what is the actual average number of bits (Ṝ)?

  13. Exercise 4 (cont.) • What is the entropy (H)? • What is the coding efficiency (ἠ)?

  14. Exercise 5 • Based on the previous example where symbols A, B, C, D and E have the probabilities occurrence of 0.4, 0.2, 0.2, 0.1 and 0.1 respectively, derive the coding efficiency (ἠ) by using normal binary FLC? • What can you concluded from this? Variable Length Coding (VLC) yields better coding efficiency compared to FLC (Fixed Length Coding). Note: H is similar to the H in exercise 4, but R is 3 since we need 3 bits to sufficiently represent fixed length of 5 symbol (23)!

  15. Recaps… • Step 2 of Fano coding is:- “Divide in two groups, so that both groups have equal or almost equal sums of probabilities” • The term ‘almost' is ambiguous; it also is difficult to define the algorithm and may not fully optimize the coding efficiency capability of VLC. • The (usually) better method is Huffman Coding

  16. Huffman Coding The rules to generate Huffman Code are: 1. Sort the symbols by falling probabilities 2. Add the two lowest probable symbols 3. Assign value 0 to the first symbol and 1 to the second 4. Go to step 1 and repeat 4 until the total sum of probabilities is 1. 5. Trace back all branches until you reach the original probabilities and note the branching labels. Done.

  17. Step1Sort 0 0 1 Step2 Add 2 Lowest Step3 Assign Huffman Coding (cont.) • Assuming we want to transmit six symbols where each has the following probability of occurrence: • To generate Huffman codewords: SymbolP(x) A 0.38 B 0.33 C 0.15 D 0.07 E 0.04 F 0.03 Step4 Repeat Step1 (Sort) 0 0.62 1 0.38 0.38 0.38 0.38 1 0.33 0.33 0.33 0 0.62 0.15 0.15 0 1 0.29 0.07 0.14 1 1 Last Step: Trace back 0.07

  18. Exercise 6 • Based on the previous example of VLC using Huffman Code, what is the actual average number of bits (Ṝ)? • What is the entropy (H)? • What is the coding efficiency (ἠ)?

  19. Exercise 7 • Derive the Huffman codewords for symbols A, B, C, D, E and F where the probabilities occurrence of each symbol are 0.44, 0.27, 0.15, 0.06, 0.05 and 0.03 respectively. SymbolP(x) A 0.44 0.44 0.44 0.44 B 0.27 0.27 0.27 C 0.15 0.15 0.15 D 0.06 E 0.05 0.27 0.44 F 0.03 0.06 0 0 0 0.29 0.56 1 0 0 1 1 1 0.08 0.14 1 1

  20. Exercise 8 • Derive the Huffman codewords for symbols A, B, C, D, E and F where the probabilities occurrence of each symbol are 0.35, 0.2, 0.15, 0.12, 0.1 and 0.08 respectively. 0.38 SymbolP(x) 0.35 0.35 A 0.35 0.35 0.27 0.27 B 0.2 0.2 0.2 0.38 C 0.15 D 0.12 E 0.1 F 0.08 0.15 0.12 0 0.62 0 1 1 1 0 0.38 1 0 0.18 0.18 1 0 0.27 1

  21. STATISTIC & INFORMATION THEORY (CSNB134) VARIABLE LENGTH CODING --END--

More Related