Chapter 4

Chapter 4 Variable-Length Codes: Huffman Codes

Outline • 4.1 Introduction • 4.2 Unique Decoding • 4.3 Instantaneous Codes • 4.4 Construction of Instantaneous Codes • 4.5 The Kraft Inequality • 4.6 Huffman Codes

4.1 Introduction • Consider the problem of efficient coding of message to be sent over a “noiseless” channel. • maximize the number of messages that can be sent in a given period of time. • transmit a message in the shortest possible time. • make the codeword as short as possible.

4.2 Unique Decoding • Source symbols (alphabet): { s1, . . . , sq } • Codes alphabet: { C1, C2, . . . , Cr } • X is a random variable • X→{ s1, . . . , sq } with probabilities {p1, . . . , pq} • X is observed over and over again, i.e., it generates a sequence of symbols {s1, . . . , sq} • Ex: s1 → 000 s2 → 111 encode Si Ci Cj · · ·Ck

The collection of all codewords is called a ‘code’. • Our Objective: • Minimize the average codeword length • Unique decodability − the received message must have a single, unique possible interpretation. Ex. s1→ 0 Source alphabet:{s1, s2, s3, s4} s2 → 01 Code alphabet:{0,1} s3 → 11 s4 → 00 Then 0011 So it doesn’t satisfy unique decodability s4 s3 s1 s1 s3

Ex s1→ 0 s2 → 010 Then 010 s3→ 01 s4 → 10 It also doesn’t satisfy unique decodability • Ex s1→ 0 s2 → 01 s3 → 011 It is a unique decodable code s4 → 111 s1 s4 s2 s3 s1

Definition: • The nth extension of a code is simple all possible concatenations of n symbols of the original source code. • No two encoded concatenations can be the same, even for different extensions. • Every finite sequence of code characters corresponds at most one message. ≡every distinct sequence of source symbols has a corresponding encoded sequence that is unique.

4.3 Instantaneous Codes • Decision (Decoding) tree: s1 = 0 s2 = 10 s3 = 110 s4 = 111 s1 0 Initial state s2 0 1 s3 0 1 1 s4

Note that: each bit of the received stream is examined only once and that the terminal states of this tree are the four source symbols s1, s2, s3 and s4. • Definition: A code is instantaneous if it is decodable without lookahead (i.e., a word can be recognized as soon as complete). • When a complete symbol is received, the receiver immediately know this, and do not have to look further before deciding what message symbol you received. • A code is instantaneous iff no codeword si is a prefix of another codeword sj.

the existence of the decoding tree ≡ the existence of the instantaneous decodability • Ex. Let n be a positive integer. A comma code is a code with codewords • “1” becomes a comma to represent end of a codeword. • Because a comma code is prefix-free, it is a instantaneous code. 1, 01, 001, 0001, . . . , 00···01, 00···0 n-1 n

s1→ 0 s2→ 01 s3 → 011 s4 → 111 Not instantaneous code, but it still be uniquely decodable code. • Ex: ex: 01111·····111 So it had better use comma code. U.D. I.C. is better than U.D. I.C. s1→ 1 s2→ 01 s3 → 001 s4 → 001 s4 s4

4.4 Construction of Instantaneous Codes • Given five symbols si in the source code S. Both C1 and C2 are Instantaneous Codes, which one is better? Answer: Depends on the frequency of occurrence of the symbols s1→ 0 s2→ 10 s3 → 110 s4 → 1110 s5 → 1111 s1→ 00 s2→ 01 s3 → 10 s4 → 110 s5 → 111 C1 C2

4.5 Kraft Inequality • Theorem: A necessary and sufficient condition for the existence of an instantaneous code S of q symbols si (i = 1, .., q) with encoded words of length l1  l2 ··· lq is where r is the radix (number of symbols) of the alphabet of the encoded symbols.

Thm: An instantaneous code with word length n1 , n2, . . ., nMexits iff where D is the size of the code alphabet. () For simplicity, we assume D = 2(1) when H = 1, n1 = 1 and n2 = 1 s10 s21 s1 0 is OK for tree of length 1 1 s2

(2)If H  h is OK, then k’ 1k” 1when H = h + 1,By induction method, the inequality is true. K’ k K”

Another proof:() C = {c1, c2, …, cM} with codeword lengths l1, …, lMLet L = max{ li}Ifwhere yj are any code symbols, cannot be in C because ciis a prefix of x. x haspossibilities. words (length of L) not in C

=> If there are 1 number of words with length 1 then 1 r. If there are 2 number of words with length 2 then (2  r2 - 1r). Infer that, 3  r3 - 1r2 - 2r.=> 1  r 1r + 2  r21r2 + 2r+ 3  r3 1rn-1+ 2rn-2 +n  rnSo if it satisfy the last equation, then all the equations hold. … => It satisfy Kraft’s inequality.

Note: A code may obey Kraft inequality still not be instantaneous.EX: 0 01 011 111EX: Binary Block codes (Error Correcting Codes) ( ) ( ) But it is not I.C. n k b : 2

Ex: Comma code D {…….}length 1 1 (It must to be.)length 2 D-1length 3 (D-1)2length k D(D-1)k-1 • Kraft inequality can be extended to any uniquely decodable codes. … …

McMillan Inequality:Thm: A uniquely decodable code has word length l1,l2, …, lqexits iff (r is the size of the code alphabet) ()Trivial. Because I.C. is one kind of U.C.()where l is the length of the longest symbol, i.e., and Nk is the number of code symbols (of radix r) of length k.

(the number of distinct sequences of length k in radix r)If k > 1, we can find a n s.t. kn > nl→←

4.6 Huffman Codes • Lemma: If a code C is optimal within the class of instantaneous codes, then C is optimal with the entire class of U.D. codes. • pf: Suppose C ’ is a U.D. code. C ’ has a smaller average codeword length than C. Let n1’, n2’ , . . . , nM’ be the codeword length of C ’ So, C is not optimal in I.C. →← (It satisfy Kraft Inequality)

Optimal Codes: Given a binary I.C. C with codeword length n1, …, nM associated with probability p1, …, pM. For convenience, let {p1 ≥ p2 ··· ≥ pM-1 ≥ pM} (ni ≤ ni+1≤ ··· ≤ ni+r if pi = pi+1 = ··· = pi+r) Then C is optimal within the class of I.C., C must have the following properties:

(a) Higher probable symbols have shorter codewords. i.e. if pj> pk => nj ≤ nk • (b) The 2 least probable symbols have codewords of equal length, i.e., nM-1 = nM • (c) Among the codewords of length nM, 2 codes the agree in all digits except the least one. • Ex: x1→ 0 x2→ 100 x3 → 101 x4 → 1101 x5 → 1110 Don’t satisfy (c), it have to be x4 → 1101 x5 → 1100

pf: (a) if ( pj > pk ) ( nj > nk ) then we can construct a better codes C ’ by interchange codewords j, k. (b) From (a) if pM-1 > pM then nM-1≤ nM & By assumption if pM-1 = pM then nM-1 ≤ nM .We may make nM-1 = nM and still have in I.C. better than the original one. (c) If condition (c) is not true, we may drop the least digit of all such codewords to obtain a better code. • Huffman coding─ Construction of Optimal (instantaneous) codes

Let x1, …, xM be an array of symbols with probabilities p1, …, pM ( p1 ≥ p2 ≥ ··· ≥ pM) (1) Combine xM-1, xM into xM-1,M with probability pM-1+pM (2) Assume we can construct an O.C. C2 for x1, x2, …, xM-1,M (3) Now construct a code C1 for x1, …, xM as follows • The codeword associated with x1, …, xM-2 in C1 is exactly the same as the corresponding codewords of C2 • Let wM-1,M be the codeword of xM-1,M in C2 The codewords for xM-1, xM in C1 is either wM-1,M 0 → xM-1 or wM-1,M 1 → xM

Claim: C1 is an optimal code for the set of probability p1, …, pM. • Ex: x3,4,5,6 0.45 x1 0.3 x2 0.25 x1,2 0.55 x3,4,5,6 0.45 x1 0.3 x2 0.25 x3 0.2 x4 0.1 x5 0.1 x6 0.05 x1 0.3 x2 0.25 x3 0.2 x5,6 0.15 x4 0.1 x1 0.3 x2 0.25 x4,5,6 0.25 x3 0.2

x100 x201 x1,20 x3,4,5,61 x310 x4,5,611 x3,4,5,61 x4110 x5,6111 x51110 x61111

pf: • We assume that C1 is not optimal. • Let C1’ be an optimal instantaneous code for x1, …, xM. • Then C1’ has codewords w1’, w2’, …, wM’ with length n1’, n2’, …, nM’. • If there are only two symbols of maximum length in a tree, they must have their last decision node in common, and they must be the two least probable symbols. Before we reduce a tree, the two symbols contribute nM( pM + pM-1) and after the reduction they contribute (nM - 1)( pM + pM-1). • So that the code length is reduced by ( pM + pM-1). Average length of C1 > Average length of C1’ --- (1) • After reduction, Average length of C2 > Average length of C2’ (The terms of (1) minus pM+pM-1) • But C2 is optimal →←

If there are more than two symbols of the maximum length, we can use the following proposition: • Symbols having the same length may be inter-change without changing the average code length. • We can use the biggest two probable symbols to encode like the way before. • Huffman encoding is not unique.

p1 = 0.4 → 00 p2 = 0.2 → 10 p3 = 0.2 → 11 p4 = 0.1 → 010 p5 = 0.1 → 011 Average length : L = 0.4·2+0.2·2+0.2·2+0.1·3+0.1·3 = 2.2 • Ex: Or p1 = 0.4 → 1 p2 = 0.2 → 01 p3 = 0.2 → 000 p4 = 0.1 → 0010 p5 = 0.1 → 0011 Average length : L = 0.4·1+0.2·2+0.2·3+0.1·4+0.1·4 = 2.2

Which encoding way is better? • Var( I ) = 0.4(2-2.2)2 + 0.2(2-2.2)2 + 0.2(2-2.2)2 + 0.1(3-2.2)2 + 0.1(3-2.2)2 = 0.16 (Good!) • Var( II ) = 0.4(1-2.2)2 + 0.2(2-2.2)2 + 0.2(3-2.2)2 + 0.1(4-2.2)2 + 0.1(4-2.2)2 = 1.36

Chapter 4

Chapter 4

Presentation Transcript

Chapter 4-4

Chapter 4 - 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4