150 likes | 290 Views
Chapter 4. Variable–Length and Huffman Codes. Unique Decodability. We must always be able to determine where one code word ends and the next one begins. Counterexample: Suppose: s 1 = 0; s 2 = 1; s 3 = 11; s 4 = 00 0011 = s 4 s 3 or s 1 s 1 s 3
E N D
Chapter 4 Variable–Length and Huffman Codes
Unique Decodability We must always be able to determine where one code word ends and the next one begins. Counterexample: Suppose: s1 = 0; s2 = 1; s3 = 11; s4 = 00 0011 = s4s3 or s1s1s3 Unique decodability means that any two distinct sequences of symbols (of possibly differing lengths) result in distinct code words. 4.1, 2
Instantaneous Codes 1 1 1 s1 = 0 s2 = 10 s3 = 110 s4 = 111 s4 No code word is the prefix of another. By reading a continuous sequence of code words, one can instantaneously determine the end of each code word. Consider the reverse: s1 = 0; s2 = 01; s3 = 011; s4 = 111 0111……111 is uniquely decodable, but the first symbol cannot be decoded without reading all the way to the end. 0 0 0 decoding tree s1 s2 s3 4.3
Constructing Instantaneous Codes comma code: s1 = 0 s2 = 10 s3 = 110 s4 = 1110 s5 = 1111 modification: s1 = 00 s2 = 01 s3 =10 s4 = 110 s5 = 111 Decoding tree 0 1 0 1 0 1 s1 = 00 s2 = 01 s3 = 10 0 1 Notice that every code word is located on the leaves s4 = 110 s5 = 111 4.4
Kraft Inequality Theorem: There exists an instantaneous code for S where each symbol s S is encoded in radix r with length |s| if and only if Proof: () By induction on the height (maximal length path) of the decoding tree, max{|s|: s S}. For simplicity, pick r = 2 (the binary case). By IH, the leaves of T0, T1 satisfy the Kraft inequality. Basis: n = 1 Induction: n > 1 Prefixing one symbol at top of tree increases all the lengths by one, so 0 1 0 1 or 0,1 s1 s1 s2 T0 T1 <n <n Could use n = 0 here! 4.5
Induction: n > 1 Basis: n = 1 0 ≤ r 1 …… Same argument for radix r: 0 ≤ r1 …… T0 T≤r-1 s1 ………… s≤r at mostr at mostr subtrees IH so adding at most r of these together gives ≤ 1 Inequality in the binary case implies that not all internal nodes have degree 2, but if a node has degree 1, then clearly that edge can be removed by contraction. 4.5
Kraft Inequality () Construct a code via decoding trees. Number the symbols s1, …, sq so that l1 ≤ … ≤ lq and assume K ≤ 1. Greedy method: proceed left-to-right, systematically assigning leaves to code words, so that you never pass through or land on a previous one. The only way this method could fail is if it runs out of nodes (tree is over-full), but that would mean K > 1. Exs:r = 2 1, 3, 3, 3r= 2 1, 2, 3, 3r= 2 1, 2, 2, 3 0 1 0 1 0 1 1 0 1 0 1 0 0 1 0 0 1 ½ + ¼ + ¼ + ⅛ > 1 not used ½ + ⅛ + ⅛ + ⅛ < 1 ½ + ¼ + ⅛ + ⅛ = 1 4.5
Shortened Block Codes 0 s1 1 s1 0 0 s2 1 With exactly 2m symbols, we can form a set of code words each of length m:b1 …… bmbi {0,1}. This is a complete binary decoding tree of depth m. With < 2m symbols, we can chop off branches to get modified (shortened) block codes. 1 0 s2 0 1 0 s3 1 1 s3 0 s4 0 1 s4 1 s5 s5 Ex 2 Ex 1 4.6
McMillan Inequality Idea: Uniquely decodable codes satisfy the same bounds as instantaneous codes. Theorem: Suppose we have a uniquely decodable code in radix r of lengths of l1 ≤ … ≤ lq . Then their Kraft sum is ≤ 1. Use a multinomial expansion to see that Nk = the number of ways nl‘s can add up to k, which is the same as the number of different ways n symbols can form a coded message of length k. Because of uniqueness, this must be ≤ rk, the number of codewords. Conclusion: WLOG we can use only instantaneous codes. 4.7
Average code length Our goal is to minimize the average coded length. If pn > pm then ln ≤ lm. For if pm < pn with lm < ln, then interchanging the encodings for sm and snwe get So we can assume that if p1 ≥ … ≥ pq then l1 ≤ … ≤ lq, because if pi = pi+1 with li > li+1, we can just switch si and si+1. new old > 4.8
Start with S = {s1, …, sq} the source alphabet. And consider B = {0, 1} as our code alphabet (binary). First, observe that lq1 = lq, since the code is instantaneous, s<q cannot be a prefix of sq, so dropping the last symbol from sq (if lq> lq1) won’t hurt. Huffman algorithm: So, we can combine sq1 and sq into a “combo-symbol” (sq1+sq) with probability (pq1+pq) and get a code for the reduced alphabet. For q = 1, assign s1 = ε . For q > 1, let sq-1 = (sq-1+sq) 0 and sq = (sq-1+sq) 1 Example: N. B. the case for q = 1 does not produce a valid code. 4.8
Huffman Lavg We know l1≤ … ≤ lq ≥ trying to show Huffman is always of shortest average length Alternative L Example: p1 = 0.7; p2 = p3 = p4 = 0.1 Compare Lavg = 1.5 to log2 q = 2. Base Case: For q = 2, no shorter code exists. Induction Step: For q > 2 take anyinstantaneous code for s1, …, sq with minimal average length. 0 1 Assume p1 ≥ … ≥ pq s1 s2 4.8
total height = lq reduced code combined symbol sq1+sq 0 1 Claim that lq1 = lq = lq1, q+ 1 because So its reduced code will always satisfy: By IH, L′avg ≤ L′. But more importantly the reduced Huffman code shares the same properties so it also satisfies the same equation L′avg + (pq1 + pq) = Lavg, henceLavg ≤ L. 4.8
Code Extensions Take p1 = ⅔ and p2 = ⅓ Huffman code gives s1 = 0 s2 = 1 Lavg = 1 Square the symbol alphabet to get: S2 : s1,1 = s1s1; s1,2 = s1s2; s2,1 = s2s1; s2,2 = s2s2; p1,1 = 4⁄9p1,2 = 2⁄9p2,1 = 2⁄9p2,2 = 1⁄9 Apply Huffman to S2: s1,1 = 1; s1,2 = 01; s2,1 = 000; s2,2 = 001 But we are sending two symbols at a time! 4.10
Huffman Codes in radix r At each stage down, we merge the last (least probable) r states into 1, reducing the # of states by r 1. Since we end with one state, we must begin with no. of states 1 mod (r 1) . We pad out states with probability 0 to get this. Example: r = 4; k = 3 pads 4.11