100 likes | 610 Views
Functional Programming Lecture 15 - Case Study: Huffman Codes. The Problem. Design a coding/decoding scheme and implement in Haskell. This requires: - an algorithm to encode a message, - an algorithm to decode a message, - an implementation. Fixed and Variable Length Codes.
E N D
The Problem Design a coding/decoding scheme and implement in Haskell. This requires: - an algorithm to encode a message, - an algorithm to decode a message, - an implementation.
Fixed and Variable Length Codes A fixed length code assigns the same number of bits to each code word. E.g. ASCII letter -> 7 bits (up to 128 code words) So to encode the string “at” we need 14 bits. A variable length code assigns a different number of bits to each code word, depending on the frequency of the code word. Frequent words are assigned short codes; infrequent words are assigned long codes. E.g. a “at” encoded by 011 0 for go left b t 1 for go right tree to encode and decode
Coding 0 1 a 01 b t a is encoded by 1 bit, 0 b is encoded by 2 bits, 10 t is encoded by 2 bits, 11 An important property of a Huffman code is that the codes are prefix codes: no code of a letter (code word) is the prefix of the code of another letter (code word). E.g. 0 is not a prefix of 10 or 11 10 is not a prefix of 0 or 11 11 is not a prefix of 0 or 10 So, “aa” is encoded by 00. “ba” is encoded by 100.
Decoding 0 1 a 01 b t The encoded message 1001111011is decoded as: 10 - b 0 - a 11 - t 11 - t 0 - a 11 - t In view of the frequency of t, this is probably not a good code. t should be encoded by 1 bit! ps. Morse code is a type of Huffman code.
A Haskell Implementation Types -- codes -- data Bit = L | R deriving (Eq, Show) type Hcode = [Bit] -- Huffman coding tree -- -- characters at leaf nodes, plus frequencies -- -- frequencies as well at internal nodes -- data Tree = Leaf Char Int | Node Int Tree Tree Assume that codes are kept in table (rather than read off a tree). -- table of codes -- type Table = [(Char, Hcode)]
Encoding -- encode a message according to code table -- -- encode each character and concatenate -- codeMessage :: Table -> [Char] -> Hcode codeMessage tbl = concat . map (lookupTable tbl) -- lookup the code for a character in code table -- lookupTable :: Table -> Char -> Hcode lookupTable [] c = error “lookupTable” lookupTable ((ch,code):tbl) c | ch == c = code | otherwise = lookupTable tbl c
Decoding -- decode a message according to code tree -- -- if at a leaf node, then character is decoded, -- -- start again at root -- -- if at an internal node, then follow sub-tree -- -- according to next code bit -- decode :: Tree -> Hcode -> [Char] decode tr = decodetree tr where decodetree (Node f t1 t2) (L:rest) = decodetree t1 rest decodetree (Node f t1 t2) (R:rest) = decodetree t2 rest decodetree (Leaf ch f) rest = ch:(decodetree tr rest)
Example codetree = Node 3 (Leaf ‘a’ 0) (Node 3 (Leaf ‘b’ 1) (Leaf ‘t’ 2)) -- assume ‘a’ is most frequent, denoted by smallest -- -- number -- message = [R,L,L,R,R,R,R,L,R,R] decode codetree message => decodetree Node 3 t1 (Node 3 ..) R: [L,L,R,R,R,R,L,R,R] => decodetree (Node 3 (Leaf ‘b’ 1) (Leaf ‘t’ 2)) L: [L,R,R,R,R,L,R,R] => decodetree ( Leaf ‘b’ 1) L:[R,R,R,R,L,R,R] => ‘b’ : decodetree Node 3 (Leaf ‘a’ 0) (Node 3 ..) L: [R,R,R,R,L,R,R] => ‘b’: decodetree (Leaf ‘a’ 0)) [R,R,R,R,L,R,R] => ‘b’ : ‘a’: decodetree Node 3 (Leaf ‘a’ 0) (Node 3 ..) [R,R,R,R,L,R,R]
We still have to make: the code tree the code table (Next lecture!)