140 likes | 153 Views
Learn about Huffman's Algorithm for efficient data compression with binary trees. Decode binary strings uniquely using tree leaves. Improve codes by optimizing tree structure. Start with frequency weights, join nodes, and build the optimal code tree.
E N D
Math 221 Huffman Codes
Represent the Code with a Tree 1 0 0 1 1 0 0 0 1 0 0 1 1 a e i t sp nl s
Some Terminology • A tree is a collection of nodes where any path that ends at the same node it started from must intersect itself, i.e. it has no “closed circuits”. • A node with no edges coming out of it is a leaf. • A node connected to an above node is a child of the above node. • We will only consider binary trees, i.e. each node will have at most two children. • Convention: 0 means to to the left, 1 to the right.
Important If a code is represented by the leaves of a binary tree, then a binary string can be uniquely decoded!
Improving the Code • Notice that the newline does not have a sibling. • Thus, we can place it in its parent node and get a shorter code! • This shortens the number of bits needed to represent a newline, from three to two.
Huffman’s Algorithm • Every node is given its frequency as a weight. • Join the two nodes with lowest weight. • Now we have a tree. In this algorithm the weight of a tree is the sum of the weights of its leaves. • Now, at the nth stage, join the two trees with the lowest weight.
10 10 15 15 12 12 3 3 4 4 13 13 1 1 a a e e i i t t sp sp nl nl s s Our example We start with which becomes 4 T1
10 15 12 3 4 13 1 a e i t sp nl s which becomes 8 T2 4 T1
10 15 12 3 4 13 1 a e i t sp nl s which becomes 18 T3 8 T2 4 T1
25 T4 10 15 12 3 4 13 1 a e i t sp nl s which becomes 18 T3 8 T2 4 T1
25 T4 10 15 12 3 4 13 1 a e i t sp nl s which becomes 33 T5 18 T3 8 T2 4 T1
25 T4 10 15 12 3 4 13 1 a e i t sp nl s 58 which becomes T6 33 T5 18 T3 8 T2 4 And we are done! T1