Introduction to Computer Science 2 Lecture 7: Extended binary trees

Introduction to Computer Science 2 Lecture 7: Extended binary trees. Prof. Neeraj Suri Brahim Ayari. In advance: Search in binary trees. Binary trees can be considered as decision trees. Each node represent a decision, the edges the different possibilities.

Introduction to Computer Science 2 Lecture 7: Extended binary trees

  1. Introduction to Computer Science 2 Lecture 7: Extended binary trees Prof. Neeraj SuriBrahim Ayari

  2. In advance: Search in binary trees • Binary trees can be considered as decision trees. • Each node represent a decision, the edges the different possibilities. • In such a tree search means to go from the root to a leaf. A < 2 FALSE TRUE B < 5 C > 7 FALSE TRUE TRUE FALSE X X2 X3 3X

  3. Extended binary trees • Replace NULL-pointers with special (external) nodes. • A binary tree, to which external nodes are added, is called extended binary tree. • The data can be stored either in the internal or the external nodes. • The length of the path to the node illustrates the cost of the search.

  4. External and internal path length • The cost of the search in extended binary trees depend on the following parameters: • External path length = The sum over all path lengths from the root to the external nodes Si (1  i  n+1): Extn = i = 1 ... n+1 depth( Si ) • Internal path length = The sum over all path lengths to the internal nodes Ki ( 1  i  n ): Intn = i = 1 ... n depth( Ki ) • Extn = Intn + 2n (Proof by induction) • Extended binary trees with a minimal external path length have a minimal internal path length too.

  5. Example • External path length Extn = 3 + 4 + 4 + 2 + 3 + 3 + 3 + 3 = 25 • Internal path length Intn = 0 + 1 + 1 + 2 + 2 + 2 + 3 = 11 • 25 = Extn = Intn + 2n = 11 + 14 = 25 0 n = 7 1 1 2 2 2 2 3 3 3 3 3 3 4 4

  6. Minimal and maximal length • For a given n, a balanced tree has the minimal internal path length. • Example: Within a complete tree with height h, the internal path length is (for n = 2h -1): Intn = i = 1 ... h i • 2i • Internal path length becomes maximum if the tree degenerates to a linear list: Intn = i = 1 ... n-1 i = n(n-1)/2 Example: h = 4, n = 15, Int = 34, Ext = 16•4 = 64 For comparison: List with n = 15 nodes has Int = 105, Ext = 105 + 30 = 135

  7. 25 15 8 15 3 25 8 3 Weighted binary trees • Often weights qi are assigned to the external nodes ( 1  i  n+1 ). • The weighted external path length is defined as Extw = i = 1 ... n+1 depth( Si )  qi • Within weighted binary trees the properties of minimal and maximal path lengths do not apply any more. • The determination of the minimal external path length is an important practical problem... Extw = 88 (less than 102 although linear list) Extw = 102

  8. Application example: optimal codes • To convert a text file efficiently to bit strings, there are two alternatives: • Fixed length coding: each character has the same number of bits (e.g., ASCII) • Variable length coding: some characters are represented using less bits than the others • Example for coding with fixed length: 3-bit code for alphabet A, B, C, D: • A = 001, B = 010, C = 011, D = 100 • Message: ABBAABCDADA is converted to • 001010010001001010011100001100001 (length 33 bits) • Using a 2-bit code the same message can be coded only with 22 bits. • For decoding the message, group each 3-bits (respectively 2bits) and use a table with the code and its matching character.

  9. Application example: optimal codes (2) • Idea: More frequently used characters are coded using less bits. • Message: ABBAABCDADA • Coding: 01010001011111001100 • Length: 20 Bit! • Variable length coding can reduce the memory space needed for storing the file. • How can this special coding be found and why is the decoding unique?

  10. Application example: optimal codes (3) • Representation of the frequencies and coding as a weighted binary tree. • First of all decoding: Given a bit string: • Use the successive bits, in order to traverse the tree starting from the root. • If you arrive to an external node, use the character stored there. Example: 010100010111... 1 0 5 A • 1. Bit = 0: external node, A • 2. Bit = 1, from the root to the right • 3. Bit 0, links, external node, B • 4. Bit = 1, from the root to the right • 5. Bit 1, right • ... 1 0 3 B 0 1 1 2 D C

  11. Correctness condition • Observation: Within variable length coding, the code of one character should not be a prefix of the code of any other character. • If a character is represented in form of an extended binary tree, then the uniqueness is guaranteed (only one character per external node). • If the frequency of the characters in the original text is taken as the weight of the external nodes, then a tree with minimal external path length will offer an optimal code. • How is a tree with minimal external path length generated?

  12. Huffman Code • Idea: Characters are weighted and sorted according to the frequency • This works as well independently from the text, e.g., in English (characters with relative weights): • A binary tree with minimal external path length is constructed as follows: • Each character is represented with an appropriate tree with its corresponding weight (only one external node). • The two trees having respectively the smallest weight are merged to a new tree. • The root of the new tree is marked with the sum of the weights of the original roots. • Continue until only one tree remains.

  13. Example 1: Huffman • Alphabet and frequency: • Step 1: (4, 5, 9, 10, 29) • new weight: 9 4+5 0 1 5 4 9+9 0 1 • Step 2: (9, 9, 10, 29) • new weight: 18 9 9 0 1 5 4

  14. Example 1: Huffman (2) • Step 3: (18, 10, 29)  (10, 18, 29) • new weight: 28 10+18 0 1 18 10 57 0 1 0 1 9 9 28 29 0 1 0 1 5 4 18 10 0 1 9 9 • Step 4: (28, 29) • finished! 0 1 5 4

  15. Resulting tree • Coding: • Extw = 112 • Using this coding, the code e.g., for: • TENNIS = 00101101101010100 • SET = 0100100 • NET = 011100 • Decoding as described before. 57 0 1 28 E 0 1 18 T 0 1 9 N 0 1 I S

  16. Some remarks • The resulting tree is not regular. • Regular trees are not always optimal. • Example: the best nearly complete tree has Extw = 123 • For the messageABBAABCDADA20 bits is optimal(see previousslides) 29 10 9 5 4

  17. Example 2: Huffman • Average number of bits without Huffman: 3 (because 23 = 8) • Average number of bits using Huffman code: • There are other “valid” solutions! But the average number of bits remains the same for all these solutions (equal to Huffman)

  18. Analysis /* Algorithm Huffmann */ for (int i = 1; i  n-1; i++) { p1 = smallest element in list L remove p1 from L p2 = smallest element in L remove p2 from L create node p add p1 und p2 as left and right subtrees to p weight p = weight p1 + weight p2 insert p into L } • Run time behavior depends in particular on the implementation of the list • Time required to find the node with the smallest weight • Time required to insert a new node • “Naive” implementations give O(n2), “smarter” result in O(n log2n)

  19. Optimality • Observation: The weight of a node K in the Huffman tree is equal to the external path length of the subtree having K as root. • Theorem: A Huffman tree is an extended binary tree with minimal external path length Extw. • Proof outline(per induction over n, the number of the characters in the alphabet): • The statement to prove is A(n) = “A Huffman tree with n nodes has minimal external path length Extw”. • Consider first n=2: Prove A(2) = “A Huffman tree with 2 nodes has minimal external path length”.

  20. Optimality (2) • Proof: • n = 2: Only two characters with weights q1 and q2 result in a tree with Extw = q1 + q2. This is minimal, because there are no other trees. • Induction hypothesis: For all i  k, A(i) is true. • To prove: A(k+1) is true. V T1 T2

  21. Optimality (3) • Proof: • Consider a Huffman tree T with k+1 nodes. This tree has a root V and two subtrees T1 und T2, which have respectively the weights q1 and q2. • Considering the construction method we can deduce, that For the weights qi of all internal nodes ni of T1 and T2: qi  min(q1, q2). • That’s why: for these weights qi: q1 + q2 > qi. So if V is replaced by any node in T1 or T2, the resulting tree will have a greater weight. • Replacing nodes within T1 and T2 will not make sense, because T1 and T2 are already optimal (both are trees with k nodes or less and the induction hypothesis hold for them). • So T is an optimal tree with k+1 nodes. q1 + q2 V q1 q2 T1 T2

  22. Huffman Code: Applications • Fax machine

  23. Huffman: Other applications • ZIP-Coding (at least similar technique) • In principle: most of coding techniques with data reduction (lossless compression) • NOT Huffman: lossy compression techniques like JPEG, MP3, MPEG, …

