1 / 26

Data Structures

Data Structures. Week 6 : Assignment # 2 Problem http://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/. Requirement. Encode a message using Huffman's algorithm Use Min Heap as the priority queue dynamic allocation The input consists of stings A string consists of alphabets only

aletta
Download Presentation

Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Structures Week 6: Assignment #2 Problem http://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/

  2. Requirement • Encode a message using Huffman's algorithm • Use Min Heap as the priority queue • dynamic allocation • The input consists of stings • A string consists of alphabets only • Upper case and lower case letters are treated as different characters • stored in a text file • given in separate lines

  3. Requirement – cont’ • Output • should be stored in a text file in the following format • Due date • 2001/5/23 24:00 Heap Traversal: [character or string]... Huffman Tree Traversal: [character or string]... character: frequency, code . . . the code for the message:

  4. Encoding • Encode the message as a long bit string • assign a bit string code to each symbol of the alphabet • then, concatenate the individual codes of the symbols making up the message to produce an encoding for the message

  5. Example#1 Symbol Code A 010 B 100 C 000 D 111 • ABACCDA • 010100010000000111010 • Three bits are used for each symbol • 21 bits are needed to encode the message • inefficient

  6. Example#2 Symbol Code A 00 B 01 C 10 D 11 • ABACCDA • 00010010101100 • Two bits are used for each symbol • 14 bits are needed to encode the message

  7. Example#3 • ABACCDA • Each of the letters B and D appears only once in the message • The letter A appears three times • The letter A assigned a shorter bit string than the letters B and D

  8. Example#3 - cont’ Symbol Code A 0 B 110 C 10 D 111 • ABACCDA • 0110010101110 • Encoding of the message requires only 13 bits • more efficient

  9. Variable-Length Code • If variable-length codes are used • the code for one symbol may not be a prefix of the code for another • Example • The code for a symbol x, c(x) • a prefix of the code of another symbol y, c(y) • When c(x) is encountered in a left-to-right scan • It is unclear whether c(x) represents the symbol x or whether it is the first part of c(y).

  10. Optimal Encoding Scheme(1) Symbol Frequency A 3 B 1 C 2 D 1 • Find the two symbols that appear least frequently • These are B and D • Combine these two symbols into the single symbol BD • The frequency of this new symbol is the sum of the frequencies of its two symbols • The frequency of BD is 2

  11. Optimal Encoding Scheme (2) Symbol Frequency A 3 C 2 BD 2 • Again choose the two symbols with smallest frequency • These are C and BD • Combine these two symbols into the single symbol CBD • The frequency of this new symbol is the sum of the frequencies of its two symbols • The frequency of CBD is 4

  12. Optimal Encoding Scheme (3) Symbol Frequency A 3 CBD 4 • There are now only two symbols remaining • These are combined into the single symbol ACBD • The frequency of ACBD is 7 Symbol Frequency ACBD 7

  13. Optimal Encoding Scheme (4) • ACBD (A and CBD) • assigned the codes 0 and 1 • CBD (C and BD) • assigned the codes 10 and 11 • BD (B and D) • assigned the codes 110 and 111

  14. The Huffman’s Algorithm (1) D1 B1 C2 A3

  15. The Huffman’s Algorithm (2) C2 A3 B1 D1

  16. The Huffman’s Algorithm (3) C2 A3 BD2 BD2 B1 D1

  17. The Huffman’s Algorithm (4) A3 BD2 C2 B1 D1

  18. The Huffman’s Algorithm (5) A3 CBD4 CBD4 BD2 C2 B1 D1

  19. The Huffman’s Algorithm (6) A3 CBD4 BD2 C2 B1 D1

  20. The Huffman’s Algorithm (7) ACBD7 ACBD7 A3 CBD4 BD2 C2 B1 D1

  21. The Huffman’s Algorithm (8) • Build a min heap which contains the nodes of all symbols with the frequency values as the keys • Delete two nodes from the heap, concatenate the two symbols, add their frequencies, and put the result back into the heap • Make the two nodes become the two children of the node of the concatenated symbol i.e) if s=s1 s2 is the symbol concatenated from s1 and s2, then s1 and s2 become the left child and right child of s • Continue steps 2 and 3 until priority queue is empty

  22. The Huffman’s Algorithm (9) • Once the Huffman tree is constructed • the code of any symbol can be constructed by starting at the leaf representing that symbol • climbing up to the root • The code is initialized to null • each time that a left branch is climbed • 0 is appended to the beginning of the code • each time that a right branch is climbed • 1 is appended to the beginning of the code

  23. The Huffman’s Algorithm (10) VAR position[i] : a pointer to the ith symbol n : the number of symbols /*none zero frequency */ frequency[i] : the relative frequency of the ith symbol code[i] : the code assigned to the ith symbol p, p1, p2: a pointer to Min heap's node or huffman tree's node Main Function { initialization; count the frequency of each symbol within the message; // construct a node for each symbol for(i=0; i < n; i++){ <p> = create <frequency[i]> a node; position[i] = p; //a pointer to the leaf containing the ith symbol insert <p> into Min heap ; }//end for

  24. The Huffman’s Algorithm (11) while(Min heap contains more than one item){ <p1> = delete Min heap; <p2> = delete Min heap; //combine p1 and p2 as branches of a single tree <p> = create < info(p1)+info(p2) > a node; set <p1> to be left_child of huffman tree p; set <p2> to be right_child of huffman tree p; insert <p> into Min heap; }//end while

  25. The Huffman’s Algorithm (12) //the tree is now constructed; use it to find codes <root> = delete Min heap; for(i=0; i<n; i++){ p = position[i]; code[i] = NULL; while(p!=root){ //travel up to the root if(is left<p>) code[i]= 0 followed by code[i]; else code[i]= 1 followed by code[i]; <p> = move <p> to father node; } // end while }//end for }//end main

More Related