260 likes | 319 Views
Data Structures. Week 6 : Assignment # 2 Problem http://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/. Requirement. Encode a message using Huffman's algorithm Use Min Heap as the priority queue dynamic allocation The input consists of stings A string consists of alphabets only
E N D
Data Structures Week 6: Assignment #2 Problem http://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/
Requirement • Encode a message using Huffman's algorithm • Use Min Heap as the priority queue • dynamic allocation • The input consists of stings • A string consists of alphabets only • Upper case and lower case letters are treated as different characters • stored in a text file • given in separate lines
Requirement – cont’ • Output • should be stored in a text file in the following format • Due date • 2001/5/23 24:00 Heap Traversal: [character or string]... Huffman Tree Traversal: [character or string]... character: frequency, code . . . the code for the message:
Encoding • Encode the message as a long bit string • assign a bit string code to each symbol of the alphabet • then, concatenate the individual codes of the symbols making up the message to produce an encoding for the message
Example#1 Symbol Code A 010 B 100 C 000 D 111 • ABACCDA • 010100010000000111010 • Three bits are used for each symbol • 21 bits are needed to encode the message • inefficient
Example#2 Symbol Code A 00 B 01 C 10 D 11 • ABACCDA • 00010010101100 • Two bits are used for each symbol • 14 bits are needed to encode the message
Example#3 • ABACCDA • Each of the letters B and D appears only once in the message • The letter A appears three times • The letter A assigned a shorter bit string than the letters B and D
Example#3 - cont’ Symbol Code A 0 B 110 C 10 D 111 • ABACCDA • 0110010101110 • Encoding of the message requires only 13 bits • more efficient
Variable-Length Code • If variable-length codes are used • the code for one symbol may not be a prefix of the code for another • Example • The code for a symbol x, c(x) • a prefix of the code of another symbol y, c(y) • When c(x) is encountered in a left-to-right scan • It is unclear whether c(x) represents the symbol x or whether it is the first part of c(y).
Optimal Encoding Scheme(1) Symbol Frequency A 3 B 1 C 2 D 1 • Find the two symbols that appear least frequently • These are B and D • Combine these two symbols into the single symbol BD • The frequency of this new symbol is the sum of the frequencies of its two symbols • The frequency of BD is 2
Optimal Encoding Scheme (2) Symbol Frequency A 3 C 2 BD 2 • Again choose the two symbols with smallest frequency • These are C and BD • Combine these two symbols into the single symbol CBD • The frequency of this new symbol is the sum of the frequencies of its two symbols • The frequency of CBD is 4
Optimal Encoding Scheme (3) Symbol Frequency A 3 CBD 4 • There are now only two symbols remaining • These are combined into the single symbol ACBD • The frequency of ACBD is 7 Symbol Frequency ACBD 7
Optimal Encoding Scheme (4) • ACBD (A and CBD) • assigned the codes 0 and 1 • CBD (C and BD) • assigned the codes 10 and 11 • BD (B and D) • assigned the codes 110 and 111
The Huffman’s Algorithm (1) D1 B1 C2 A3
The Huffman’s Algorithm (2) C2 A3 B1 D1
The Huffman’s Algorithm (3) C2 A3 BD2 BD2 B1 D1
The Huffman’s Algorithm (4) A3 BD2 C2 B1 D1
The Huffman’s Algorithm (5) A3 CBD4 CBD4 BD2 C2 B1 D1
The Huffman’s Algorithm (6) A3 CBD4 BD2 C2 B1 D1
The Huffman’s Algorithm (7) ACBD7 ACBD7 A3 CBD4 BD2 C2 B1 D1
The Huffman’s Algorithm (8) • Build a min heap which contains the nodes of all symbols with the frequency values as the keys • Delete two nodes from the heap, concatenate the two symbols, add their frequencies, and put the result back into the heap • Make the two nodes become the two children of the node of the concatenated symbol i.e) if s=s1 s2 is the symbol concatenated from s1 and s2, then s1 and s2 become the left child and right child of s • Continue steps 2 and 3 until priority queue is empty
The Huffman’s Algorithm (9) • Once the Huffman tree is constructed • the code of any symbol can be constructed by starting at the leaf representing that symbol • climbing up to the root • The code is initialized to null • each time that a left branch is climbed • 0 is appended to the beginning of the code • each time that a right branch is climbed • 1 is appended to the beginning of the code
The Huffman’s Algorithm (10) VAR position[i] : a pointer to the ith symbol n : the number of symbols /*none zero frequency */ frequency[i] : the relative frequency of the ith symbol code[i] : the code assigned to the ith symbol p, p1, p2: a pointer to Min heap's node or huffman tree's node Main Function { initialization; count the frequency of each symbol within the message; // construct a node for each symbol for(i=0; i < n; i++){ <p> = create <frequency[i]> a node; position[i] = p; //a pointer to the leaf containing the ith symbol insert <p> into Min heap ; }//end for
The Huffman’s Algorithm (11) while(Min heap contains more than one item){ <p1> = delete Min heap; <p2> = delete Min heap; //combine p1 and p2 as branches of a single tree <p> = create < info(p1)+info(p2) > a node; set <p1> to be left_child of huffman tree p; set <p2> to be right_child of huffman tree p; insert <p> into Min heap; }//end while
The Huffman’s Algorithm (12) //the tree is now constructed; use it to find codes <root> = delete Min heap; for(i=0; i<n; i++){ p = position[i]; code[i] = NULL; while(p!=root){ //travel up to the root if(is left<p>) code[i]= 0 followed by code[i]; else code[i]= 1 followed by code[i]; <p> = move <p> to father node; } // end while }//end for }//end main