1 / 30

Huffman Codes

Huffman Codes. Computing an Optimal Code for a Document. Objectives. You will be able to: Create an optimal code for an ASCII text file. Encode the text file using the optimal code and output the compressed text as a binary file.

shayla
Download Presentation

Huffman Codes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Huffman Codes Computing an Optimal Code for a Document

  2. Objectives You will be able to: • Create an optimal code for an ASCII text file. • Encode the text file using the optimal code and output the compressed text as a binary file. • Read the compressed binary file and reconstruct the original ASCII text. • Output the decoded message to a text file. • Encode and decode a large text file • Moby Dick

  3. Getting Started • Download program from last class. • http://www.cse.usf.edu/~turnerr/Data_Structures/Downloads/2011_04_13_Huffman_Codes_with_Binary_IO/ • File Huffman_Codes_with_Binary_IO.zip • A bit of cleanup • Improve the prompts as shown on the following slides. • Delete commented out sections in main.cpp • Remove output of sorted list in Make_Decode_Tree

  4. Modifications to Prompts main.cpp • In do_decode (line 29) //cout << "File name for input? "; cout << "File name for compressed input file? "; • In do_encode (line 89) //cout << "File name for output? "; cout << "File name for compressed output file? ";

  5. An Error on Circe • Binary_File.h, line 14 should be: static const size_t FIRST_BIT_POSITION = 8*sizeof(size_t); • int and size_t are the same size on 32 bit Windows systems. • Not on Circe. • Probably not on other 64 bit systems. • Other errors and warnings on Circe have fairly obvious fixes.

  6. Program Running

  7. Text Files for Testing • Download to a convenient directory: • Full text of Moby Dick • http://www.cse.usf.edu/~turnerr/Data_Structures/Downloads/Moby_Dick.txt • Abridged version • http://www.cse.usf.edu/~turnerr/Data_Structures/Downloads/Moby_Quick.txt

  8. Moby Dick (Abridged)

  9. Get Input from a File • Modify the Huffman Code program to get its input for encode from a text file rather than from the keyboard.

  10. main.cpp • Insert above do_encode: void get_text_input_file(string& input_filename, ifstream& infile) { string junk; while (true) { cout << "File name for text input? "; cin >> input_filename; getline(cin, junk); // Skip newline char infile.open(input_filename.c_str()); if (infile.good()) { break; } infile.clear(); cout << "Open failed for file " << input_filename << endl; cout << "Please try again\n"; } } http://www.cse.usf.edu/~turnerr/Data_Structures/Downloads/2011_04_18_Huffman_Code_for_Document/get_text_input_file.cpp.txt

  11. do_encode() Revised version that gets input from a file rather than from the keyboard: http://www.cse.usf.edu/~turnerr/Data_Structures/Downloads/2011_04_18_Huffman_Code_for_Document/do_encode.cpp.txt

  12. do_encode() void do_encode(void) { string msg; string output_filename; Binary_Output_File* outfile; string junk; string input_filename; ifstream infile; get_text_input_file(input_filename, infile); while (true) { cout << "\nFile name for compressed output file? "; cin >> output_filename; getline(cin,junk); // Skip newline char try { outfile =new Binary_Output_File(output_filename); break; } catch (const string& msg) { cout << msg << endl; } }

  13. do_encode() //cout << "\n\nEnter message to encode\n"; //getline(cin, msg); while (infile.good()) { char next_char; infile.get(next_char); string code = huffman_tree.Encode_Char(tolower(next_char)); if (code.size() == 0) { cout << endl << "Invalid character in input " << next_char << endl; continue; } outfile->Output(code); } infile.close(); cout << endl << endl; outfile->Close(); delete(outfile); cout << "File " << output_filename << " written\n"; }

  14. Program in Action

  15. Program continuing

  16. Some Issues • White space • newline characters lost • Punctuation • Capitalization • Let's build a code specifically for this document. • Include all characters. • Optimize weights for the document.

  17. Developing a Code for the Document • New version of build_huffman_tree • Read the input text file and count occurrences of each character. • Also total number of characters in the file • For each ASCII value that appears in the input text file • Compute relative frequency. • Add char and frequency to the Huffman tree.

  18. New build_huffman_tree() http://www.cse.usf.edu/~turnerr/Data_Structures/Downloads/2011_04_18_Huffman_Code_for_Document/build_huffman_tree.cpp.txt void build_huffman_tree(ifstream& infile) { int counts[128] = {0}; int total = 0; // Count characters in the input file. while (infile.good()) { char next_char; infile.get(next_char); assert (next_char > 0); assert (next_char <= 127); ++counts[next_char]; ++total; } infile.close(); infile.clear();

  19. New build_huffman_tree() for (int i = 0; i < 128; ++i) { if (counts[i] > 0) { huffman_tree.Add(i, (1.0*counts[i]) / total); } } }

  20. main.cpp • Add at top: #include <cassert> … string input_filename; ifstream infile; • Add to main() int main(void) { cout << "This is the Huffman code program \n"; get_text_input_file(input_filename, infile); build_huffman_tree(infile);

  21. do_encode() • We have to reopen the input file after reading it Build_Huffman_Tree. • No longer call get_text_input_file. • Comment out call to get_text_input_file near the top. • At line 104: //cout << "\n\nEnter message to encode\n"; //getline(cin, msg); infile.open(input_filename.c_str()); while (infile.good()) {

  22. do_encode() • At line 112 remove call to tolower() infile.open(input_filename.c_str()); while (infile.good()) { char next_char; infile.get(next_char); string code = huffman_tree.Encode_Char(tolower(next_char)); • We now can encode all characters.

  23. Program Running

  24. So far, so good! • The program seems to be working for a short file. • Let's try it on the full text. • You may not want to wait for the complete output!

  25. Output Decoded Message to a File Add above do_decode(): http://www.cse.usf.edu/~turnerr/Data_Structures/Downloads/2011_04_18_Huffman_Code_for_Document/get_text_output_file.cpp.txt void get_text_output_file(string& output_filename, ofstream& outfile) { string junk; while (true) { cout << "File name for text output? "; cin >> output_filename; getline(cin, junk); // Skip newline char outfile.open(output_filename.c_str()); if (outfile.good()) { break; } outfile.clear(); cout << "Open failed for file " << output_filename << endl; cout << "Please try again\n"; } }

  26. Output Decoded Message to a File • At end of do_decode original_message = huffman_tree.Decode_Msg(coded_message); //cout << "Original message: " << original_message << endl; //cout << endl << endl; string output_filename; ofstream outfile; get_text_output_file(output_filename, outfile); outfile << original_message; outfile.close(); cout << "File " << output_filename << " written"; cout << endl << endl; }

  27. Test on Full Text of Moby Dick

  28. Test on Full Text of Moby Dick

  29. On Circe (After some tweaking)

  30. Embedding the Code • In order for the compressed file to be useful, we have to store the code along with it. • Then we can read and decode the file at a later time. • Even on a different computer (with the same architecture) • In order to decode • First read the code. • Reconstitute the decode tree. • Then read and decode the message. Project 7

More Related