1 / 11

Homework #5

Homework #5. New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein. Homework #4 Review. Huffman coding is a variable-length binary encoding for text We implemented Huffman's optimal code finding algorithm (book 389-395)

Download Presentation

Homework #5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein

  2. Homework #4 Review • Huffman coding is a variable-length binary encoding for text • We implemented Huffman's optimal code finding algorithm (book 389-395) • Builds tree representing shortest possible code • Input for HW#4: letters, frequencies: • A 20 E 24 ... • Construct Huffman tree • Navigate tree to find code: • c: 0, a: 10, b: 11

  3. Homework #5 Overview • Given a document • Calculate letter frequencies • Construct Huffman code • Encode document • Calculate memory savings of Huffman binary encoding vs 8-bit ASCII • Correctly decode document • We can use Huffman code building algorithm from HW#4 • So we will keep HuffmanTree and HuffmanNode

  4. Organization • The new code for this assignment should go into HuffmanConverter.java • The filename of file to encode is passed as a parameter on the command line • So if my file is foo.txt, I should be able to run • java HuffmanConverter foo.txt • Then foo.txt show up in args[0] • If you use an IDE, specify command-line options through the menus • Test inputs and outputs linked from assignment page (2007 version)

  5. HuffmanConverter Instance Vars • String contents - stores file to process • Lines are separated by '\n' - line break character • e.g., twoLines = line1 + '\n' + line2; • HuffmanTree huffmanTree - output of HW4 • int count[] - frequencies in input file • Indexed on ASCII value of characters, e.g., count[(int)'a'] is frequency of 'a' • String code[] - binary string per character • Also indexed on ASCII value, e.g., code[(int)'a'] == "10001"

  6. To Implement • readContents() - reads in a file and stores in String contents • recordFrequencies() - process file stored in contents and store frequencies in count[] • frequenciesToTree() - use HW4 code to produce Huffman tree • treeToCode() - slight modification of HW4: traverse Huffman tree and populate code[] • encodeMessage() - use code[] to encode • decodeMessage() - use inverse of code[]

  7. Implementation Notes • readContents() can use Scanner • Read a line at a time, and append to contents inserting '\n' to separate lines • recordFrequencies(): iterate over contents one character at a time • frequenciesToTree() • Very similar to main() method of HW4  • Create a BinaryHeap object • For every non-zero-count letter, create a HuffmanNode object, insert into heap • Then run Huffman algorithm

  8. Implementation Notes, Cont'd • treeToCode() • Similar to printCode() of HW4 • Instead of printing code, store in code[] • encodeMessage() • For each character of contents, look up its binary string in code[], append

  9. Implementation Notes, Cont'd • decodeMessage() • Need to implement inverse mapping of code[]: binary strings to characters • Several possible implementations • Traverse Huffman tree as you read binary string, output character when you reach a leaf • Build HashMap mapping strings to ASCII values of characters

  10. HashMap • An array maps integers to Objects • e.g., String args[]: args[i] returns ith String • A HashMap maps Objects to Objects • Access with put() and get(), e.g., • HashMap ids = new HashMap(); • ids.put("Alice", 123456789); • ids.put("Ben", 321654987); • int id = (Integer) ids.get("Alice");  • // id gets 123456789 • For decode, map bit Strings to characters

  11. Homework #5 Tips • Keep checking intermediate results • Make use of sample outputs here • Print out intermediate results! • You might need special cases for newline ('\n') • Your encoding might differ from the examples • Depends on the BinaryHeap implementation • Same-frequency items are returned in arbitrary order (e.g., in love_poem_58, 'N', '-', '.', 'W', and 'p' all have frequency one) • However, Huffman encoding length must match! • Guaranteed to be shortest-length encoding

More Related