180 likes | 192 Views
Learn about priority queues and Huffman trees as efficient data structures for accessing high-priority items and implementing lossless data compression algorithms. Understand their implementation with heaps and binary trees.
E N D
8.5 Heaps and Priority Queues 8.6 Huffman Trees Chapter 8 – Binary Search Tree
8.5 Heaps and Priority Queues Priority Queues The priority_queue Class Using a Heap as the Basis of a Priority Queue 8.5, pgs. 484-495
Priority Queues Heaps / Priority Queues (29) • The heap is used to implement a special kind of queue called a priority queue. • Sometimes a FIFO queue may not be the best way to implement a waiting line. • A priority queue is a data structure in which only the highest-priority item is accessible (as opposed to the first item entered). • In a print queue, sometimes it is more appropriate to print a short document that arrived after a very long document.
priority_queueClass Heaps / Priority Queues (29) • C++ provides a priority_queueclass that uses the same interface as the queue given in Chapter 6. • The differences are in the specification for the top and pop functions—these are defined to return a largest item in the queue rather than an oldest item.
Heap Priority Queue Heaps / Priority Queues (29) • In a priority queue, just like a heap, the largest item always is removed first. • Because heap insertion and removal is O(log n), a heap can be the basis of a very efficient implementation of a priority queue. • To remove an item from the priority queue, we take the first item from the vector; this is the largest item in the max heap. • We then remove the last item from the vector and put it into the first position of the vector, overwriting the value currently there. • Then following the algorithm for a max heap, we move this item down until it is larger than its children or it has no children.
8.6 Huffman Trees Case Study: Building a Custom Huffman Tree 8.6, pgs. 496-505
Huffman Trees Heaps / Priority Queues (29) • Huffman coding is a lossless data compression algorithm. • Assign variable-length codes to input characters based on the frequencies of corresponding characters. • The most frequent character gets the smallest code and the least frequent character gets the largest code. • A straight binary encoding of an alphabet assigns a unique 8-bit binary number to each symbol in the alphabet. • ASCII has 256 possible characters, the length of a message would be 8 x n where n is the total number of character. • The message “go cougars” requires 10 x 8 or 80 bits in ASCII. • The same string encoded using a good Huffman encoding schema would only require 40 bits. • A Huffman tree can be implemented using a binary tree and a priority_queue.
Huffman Trees Heaps / Priority Queues (29) • In 1973, Donald Knuth published the relative frequencies of the letters in English text. • The letter e occurs and average of 103 times every 1000 letters (or 10.3% of the letters are e's.) e = 010
A 1. What are the Huffman codes for the characters found in the Huffman binary tree to the right? 0 1 0 1 1 n 4 e 5 s 2 l 1 p 2 4 13 8 0 1 0 1 2. Using the above Huffman codes, what word is represented with the following 27-bit stream? sleeplessness 011010101111110100011101000 3. What is the resulting compression ratio of Huffman encoding verses 8-bit ASCII encoding? (13 x 8 = 104 bits) : 27 bits or 4 : 1
Steps to build Huffman Tree Heaps / Priority Queues (29) • Input: array of unique characters with their frequency of occurrences. • Create a leaf node for each unique character and build a min-heap of all leaf nodes. • Min-heap is used as a priority queue. • The value of frequency field is used to compare two nodes in min heap. • The least frequent character will be at root (min-heap). • Extract two nodes (minimum frequencies) from the min heap and create a new internal node whose frequency is the sum of the two extracted node frequencies. • Make the first extracted node the left child and the second node the right child. • Add the new internal node (with two children) to the min heap. • Repeat steps #2 and #3 until the heap contains only one node. • The tree is complete and the remaining node is the root node of a Huffman Binary Tree.
An assassin sins 1. Create a leaf node for each unique character and build a min-heap of all leaf nodes. 2 3 n 3 a 2 i 6 s 4 6 10 16 2. Extract two nodes, create new node w/frequency equal to the sum of the two node frequencies, and add back to min-heap. Repeat until 1 node. 0 1 3. What is the bit stream for "An assassin sins"? 0 1 0 1 000110000111100111110101100111010111 0 1 ␣
B aaabbc 1. Using the above character stream, create a leaf node for each unique character and build a min-heap of all leaf nodes. 3 a 2 b 1 c 3 6 2. Extract two nodes, create new node w/frequency equal to the sum of the two node frequencies, and add back to min-heap. Repeat until 1 node. 0 1 0 1 1100 3. What is the bit stream for "baa"? 4:24 4. What is the resulting compression ratio of Huffman to ASCII?
Balanced Binary Search Trees Heaps / Priority Queues (29) • The performance of a binary search tree is proportional to the height of the tree or the maximum number of nodes along a path from the root to a leaf. • A full binary tree of height h (assuming an empty tree is of height 0,) can hold 2h -1 items. • If a binary search tree is full and contains n items, the expected performance is O(log n). • However, if a binary tree is not full, the actual performance is worse than expected, possibly up to O(n). • Self-balancing trees require the heights of the right and left subtrees to be equal or nearly equal. • We'll examine these trees later as well as non-binary search trees: the B-tree and its specializations, the 2-3 and 2-3-4 trees, and the B+ tree.
Balanced Binary Search Trees Heaps / Priority Queues (29) • Balanced tree is a tree whose height is of order of log(number of elements in the tree). • Balancing a tree recursively applies to every subtree. That is, the tree is balanced if and only if: • The left and right subtrees' heights differ by at most one, AND • The left subtree is balanced, AND • The right subtree is balanced. 10 10 13 8 8 4 4 12 12 2 2 9 9 Balanced Not Balanced
Why Balance is Important Heaps / Priority Queues (29) • The binary tree to the right has search performance of O(n), not O(log n). • Balancing a tree recursively applies to every subtree. That is, the tree is balanced if and only if: • The left and right subtrees' heights differ by at most one, AND • The left subtree is balanced, AND • The right subtree is balanced. 1 10 10 13 8 8 2 3 4 5 6 7 8 9 10 4 4 12 12 2 2 9 9 Balanced Not Balanced
Rotation Heaps / Priority Queues (29) • We need an operation on a binary tree that changes the relative heights of left and right subtrees, but preserves the binary search tree property. • Tree rotationis an operation on a binary tree that changes the structure without interfering with the order of the elements. • A tree rotation moves one node up in the tree and one node down. • Rotation is used to change the shape of the tree, and in particular to decrease its height by moving smaller subtrees down and larger subtrees up, resulting in improved performance of many tree operations.