380 likes | 472 Views
CSE331 – Lecture 21 – Heaps, Binary Files, and Bit Sets. 1. Main Index. Contents. Chapter 14 Complete Binary Tree Example Maximum and Minimum Heaps Heap Insertion pushHeap() popHeap() Adjusting popHeap() Heap Sort Heapifying. File Structure Direct File Access bitVector Class
E N D
CSE331 – Lecture 21 – Heaps, Binary Files, and Bit Sets 1 Main Index Contents • Chapter 14 • Complete Binary Tree Example • Maximum and Minimum Heaps • Heap Insertion • pushHeap() • popHeap() • Adjusting popHeap() • Heap Sort • Heapifying • File Structure • Direct File Access • bitVector Class • Lossless Compression • Lossy Compression • Example of Building Huffman Tree • Summary Slides
2 Main Index Contents Complete Binary Tree
Indexing & Other Heap Properties • Node[ j ] has ... • Node[ 2*j+1 ] as left child • Node[ 2*j+2 ] as right child • Node[ (j-1)/2 ] as parent • MAX Heap : parent > both children • MIN Heap: parent < both children
4 Main Index Contents Maximum and Minimum Heaps
5 Main Index Contents Heap Before and After Insertion of 50
pushHeap() – new value added at end & bubbles upward template <typename T, typename Compare> void pushHeap(vector<T>& v, int last, Compare comp) { // new item v[last-1] & v[0] to v[last-2] are in heap order int currentPos = last-1; // current node int parentPos = (currentPos-1)/2; // it’s parent T target = v[last-1]; // new value being pushed while (currentPos != 0) // not at root yet { if (comp(target,v[parentPos])) // out of order { // copy parent down & move up a level in tree v[currentPos] = v[parentPos]; currentPos = parentPos; parentPos = (currentPos-1)/2; } else // heap condition is ok. break break; } // copy target to correct location v[currentPos] = target; }
9 Main Index Contents Adjusting the semi-heap in popHeap()
PopHeap() template <typename T, typename Compare> void popHeap(vector<T>& v, int last, Compare comp) { T temp; // exchange first and last elements in heap temp = v[0]; v[0] = v[last-1]; v[last-1] = temp; // trickle down root over the range [0, last-1) adjustHeap(v, 0, last-1, comp); }
adjustHeap() – converts semi-heap to heap : range [first,last) template <typename T, typename Compare> void adjustHeap(vector<T>& v, int first, int last, Compare comp) { int currentPos = first; // start at v[first] T target = v[first]; // save value temporarily int childPos = 2 * currentPos + 1; // select left child while (childPos <= last-1) // more nodes to check { if ((childPos+1 <= last-1) && comp(v[childPos+1], v[childPos])) childPos = childPos + 1; // select right child if (comp(v[childPos],target)) { v[currentPos] = v[childPos]; // copy selected child up currentPos = childPos; // move down childPos = 2 * currentPos + 1; // left child } else // target belongs at currentPos break; } v[currentPos] = target; // copy target to currentPos }
Heap Sort • Two steps • (1) Build a heap (Heapify the vector) • (2) For each Index from v.sixe()-1 to 1 do • Swap v[0] and v[Index] : This places a single item in the sorted portion of the vector (following the heap) • Adjust semi-heap (vector within range [0,Index) ) • To sort In Order : Use MAX Heap • To sort In Reverse Oreder : Use MIN Heap
14 Main Index Contents Example of Heapifying a Vector (Cont…)
Making a Heap: Heapifying using makeHeap() template <typename T, typename Compare> void makeHeap(vector<T>& v, Compare comp) { int heapPos, lastPos; // compute the size of the heap and the index // of the last parent lastPos = v.size(); heapPos = (lastPos - 2)/2; // filter down every parent in order from last parent // down to root while (heapPos >= 0) { adjustHeap(v,heapPos, lastPos, comp); heapPos--; } }
Example of heap sort int arr[] = {50, 20, 75, 35, 25}; vector<int> v(arr, 5);
1 17 Main Index Contents Example of Implementing heap sort (Cont….) 2 1 1 2
heapSort() - in “d_sort.h” template <typename T, typename Compare> void heapSort (vector<T>& v, Compare comp) { // "heapify" the vector v makeHeap(v, comp); int i, n = v.size(); // iteration that determines elements v[n-1] ... v[1] for(i = n; i > 1;i--) { // call popHeap() to move next largest to v[n-1] popHeap(v, i, comp); } }
Program using heapSort() #include <iostream> #include <vector> #include <functional> // for greater<T>() operator #include "d_sort.h“ // for heapSort() #include "d_util.h“ // for writeVector() #include "d_random.h“ // for randomNumber using namespace std; int main() { vector<int> v; randomNumber rnd; int i; // create a vector with 15 random integers for (i = 0; i < 15; i++) v.push_back(rnd.random(100)); cout << "Sort in ascending order" << endl << " "; heapSort(v,greater<int>()); // also callable with less<int>() writeVector(v); cout << endl; return 0; }
20 Main Index Contents File Structure • A text file contains ASCII characters with a newline sequence separating lines. • A binary file consists of data objects that vary from a single character (byte) to more complex structures that include integers, floating point values, programmer-generated class objects, and arrays. • each data object in a file a record
Direct File Access • The functions seekg() and seekp() allow the application to reposition the current file pointers. • The seek functions take an offset argument that measures the number of bytes from the beginning (beg), ending (end), or current position (cur) in the file. • If a file is used for both input and output, use the seek functions tellg() and seekg().
22 Main Index Contents Implementing the bitVector Class • bitMask() returns an unsigned character value containing a 1 in the bit position representing i.
23 Main Index Contents Lossless Compression • data compression loses no information • original data can be recovered exactly from the compressed data. • normally apply to "discrete" data, such as text, word processing files, computer applications, and so forth
Lossy Compression • loses some information during compression and the data cannot be recovered exactly • shrink the data further than lossless compression techniques. • Sound files often use this type of compression,
25 Main Index Contents Huffman Tree – data codes are from edges (root to leaf)
Huffman Codes • Variable length binary codes • Shortest codes for most frequently occurring data value • Longest codes for least frequently occurring data values • Codes have unique prefixes • Codes are determined from binary tree built from priority queue of data value/frequency pairs • Tree is built bottom-up (always balanced)
27 Main Index Contents Building Huffman Tree
28 Main Index Contents Building Huffman Tree (Cont…)
30 Main Index Contents Building Huffman Tree (Cont…)
31 Main Index Contents Summary Slide 1 §- Heap - an array-based tree that has heap order - maximum heap: if v[i] is a parent, then v[i] v[2i+1] and v[i] v[2i+2] (a parent is its children) - root, v[0], is the maximum value in the vector - minimum heap: the parent is its children. - v[0] is the minimum value - Insertion: place the new value at the back of the heap and filtering it up the tree.
32 Main Index Contents Summary Slide 2 §- Heap (Cont…) - Deletion: exchanging its value with the back of the heap and then filtering the new root down the tree, which now has one less element. - Insert and delete running time: O(log2 n) - heapifying: apply the filter-down operation to the interior nodes, from the last interior node in the tree down to the root - running time: O(n) - The O(n log2 n) heapsort algorithm heapifies a vector and erases repeatedly from the heap, locating each deleted value in its final position.
33 Main Index Contents Summary Slide 3 §- Binary File - a sequence of 8-bit characters without the requirement that a character be printable and with no concern for a newline sequence that terminates lines - often organized as a sequence of records: record 0, record 1, record 2, ..., record n-1. - uses for both input and output, and the C++ file <fstream> contains the operations to support these types of files. - the open() function must use the attribute ios::binary
34 Main Index Contents Summary Slide 4 §- Binary File (Cont…) - For direct access to a file record, use the function seekg(), which moves the file pointer to a file record. - accepts an argument that specifies motion from the beginning of the file (ios::beg), from the current position of the file pointer (ios::cur), and from the end of the file (ios::end) - use read() function to inputs a sequence of bytes from the file into block of memory and write() function to output from a block of memory to a binary file
35 Main Index Contents Summary Slide 5 §- Bit Manipulation Operators - | (OR), & (AND), ^ (XOR), ~ (NOT), << (shift left), and >> (shift right) - use to perform operations on specific bits within a character or integer value. - The class, bitVector, use operator overloading - treat a sequence of bits as an array, with bit 0 the left-most bit of the sequence - bit(), set(), and clear() allow access to specific bits - the class has I/O operations for binary files and the stream operator << that outputs a bit vector as an ASCII sequence of 0 and 1 values.
36 Main Index Contents Summary Slide 6 §- File Compression Algorithm - encodes a file as sequence of characters that consume less disk space than the original file. - Two types of compression algorithms: 1) lossless compression – restores the original file. – approach: count the frequency of occurrence of each character in the file and assign a prefix bit code to each character - file size: the sum of the products of each bit-code length and the frequency of occurrence of the corresponding character.
37 Main Index Contents Summary Slide 7 §- File Compression Algorithm (Cont…) 2) lossy compression – loses some information during compression and the data cannot be recovered exactly – normally used with sound and video files - The Huffman compression algorithm builds optimal prefix codes by constructing a full tree with the most frequently occurring characters and shorter bit codes near the top of the tree. The less frequently occurring characters occur near the bottom of the tree and have longer bit codes.
38 Main Index Contents Summary Slide 8 §- File Compression Algorithm (Cont…) - If the file contains n distinct characters, the loop concludes after n-1 iterations, having built the Huffman Tree. - implementation requires the use of a heap, bit operations, and binary files - The use of the bitVector class simplifies the construction of the classes hCompress and hDecompress, which perform Huffman compression and decompression. - works better with textfiles; they tend to have fewer unique characters than binary files.