540 likes | 555 Views
Data Structures Heaps. Phil Tayco Slide version 1.0 May 7, 2018. Heaps. Binary trees revisited again We’ve seen how binary trees can be used to sort data and store it dynamically
E N D
Data StructuresHeaps Phil Tayco Slide version 1.0 May 7, 2018
Heaps Binary trees revisited again We’ve seen how binary trees can be used to sort data and store it dynamically In a traditional linked list, searches perform at O(n), which we are able to improve to O(log n) like an array with a fairly balanced binary tree Binary trees can also be used to handle other types of data and order Perhaps this structure can be used to revisit handling other concepts we’ve discussed
Heaps Back to priority queues Recall with priority queues the tradeoffs with inserts and removes O(1) insert meant there was no sort order in the array or linked list so the remove is O(n) O(1) remove meant that element to remove was always at the top of the queue, but maintaining that during insert meant O(n) If we are to improve on this performance, the side that works at O(n) will at least need to go to O(log n) The challenge will be to see if we can keep the other side at O(1)
Heaps How about those tree structures? Tree structures, when balanced, lead to divide and conquer algorithms that suggest O(log n) performance With a traditional binary tree, highest values representing priority would use a max() function which is O(log n) which could be used to do a priority queue remove Inserting records performs at O(log n) – this means while O(n) remove improves to O(log n), O(1) insert degrades to O(log n) This also only applies if the tree is balanced – if the tree becomes imbalanced, both insert and remove work at O(n)
Heaps What about O(1) remove? 2-3-4 trees can help address the performance by targeting the balance, but remove is not a simple function and leads to O(n) to maintain balance O(1) remove means the highest priority value is in a predictable location – in a tree, that’s the root A standard tree’s root value on a balanced tree is the middle sorted value What leads to a divide and conquer solution is the tree’s structure – we can still use it, but change the definitions on how values are maintained
Heaps The root is the highest To target O(1) remove, the root node needs to have the highest value With nodes in a tree with parent and child nodes, this suggests an order where nodes higher in the tree have higher values than the children The challenge is when a node is removed, you have to delete the root node and set up the new root as the next element with the highest value In addition, in order to optimize on structure, balance must also be maintained The stage is set. Given these concepts, we can define a new tree like structure
Heaps Define the conditions A heap is a complete tree structure where any node has a value higher value than its immediate children A complete tree means levels are filled left to right and no new level starts until a level is full Nodes can have a variety of values that appear to not have any order to them, but as long as the rule of parent-child values is maintained, it is considered to be a heap Note: when studying heaps, first look at the structural theory before thinking about implementation
Heaps Example 1: Incorrect heap – not a complete level (nodes 6 and 7 should be children of 5) root 50 5 20 7 6
Heaps Example 2: Incorrect heap – Complete, but values are incorrect (17 and 7 should switch places) root 50 7 2 17 6
Heaps Example 3: Correct heap – complete with nodes following value rules root 10 9 4 1 2
Heaps Insert In order to maintain the level completeness rule of a heap, new nodes are added at the next available spot on a level If the level is full, the new node is added on a new level on the far left
Heaps Insert 15: root 10 9 4 1 15 2
Heaps Insert Example The addition of node 15 is in the correct location maintaining the completeness rule of the heap The values rules, however, are violated because node 4 is less than 15 Assuming the heap is already correctly formed, when a new node is added, the only concern with the values rules will be between the new node and its parent In those cases, the new node will either be correctly position or not – if it is correct, no further action is required When they are incorrect, the solution is to trade places
Heaps Swap 15 and 4 root 10 9 15 1 4 2
Heaps Just keep swapping If a swap occurs, then the next pair of nodes to investigate is the next parent-child pair In this example, that means checking 15 with its parent, 10 Here, we need to do another swap
Heaps Swap 15 and 10 root 15 9 10 1 4 2
Heaps Eventually we’ll stop As swapping occurs, the higher values travel up to where they should be in the heap This swapping up the tree is called “trickling up” Trickling up stops when either no further swaps with a parent are needed or the last swap is with the root node
Heaps Insert 12 – new node maintains completeness but requires swapping with 10 root 15 9 10 1 4 12 2
Heaps Insert 12 – no further swaps are needed since 12’s new position does not need to swap with 15 – insert is done root 15 9 12 1 4 10 2
Heaps Insert 14 – 14 starts a new level under 2 root 15 9 12 1 4 10 2 14
Heaps Insert 14 – 14 will swap with 2 and then swap with 9 to complete its trickle up root 15 14 12 1 4 10 9 2
Heaps Insert efficiency Assuming the next location for adding a new node is known (more on that when we discuss implementation), that process is O(1) Trickling up in the worst case is one path up from a leaf level to the root Because of the nature of the tree structure, that path in the worst case is O(log n) For a priority queue, this actually makes the performance go from O(1) to O(log n) – is this an acceptable price to pay for the improvement on remove?
Heaps Remove efficiency goals Before accepting O(1) insert degrading to O(log n), we need to understand the performance for remove The root node is clearly the node to remove since it will have the highest value– O(1) operation However, the heap must be rebuilt effectively so the overall target of O(1) remove performance will be a challenge The target then is O(log n) since O(n) is not an improvement on using a heap for a priority queue
Heaps Remove algorithm With the root node removed, the vacant spot needs to be replaced with the correct node The node with the next best potential value will be one of its children The completeness rule, however, must also be supported – this means the node to truly remove is the rightmost node on the lowest level From this perspective, we can first start with swapping the root node to be removed with the node in the position that is designed to be deleted
Heaps Remove – 15 is the node to eliminated and 2 is the rightmost node on the lowest level root 15 14 12 1 4 10 9 2
Heaps Remove – 15 and 2 swap places. 15 can now be removed and the heap is still complete root 2 14 12 1 4 10 9 15
Heaps Fix the heap Since the heap is complete in structure, all that remains is to fix the values In this example, 2 is out of position Its current children were children of node 15 and because of the heap rules, one of the nodes will have the next highest value of the entire heap The process now is the opposite of the trickle up, except in this case, we trickle down In trickling down, the child node that is the higher of the 2 children and greater than the parent swaps with it This process continues until either no swaps are needed or the lowest level is reached
Heaps Remove – 14 is greater than 12 so it swaps with 2 root 2 14 12 1 4 10 9
Heaps Remove – After 14 and 2 swap, the next level is checked. Here 9 will swap with 2 root 14 2 12 1 4 10 9
Heaps Remove – After 9 and 2 swap, the heap is correct root 14 9 12 1 4 10 2
Heaps Remove – On the next remove, 14 will swap with 10 and be deleted. 10 will then trickle down from root and just swap with 12 root 12 9 10 1 4 2
Heaps Remove efficiency Again assuming the last spot on the heap is known, the swap with root is O(1) Trickle down from there is similar to trickle up in efficiency and is therefore O(log n) This makes both insert and remove performing at O(log n) – with the completeness rules in place as well, the O(log n) is guaranteed (no risk of imbalance)
Heaps Heaps as priority queues The remove function is always taking the highest value so it performs well as a priority queue Insert into the heap still ensures the highest value is at the root so it works for a priority queue as well The efficiencies are O(log n) – earlier implementations have one operation O(1) and the other O(n) You get a significant improvement with one and a significant degradation with the other The question is how significant is the overall change?
Heaps Run the numbers Recall that O() is a measure of performance as the structure grows in size At smaller values, O(1), O(log n), and O(n) are relatively close Larger values show the significance with is what O() demonstrates. With 1000 elements: O(n) means a value relative to 1000 O(1) means a constant value as low as 1 O(log n) could still be a value as low as 10 The key observation is the rate increase – as n doubles in size, O(n) performance also doubles while O(log n) performance only increments
Heaps Heaps reign supreme With both operations are O(log n), the gap between O(1) and O(log n) does not widen as much as between O(log n) and O(n) as n gets larger Thus, the hit for losing O(1) to O(log n) is worth it to gain the benefit of O(n) to O(log n) The O(log n) performance is dependent on knowing the exact location of the rightmost lowest level node This is where the implementation comes into play As with stacks and queues, static or dynamic structures can be used
Heaps Heaps as arrays It turns out heaps are generally implemented as arrays because the direct access to any element simplifies the code and caters to the O(1) part of finding that rightmost, lowest node Direct element access also allows for easy swaps between a parent and a child node The question is now how to implement a tree structure using an array We can start by taking a heap tree and “flatten” it into an array
Heaps Heap as an array – note color and index numbers The parent and child positions on the heap show a mathematical relationship in the array indexes root 14 0 1 2 3 5 4 6 14 9 12 2 1 4 10 9 12 1 4 10 2
Heaps Do the math Looking at children and parents at each level: Node 14 = 0, Left child (9) = 1, Right child (12) = 2 Node 9 = 1, Left child (2) = 3, Right child (1) = 4 Node 12 = 2, Left child (4) = 5, Right child (10) = 6 If node 2 had children, their indexes would be 7 and 8 Generalizing it, every given node at index n has children at the following array locations: Left child = 2n + 1 Right child = 2n + 2 This is useful for the coding and can also show how insert and remove functions can still be handled as an array
Heaps Insert 11: 11 will go as a new node on level 4 which will be index 7 in the array root 14 0 1 2 3 5 7 4 6 11 14 9 12 2 1 4 10 9 12 1 4 10 2 11
Heaps Insert 11: Trickle up will swap with 2 and then with 9. The array follows suit root 14 0 1 2 3 5 7 4 6 2 14 11 12 9 1 4 10 11 12 1 4 10 9 2
Heaps Remove example: Given this tree, remove first starts with swapping 14 with 10. Again this is a simple swap with index 0 and last array element root 14 0 1 2 3 5 4 6 14 9 12 2 1 4 10 9 12 1 4 10 2
Heaps Now 10 will have to trickle down and swap with 12. 14 can be considered removed as well root 12 0 1 2 3 5 4 6 12 9 10 2 1 4 14 9 10 1 4 14 2
Heaps Array considerations The array example here suggests the size of the array matches the size of the heap In reality, the heap array size has to be treated as static and as such, allocating an appropriate maximum size is still necessary The size in use (minus 1) also gives you the index location of the rightmost, lowest level node Swaps with parent and child nodes is a simple array element swap using the math relationship formulas discussed The array heap implementation supports the O(log n) efficiency but the price is static memory
Heaps Did you also notice this? When an element is “removed”, it actually is placed in the last spot in the array While the current active size would be reduced with each removal, you can also see this as an opportunity to effectively place the highest value at the end of the array so long as it follows the heap rules What would repeated “removals” lead to doing to the array?
Heaps Here’s the array after the last remove example. 14 is supposed to be “removed” so 4 represents the rightmost, lowest level node root 12 0 1 2 3 5 4 6 12 9 10 2 1 4 14 9 10 1 4 14 2
Heaps The next remove puts 12 where 4 is and 4 trickles down to swap with 10. Node 1 is next lowest root 10 0 1 2 3 5 4 6 10 9 4 2 1 12 14 9 4 1 12 14 2
Heaps The remove process continues with 10, with 1 swapping with 9 and then 2 root 9 0 1 2 3 5 4 6 9 2 4 1 10 12 14 2 4 10 12 14 1
Heaps And again with 9 down and 1 swapping with 4 root 4 0 1 2 3 5 4 6 4 2 1 9 10 12 14 2 1 10 12 14 9
Heaps Heapsort By using the heap structure implemented as an array, simulating running the remove operation for all elements results in sorting the array! Remove performs at O(log n) and would be done n times – this leads to a consistent O(n log n) sorting performance Quicksort is O(n log n) but can degrade to O(n2) for arrays with nearly sorted data Mergesort is O(n log n) and is consistent, but requires double the memory space Heapsort works well but its drawback is that the array elements need to first follow a heap How do you turn any array into a heap?
Heaps Twice the fun The simplest way to see turning an array into a heap is to walk through each element and perform the insert function The insert is handled within the array using swaps so no need for a second array of equal size like mergesort Once all the inserts are done, the removes could be done to sort the array Since insert and remove each perform at O(log n) and the process requires going through the array twice, overall performance is O(2(n log n)) This is still category O(n log n), but in the long run is less effective than quicksort (as long as the data is randomly distributed)