1 / 41

Sorting Overview / Heapsort

Sorting Overview / Heapsort. Sort Routine Features / Definitions: Big O analysis is based on number of comparisons.

Download Presentation

Sorting Overview / Heapsort

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sorting Overview / Heapsort

  2. Sort Routine Features / Definitions: • Big O analysis is based on number of comparisons. • A record is a group of related information, such as information about a particular student in a student database. Think of record as a row in a table. In C++, the struct data type can be used to represent a record, and an array of structs can be used to represent a table. • In database terminology, each entry in a record is called a field. Think of a field as a column in a table. When sorting a list of records, the field on which the sort order is based is called the key. field record

  3. More Sort Routine Features / Definitions: • Adaptive means that the sort routine's performance is better if the key values to be sorted are nearly sorted to begin with. • Stable means that the sort routine maintains the original relative order of records with equal keys.

  4. Sorting on more than one key • The key on which the overall sort order is based is called the primary key. • For example, when names are sorted, they are usually sorted by last name. Then records with the same last name are typically sorted by first name. In this case, the last name is the primary key and the first name is the secondary key. primary key secondary key

  5. Sorting on more than one key continued • To sort records by last name, then by first name within last name, two complete sorts must be performed on the records. • Although admittedly counterintuitive, the first sort must be based on the secondary key, first name. Depending on the characteristics of the data (how many records, how many fields, are they close to being sorted already, etc?) any sorting algorithm will do. • After sorting on the secondary key, the table looks like this:

  6. Sorting on more than one key continued • After sorting on the secondary key, the next sort must be based on the primary key. In order to maintain the order obtained by the sort on the first names, a stable sort must be used this time. • In the table below, the first names have already been sorted, so we don't want the sort on the last names to ruin what we accomplished in the first sort. An unstable sort could possibly rearrange the first names, while a stable sort is guaranteed not to rearrange them.

  7. Sorting on more than one key continued • After sorting by last name using a stable sort, the table now looks like this • Remember, when sorting on multiple keys • One sort must be performed for each key. • The sorts must be performed in reverse order of the "importance" of the keys. • Each sort after the first must be stable.

  8. Sorting Algorithms We've Discussed

  9. Bubble Sort / Straight Exchange Sort • O(n2) • compares each key with every other key - each pass "bubbles up" the next smallest key to its proper position • requires n swaps / exchanges to "bubble" a key up n positions • stable - maintains original order of records with equal keys • simple modifications are adaptive - behavior approaches O(n) when original data is almost sorted Short Bubble algorithm in text "short circuits" outer loop as soon as all values are in order Bi-directional Bubble alternately bubbles a small value up then a large value down and also "short circuits" outer loop • generally, the worst performing algorithm of all, but the simplest to code!

  10. Straight Insertion Sort • O(n2) • places next key from unsorted portion of array into the desired position among the previously sorted keys • doesn't compare each key with every other key but constantly "shifts down" groups of previously sorted keys • stable • adaptive - behavior approaches O(n) when original data is almost sorted

  11. Straight Selection Sort • guaranteed O(n2) • locates next smallest key from unsorted portion of array and places it in its proper position • only O(n) swaps - unlike Bubble Sort, only one swap takes place when a key is moved no matter how far it is moved and sorted keys are never "shifted down" as they are by Insertion Sort • not stable • not adaptive, but since comparing is less labor intensive for the computer than data movement (swapping / shifting) the O(n) swaps make it a reasonable choice among the O(n2) sorts for small lists of large records

  12. Mergesort • guaranteed O(n log2n) • recursively divides array to be sorted in half, sorts each half and merges the two halves together • only algorithm not coded to work "in place" - requires an additional array - not good choice for large records if space is a problem (can be coded in place, but not typically) • stable • not adaptive

  13. Quicksort / Partition Exchange Sort • O(n log2n) • advanced exchange sort algorithm - selects one key (the pivot) to be placed in its proper final position and partitions the remaining keys so that all keys to the "left" of the pivot are less than or equal to the pivot and all keys to the "right" of the pivot are greater than the pivot, then recursively sorts each partition • based on idea that it is better to move keys large distances than to move them one position at a time (same theory behind Selection Sort) • not stable • not adaptive; in fact behavior approaches O(n2) with poor choice of split value, or pivot (for nearly sorted data, choosing "leftmost" value as pivot is a poor choice; choosing a random pivot has a high probability of yielding better results)

  14. Heapsort • guaranteed O(n log2n) • works by first forming heap out of existing array, then successively swapping top of heap with bottom of heap and using ReHeapDown to reform heap from top to (bottom - 1) until heap has been emptied • high overhead because of its 2 O(n log2n) phases, but good for large values of n • not stable • not adaptive

  15. To understand Heapsort: • Must understand how to view and process an array as a binary tree • Must understand what a heap is

  16. is equivalent to this binary tree: This array --> 3 7 5 14 1 9 12 15 6 13 2 4 16 8 11 10 • Note that for a "node" at array index N: • the left child is at index 2N + 1 • the right child is at index 2N + 2

  17. Heaps • A heap is a binary tree that meets a shape property: • A full tree is one in which all leaves are on the same level and every nonleaf node has two children. • A complete tree is full or at least full to the next-to-last level and the leaves on the last level are as far to the left as possible. A heap must be a complete tree. • If an "outline" is drawn around a full tree, it looks like: • If an "outline" is drawn around a complete tree, it looks like: • A heap also has an order property: • Each node contains a value greater than or equal to each of its children.

  18. This binary tree has the shape property, but not the order property 3 7 5 14 1 9 12 15 6 13 2 4 16 8 11 10 Note that each leaf satisfies the heap order property, so below the red dashed line we have a heap Because of the shape property, the first nonleaf (from the bottom of the heap / end of the array ) is located at index position (bottom /2), where bottom is the index position of the last element in the heap. An operation called ReHeapDown in the text begins with this first nonleaf node and repairs the heap from this point down, then moves back one node repeating the repair operation until the top of the heap is reached.

  19. A look at ReHeapDown: 3 7 5 14 1 9 12 15 6 13 2 4 16 8 11 10 Considering the 15 to be the root of a heap, ReHeapDown makes sure that the value 15 is larger than the values of its two children. If necessary, the root value is swapped with the value of the largest child In this case, no swap is necessary since 15 > 10 and there is no right child

  20. We now know we have a heap below the red dashed line . . . 3 7 5 14 1 9 12 15 6 13 2 4 16 8 11 10 . . . so ReHeapDown is now called with the 12 as the root of a heap. Since 12 > 11 and 12 > 8, no swap is necessary

  21. We now know we have a heap below the red dashed line . . . 3 7 5 14 1 9 12 15 6 13 2 4 16 8 11 10 . . . so ReHeapDown is now called with the 9 as the root of a heap. Since 9 > 4 but 16 > 9, the 9 and 16 must be swapped to repair the order property.

  22. We now know we have a heap below the red dashed line . . . 3 7 5 14 1 16 12 15 6 13 2 4 9 8 11 10 . . . so ReHeapDown is now called with the 1 as the root of a heap. Since 1 < 13 and 1 < 2, the 1 must be swapped with the 13 (the maximum child) to repair the order property.

  23. We now know we have a heap below the red dashed line . . . 3 7 5 14 13 16 12 15 6 1 2 4 9 8 11 10 . . . so ReHeapDown is now called with the 14 as the root of a heap. Since 14 > 6 but 14 < 15, the 14 and 15 must be swapped to repair the order property.

  24. But this time we need to keep going to make sure the swap didn't ruin the order property further down the heap. 3 7 5 15 13 16 12 14 6 1 2 4 9 8 11 10 Since 14 > 10 the order property is still okay.

  25. We now know we have a heap below the red dashed line . . . 3 7 5 15 13 16 12 14 6 1 2 4 9 8 11 10 . . . so ReHeapDown is now called with the 5 as the root of a heap. 5 must be swapped with its maximum child, 16

  26. Then we need to compare 5 with its two children . . . 3 7 16 15 13 5 12 14 6 1 2 4 9 8 11 10 . . . and we see that 5 must be swapped with 9

  27. We now know we have a heap below the red dashed line . . . 3 7 16 15 13 9 12 14 6 1 2 4 5 8 11 10 . . . so ReHeapDown is now called with the 7 as the root of a heap. 7 must be swapped with its maximum child, 15

  28. Then we need to compare 7 with its two children . . . 3 15 16 7 13 9 12 14 6 1 2 4 5 8 11 10 . . . and we see that 7 must be swapped with 14

  29. Then we need to swap 7 with its left child 10 3 15 16 14 13 9 12 7 6 1 2 4 5 8 11 10

  30. We now know we have a heap below the red dashed line . . . 3 15 16 14 13 9 12 10 6 1 2 4 5 8 11 7 . . . so ReHeapDown is now called with the 3 as the root of a heap. 3 must be swapped with its maximum child, 16

  31. Then we need to compare 3 with its two children . . . 16 15 3 14 13 9 12 10 6 1 2 4 5 8 11 7 . . . so ReHeapDown is now called with the 3 as the root of a heap. 3 must be swapped with its maximum child, 12

  32. Then we need to compare 3 with its two children . . . 16 15 12 14 13 9 3 10 6 1 2 4 5 8 11 7 . . . so ReHeapDown is now called with the 3 as the root of a heap. 3 must be swapped with its maximum child, 11

  33. We now finally have a complete heap! 16 15 12 14 13 9 11 10 6 1 2 4 5 8 3 7 AND we have just completed the first stage of Heapsort: building the heap! Note that we considered n/2 nodes and swapped each node at most log2n times for O (n log2n) behavior.

  34. NOW for the second stage of Heapsort! 16 15 12 14 13 9 11 10 6 1 2 4 5 8 3 7 root bottom Note that by swapping the root and bottom of the heap, we will place the largest value in the array in its proper position.

  35. Now note that below the red line the array is sorted, and the remaining heap portion of the array is above the red line 7 15 12 14 13 9 11 10 6 1 2 4 5 8 3 16 new bottom Also notice that the heap order property has been compromised at the root. So, what can we do??? ReHeapDown from the root, but be sure to stop at the new bottom!

  36. The array after ReHeapDown from the 7 at the root. 15 14 12 10 13 9 11 7 6 1 2 4 5 8 3 16 new bottom

  37. One more pass of the second phase 15 14 12 10 13 9 11 7 6 1 2 4 5 8 3 16 root new bottom Swap the root and the bottom

  38. Below the red line the array is sorted; above the red line is a heap in need of repair 3 14 12 10 13 9 11 7 6 1 2 4 5 8 15 16 root new bottom So ReHeapDown from the root and, again, be sure to stop at the new bottom!

  39. The array after ReHeapDown from the 3 at the root. 14 13 12 10 3 9 11 7 6 1 2 4 5 8 15 16 root new bottom

  40. After n such root / bottom swaps each followed by ReHeapDown (each with at most log2n swaps) . . . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 . . . we have completed the second phase of Heapsort, which is O(nlog2n), and we have a sorted array So, the first phase of Heapsort , O(nlog2n), plus the second phase of Heapsort, O(nlog2n), gives us 2*O(nlog2n), which is still O(nlog2n)

  41. Sorting Homework Due Tuesday, December 2 Chapter 10, pp. 669 – 673, problems 1 – 11, 23 – 27 Note: In 1 & 2 when the question asks you to show the array after the 4th iteration of an algorithm, show it after EACH of the first 4 iterations. Note: Assume ALL questions ask WHY? This means ALL answers require an explanation!!!

More Related