1 / 66

Class 5: Sorting

Class 5: Sorting. Reading (note, depends on the book edition) Heap sort: Section 7 Bubble and Insert sort: Section 1 Mergesort: Section 1 Quicksort: Section 8. Sorting. Input: an array A of data records a key value in each data record (e.g., integers)

dpeel
Download Presentation

Class 5: Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class 5: Sorting Reading (note, depends on the book edition) Heap sort: Section 7 Bubble and Insert sort: Section 1 Mergesort: Section 1 Quicksort: Section 8

  2. Sorting • Input: • an array A of data records • a key value in each data record (e.g., integers) • a comparison function which imposes a consistent ordering on the keys. • Output: • reorganize the elements of A such that for any i and j, if i < j then A[i]  A[j]

  3. Why Sort? • Sorting algorithms are among the most frequently used algorithms in computer science • Allows binary search of an N-element array in O(log N) time • Allows O(1) time access to kth largest element in the array for any k • Allows easy detection of any duplicates

  4. Evaluation • Time • Space • Stability

  5. Time Complexity • How fast is the algorithm? • The definition of a sorted array A says that for any i<j, A[i] < A[j] • This means that you need to at least check on each element at the very minimum, I.e., at least O(N) • And you could end up checking each element against every other element, which is O(N2) • The big question is: How close to O(N) can you get?

  6. Space • How much space does the sorting algorithm require in order to sort the collection of items? • Is copying needed? O(n) additional space • In-place sorting – no copying – O(1) additional space • Somewhere in between for “temporary” or recursion stack, e.g. O(logn) space • External memory sorting – data so large that does not fit in memory

  7. Stability • Does it rearrange the order of input data records which have the same key value (duplicates)? • E.g. Phone book sorted by name. Now sort by county – is the list still sorted by name within each county? • Extremely important property for databases • A stable sorting algorithm is one which does not rearrange the order of duplicate keys

  8. Example 5a 8 3a 5b 4 3b 2 3c 5a 8 3a 5b 4 3b 2 3c 2 3a 3b 3c 4 5a 5b 8 2 3c 3b 3a 4 5a 5b 8 Unstable Sort Stable Sort

  9. BubbleSort • “Bubble” elements go to their proper place in the array by comparing elements i and i+1, and swapping if A[i] > A[i+1] • Bubble every element towards its correct position • last position has the largest element • then bubble every element except the last one towards its correct position • then repeat until done.

  10. Putthelargestelementinitsplace larger value? 2 3 8 8 1 2 3 8 7 9 10 12 23 18 15 16 17 14 swap 1 2 3 7 8 9 10 12 23 18 15 16 17 14 9 10 12 23 23 1 2 3 7 8 9 10 12 23 18 15 16 17 14 swap 1 2 3 7 8 9 10 12 18 23 15 16 17 14 swap 1 2 3 7 8 9 10 12 18 15 23 16 17 14 swap 1 2 3 7 8 9 10 12 18 15 16 23 17 14 swap 1 2 3 7 8 9 10 12 18 15 16 17 23 14 swap 1 2 3 7 8 9 10 12 18 15 16 17 14 23

  11. Put2ndlargestelementinitsplace larger value? 9 10 12 18 18 2 3 7 8 1 2 3 7 8 9 10 12 18 15 16 17 14 23 swap 1 2 3 7 8 9 10 12 15 18 16 17 14 23 swap 1 2 3 7 8 9 10 12 15 16 18 17 14 23 swap 1 2 3 7 8 9 10 12 15 16 17 18 14 23 swap 1 2 3 7 8 9 10 12 15 16 17 14 18 23 Two elements done, only n-2 more to go ...

  12. Bubblesort bubble(A, n): for i = 1 to n-1 do for j = 2 to n–i+1 do if A[j-1] > A[j] then t:= A[j-1]; A[j-1]:=A [j]; A [j]:=t; i=1: Largest element is placed at last position i=k: kth Largest element is placed at kthto last position

  13. Bubblesort(recursive) bubble(A, n) swapped=0; if (n=1) then return; for j= 2 to n if (A[j-1] > A[j]) then t:= A[j-1] A[j-1]:=A [j] A [j]:=t; swapped =1; if (swapped) then bubble(A,n-1);

  14. BubbleSort? • “Bubble” elements to their proper place in the array by comparing elements i and i+1, and swapping if A[i] > A[i+1] • We bubblize for i=1 to n (i.e, n times) • Each bubblization is a loop that makes n-i comparisons • This is O(n2)(why?) JustSayNo

  15. Insertion Sort • What if first k elements of array are already sorted? • 4, 7, 12,5, 19, 16 • We can insert next element into its proper position and get k+1 sorted elements • 4, 5, 7, 12, 19, 16 • Repeat till the first n are sorted.

  16. Insertion Sort InsertionSort(A[1..N]: integer array, N: integer) { i, j, temp: integer ; for i = 2 to N { temp := A[i]; j := i; while j > 1 and A[j-1] > temp { A[j] := A[j-1]; j := j–1; A[j] = temp; } } } • Is Insertion sort in place? Stable? Running time = ?

  17. Example 1 2 3 8 7 9 10 12 23 18 15 16 17 14 1 2 3 7 8 9 10 12 23 18 15 16 17 14 1 2 3 7 8 9 10 12 18 23 15 16 17 14 1 2 3 7 8 9 10 12 18 15 23 16 17 14 1 2 3 7 8 9 10 12 15 18 23 16 17 14 1 2 3 7 8 9 10 12 15 18 16 23 17 14 1 2 3 7 8 9 10 12 15 16 18 23 17 14

  18. Example 1 2 3 7 8 9 10 12 15 16 18 17 23 14 1 2 3 7 8 9 10 12 15 16 17 18 23 14 1 2 3 7 8 9 10 12 15 16 17 18 14 23 1 2 3 7 8 9 10 12 15 16 17 14 18 23 1 2 3 7 8 9 10 12 15 16 14 17 18 23 1 2 3 7 8 9 10 12 15 14 16 17 18 23 1 2 3 7 8 9 10 12 14 15 16 17 18 23

  19. Insertion Sort Characteristics • In place and Stable • Running time • Worst case is O(N2) • reverse order input • must copy every element every time • Good sorting algorithm for almost sorted data • Each item is close to where it belongs in sorted order.

  20. Inversions • Aninversion is a pair of elements in wrong order • i < j but A[i] > A[j] • By definition, a sorted array has no inversions • So you can think of sorting as the process of removing inversions in the order of the elements

  21. Inversions • A single value out of place can cause several inversions 1 2 3 8 7 9 10 12 23 14 15 16 17 18 value index 8 9 10 11 12 13 0 1 2 3 4 5 6 7

  22. Reverseorder • All values out of place (reverse order) causes numerous inversions 1 2 3 8 7 9 10 12 23 18 17 16 15 14 value index 8 9 10 11 12 13 0 1 2 3 4 5 6 7

  23. Inversions • Insertion sort and bubble sort swap adjacent elements and remove just one inversion at a time • Their running time is proportional to number of inversions in array • Given N distinct keys, the maximum possible number of inversions is

  24. InversionsandAdjacentSwapSorts • "Average" list will contain half the max number of inversions = • So the average running time of Insertion sort is (N2) • Any sorting algorithm that only swaps adjacent elements requires (N2) time because each swap removes only one inversion (lower bound).

  25. HeapSort • We use a Max-Heap • Root node = A[1] • Children of A[i] = A[2i], A[2i+1] • Keep track of current size N (number of nodes) 7 7 5 6 2 4 5 6 value index 1 2 3 4 5 6 7 8 2 4 N = 5

  26. UsingBinaryHeapsforSorting • Build a max-heap • Do NDeleteMax operations • Data comes out in largest to smallest order • Where can we put the elements as they are removed from the heap? 7 Build Max-heap 5 6 2 4 DeleteMax 6 5 4 2 7

  27. 1 Removal = 1 Addition • Every time we do a DeleteMax, the heap gets smaller by one node, and we have one more node to store • Store the data at the end of the heap array • Not "in the heap" but it is in the heap array 6 6 5 4 2 7 value 5 4 index 1 2 3 4 5 6 7 8 2 7 N = 4

  28. Repeated DeleteMax 5 5 2 4 6 7 2 4 1 2 3 4 5 6 7 8 6 7 N = 3 4 4 2 5 6 7 2 5 1 2 3 4 5 6 7 8 6 7 N = 2

  29. HeapSortisIn-place • After all the DeleteMaxs, the heap is gone but the array is full and is in sorted order 2 2 4 5 6 7 value 4 5 index 1 2 3 4 5 6 7 8 6 7 N = 0

  30. Heapsort: Analysis • Running time • time to build max-heap is O(N) • time for N DeleteMax operations is NO(log N) • total time is O(N log N) • Can also show that running time is (N log N) for some inputs, • so worst case is(N log N) • Average case running time is also O(N log N) • Heapsort is in-place but not stable (why?)

  31. Divide and Conquer A strategy for solving a problem by dividing it to sub-problems and combine the results to the original problem.

  32. Divide and Conquer • Divide the problem into a number of sub-problems (similar to the original problem but smaller); • Conquer the sub-problems by solving them recursively (if a sub-problem is small enough, just solve it in a straightforward manner. • Combine the solutions to the sub-problems into the solution for the original problem

  33. Merge Sort • Divide the n-element sequence into two subsequences of n/2-element each • Conquer: Sort the two subsequences recursively using merge sort • Combine: merge the two sorted subsequences to produce the sorted answer • Recursion base case: if the subsequence has only one element, then do nothing.

  34. Mergesort • Divide it in two at the midpoint • Conquer each side in turn (by recursively sorting) • Merge two halves together 8 2 9 4 5 3 1 6

  35. 2 8 4 2 8 9 9 4 5 1 3 3 1 5 6 6 MergesortExample 8 2 9 4 5 3 1 6 Divide 8 2 9 4 5 3 1 6 1 element 8 2 9 4 5 3 1 6 2 8 4 9 3 5 1 6 Merge 1 2 3 4 5 6 8 9

  36. 2 4 8 9 1 3 5 6 AuxiliaryArray • The merging requires auxiliary arrays. 2 4 8 9 1 3 5 6 copy copy Auxiliary arrays

  37. 2 4 8 9 1 3 5 6 Merging 1 2 3 4 5 6 8 9   Auxiliary arrays

  38. RecursiveMergesort Mergesort(A, p, r) if p < r then qp+r)/2 Mergesort(A,p,q) Mergesort(A,q+1,r) Merge(A,p,q,r) Time complexity ? To sort an array callMergesort(A, 1, length(A)).

  39. Merge(A,p,q,r) n1 q-p+1, n2 r-q L[1,..,n1 ] A [p,...,q] %copy A from p to q % into a new array L R[1,..,n2 ] A [q+1,...,r]%copy A from q+1 to r % into a new array R L(n1+1) , R(n2+1)  i1, j1 for k  p to r do if L(i)≤R(j) then A[k]  L[i] ii+1 else A[k]  R[j] jj+1 Time complexity ? O(n)

  40. Analyzing Recursive Programs • Express the running time T(n) as a recursive equation • Solve the recursive equation • For an upper-bound analysis, you can optionally simplify the equation to something larger • For a lower-bound analysis, you can optionally simplify the equation to something smaller

  41. MergesortAnalysis • Let T(N) be the running time for an array of N elements • Mergesort divides array in half and calls itself on the two halves. • Each recursive call takes T(N/2) • After returning, it merges both halves. • The merging takes O(N) • The base case: O(1)

  42. MergesortRecurrenceRelation • The recurrence relation for T(N) is: • T(1) = c (base case: 1 element array  constant time) • T(N) = 2T(N/2) + dN • Sorting N elements takes • the time to sort the left half • plus the time to sort the right half • plus an O(N) time to merge the two halves • T(N)= ?

  43. MergesortAnalysisUpperBound Merge takes o(n) Assuming n is a power of 2 n = 2k, k = log2 n

  44. Solving Recursive Equations by Induction • Repeated substitution and telescoping construct the solution • If you know the closed form solution, you can validate it by ordinary induction • For the induction, may want to increase n by a multiple (2n) rather than by n+1

  45. Iterative Mergesort Merge by 1 Merge by 2 Merge by 4 Merge by 8 Merge by 16 Need of a last copy

  46. PropertiesofMergesort • Not in-place • Requires an auxiliary array (O(n) extra space) • Stable • Make sure that left is sent to target on equal values. • Iterative Mergesort reduces copying.

  47. Quicksort • Uses a divide and conquer strategy • Does not require the O(N) extra space (as MergeSort)

  48. Quicksort: General idea • Partition array into left and right sub-arrays • Choose an element of the array, called pivot • the elements in left sub-array are all less than pivot • elements in right sub-array are all greater than pivot • Recursively sort left and right sub-arrays • Concatenate left and right sub-arrays in O(1) time

  49. Thesteps of QuickSort S select pivot value 81 31 57 43 13 75 92 0 26 65 S1 S2 partition S 0 31 75 43 65 13 81 92 26 57 QuickSort(S1) and QuickSort(S2) S1 S2 0 13 26 31 43 57 75 81 92 65 S S is sorted 0 13 26 31 43 57 65 75 81 92 [Weiss]

  50. Details, details • Implementing the actual partitioning • Picking the pivot • want a value that will cause |S1| and |S2| to be non-zero, and close to equal in size if possible • Dealing with cases where an element equals the pivot

More Related