440 likes | 602 Views
A look at Sorting. Why Sorting?. It comes up all the time It can bottlenose an app in terms of either time or space There’s a lot of sorting algorithms – rich & interesting as a problem We can prove a lower bounds!. Analyzing sorts. Running time Space – is it in-place?
E N D
Why Sorting? • It comes up all the time • It can bottlenose an app in terms of either time or space • There’s a lot of sorting algorithms – rich & interesting as a problem • We can prove a lower bounds!
Analyzing sorts • Running time • Space – is it in-place? • Stable: do equal elements get shifted?
Insertion Sort • Main idea: Insert each element into its proper place in sorted order. • A[0] is sorted by itself • Then consider A[1]. Swap with first element if necessary. • Then consider A[2]. Put into place with first 2. • Etc… • After ith pass, first i elements are sorted. They are the same first i elements from the unsorted set.
Insertion Sort • O(n2) sort • Stable • Is especially bad if elements are in reverse sorted order • Already sorted order?
Selection Sort • Main idea: find the smallest, then the next smallest, etc. • When you find smallest, swap it with A[o] • After ith pass, A[0] – A[i] has the correct sorted elements.
Selection Sort • O(n2) • Stable • Does it perform better/worse for already sorted input? Reverse-sorted input?
Bubble Sort • Main idea: Go through each element, and if it’s out of order with its neighbor, swap them. • If you can go through entire array with no swaps, you’re done • After ith pass, at least last i positions are sorted and in final correct order.
Bubble Sort • O(n2) • Worst case time has a big constant • What if already sorted? • “In short, the bubble sort seems to have nothing to recommend it, except a catchy name and the fact that it leads to some interesting theoretical problems.” Don Knuth, The Art of Computer Programming: Vol. 3, Sorting and Searching
Merge Sort • Divide-and-conquer • If there is only 0 or 1 element, it’s done • Otherwise, recursively sort ½ the set • Then merge them together (O(n))
Merge Sort • O(nlgn) • No real best or worst case • Can be made to be in-place
Heap Sort • Main idea: Create a heap out of the elements. • Inserting each element takes time lgn • Delete biggest and re-heapify (also lgn)
Heap Sort • O(nlgn) • In-place • Quite a bit of shuffling memory
Quicksort • Main idea: • Find a Pivot element • Split array into elements less than pivot, equal to pivot, and greater than pivot, called partitioning • Recursively sort the pieces
Divide and Conquer 1. Pick a pivot element 2. Put everything <pivot on the left and everything > pivot on right. 3. Sort the left and right x x L R P x David Luebke 152/28/12
Quicksort Code Quicksort(A, p, r) if (p < r) q = Partition(A, p, r); Quicksort(A, p, q); Quicksort(A, q+1, r); David Luebke 162/28/12
Partition • Clearly, all the action takes place in the partition() function • Rearranges the subarray in place • End result: • Two subarrays • All values in first subarray all values in second • Returns the index of the “pivot” element separating the two subarrays • How do you suppose we implement this? David Luebke 172/28/12
In-Place Partitioning • Perform the partition using two indices to split S into L, E and G. • Repeat until l and r cross: • Scan l to the right until finding an element > p. • Scan r to the left until finding an element < p. • Swap elements r l (pivot = 6) 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 96 l r 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 9 6 David Luebke 182/28/12
Partitioning 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 9 6 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 9 6 3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 9 6 3 2 5 1 0 2 3 5 9 7 7 9 8 9 7 9 6 Stop when they cross! 3 2 5 1 0 2 3 5 9 7 7 9 8 9 7 9 6 Put the pivot in place 3 2 5 1 0 2 3 5 6 7 7 9 8 9 7 9 9 David Luebke 192/28/12
Partition • Partition(A[l..r], j): • pivot=A[r] • i = l; j=r+1 • repeat • repeat i = i+1 until A[i]>=pivot • repeat j = j-1 until A[j] <= pivot • swap(A[i], A[j]) • until i >= j • swap(A[i],A[j]) to undo the last one when i>=j • swap (A[j],A[r]) • Return j David Luebke 202/28/12
Analyzing Partition • For an array of n items, what is the most number of items that partition looks at? • Easier to think of the algorithm than to look at code • Partition: linear David Luebke 212/28/12
Analyzing Quicksort • Partition • 2 recursive calls – how many elements are in the recursive calls? • Depends on how the pivot works out! • Best case: pivot is in the perfect middle • Worst case: pivot is at an extreme David Luebke 222/28/12
Worst-case Running Time • The worst case for quick-sort occurs when the pivot is the unique minimum or maximum element • One partition has size n - 1 and the other has size 0 • T(n) = T(1) + T(n-1) + O(n) … David Luebke 232/28/12
Analyzing Quicksort • In the worst case: T(1) = (1) T(n) = T(n - 1) + (n) => T(n) = (n2) • In the best case:T(1) = (1) T(n) = 2T(n/2) + (n) => T(n) = O(nlgn) David Luebke 242/28/12
Analyzing Quicksort • What will be the worst case for the algorithm? • Partition is always unbalanced • What will be the best case for the algorithm? • Partition is perfectly balanced • Which is more likely? • The latter, by far, except... • Will any particular input elicit the worst case? • Yes: Already-sorted input David Luebke 252/28/12
Analyzing Quicksort: Average Case • Assuming random input, average-case running time is much closer to O(nlgn) than O(n2) • First, a more intuitive explanation/example: • Suppose that partition() always produces a 9-to-1 split. This looks quite unbalanced! • The recurrence is thus: T(n) = T(9n/10) + T(n/10) + n • Still comes out to O(nlgn) David Luebke 262/28/12
Analyzing Quicksort: Average Case • Intuitively, a real-life run of quicksort will produce a mix of “bad” and “good” splits • Randomly distributed among the recursion tree • Pretend for intuition that they alternate between best-case (n/2 : n/2) and worst-case (n-1 : 1) • What happens if we bad-split root node, then good-split the resulting size (n-1) node? David Luebke 272/28/12
Analyzing Quicksort: Average Case • Intuitively, a real-life run of quicksort will produce a mix of “bad” and “good” splits • Randomly distributed among the recursion tree • Pretend for intuition that they alternate between best-case (n/2 : n/2) and worst-case (n-1 : 1) • What happens if we bad-split root node, then good-split the resulting size (n-1) node? • We fail English David Luebke 282/28/12
Analyzing Quicksort: Average Case • Intuitively, a real-life run of quicksort will produce a mix of “bad” and “good” splits • Randomly distributed among the recursion tree • Pretend for intuition that they alternate between best-case (n/2 : n/2) and worst-case (n-1 : 1) • What happens if we bad-split root node, then good-split the resulting size (n-1) node? • We end up with three subarrays, size 1, (n-1)/2, (n-1)/2 • Combined cost of splits = n + n -1 = 2n -1 = O(n) • No worse than if we had good-split the root node! David Luebke 292/28/12
Analyzing Quicksort: Average Case • Intuitively, the O(n) cost of a bad split (or 2 or 3 bad splits) can be absorbed into the O(n) cost of each good split • Thus running time of alternating bad and good splits is still O(n lg n), with slightly higher constants • How can we be more rigorous? David Luebke 302/28/12
Analyzing Quicksort: Average Case • For simplicity, assume: • All inputs distinct (no repeats) • Slightly different partition() procedure • partition around a random element, which is not included in subarrays • all splits (0:n-1, 1:n-2, 2:n-3, … , n-1:0) equally likely • What is the probability of a particular split happening? • Answer: 1/n David Luebke 312/28/12
Analyzing Quicksort: Average Case • So partition generates splits (0:n-1, 1:n-2, 2:n-3, … , n-2:1, n-1:0) each with probability 1/n • If T(n) is the expected running time, • What is each term under the summation for? • What is the (n) term for? David Luebke 322/28/12
Analyzing Quicksort: Average Case • So… David Luebke 332/28/12
Analyzing Quicksort: Average Case • We can solve this recurrence using the dreaded substitution method • Guess the answer • Assume that the inductive hypothesis holds • Substitute it in for some value < n • Prove that it follows for n • Randomized David Luebke 342/28/12
Improving Quicksort • The real liability of quicksort is that it runs in O(n2) on already-sorted input • Book discusses two solutions: • Randomize the input array, OR • Pick a random pivot element • How will these solve the problem? • By insuring that no particular input can be chosen to make quicksort run in O(n2) time David Luebke 352/28/12
Quicksort • O(nlgn) • Worst case O(n2) • Fine-tune: • Random pivot point takes away worst case • If array portion nearly sorted call insert-sort • When array portion becomes small call insert-sort
How Fast Can We Sort? • First, an observation: all of the sorting algorithms so far are comparison sorts • The only operation used to gain ordering information about a sequence is the pairwise comparison of two elements • Comparisons sorts must do at least n comparisons (why?) • What do you think is the best comparison sort running time?
Decision Trees • Abstraction of any comparison sort. • Represents comparisons made by • a specific sorting algorithm • on inputs of a given size. • Abstracts away everything else: control and data movement. • We’re counting only comparisons. • Each node is a pair of elements being compared • Each edge is the result of the comparison (< or >=) • Leaf nodes are the sorted array
Insertion Sort 4 Elements as a Decision Tree Compare A[1] and A[2] Compare A[2] and A[3]
The Number of Leaves in a Decision Tree for Sorting Lemma: A Decision Tree for Sorting must have at least n! leaves.
Lower Bound For Comparison Sorting • Thm: Any decision tree that sorts n elements has • height (n lg n) • If we know this, then we know that comparison sorts are always (n lg n) • Consider a decision tree on n elements • We must have at least n! leaves • The max # of leaves of a tree of height h is 2h
Lower Bound For Comparison Sorting • So we have… n! 2h • Taking logarithms: lg (n!) h • Stirling’s approximation tells us: • Thus:
Lower Bound For Comparison Sorting • So we have • Thus the minimum height of a decision tree is (nlgn)
Lower Bound For Comparison Sorts • Thus the time to comparison sort n elements is (nlgn) • Corollary: Heapsort, Quicksortand Mergesort are asymptotically optimal comparison sorts • How can we do better than (nlgn)?