1.44k likes | 1.63k Views
CSE 3101. Prof. Andy Mirzaian. Sorting & Selection. STUDY MATERIAL:. [CLRS] chapters 6, 7, 8, 9 Lecture Notes 5, 6. TOPICS. The Sorting Problem Some general facts QuickSort HeapSort, Heaps, Priority Queues Sorting Lower Bound Special Purpose Sorting Algorithms
E N D
CSE 3101 Prof. Andy Mirzaian Sorting & Selection
STUDY MATERIAL: • [CLRS] chapters 6, 7, 8, 9 • Lecture Notes5, 6
TOPICS • The Sorting Problem • Some general facts • QuickSort • HeapSort, Heaps, Priority Queues • Sorting Lower Bound • Special Purpose Sorting Algorithms • The Selection Problem • Lower Bound Techniques • Prune-&-Search
The Sorting Problem INPUT: A sequence A= a1 , a2 ,··· , an of n arbitrary numbers. OUTPUT: A permutation (reordering) ap(1) , ap(2) ,··· , ap(n) of the input sequence, such that ap(1) ap(2) ··· ap(n) . • Two elementary operations: • Comparison between a pair of items aiandaj with =, <, >, or any logical combination thereof. • Exchange: swapping the positions of a pair of items aiandaj. Definition:An inversion is a pair of numbers ai , aj in the input, such that i < j but ai > aj (i.e., the pair is out of order).I(A)= the total # of inversions in sequence A.In general: 0 I(A) n(n-1)/2. Example: A = 4, 9, 4, 3, 6, 8, 2, 5.I(A) = 14. a2, a7 = 9, 2 is one of the inversions in A.
Some Basic Facts • Swapping an adjacent pair of items will change the inversion count by +1, 0, or –1. • Any sorting algorithm that (effectively) exchanges only adjacent pairs of items is doomed to take at least W(n2) steps in the worst case. BubbleSort, InsertionSort, SelectionSort are in this category.MergeSort, QuickSort, HeapSort are not. • InsertionSort takes Q(n + I) time on every input, where n = # input items, and I = # inversions in the input. Why? This makes InsertionSort a suitable choice when the input is almost sorted (low I).
QUICKSORT [C.A.R. Hoare, 1962] History:Hoare lived in Moscow for a period of time; first as part of the U.K. Royal Navy studying modern Russian military; then as a visiting student at Moscow State University; and later on, worked for the National Physical Lab stationed in Moscow. He collaborated with the Automatic Machine Translation of Russian to English Project Group. Dictionaries were on a long magnetic tape in alphabetical order. So they would first sort the words in a sentence, then in one pass would compare it with the magnetic tape. … For the sorting part, he first thought of BubbleSort, but soon realized it was too slow.QuickSort was the 2nd algorithm he says he thought of.
Sorting Student Homework Papers The way I sort student homework papers by name: • I first partition them into a small number of piles by initials, e.g., Pile 1: A – F Pile 2: G – L Pile 3: M – S Pile 4: T – Z • Then I sort each pile separately, possibly first partitioning them further into more refined groups, e.g., there are many names with the same initial. • Then I reassemble the sorted piles in order.
AlgorithmQuickSort(S) Pre-Cond: input S is a finite sequence of arbitrary numbers Post-Cond: output is a permutation of S in sorted order if|S| < 2 then return p a random element in S § pivot item, why random?3-Partition S into: § we already discussed this S< { x S : x < p } S= { x S : x = p } § O( |S| ) time S> { x S : x > p } S’< QuickSort(S<) § Exercise Question: S’> QuickSort(S>) § which recursive call first?! return S’< , S= , S’> end S< : x < p S= : x = p S> : x > p Randomized QuickSort T(|S|) = T(| S<|) + T(| S>|) + Q( |S| ), T(n) = Q(1), for n=0,1.
n S< : x < p p S> : x > p i n – i –1 QuickSort Running Time WLOG Assume: | S=| = 1. If it’s larger, it can only help! T(n) = T(i) + T(n-i-1) +Q(n), T(n) =Q(1), for n=0,1. Worst-Case: T(n) = maxi { T(i) + T(n-i-1) +Q(n) : i = 0 .. n –1 } = T(n – 1) + T(0) + Q(n) = T(n – 1) + Q(n) = Q(n2) This occurs if at all recursion levels the selected pivot is (near) the extreme; the largest or the smallest!
n S< : x < p p S> : x > p i n – i –1 QuickSort Running Time WLOG Assume: | S=| = 1. If it’s larger, it can only help! T(n) = T(i) + T(n-i-1) +Q(n), T(n) =Q(1), for n=0,1. Best-Case: T(n) = mini { T(i) + T(n-i-1) +Q(n) : i = 0 .. n –1 } = T(n/2) + T(n/2) + Q(n) = 2T(n/2) + Q(n) = Q(n log n) This occurs if at all recursion levels the selected pivot is (near) the median element!
n S< : x < p p S> : x > p i n – i –1 QuickSort Running Time WLOG Assume: | S=| = 1. If it’s larger, it can only help! T(n) = T(i) + T(n-i-1) +Q(n), T(n) =Q(1), for n=0,1. Expected-Case: T(n) = avei { T(i) + T(n-i-1) +Q(n) : i = 0 .. n –1 } (full history recurrence already discussed) = Q(n log n)
1. Multiply across by n (so that we can subsequently cancel out the summation): 2. Substitute n-1 for n: 3. Subtract (2) from (1): 4. Rearrange: 5. Divide by n(n+1) to make LHS & RHS look alike: 6. Rename: 7. Simplified recurrence: 8. Iteration: 9. Finally: Full History Recurrence:QuickSort nth Harmonic number
n S< : x < p p S> : x > p i n – i –1 Why Randomize? Worst-Case: T(n) = Q(n2) Expected-Case: T(n) = Q(n log n) Best-Case: T(n) = Q(n log n) • Randomization nullifies adverse effect of badly arranged input permutation. • Algorithm sees the input as a random permutation. • Probability that almost all random choices are the worst is nearly 1/n! (extremely low). • FACT: Randomized QuickSort will take O(n log n) time with probability very close to 1 on every input.
HEAPSORT,Heaps &Priority Queues [J.W.J. Williams, 1964]
91 1 84 74 2 3 73 81 66 54 4 5 6 7 8 48 9 62 77 34 53 61 41 29 10 13 11 12 14 15 36 23 51 27 44 69 20 27 47 33 59 46 16 17 18 19 20 21 22 23 24 25 26 27 Binary Heap • A = a binary tree with one key per node (duplicate keys allowed). • MaxHeap Order: A satisfies the following partial order:for every node x root[A] : key[x] key[parent(x)]. • Full tree node allocation scheme: nodes of A are allocated in increasing order of level, and left-to-right within the same level. • This allows array implementation, where array indices simulate tree pointers.
size[A] 1 A = currently unused max size 1 A[ t/2 ] parent node A[t] h log n A[2t] A[2t+1] left child right child n = size[A] Array as Binary Heap
A[1] L R 1 94 2 3 82 x 92 5 4 6 7 74 76 68 y 88 68 48 74 18 56 8 9 10 11 12 Some MAX Heap Properties • Root A[1] contains the maximum item. • Every root-to-leaf path appears in non-increasing order. • Every subtree is also max heap ordered. [Recursive structure] • The key at any node is the largest among all its descendents (inclusive). • (x,y) AncestorOf : A[x] A[y], where AncestorOf = { (x,y) : node x is ancestor of node y }.
1 t UpHeap • A[1..n] = a max-heap. • Suddenly, item A[t] increases in value. • Now A is a “t upward corrupted heap”:(x,y) AncestorOf : y t A[x] A[y]. • Question: how would you rearrange A to make it a max-heap again? • Answer: percolate A[t] up its ancestral path. procedureUpHeap(A, t) §O(log n) time Pre-Cond: A is a t upward corrupted heap Post-Cond: A is rearranged into a max-heap p t/2 §parent of t ifp = 0 or A[p] A[t] then return A[t] A[p] UpHeap(A,p) end
1 1 94 94 2 2 3 3 82 82 92 92 5 5 4 4 6 7 6 7 74 86 74 76 68 68 88 88 68 48 76 18 56 68 48 74 18 56 8 9 10 11 12 8 9 10 11 12 1 94 2 3 86 92 5 4 6 7 74 82 68 88 68 48 76 18 56 8 9 10 11 12 UpHeap Example 86 stop
t 2t 2t+1 DownHeap (or Heapify) • A[1..n] = a max-heap. • Suddenly, item A[t] decreases in value. • Now A is a “t downward corrupted heap”:(x,y) AncestorOf : x t A[x] A[y]. • Question: how would you rearrange A to make it a max-heap again? • Answer: demote A[t] down along largest-child path. procedureDownHeap(A, t) §O(log n) time Pre-Cond: A is a t downward corrupted heap Post-Cond: A is rearranged into a max-heap c 2t §left child of t ifc > size[A] then return §c not part of heapifc < size[A] and A[c] < A[c+1] then c c+1 §now c is the largest child of t ifA[t] < A[c] then A[t] A[c] DownHeap(A, c) end
1 1 94 94 2 2 3 3 76 82 92 92 5 5 4 4 6 7 6 7 74 26 74 76 68 68 88 88 68 48 74 18 56 68 48 74 18 56 8 9 10 11 12 8 9 10 11 12 1 94 2 3 76 92 5 4 6 7 74 74 68 88 68 48 26 18 56 8 9 10 11 12 DownHeap Example 26 stop
1 i h = log n 2i t n Construct Heap • One application of heaps is sorting. But how do we start a heap first? • Problem: Given array A[1..n], rearrange its items to form a heap. • Solution 1:Sort A[1..n] in descending order. Yes, but that’s circular! • Solution 2:Build Incrementally:That is, make A[1..t] a max heap while incrementing t 1 .. n. That is, for t 1 .. n doUpHeap(A, t) end Most nodes are concentrated near the bottom with larger depths but smaller heights. Idea: DownHeap is better!
T(n) = T(|L|) + T(|R|) + O(log n) T(n) = O(n) A[1] L R 1 h = log n 2i t h-i n Heap Construction Algorithm Solution 3:Build Backwards on t by DownHeap(A,t). Assume items in nodes 1..t-1 are tentatively + so that DownHeap’s Pre-Cond is met. procedureConstructHeap(A[1..n]) §O(n) time Pre-Cond: input is array A[1..n] of arbitrary numbers Post-Cond: A is rearranged into a max-heap size[A] n §establish last node barrierLastNonleafNode n/2 fort LastNonleafNodedownto 1 doDownHeap(A, t) end
1 14 2 3 23 42 5 4 6 7 62 51 12 92 26 88 56 83 94 10 11 12 8 9 Construct Heap Example
1 14 DownHeap(A,t)t=6 2 3 23 42 5 4 6 7 62 51 12 92 26 88 56 83 94 10 11 12 8 9 Construct Heap Example
1 14 DownHeap(A,t)t=5 2 3 23 42 5 4 6 7 62 51 56 92 26 88 12 83 94 10 11 12 8 9 Construct Heap Example
1 14 DownHeap(A,t)t=4 2 3 23 42 5 4 6 7 88 51 56 92 26 62 12 83 94 10 11 12 8 9 Construct Heap Example
1 14 DownHeap(A,t)t=3 2 3 23 42 5 4 6 7 88 94 56 92 26 62 12 83 51 10 11 12 8 9 Construct Heap Example
1 14 DownHeap(A,t)t=2 2 3 23 92 5 4 6 7 88 94 56 42 26 62 12 83 51 10 11 12 8 9 Construct Heap Example
1 14 DownHeap(A,t)t=1 2 3 94 92 5 4 6 7 88 83 56 42 26 62 12 23 51 10 11 12 8 9 Construct Heap Example
1 94 2 3 88 92 5 4 6 7 62 83 56 42 26 14 12 23 51 10 11 12 8 9 Construct Heap Example MAXHEAP
Heap as a Priority Queue A Priority Queue (usually implemented with some “heap” structure)is an abstract Data Structure that maintains a set S of items and supports the following operations on it: MakeEmptyHeap(S): Make an empty priory queue and call it S. ConstructHeap(S): Construct a priority queue containing the set S of items. Insert(x, S): Insert new item x into S (duplicate values allowed) DeleteMax(S): Remove and return the maximum item from S. Note: Min-Heap is used if we intend to do DeleteMin instead of DeleteMax.
1 size[A] Priority Queue Operations Array A as a binary heap is a suitable implementation.For a heap of size n, it has the following time complexities: O(1) MakeEmptyHeap(A)size[A] 0 O(n) ConstructHeap(A[1..n])discussed already O(log n) Insert(x,A) and DeleteMax(A)see below procedureInsert(x, A) size[A] size[A] + 1A[ size[A] ] x UpHeap(A, size[A] ) end procedureDeleteMax(A)if size[A] = 0 then return error MaxItem A[1] A[1] A[size[A]]size[A] size[A] – 1 DownHeap(A, 1) returnMaxItem end
HeapSort AlgorithmHeapSort(A[1..n]) §O(n log n) time Pre-Cond: input is array A[1..n] of arbitrary numbers Post-Cond: A is rearranged into sorted order ConstructMaxHeap(A[1..n]) fort n downto 2 do A[t] DeleteMax(A) end
already constructed heap 1 2 3 4 5 6 7 8 9 10 11 12 94 | 82 | 92 | 74 | 76 | 68 | 88 | 68 | 48 | 74 | 18 | 56 swap size[A]= 12 1 94 2 3 82 92 5 4 6 7 74 76 68 88 68 48 74 18 56 8 9 10 11 12 HeapSort Example MaxItem
1 56 2 3 82 92 5 4 6 7 74 76 68 88 68 48 74 18 94 8 9 10 11 12 HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 56 | 82 | 92 | 74 | 76 | 68 | 88 | 68 | 48 | 74 | 18 | 94 size[A]= 12
1 DownHeap(A,1) 56 2 3 82 92 5 4 6 7 74 76 68 88 68 48 74 18 94 8 9 10 11 12 HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 56 | 82 | 92 | 74 | 76 | 68 | 88 | 68 | 48 | 74 | 18 | 94 size[A]= 11
swap HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 92 | 82 | 88 | 74 | 76 | 68 | 56 | 68 | 48 | 74 | 18 | 94 MaxItem size[A]= 11 1 92 2 3 82 88 5 4 6 7 74 76 68 56 68 48 74 18 94 8 9 10 11 12
1 18 2 3 82 88 5 4 6 7 74 76 68 56 68 48 74 92 94 8 9 10 11 12 HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 18 | 82 | 88 | 74 | 76 | 68 | 56 | 68 | 48 | 74 | 92 | 94 size[A]= 11
1 DownHeap(A,1) 18 2 3 82 88 5 4 6 7 74 76 68 56 68 48 74 92 94 8 9 10 11 12 HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 18 | 82 | 88 | 74 | 76 | 68 | 56 | 68 | 48 | 74 | 92 | 94 size[A]= 10
swap HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 88 | 82 | 68 | 74 | 76 | 18 | 56 | 68 | 48 | 74 | 92 | 94 MaxItem size[A]= 10 1 88 2 3 82 68 5 4 6 7 74 76 18 56 68 48 74 92 94 8 9 10 11 12
HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 74 | 82 | 68 | 74 | 76 | 18 | 56 | 68 | 48 | 88 | 92 | 94 size[A]= 10 1 74 2 3 82 68 5 4 6 7 74 76 18 56 68 48 88 92 94 8 9 10 11 12
1 DownHeap(A,1) 74 2 3 82 68 5 4 6 7 74 76 18 56 68 48 88 92 94 8 9 10 11 12 HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 74 | 82 | 68 | 74 | 76 | 18 | 56 | 68 | 48 | 88 | 92 | 94 size[A]= 9
swap HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 82 | 76 | 68 | 74 | 74 | 18 | 56 | 68 | 48 | 88 | 92 | 94 MaxItem size[A]= 9 1 82 2 3 76 68 5 4 6 7 74 74 18 56 68 48 88 92 94 8 9 10 11 12
HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 48 | 76 | 68 | 74 | 74 | 18 | 56 | 68 | 82 | 88 | 92 | 94 size[A]= 9 1 48 2 3 76 68 5 4 6 7 74 74 18 56 68 82 88 92 94 8 9 10 11 12
1 DownHeap(A,1) 48 2 3 76 68 5 4 6 7 74 74 18 56 68 82 88 92 94 8 9 10 11 12 HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 48 | 76 | 68 | 74 | 74 | 18 | 56 | 68 | 82 | 88 | 92 | 94 size[A]= 8
swap HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 76 | 74 | 68 | 68 | 74 | 18 | 56 | 48 | 82 | 88 | 92 | 94 MaxItem size[A]= 8 1 76 2 3 74 68 5 4 6 7 68 74 18 56 48 82 88 92 94 8 9 10 11 12
HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 48 | 74 | 68 | 68 | 74 | 18 | 56 | 76 | 82 | 88 | 92 | 94 size[A]= 8 1 48 2 3 74 68 5 4 6 7 68 74 18 56 76 82 88 92 94 8 9 10 11 12
1 DownHeap(A,1) 48 2 3 74 68 5 4 6 7 68 74 18 56 76 82 88 92 94 8 9 10 11 12 HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 48 | 74 | 68 | 68 | 74 | 18 | 56 | 76 | 82 | 88 | 92 | 94 size[A]= 7
swap HeapSort Example 1 2 3 4 5 6 7 8 9 10 11 12 74 | 74 | 68 | 68 | 48 | 18 | 56 | 76 | 82 | 88 | 92 | 94 MaxItem size[A]= 7 1 74 2 3 74 68 5 4 6 7 68 48 18 56 76 82 88 92 94 8 9 10 11 12