700 likes | 808 Views
Sorting. We have actually seen already two efficient ways to sort:. A kind of “insertion” sort. Insert the elements into a red-black tree one by one Traverse the tree in in-order and collect the keys Takes O(nlog(n)) time. Heapsort (Willians, Floyd, 1964). Put the elements in an array
E N D
Sorting • We have actually seen already two efficient ways to sort:
A kind of “insertion” sort • Insert the elements into a red-black tree one by one • Traverse the tree in in-order and collect the keys • Takes O(nlog(n)) time
Heapsort (Willians, Floyd, 1964) • Put the elements in an array • Make the array into a heap • Do a deletemin and put the deleted element at the last position of the array
quicksort Input: an array A[p, r] Quicksort (A, p, r) if (p < r) then q = Partition (A, p, r) //q is the position of the pivot element Quicksort(A, p, q-1) Quicksort(A, q+1, r)
p r j i 2 8 7 1 3 5 6 4 j i 2 8 7 1 3 5 6 4 j i 2 8 7 1 3 5 6 4 j i 2 1 7 8 3 5 6 4 i j 2 8 7 1 3 5 6 4
j i 2 1 7 8 3 5 6 4 j i 2 1 3 8 7 5 6 4 j i 2 1 3 8 7 5 6 4 j i 2 1 3 8 7 5 6 4 j i 2 1 3 4 7 5 6 8
2 8 7 1 3 5 6 4 r p Partition(A, p, r) x ←A[r] i ← p-1 for j ← p to r-1 do if A[j] ≤ x then i ← i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔A[r] return i+1
Analysis • Running time is proportional to the number of comparisons • Each pair is compared at most once O(n2) • In fact for each n there is an input of size n on which quicksort takes Ω(n2) time
But • Assume that the split is even in each iteration
T(n) = 2T(n/2) + n How do we solve linear recurrences like this ? (read Chapter 4)
Recurrence tree n T(n/2) T(n/2)
Recurrence tree n n/2 n/2 T(n/4) T(n/4) T(n/4) T(n/4)
Recurrence tree n n/2 n/2 logn T(n/4) T(n/4) T(n/4) T(n/4) In every level we do bn comparisons So the total number of comparisons is O(nlogn)
Observations • We can’t guarantee good splits • But intuitively on random inputs we will get good splits
Randomized quicksort • Use randomized-partition rather than partition Randomized-partition (A, p, r) i ← random(p,r) exchange A[r] ↔ A[i] return partition(A,p,r)
On the same input we will get a different running time in each run ! • Look at the average for one particular input of all these running times
Expected # of comparisons Let X be the expected # of comparisons This is a random variable Want to know E(X)
Expected # of comparisons Let z1,z2,.....,zn the elements in sorted order Let Xij = 1 if zi is compared to zj and 0 otherwise So,
Consider zi,zi+1,.......,zj ≡ Zij Claim: zi and zj are compared either zi or zj is the first chosen in Zij Proof: 3 cases: • {zi, …, zj} Compared on this partition, and never again. • {zi, …, zj} the same • {zi, …, zk, …, zj} Not compared on this partition. Partition separates them, so no future partition uses both.
just explained = Pr{zi or zj is first pivot chosen from Zij} = Pr{zi is first pivot chosen from Zij} + Pr{zj is first pivot chosen from Zij} mutually exclusive possibilities Pr{zi is compared to zj} = 1/(j-i+1) + 1/(j-i+1) = 2/(j-i+1)
Simplify with a change of variable, k=j-i+1. Simplify and overestimate, by adding terms.
A lower bound • Comparison model: We assume that the operation from which we deduce order among keys are comparisons • Then we prove that we need Ω(nlogn) comparisons on the worst case
Insertion sort 1:2 z x x y y z x y x y x y z z y z x x z y y x z y z z z z x y y x x > < 2:3 2:3 < > > 1:2 1:2 > > < <
Quicksort 1:3 x x x x y z y z z x y y x z x z y y z x z z y y z y x > < 2:3 2:3 < > < > 2:3 1:2 > < > <
Important observations • Every algorithm can be represented as a (binary) tree like this • For every node v there is an input on which the algorithm reaches v • The # of leaves is n!
Important observations • Each path corresponds to a run on some input • The worst case # of comparisons corresponds to the longest path
The lower bound • Let d be the length of the longest path n! ≤ #leaves ≤ 2d log2(n!) ≤ d
Lower bound for sorting • Any sorting algorithm based on comparisons between elements requires (n log n) comparisons.
Beating the lower bound • We can beat the lower bound if we can deduce order relations between keys not by comparisons Examples: • Count sort • Radix sort
Count sort • Assume that keys are integers between 0 and k A 2 3 0 5 3 5 0 2 0
Count sort • Allocate a temporary array of size k: cell x counts the # of keys =x A 2 3 0 5 3 5 0 2 5 C 0 0 0 0 0 0
Count sort A 2 3 0 5 3 5 0 2 5 C 0 0 1 0 0 0
Count sort A 2 3 0 5 3 5 0 2 5 C 0 0 1 1 0 0
Count sort A 2 3 0 5 3 5 0 2 5 C 1 0 1 1 0 0
Count sort A 2 3 0 5 3 5 0 2 5 C 2 0 2 2 0 3
Count sort • Compute prefix sums of C: cell x holds the # of keys ≤ x (rather than =x) A 2 3 0 5 3 5 0 2 5 C 2 0 2 2 0 3
Count sort • Compute prefix sums of C: cell x holds the # of keys ≤ x (rather than =x) A 2 3 0 5 3 5 0 2 5 C 2 2 4 6 6 9
Count sort • Move items to output array A 2 3 0 5 3 5 0 2 5 C 2 2 4 6 6 9 B / / / / / / / / /
Count sort A 2 3 0 5 3 5 0 2 5 C 2 2 4 6 6 9 B / / / / / / / / /
Count sort A 2 3 0 5 3 5 0 2 5 C 2 2 4 6 6 8 B / / / / / / / / 5
Count sort A 2 3 0 5 3 5 0 2 5 C 2 2 3 6 6 8 B / / / 2 / / / / 5
Count sort A 2 3 0 5 3 5 0 2 5 C 1 2 3 6 6 8 B / 0 / 2 / / / / 5
Count sort A 2 3 0 5 3 5 0 2 5 C 1 2 3 6 6 7 B / 0 / 2 / / / 5 5
Count sort A 2 3 0 5 3 5 0 2 5 C 1 2 3 5 6 7 B / 0 / 2 / 3 / 5 5
Count sort A 2 3 0 5 3 5 0 2 5 C 0 2 2 4 6 6 B 0 0 2 2 3 3 5 5 5
Count sort • Complexity: O(n+k) • The sort is stable • Note that count sort does not perform any comparison