160 likes | 287 Views
Summary of claims. Sorting algorithms that compare adjacent elements have average-case time W (n 2 ) Sorting algorithms that compare pairs of elements have worst-case time W (n log n) BST sort and quicksort each have average-case time complexity Q (n log n)
E N D
Summary of claims • Sorting algorithms that compare adjacent elements have average-case time W(n2) • Sorting algorithms that compare pairs of elements have worst-case time W(n log n) • BST sort and quicksort each have average-case time complexity Q(n log n) • In each case we assume (unrealistically?): • that each permutation of the sorted sequence is equally likely to appear as input
Average-case analysis • Average-case analysis of algorithms requires a precise notion of average-case behavior. • Finding such a notion can be hard. • Often the simplest notion is to assume that each possible input is equally likely.
Average-case analysis for sorting • For sorting, the input may be considered to be a permutation of the sorted sequence. • This is why we assume that each input permutation is equally likely • and thus has probability 1/n!. • This may be a bad assumption in practice • in particular, sorted input often occurs with probability greater than 1/n!
The equal-likelihood assumption • If it’s important, we can force the equal-likelihood assumption to be true • The sorting algorithm can first apply a pseudorandom permutation to its input • this can be done in O(n) time • so this preprocessing step won’t affect the time complexity of the sorting algorithm
Inefficiency of swapping adjacent elements • Some sorting algorithms work by swapping adjacent elements e.g., insertion sort and bubble sort • We need a way to prove our claim: that these algorithms must have average-case time complexity W(n2). • under the equal-likelihood assumption
Inverse permutations • One slick idea: for any input permutation p, there’s an inverse permutation p-1. • By assumption, p and p-1 have the same probability. • It’s enough to show that the average-case time complexity for p and p-1 is W(n2) • The key concept is that of an inversion – a pair of elements that is out of order.
Inversions • Any pair of elements must be inverted in exactly one of {p, p-1} • So p and p-1 must together contain n(n-1)/2 inversions – one for each pair of elements. • so the average number over {p,p-1} is n(n-1)/4 • Since p is arbitrary, the average number of inversions overall is n(n-1)/4, or Q(n2). • Swapping adjacent elements fixes just 1 inversion, so W(n2) swaps are required.
BST sort in the average case • Recall our claim – that BST sort takes time Q(n log n) in the average case • if we make the equal-likelihood assumption • It’s enough to show that the total sum D(n) of the distances to the nodes is Q(n log n) • since this sum measures the total time for all insertions • and traversal takes time Q(n) • this sum is often called the internal path length
Internal path lengths • Consider BSTs of size n and LST size i • The average internal path length of such BSTs is D(i) + D(n-i-1) + n-1 • since the distance of all n-1 nonroot nodes to the root is 1 greater than that to the subtree root • So the average internal path length of BSTs of size n is the average of these values • as i ranges from 0 through n-1 • all values of i are equally likely by equal likelihood
Bounding the average internal path length • But the average value of D(i) + D(n-i-1) + n-1 is (2/n)[S D(i)] + n-1 • since the values of i vary over the same range • To show: this is ≤ cn log n for some c • We can use the corresponding inequality for i<n as an induction hypothesis • So it’s enough to show that (2/n)[S ci log i] + n-1 ≤ cn log n
Bounding (2/n)[S ci log i] + n-1 by cn log n • The integral test bounds S i log i above by the integral from 1 to n of x log x dx • The indefinite integral is (x2/2)log x – x2/4 • using integration by parts • So (2/n)[S ci log i] + n-1 ≤ (2c/n)[(n2log n)/2 - (n2/4) + 1/4] +n-1 = cn log n – cn/2 + c/2n + n – 1 which is at most cn log n for c≥2 and n > c/2
Quicksort in the average case • For the average-case analysis of quicksort: • under the equal-likelihood assumption • Let T(n) be the average-case time for quicksort on input of size n • Then T(n) ≤ (1/n) [S T(k) + S T(k) ] + cn • T(0) = 0, so the sum runs from k=1 to k=n-1 • since a randomly chosen pivot element is equally likely to be anywhere in the output
To show: T(n) ≤ dn log n for some d • We have T(n) ≤ (2/n) S T(k) + cn • by combining like terms • By induction, T(n) = (2/n) dS (k log k) + cn • for some d that we may choose • The sum is at most n2[(log n)/2 - (1/4)] + 1/4 • by the same integral test as for BSTs • So T(n) ≤ dn log n – (d/2)n + d/(2n) + cn • And so T(n) ≤ dn log n, QED • for d >>c and large n (e.g, for d ≥ 3c and n2 > 3)
Sorting by comparing pairs of elements • Finally, consider an arbitrary sorting algorithms that works by comparing pairs of elements • In k comparisons, such an algorithm can distinguish at most 2k inputpermutations • But there are n! input permutations. • So W(log n!) comparisons are required
A lower bound for sorting by comparing pairs of elements • But log n! is just S log k • as k ranges from 1 to n • And S log k is bounded below by ∫ log x dx • as x ranges from 1 to n, by the integral test • The indefinite integral is x log x – x • So log n! ≥ n log n – n + 1 • And the number of comparisons required by comparison-based sorts is W(n log n)
Guessing and verifying a solution for the mergesort recurrence • Suppose we guess that the mergesort recurrence has the O(n log n) solution suggested by merge trees • We can verify our guess recursively • We’d need to show 2T(n/2) + cn < d n log n • by induction we may assume • 2T(n/2) + cn ≤ 2(dn/2(log n/2)) + cn • but on the right we have dn (-1 + log n) + cn • = -2dn + dn log n + cn • < dn log n if d > c/2