150 likes | 308 Views
Summary of claims. Sorting algorithms that compare adjacent elements have average-case time W (n 2 ) Sorting algorithms that compare pairs of elements have worst-case time W (n log n) BST sort and quicksort each have average-case time complexity Q (n log n)
E N D
Summary of claims • Sorting algorithms that compare adjacent elements have average-case time W(n2) • Sorting algorithms that compare pairs of elements have worst-case time W(n log n) • BST sort and quicksort each have average-case time complexity Q(n log n) • In each case we assume that each permutation of the sorted sequence is equally likely to appear as input • This is often an unrealistic assumption.
Average-case analysis • Average-case analysis of algorithms requires a precise notion of average-case behavior. • Finding such a notion can be hard. • Often the simplest notion is to assume that each possible input is equally likely.
Average-case analysis for sorting • For sorting, the input may be considered to be a permutation of the sorted sequence. • So the simplest assumption here is that each input permutation is equally likely • and thus has probability 1/n!. • This may be a bad assumption in practice • In particular, sorted input often occurs with probability greater than 1/n!
The equal-likelihood assumption • Yet it’s common to assume equal likelihood in average-case analysis of sorting. • This assumption can be made true if necessary. • the sorting algorithm can first apply a pseudorandom permutation to its input • this can be done in O(n) time
Inefficiency of swapping adjacent elements • Some sorting algorithms work by swapping adjacent elements e.g., insertion sort and bubble sort • Recall our claim: these algorithms must have average-case time complexity W(n2). • with the equal likelihood assumption
Inverse permutations • One slick idea: for any input permutation p, there’s an inverse permutation p-1. • By assumption, p and p-1 have the same probability. • It’s enough to show that the average-case time complexity for p and p-1 is W(n2) • The key concept is that of an inversion – a pair of elements that is out of order.
Inversions • Any pair of elements must be inverted in exactly one of {p, p-1} • So p and p-1 must together contain as many inversions as there are pairs of elements. • this number is n(n-1)/2 • The average number of inversions over p and p-1 (and thus in general) is thus n(n-1)/4 • which is Q(n2). • Swapping adjacent elements fixes just 1 inversion, so W(n2) swaps are required.
BST sort in the average case • Recall our claim – that BST sort takes time Q(n log n) in the average case • if we make the equal-likelihood assumption • It’s enough to show that the total sum of the distances to the nodes is Q(n log n) • since this sum measures the total time for all insertions • and traversal takes time Q(n) • this sum is often called the internal path length
Internal path lengths • Consider BSTs of size n and LST size i • The average internal path length of such BSTs is D(i) + D(n-i-1) + n-1 • So the average internal path length of BSTs of size n is the average of these values • as i ranges from 0 through n-1 • using our equal-likelihood assumption to ensure that all values of i are equally likely
Bounding the average internal path length • But the average value of D(i) + D(n-i-1) + n-1 is (2/n)[S D(i)] + n-1 • since the arguments to D range over the same • To show that this is at most cn log n for some c, we can use the corresponding inequality for i<n as an induction hypothesis • So it’s enough to show that (2/n)[S ci log i] + n-1 ≤ cn log n
Bounding (2/n)[S ci log i] + n-1 by cn log n • The integral test bounds S i log i above by the integral from 1 to n of x log x dx • The indefinite integral is (x2/2)log x – x2/4 • using integration by parts • So (2/n)[S ci log i] + n-1 ≤ (2c/n)[(n2log n)/2 - (n2/4) + 1/4] +n-1 = cn log n – cn/2 + c/2n + n – 1 which is at most cn log n for c≥2 and n > c/2
Quicksort in the average case • For the average-case analysis of quicksort: • under the equal-likelihood assumption • Let T(n) be the average-case time for quicksort on input of size n • Then T(n) ≤ (1/n) [S T(k) + S T(k) ] + cn • T(0) = 0, so the sum runs from k=1 to k=n-1 • since a randomly chosen pivot element is equally likely to be anywhere in the output
To show: T(n) ≤ dn log n for some d • We have T(n) ≤ (2/n) S T(k) + cn • by combining like terms • By induction, T(n) = (2/n) dS (k log k) + cn • for some d that we may choose • The sum is at most n2[(log n)/2 - (1/4)] + 1/4 • by the same integral test as for BSTs • So T(n) ≤ dn log n – (d/2)n + d/(2n) + cn • And so T(n) ≤ dn log n, QED • for d >>c (e.g, for d ≥ 3c and n2 > 3)
Sorting by comparing pairs of elements • Finally, consider an arbitrary sorting algorithms that works by comparing pairs of elements • In k comparisons, such an algorithm can distinguish at most 2k inputpermutations • But there are n! input permutations. • So W(log n!) comparisons are required
A lower bound for sorting by comparing pairs of elements • But log n! is just S log k • as k ranges from 1 to n • And S log k is bounded below by ∫ log x dx • as x ranges from 1 to n, by the integral test • The indefinite integral is x log x – x • So log n! ≥ n log n – n + 1 • And the number of comparisons required by comparison-based sorts is W(n log n)