200 likes | 336 Views
Sorting. Suppose you wanted to write a computer game like Doom 4: The Caverns of Calvin… How do you render those nice (lurid) pictures of Calvin College torture chambers, with hidden surfaces removed? Given a collection of polygons (points, tests, values), how do you sort them?
E N D
Sorting • Suppose you wanted to write a computer game like Doom 4: The Caverns of Calvin… • How do you render those nice (lurid) pictures of Calvin College torture chambers, with hidden surfaces removed? • Given a collection of polygons (points, tests, values), how do you sort them? • My favorite sort: • What are your favorite sorts? Read 6.1-6.5, omit rest of chapter 6.
Simple (and slow) algorithms • Bubble sort: • Selection Sort: • Insertion Sort: • Which is best? important factors: comparisons, data movement
Sorting out Sorting • A collection or file of items with keys • Sorting may be on items or pointers • Sorting may be internal or external • Sorting may or may not be stable • Simple algorithms: • easy to implement • slow (on big sets of data) • show the basic approaches, concepts • May be used to improve fancier algorithms
Sorting Utilities We’d like our sorting algorithms to work with all data types… template <class Item> void exch(Item &A, Item &B) {Item t=A; A=B; B=t; } template <class Item> void compexch(Item &A, Item &B) {if (B<A) exch(A, B); }
Bubble Sort • The first sort moststudents learn • And the worst… template <class Item> void bubble(Item a[], int l, int r) { for (int i=l; i<r; i++) for (int j=r; j>i; j--) compexch(a[j-1], a[j]); } comparisons? something like n2/2 date movements? something like n2/2
Selection Sort • Find smallest element • Exchange with first • Recursively sort rest template <class Item> void selection(Item a[], int l, int r) { for (int i=1; i<r; i++) { int min=i; for (int j=i+1; j<=r; j++) if (a[j]<a[min]) min=j; exch(a[i], a[min]); } } comparisons? n2/2 swaps? n
Insertion Sort • Like sorting cards • Put next one in place template <class Item> void insertion(Item a[], int l, int r) { int i; for (i=r; i>l; i--) compexch(a[i-1],a[i]); for (i=l+2; i<=r; i++) { int j=i; Item v=a[i]; while (v<a[j-1]) { a[j] = a[j-1]; j--; } a[j] = v; } } comparisons? n2/4 n2/4 data moves?
Which one to use? • Selection: few data movements • Insertion: few comparisons • Bubble: blows • But all of these are Q(n2), which, as you know, is TERRIBLE for large n • Can we do better than Q(n2)?
Merge Sort • The quintessential divide-and-conquer algorithm • Divide the list in half • Sort each half recursively • Merge the results. • Base case: left as an exercise to the reader
Merge Sort Analysis • Recall runtime recurrence: T(1)=0; T(n) = 2T(n/2) + cn Q(n log n) runtime in the worst case • Much better than the simple sorts on big data files – and easy to implement! • Can implement in-place and bottom-up to avoid some data movement and recursion overhead • Still, empirically, it’s slower than Quicksort, which we’ll study next.
Quicksort • Pick a pivot; pivot list; sort halves recursively. • The most widely used algorithm • A heavily studied algorithm with many variations and improvements (“it seems to invite tinkering”) • A carefully tuned quicksort is usually fastest (e.g. unix’s qsort standard library function) • but not stable, and in some situations slooow…
Quicksort template <class Item> void qsort(Item a[], int l, int r) { if (r<=l) return; int i=partition(a, l, r); qsort(a, l, i-1); qsort(a, i+1, r); } partition: pick an item as pivot, p (last item?) rearrange list into items smaller, equal, and greater than p
Partitioning template <class Item> int partition(Item a[], int l, int r) { int i=l-1, j=r; Item v=a[r]; for (;;) { while (a[++i] < v) ; while (v<a[--j]) if (j==l) break; if (i >= j) break; exch(a[i], a[j]); } exch(a[i], a[r]); return i; }
Quicksort Analysis • What is the runtime for Quicksort? • Recurrence relation? • Worst case: Q(n2) • Best, Average case: Q(n log n) • When does the worst case arise? when the list is (nearly) sorted! oops… • Recursive algorithms also have lots of overhead. How to reduce the recursion overhead?
Quick Hacks: Cutoff • How to improve the recursion overhead? • Don’t sort lists of size <= 10 (e.g.) • At the end, run a pass of insertion sort. • In practice, this speeds up the algorithm
Quick Hacks: Picking a Pivot • How to prevent that nasty worst-case behavior? • Be smarter about picking a pivot • E.g. pick three random elements and take their median • Again, this yields an improvement in empirical performance: the worst case is much more rare (what would have to happen to get the worst case?)
Median, Order Statistics • Quicksort improvement idea: use the median as pivot • Give me an algorithm to: • find the smallest element of a list • find the 4th smallest element • find the kth smallest element • Algorithm idea: sort, then pick the middle element. • Q(n log n) worst, average case. • This won’t help for quicksort! • Can we do better?
Quicksort-based selection • Pick a pivot; partition list. Let i be location of pivot. • If i>k search left part; if i<k search right part template<class Item> void select(Item a[], int l, int r, int k) { if (r <= l) return a[r]; int i = partition(a, l, r); if (i > k) return select(a, l, i-1, k); if (i < k) return select(a, i+1, r, k); } O(n2) Worst-case runtime? O(n) Expected runtime?
Lower Bound on Sorting • Do you think that there will always be improvements in sorting algorithms? • better than Q(n)? • better than Q(n log n)? • how to prove that no comparison sort is better than Q(n log n) in the worst case? • consider all algorithms!? • Few non-trivial lower bounds are known. Hard! • But, we can say that the runtime for any comparison sort is W(n log n).
Comparison sort lower bound • How many comparisons are needed to sort? • decision tree: each leaf a permutation; each node a comparison: a < b? • A sort of a particular list: a path from root to leaf. • How many leaves? • n! • Shortest possible decision tree? • W(log n!) • Stirling’s formula (p. 43): lg n! is about n lg n – n lg e + lg(sqrt(2 pi n)) • W(n log n)! • There is no comparison sort better than W(n log n) • (but are there other approaches to sorting?)