620 likes | 764 Views
Sorting. 15-211 Fundamental Data Structures and Algorithms. Klaus Sutner February 17, 2004. Announcements. Homework 5 is out Reading: Chapter 8 in MAW Quiz 1 available on Thursday. Introduction to Sorting. Boring ….
E N D
Sorting 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 17, 2004
Announcements • Homework 5 is out • Reading: • Chapter 8 in MAW • Quiz 1 available on Thursday
Boring … Sorting is admittedly not very sexy, everybody knows some algorithms already, … But: Good sorting algorithms are needed absolutely everywhere. Sorting is fairly well understood theoretically. Provides a good way to introduce some important ideas.
The Problem We are given a sequence of items a1 a2 a3 … an-1 an We want to rearrange them so that they are in non-decreasing order. More precisely, we need a permutation f such that af(1) af(2) af(3) … af(n-1) af(n) .
A Constraint Comparison Based Sorting While we are rearranging the items, we will only use queries of the form ai aj Or variants thereof (<,> and so forth).
Say What? The important point here is that the algorithm can can only make comparison such as if( a[i] < a[j] ) … We are not allowed to look at pieces of the elements a[i] and a[j]. For example, if these elements are numbers, we are not allowed to compare the most significant digits.
An Easy Upper Bound Here is a simple idea to sort an array: a flip is a position in the array where two adjacent elements are out of order. a[i] > a[i+1] Let’s look for a flip and correct it by swapping the two elements.
A Prototype Algorithm // FlipSort while( there is a flip ) pick one, fix it Is this algorithm guaranteed to terminate? If so, what can we say about its running time? Is it correct, i.e., is the array sorted?
Termination while( there is a flip ) pick one, fix it It’s tempting to do induction on the number of flips but beware: 10 15 5 10 10 5 15 10 We need to talk about inversions instead.
flip inversion Flips and Inversions 24 47 13 99 105 222
Running Time The total number of inversions is clearly quadratic at most. So we can sort in quadratic time if we can manage to find and fix a flip in constant time. We need to organize the search somehow. Probably should try to avoid recomputation.
Naïve sorting algorithms Bubble Sort Selection Sort Insertion Sort this one is actually important Are all quadratic in the worst case and on average.
Bubble Sort Scan through the array, fix flips as you go along. Repeat until array is sorted. for( i = 2; i <= n; i++ ) for( j = n; j >= i; j-- ) if( A[j-1] > A[j] ) swap A[j-1] and A[j];
Selection Sort For k = n, n-1, … find the smallest element in the last k elements of the array and swap it to the front. for( i = 1; i <= n-1; i++ ) find A[j] minimal in A[i..n] swap with A[i]
Insertion Sort Place the ith element into the proper place into the already sorted list of the first i-1 elements. for i = 2 to n do order-insert a[i] in a[1:i-1] Can be implemented nicely.
Insertion Sort Using a sentinel. for( i = 2; i <= n; i++ ) x = A[i]; A[0] = x; for( j = i; x < A[j-1]; j-- ) A[j] = A[j-1]; A[j] = x;
13 13 13 47 105 105 47 47 30 47 47 105 99 105 13 13 13 47 99 99 99 105 99 99 105 30 30 30 30 30 222 222 222 222 222 222 Insertion sort Sorted sublist
How fast is insertion sort? Takes O(#inversions) steps, which is very fast if array is nearly sorted to begin with. 3 2 1 6 5 4 9 8 7 …
How long does it take to sort? • Can we do better than O(n2)? • In the worst case? • In the average case
Sorting in O(n log n) O(n log n) turns out to be a Magic Wall: it is hard to reach, and exceedingly hard to break through. In fact, it’s impossible in a sense to do better than O(n log n). We already know that Heapsort will give us this bound: - build the heap in linear time, - destroy it in O(n log n).
Heapsort in practice • The average-case analysis for heapsort is somewhat complex. • In practice, heapsort consistently tends to use nearly n log n comparisons. • So, while the worst case is better than n2, other algorithms sometimes work better.
Shellsort Shellsort, like insertion sort, is based on swapping inverted pairs. It achieves O(n4/3) running time. [See your book for details.]
99 105 30 99 30 99 30 30 99 47 13 47 13 99 13 13 13 13 105 99 105 105 105 105 47 47 30 30 47 47 222 222 222 222 222 222 Several inverted pairs fixed in one exchange. Shellsort • Example with sequence 3, 1. ...
Recursive sorting • Intuitively, divide the problem into pieces and then recombine the results. • If array is length 1, then done. • If array is length N>1, then split in half and sort each half. • Then combine the results. • An example of a divide-and-conquer algorithm.
Why divide-and-conquer works • Suppose the amount of work required to divide and recombine is linear, that is, O(n). • Suppose also that the amount of work to complete each step is greater than O(n). • Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.
Divide-and-conquer is big • We will see several examples of divide-and-conquer in this course.
Recursive Sorting • If array is length 1, then done. • Otherwise, split into two smaller pieces. • Sort each piece. • Combine the sorted pieces.
Two Major Approaches • 1. Make the split trivial, but perform some work when the pieces are combined Merge Sort. • Work during the split, but then do nothing in the combination step Quick Sort. • In either case, the overhead should be linear • with small constants.
Analysis The analysis is relatively easy if the two pieces have (approximately) the same size. This is the case for Merge Sort, but not for Quick Sort. Let’s ignore the second case for the time being.
Recurrence Equations We need to deal with equations of the form T(1) = 1 T(n) = 2 T(n/2) + f(n) Here f(n) is the non-recursive overhead. There are two recursive calls, each to a sub-instance of the same size n/2. Of course, there are other cases to consider.
Recurrence Equations A slight generalization is T(1) = 1 T(n) = a T(n/b) + f(n) Here f(n) is again the non-recursive overhead. There are a recursive calls, each to a sub-instance of the size n/b.
Recurrence Equations Of course, we’re cheating: T(1) = 1 T(n) = a T(n/b) + f(n) Makes no sense unless b divides n. Let’s just ignore this. In reality there are ceilings and floors and continuity arguments everywhere.
The Algorithm Merging the two sorted parts here is responsible for the overhead. merge( nil, B ) = B; merge( A, nil ) = A; merge( a A, b B ) = if( a <= b ) prepend( merge( A, b B ), a ) else prepend( merge( a A, B ), b )
The Algorithm The main function. List MergeSort( List L ) { if( length(L) <= 1 ) return L; A = first half of L; B = second half of L; return merge(MergeSort(A),MergeSort(B)); }
Harsh Reality In reality, the items are always given in an array. The first and second half can be found by index arithmetic. L R L L
But Note … We cannot perform the merge operation in place. Rather, we need to have another array as scratch space. The total space requirement for Merge Sort is 2n + O(log n) Assuming the recursive implementation.
Running Time Solving the recurrence equation for Merge Sort one can see that the running time is O(n log n) Since Merge Sort reads the data strictly sequentially it is sometimes useful when data reside on slow external media. But overall it is no match for Quick Sort.
Quicksort • Quicksort was invented in 1960 by Tony Hoare. • Although it has O(N2) worst-case performance, on average it is O(Nlog N). • More importantly, it is the fastest known comparison-based sorting algorithm in practice.
Quicksort idea • Choose a pivot.
Quicksort idea • Choose a pivot. • Rearrange so that pivot is in the “right” spot.
Quicksort idea • Choose a pivot. • Rearrange so that pivot is in the “right” spot. • Recurse on each half and conquer!
Quicksort algorithm • If array A has 1 (or 0) elements, then done. • Choose a pivot element x from A. • Divide A-{x} into two arrays: • B = {yA | yx} • C = {yA | yx} • Quicksort arrays B and C. • Result is B+{x}+C.
105 47 13 17 30 222 5 19 5 17 13 47 30 222 105 5 17 30 222 105 105 222 Quicksort algorithm 19 13 47
105 47 13 17 30 222 5 19 5 17 13 47 30 222 105 105 222 Quicksort algorithm 19 13 47 5 17 30 222 105 In practice, insertion sort is used once the arrays get “small enough”.