770 likes | 904 Views
Big-O and Sorting. February 6, 2006. Administrative Stuff. Readings for today: Ch 7.3-7.5. Readings for tomorrow: Ch 8. Sorting!. Very common to need data in order Viewing, printing Faster to search, find min/max, compute median/mode, etc. Lots of different sorting algoritms
E N D
Big-O and Sorting February 6, 2006
Administrative Stuff • Readings for today: Ch 7.3-7.5 • Readings for tomorrow: Ch 8
Sorting! • Very common to need data in order • Viewing, printing • Faster to search, find min/max, compute median/mode, etc. • Lots of different sorting algoritms • From the simple to very complex • Some optimized for certain situations (lots of duplicates, almost sorted, etc.) • Typically sort arrays, but algorithms usually can be adapted for other data structures (e.g. linked lists)
Selection sort • Sort by "selecting" smallest and putting in front • Search entire array for minimum value • Min is placed in first slot • Could move elements over to make space, but faster to just swap with current first • Repeat for second smallest, third, and so on
Selection sort code void SelectionSort(int arr[], int n) { for (int i = 0; i < n-1; i++) { int minIndex = i; for (int j = i+1; j < n; j++) { if (arr[j] < arr[minIndex]) minIndex = j; } Swap(arr[i], arr[minIndex]); } }
Analyzing selection sort for (int i = 0; i < n-1; i++) { int minIndex = i; for (int j = i+1; j < n; j++) { if (arr[j] < arr[minIndex]) minIndex = j; } Swap(arr[i], arr[minIndex]); } • Count statements • First time inner loop N-1 comparisons • N-2 second time, then N-3, … • Last iteration 1 comparison
Analyzing selection sort • N-1 + N-2 + N-3 + … + 3 + 2 + 1 • "Gaussian sum" • Add sum to self Sum =
Analyzing selection sort • N-1 + N-2 + N-3 + … + 3 + 2 + 1 • "Gaussian sum" • Add sum to self N-1 + N-2 + N-3 + … + 3 + 2 + 1 + 1 + 2 + 3 + …. + N-2 + N-1 = N + N + N + …. + N + N = (N-1)N Sum = 1/2 * (N-1)N O(N2)
Quadratic growth • In clock time • 10,000 3 sec • 20,000 13 sec • 50,000 77 sec • 100,000 5 min • Double input -> 4X time • Feasible for small inputs, quickly unmanagable • Halve input -> 1/4 time • Hmm… • If two sorted half-size arrays, how to produce sorted full array?
Mergesort • "Divide and conquer" algorithm • Divide array in half • Recursively sort each half • Merge two halves together • "Easy-split hard-join" • No complex decision about which goes where, just divide in middle • Merge step preserves ordering from each half
6 2 8 5 1 9 3 7 4 10
void MergeSort(int array[], int n) { if (n > 1) { int n1 = n/2; int n2 = n - n1; int *arr1 = CopySubArray(array, 0, n1); int *arr2 = CopySubArray(array, n1, n2); MergeSort(arr1, n1); MergeSort(arr2, n2); Merge(array, arr1, n1, arr2, n2); delete[] arr1; delete[] arr2; } }
CopySubArray // Create a new array in memory void CopyArray(int arr[], int n, int * & copy) { copy = new int[n]; for(int i = 0; i < n; i++) { copy[i] = arr[i]; } }
Merge code void Merge(int array[], int arr1[], int n1, int arr2[], int n2){ int p = 0, p1 = 0, p2 = 0; while (p1 < n1 && p2 < n2) { if (arr1[p1] < arr2[p2]) array[p++] = arr1[p1++]; else array[p++] = arr2[p2++]; } while (p1 < n1) array[p++] = arr1[p1++]; while (p2 < n2) array[p++] = arr2[p2++]; }
void Merge(int array[], int arr1[], int n1, int arr2[], int n2) { int p, p1, p2; p = p1 = p2 = 0; while (p1 < n1 && p2 < n2) { // Merge until hit if (arr1[p1] < arr2[p2]) { // end of one array array[p++] = arr1[p1++]; } else { array[p++] = arr2[p2++]; } } while (p1 < n1) { // Merge rest of array[p++] = arr1[p1++]; // remaining array } while (p2 < n2) { array[p++] = arr2[p2++]; } } arr1 n1 p1 array 4 7 8 12 4 arr2 n2 p2 5 9 16 18 4 p
Merge sort analysis void MergeSort(int array[], int n) { if (n > 1) { int n1 = n/2; int n2 = n - n1; int *arr1 = CopySubArray(array, 0, n1); int *arr2 = CopySubArray(array, n1, n2); MergeSort(arr1, n1); MergeSort(arr2, n2); Merge(array, arr1, n1, arr2, n2); delete[] arr1; delete[] arr2; } }
+ = N/2 + N/2 MS(N/2) MS(N/2) + = 4*N/4 N/4 N/4 N/4 N/4 + = 8*N/8 N/8 N/8 N/8 N/8 N/8 N/8 N/8 N/8 ... Each level contributes N Merge sort analysis = N MS(N)
K levels … N/2K Merge sort analysis MS(N) MS(N/2) MS(N/2) N/4 N/4 N/4 N/4 N/8 N/8 N/8 N/8 N/8 N/8 N/8 N/8 N/2K = 1 N = 2K lg N = K lg N levels * N per level= O(NlgN)
In clock time • Compare SelectionSort to MergeSort • 10,000 3 sec .05 sec • 20,000 13 sec .15 sec • 50,000 78 sec .38 sec • 100,000 5 min .81 sec • 200,000 20 min 1.7 sec • 1,000,000 8 hrs (est) 9 sec • O(NlgN) is looking pretty good! But can we do even better?
Can we do even better than MergeSort? • O(N log N) is fastest sort in the general case • So, theoretically, answer is “no” • But, we can come up with a different O(N log N) sort that is practically faster • Want to avoid overhead of creating new arrays (as is done in MergeSort) • Bring on the QuickSort!
5 3 7 4 8 6 2 1 Quicksort
Recursive Insight 5 3 7 4 8 6 2 1
Recursive Insight 5 3 7 4 8 6 2 1 select “pivot”
Recursive Insight 5 3 7 4 8 6 2 1 • Partition array so: • everything smaller than pivot is on left • everything greater than or equal to pivot is on right • pivot is in-between
Recursive Insight 2 3 1 4 5 6 8 7 • Partition array so: • everything smaller than pivot is on left • everything greater than or equal to pivot is on right • pivot is in-between
Recursive Insight 2 3 1 4 5 6 8 7 Now recursive sort “red” sub-array
Recursive Insight 1 2 3 4 5 6 8 7 Now recursive sort “red” sub-array
Recursive Insight 1 2 3 4 5 6 8 7 Now recursive sort “red” sub-array Then, recursive sort “blue” sub-array
Recursive Insight 1 2 3 4 5 6 7 8 Now recursive sort “red” sub-array Then, recursive sort “blue” sub-array
Recursive Insight 1 2 3 4 5 6 7 8 Everything is sorted!
void Quicksort(int arr[], int n) { if (n < 2) return; int boundary = Partition(arr, n); // Sort subarray up to pivot Quicksort(arr, boundary); // Sort subarray after pivot to end Quicksort(arr+boundary+1,n–boundary-1); } “boundary” is the index of the pivot This is equal to the number of elements before pivot
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; }
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 7 4 8 6 2 1
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 7 4 8 6 2 1 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 7 4 8 6 2 1 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 7 4 8 6 2 1 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 7 4 8 6 2 1 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 7 4 8 6 2 1 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 7 4 8 6 2 1 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 8 6 2 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 8 6 2 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 8 6 2 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 8 6 2 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 8 6 2 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 8 6 2 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 8 6 2 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 8 6 2 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 8 6 2 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 2 6 8 7 pivot lh rh
int Partition(int arr[], int n) { int lh = 1, rh = n - 1; int pivot = arr[0]; while (true) { while (lh < rh && arr[rh] >= pivot) rh--; while (lh < rh && arr[lh] < pivot) lh++; if (lh == rh) break; Swap(arr[lh], arr[rh]); } if (arr[lh] >= pivot) return 0; Swap(arr[0], arr[lh]); return lh; } 5 3 1 4 2 6 8 7 pivot lh rh