620 likes | 638 Views
Sorting algorithms explained with examples of selection sort and insertion sort, highlighting their efficiency in organizing data for quick retrieval through computer programs. Learn the concepts and implementation details to optimize data sorting processes.
E N D
Motivation • Example: Phone Book Searching • If the phone book was in random order, we would probably never use the phone! • Potentially, we would need to look at every entry in the book • Let’s say ½ second per entry • There are 72,000 households in Ilam • 35,000 seconds = 10hrs to find a phone number! • Next time make a call, could potentially take another 10+ hours • We might get lucky and the number we are looking for is the 1st number – ½ second total • On average though, would be much more time expensive than the phone call itself (average time is about 5 hrs)
Motivation • Because we know the phone book is sorted: • Can use heuristics to facilitate our search • Jump directly to the letter of the alphabet we are interested in using • Scan quickly to find the first two letters that are really close to the name we are interested in • Flip whole pages at a time if not close enough • Every time we pick up the phone book, know it will take a short time to find what we are looking for
The Big Idea • Take a set of N randomly ordered pieces of data aj and rearrange data such that for all j (j >= 0 and j < N), R holds, for relational operator R: a0 R a1 R a2 R … aj … R aN-1 R aN • If R is <=, we are doing an ascending sort – Each consecutive item in the list is going to be larger than the previous • If R is >=, we are doing a descending sort – Items get smaller as move down the list
Sorting Algorithms • What does sorting really require? • Really a repeated two step process • Compare pieces of data at different positions • Swap the data at those positions until order is correct • For large groups of data, • Implementation by a computer program would be very useful • Computers can do the data management (compare and swap) faster than people.
3 5 9 18 20 Example 20 3 18 9 5
Selection Sort void selectionSort(int* a, int size) { for (int k = 0; k < size-1; k++) { int index = mininumIndex(a, k, size); swap(a[k],a[index]); } } int minimumIndex(int* a, int first, int last) { int minIndex = first; for (int j = first + 1; j < last; j++) { if (a[j] < a[minIndex]) minIndex = j; } return minIndex; }
Selection Sort • What is selection sort doing? • Repeatedly • Finding smallest element by searching through list • Inserting at front of list • Moving “front of list” forward by 1
18 9 5 3 20 Selection Sort Step Through 20 3 18 9 5 minIndex(a, 0, 5) ? = 1 swap (a[0],a[1])
Order From Previous 9 18 18 18 9 18 9 9 20 5 20 20 3 3 3 3 5 5 20 5 Find minIndex (a, 1, 5) =4 Find minIndex (a, 2, 5) = 3
9 9 9 18 18 18 20 20 20 3 3 3 5 5 5 Find minIndex (a, 3, 5) = 3 K = 4 = size-1 Done!
Cost of Selection Sort void selectionSort(int* a, int size) { for (int k = 0; k < size-1; k++) { int index = mininumIndex(a, k, size); swap(a[k],a[index]); } } int minimumIndex(int* a, int first, int last) { int minIndex = first; for (int j = first + 1; j < last; j++) { if (a[j] < a[minIndex]) minIndex = j; } return minIndex; }
Cost of Selection Sort • How many times through outer loop? • Iteration is for k = 0 to < (N-1) => N-1 times • How many comparisons in minIndex? • Depends on outer loop – Consider 5 elements: • K = 0 j = 1,2,3,4 • K = 1 j = 2, 3, 4 • K = 2 j = 3, 4 • K = 3 j = 4 • Total comparisons is equal to 4 + 3 + 2 + 1, which is N-1 + N-2 + N-3 … + 1 • What is that sum?
Cost of Selection Sort (N-1) + (N-2) + (N-3) + … + 3 + 2 + 1 (N-1) + 1 + (N-2) + 2 + (N-3) + 3 … N + N + N … => repeated addition of N How many repeated additions? There were n-1 total starting objects to add, we grouped every 2 together – approximately N/2 repeated additions => Approximately N * N/2 = O(N^2) comparisons
Insertion Sort void insertionSort(int* a, int size) { for (int k = 1; k < size; k++) { int temp = a[k]; int position = k; while (position > 0 && a[position-1] > temp) { a[position] = a[position-1]; position--; } a[position] = temp; } }
Insertion Sort • List of size 1 (first element) is already sorted • Repeatedly • Chooses new item to place in list (a[k]) • Starting at back of the list, if new item is less than item at current position, shift current data right by 1. • Repeat shifting until new item is not less than thing in front of it. • Insert the new item
Insertion Sort • Insertion Sort is the classic card playing sort. • Pick up cards one at a time. • Single card already sorted. • When select new card, slide it in at the appropriate point by shifting everything right. • Easy for humans because shifts are “free”, where for computer have to do data movement.
Insertion Sort Step Through Single card list already sorted 20 3 18 9 5 A[1] A[2] A[3] A[4] A[0] Move 3 left until hits something smaller 20 3 18 9 5 A[2] A[3] A[4] A[0] A[1]
Move 3 left until hits something smaller Now two sorted 18 9 5 3 20 A[2] A[3] A[4] A[0] A[1] Move 18 left until hits something smaller 3 20 18 9 5 A[3] A[4] A[0] A[1] A[2]
Move 18 left until hits something smaller Now three sorted 9 5 3 18 20 A[3] A[4] A[0] A[1] A[2] Move 9 left until hits something smaller 3 20 9 18 5 A[4] A[0] A[1] A[2] A[3]
Move 9 left until hits something smaller Now four sorted 3 9 18 20 5 A[4] A[0] A[1] A[2] A[3] Move 5 left until hits something smaller 3 9 18 20 5 A[0] A[1] A[2] A[3] A[4]
Move 5 left until hits something smaller Now all five sorted Done 3 5 9 18 20 A[0] A[1] A[2] A[3] A[4]
Cost of Insertion Sort void insertionSort(int* a, int size) { for (int k = 1; k < size; k++) { int temp = a[k]; int position = k; while (position > 0 && a[position-1] > temp) { a[position] = a[position-1]; position--; } a[position] = temp; } }
Cost of Insertion Sort • Outer loop • K = 1 to < size 1,2,3,4 => N-1 • Inner loop • Worst case: Compare against all items in list • Inserting new smallest thing • K = 1, 1 step (position = k = 1, while position > 0) • K = 2, 2 steps [position = 2,1] • K = 3, 3 steps [position = 3,2,1] • K = 4, 4 steps [position = 4,3,2,1] • Again, worst case total comparisons is equal to sum of I from 1 to N-1, which is O(N2)
Cost of Swaps Selection Sort: void selectionSort(int* a, int size) { for (int k = 0; k < size-1; k++) { int index = mininumIndex(a, k, size); swap(a[k],a[index]); } } • One swap each time, for O(N) swaps
Cost of Swaps Insertion Sort void insertionSort(int* a, int size) { for (int k = 1; k < size; k++) { int temp = a[k]; int position = k; while (position > 0 && a[position-1] > temp) { a[position] = a[position-1]; position--; } a[position] = temp; } } • Do a shift almost every time do compare, so O(n2) shifts • Shifts are faster than swaps (1 step vs 3 steps) • Are we doing few enough of them to make up the difference?
Another Issue - Memory • Space requirements for each sort? • All of these sorts require the space to hold the array - O(N) • Require temp variable for swaps • Require a handful of counters • Can all be done “in place”, so equivalent in terms of memory costs • Not all sorts can be done in place though!
Which O(n2) Sort to Use? • Insertion sort is the winner: • Worst case requires all comparisons • Most cases don’t (jump out of while loop early) • Selection use for loops, go all the way through each time
Tradeoffs • Given random data, when is it more efficient to: • Just search versus • Insertion Sort and search • Assume Z searches Search on random data: Z * O(n) Sort and binary search: O(n2) + Z *log2n
Tradeoffs Z * n <= n2 + (Z * log2n) Z * n – Z * log2n <= n2 Z * (n-log2n) <= n2 Z <= n2/(n-log2n) For large n, log2n is dwarfed by n in (n-log2n) Z <= n2/n Z <= n (approximately)
Improving Sorts • Better sorting algorithms rely on divide and conquer (recursion) • Find an efficient technique for splitting data • Sort the splits separately • Find an efficient technique for merging the data • We’ll see two examples • One does most of its work splitting • One does most of its work merging
Quicksort General Quicksort Algorithm: • Select an element from the array to be the pivot • Rearrange the elements of the array into a left and right subarray • All values in the left subarray are < pivot • All values in the right subarray are > pivot • Independently sort the subarrays • No merging required, as left and right are independent problems [ Parallelism?!? ]
Quicksort void quicksort(int* arrayOfInts, int first, int last) { int pivot; if (first < last) { pivot = partition(arrayOfInts, first, last); quicksort(arrayOfInts,first,pivot-1); quicksort(arrayOfInts,pivot+1,last); } }
Quicksort int partition(int* arrayOfInts, int first, int last) { int temp; int p = first; // set pivot = first index for (int k = first+1; k <= last; k++) // for every other indx { if (arrayOfInts[k] <= arrayOfInts[first]) // if data is smaller { p = p + 1; // update final pivot location swap(arrayOfInts[k], arrayOfInts[p]); } } swap(arrayOfInts[p], arrayOfInts[first]); return p; }
9 9 9 18 5 5 5 18 3 3 3 18 20 20 20 3 5 9 18 20 Partition Step Through partition(cards, 0, 4) P = 0 K = 1 P = 1 K = 3 cards[1] < cards[0] ? No cards[3] < cards[0]? Yes P = 2 P = 0 K = 2 temp = cards[3] cards[2] < cards[0] ? Yes cards[3] = cards[2] P = 1 cards[2] = cards[3] temp = cards[2] P = 2 K = 4 cards[2] = cards[1] cards[4] < cards[0]? No cards[1] = temp temp = cards[2], cards[2] = cards[first] cards[first] = temp, return p = 2;
Complexity of Quicksort • Worst case is O(n2) • What does worst case correspond to? • Already sorted or near sorted • Partitioning leaves heavily unbalanced subarrays • On average is O(n log2n), and it is average a lot of the time.
Complexity of Quicksort Recurrence Relation: [Average Case] 2 sub problems ½ size (if good pivot) Partition is O(n) a = 2 b = 2 k = 1 2 = 21 Master Theorem: O(nlog2n)
Complexity of Quicksort Recurrence Relation: [Worst Case] • Partition separates into (n-1) and (1) • Can’t use master theorem: b (subproblem size) changes n-1/n n-2/n-1 n-3/n-2 • Note that sum of partition work: n + (n-1) + (n-2) + (n-3) … Sum(1,N) = N(N+1)/2 = O(N2)
Complexity of Quicksort • Requires stack space to implement recursion • Worst case: O(n) stack space • If pivot breaks into 1 element and n-1 element subarrays • Average case: O(log n) • Pivot splits evenly
MergeSort • General Mergesort Algorithm: • Recursively split subarrays in half • Merge sorted subarrays • Splitting is first in recursive call, so continues until have one item subarrays • One item subarrays are by definition sorted • Merge recombines subarrays so result is sorted • 1+1 item subarrays => 2 item subarrays • 2+2 item subarrays => 4 item subarrays • Use fact that subarrays are sorted to simplify merge algorithm
MergeSort void mergesort(int* array, int* tempArray, int low, int high, int size) { if (low < high) { int middle = (low + high) / 2; mergesort(array,tempArray,low,middle, size); mergesort(array,tempArray,middle+1, high, size); merge(array,tempArray,low,middle,high, size); } }
MergeSort void merge(int* array, int* tempArray, int low, int middle, int high, int size) { int i, j, k; for (i = low; i <= high; i++) { tempArray[i] = array[i]; } // copy into temp array i = low; j = middle+1; k = low; while ((i <= middle) && (j <= high)) { // merge if (tempArray[i] <= tempArray[j]) // if lhs item is smaller array[k++] = tempArray[i++]; // put in final array, increment else // final array position, lhs index array[k++] = tempArray[j++]; // else put rhs item in final array } // increment final array position // rhs index while (i <= middle) // one of the two will run out array[k++] = tempArray[i++]; // copy the rest of the data } // only need to copy if in lhs array // rhs array already in right place
MergeSort Example 20 3 18 9 5 Recursively Split 20 3 18 9 5
MergeSort Example 20 3 18 9 5 Recursively Split 9 20 3 18 5
MergeSort Example 20 3 18 9 5 Merge
Merge Sort Example 2 cards Not very interesting Think of as swap 20 3 3 20 Temp Array Array Temp[i] < Temp[j] Yes 3 20 18 3 k j i
3 18 MergeSort Example Temp Array Array Temp[i] < Temp[j] No 3 20 18 3 18 j k i Update J, K by 1 => Hit Limit of Internal While Loop, as J > High Now Copy until I > Middle Array 20 k
5 9 3 5 9 18 20 i=3,j=5 i=1,j=3 i=1,j=4 i=1,j=5 i=2,j=5 MergeSort Example 2 Card Swap 9 5 5 9 3 18 20 Final after merging above sets i=0,j=3
Complexity of MergeSort Recurrence relation: 2 subproblems ½ size Merging is O(n) for any subproblem Always moving forwards in the array a = 2 b = 2 k = 1 2 = 21 Master Theorem: O(n log2n) Always O(n log2n) in both average and worst case Doesn’t rely on quality of pivot choice
Space Complexity of Mergesort • Need an additional O(n) temporary array • Number of recursive calls: • Always O(log2n)