Data Structures and Algorithms

Data Structures and Algorithms Dr. Ken Cosh Sorting

Review • Recursion • Tail Recursion • Non-tail Recursion • Indirect Recursion • Nested Recursion • Excessive Recursion

This week • Sorting Algorithms • The efficiency of many algorithms depends on the way data is stored. • Most data structures can be improved by ordering the data structure in some way. • This week we will investigate different approaches to sorting data.

Sorting Criteria • The first step is to decide what to sort by; • numerical • alphabetical • ASCII codes

Algorithm Measures • Best Case • Often when data already sorted • Worst Case • Data completely disorganised, or in reverse order. • Average Case • Random order • Some sorting algorithms are the same for all three cases, others can be tailored to suit certain cases.

Sorting Algorithms • Sorting Algorithms are generally comprised of ‘comparisons’ and ‘movements’. • If the data being sorted is complicated (strings for example), then the comparison part can be more expensive. • If the data items are large, then the movement part can be more expensive.

Elementary Sorting Algorithms • First lets investigate 3 elementary sorting algorithms; • Insertion Sort • Selection Sort • Bubble Sort

Insertion Sort • Insertion sort starts with the first two elements, and compares them; • a[0] < a[1] ? • If necessary it will swap them. • Next the 3rd element is compared and positioned, and then the 4th etc.

Insertion Sort template<class T> void insertionsort(T data[], int n) { for (int i = 1, j; i < n; i++) { T tmp = data[i]; for (j=i; j<0 && tmp < data[j-1]; j--) data[j] = data[j-1]; data[j] = tmp; } }

Insertion Sort • The insertion method only sorts the part of the array that needs to be sorted – if the array is already sorted, less processing is required. • Best Case when data is already in order, 1 comparison per position (n-1), or O(n). • Worst Case is when data is in reverse order and ever, in which case comparisons are required for both the outer and the inner loop, O(n2) • How about Average Case? The result should be somewhere between O(n) and O(n2), but which is it closer to?

Insertion Sort – Average Case • The number of iterations of the outer loop depends on how far the number is away from where it should be; • it could loop 0 times if it is where it should be, or i times if it needs to be inserted at the start. • Therefore on average it should be; 0+1+2+…+i / i = (i+1)/2 • This is the average per case, which needs to be multiplied by the number of numbers (n), leaving O(n2), sadly closer to the worst case than best.

Selection Sort • Selection Sort takes each value and moves it to its final place. • i.e. it first searches for the smallest value and moves it to the start, then the next smallest and moves that to position 2.

Selection Sort template<class T> void selectionsort(T data[], int n) { for (int i = 0,j,least; i<n-1; i++) { for (j = i+1, least = i; j < n; j++) if (data[j] < data[least]) least = j; swap(data[least], data[i]); } } • Once again this algorithm relies on a double loop, so O(n2)

BubbleSort • Imagine a glass of beer… • The bubbles slowly rise to the surface. • Now imagine those bubbles are the data to be sorted, with the lowest values rising to the surface. • That’s how Bubblesort works!

BubbleSort • Starting with the other end of the dataset, a[n-1], each value is compared with its predecessor, a[n-2], and if it is less it is swapped. • Next a[n-2] is compared with a[n-3], and if necessary swapped, and so on. • Therefore at the end of the first loop through the array, the smallest value is in position a[0], but many of the other values have been bubbled towards their final location. • The next loop through will move values closer still.

BubbleSort template<class T> void bubblesort(T data[], int n) { for (int i=0; i<n-1; i++) for (int j=n-1; j>I; --j) if (data[j] < data[j-1]) swap(data[j], data[j-1]); }

BubbleSort • With Bubble Sort there are the same number of comparisons in every case (best, worst and average) • Sadly that number of comparisons is; (n(n-1))/2, or O(n2)

Cocktail Sort • A variation of the BubbleSort is called Cocktail Sort. • Here the bubbles move in both directions alternatingly. As though a cocktail was being shaken. First lower values are bubbled towards the start, then higher values are bubbled towards the end.

Elementary Sorting Algorithms • Sorry… but O(n2) just isn’t good enough anymore!

Efficient Sorting Algorithms • Shell Sort • Heap Sort • Quick Sort • Merge Sort • Radix Sort

ShellSort • One problem with the basic sorting algorithms is that their runtime grows faster than the size of the array. • Therefore one way to solve this problem is to perform the sort on smaller / limited subarray parts of the array. • ShellSort (named after Donald Shell), breaks the array into subarrays, sorts them, and then sorts the whole array.

ShellSort • Shellsort is actually slightly more complex than that, because decisions need to be made about the size of each subarray. • Normally the Sort is performed on a decreasing number of sub arrays, with the final sort being performed on the whole array (the sub arrays growing in size each Sort). • Note that the actual Sort part can use any of the Elementary Sorting Algorithms already discussed.

Shell Sort Sample • Original Data Set 10, 15, 2, 9, 12, 4, 19, 5, 8, 0, 11 • 5 subarray Sort 10, 4, 11  4,10,11 15, 19  15, 19 2, 5  2, 5 9, 8  8, 9 12, 0  0, 12 • After 5 subarray Sort 4, 15, 2, 8, 0, 10, 19, 5, 9, 12, 11

Shell Sort Sample cont. 4, 15, 2, 8, 0, 10, 19, 5, 9, 12, 11 • 3 subarray Sort 4, 8, 19, 12  4, 8, 12, 19 15, 0, 5, 11  0, 5, 11, 15 2, 10, 9  2, 9, 10 • After 3 subarray Sort 4, 0, 2, 8, 5, 9, 12, 11, 10, 19, 15 • Final Sort 0, 2, 4, 5, 8, 9, 10, 11, 12, 15, 19

Shell Sort • In the Sample, I arbitrarily chose sub array sizes of 1, 3 and 5. • In reality that increment isn’t ideal, but then it isn’t clear what increment is. • From empirical studies, a suggested increment follows this pattern; • h1 = 1 • hi+1 = 3hi+1 • until ht+2≥n • What sequence will this give?

Shell Sort • Shell Sort is shown to be better than O(n2), but how much depends on the increments. • With Shell Sort different Elementary Sorting Algorithms can affect the efficiency of the ShellSort, but its not agreed on how. • A favourite is surely the “Cocktail Shaker Sort”, which uses iterations of the bubble sort, finishing off with a insertion sort – as the cherry is inserted into the cocktail!

Heap Sort • To investigate how Heap Sort works, we need to investigate a Heap further. • A heap is like a binary tree, with 2 different properties; • First the value of each node is not less than the values in its children (i.e. the highest value is the root). • Second the tree is perfectly balanced.

Heap Sort • Heap Sort makes use of a heap, and is a variation of Selection Sort. • The weakness of selection Sort was the number of comparisons required – because of the improved searchability of a tree, heap sort reduces the comparisons made during selection sort. • First a heap is created, then the root is moved to the end of the dataset, before a new heap is created. • Therefore the complexity of the algorithm mirrors the complexity of creating a heap.

Heap Sample Remove root node, and make new heap from 2 heaps 15 12 6 11 10 2 3 1 8

Heap Sort • Rather than moving the root to a new array, heapsort uses the same memory space and swaps the root value with the last leaf node (8). • 8 is then be moved down the tree, swapping with the larger child until it reaches its new base (i.e. swap with 12 and then 11). • The new root can then be swapped with the next lowest node (1). • And the process repeated, which in the worst case is O(n lg n).

QuickSort • Quicksort uses a recursive approach, which is similar to the approach we used when guessing a number between 1 and 10. • Quicksort shares with Shellsort the ideal of breaking the problem down into subproblems that are easier to solve. • Quicksort divides the values into 2 subarrays – those lower than value ‘x’ and those higher than value ‘x’. • The 2 subarrays can then be sorted – guess what… using QuickSort.

QuickSort • The problem remains which value to chose as the pivot or bound value? • Obviously if we choose a value too near to either end of the dataset we will have an unbalanced split. • We could arbitrarily choose the first value, but some arrays are already partly sorted, therefore we could arbitrarily choose the middle value.

Quick Sort {10, 15, 2, 9, 12, 4, 19, 5, 8, 0, 11} {2, 0} 4 {10, 15, 9, 12, 19, 5, 8, 11} {0, 2} 4 {10, 9, 5, 8, 11} 12 {15, 19} 0, 2, 4, {}5{10, 9, 8, 11} 12 {15,19} 0, 2, 4, 5 {8}9{10, 11} 12, 15, 19 1, 2, 4, 5, 8, 9, 10, 11, 12, 15, 19

QuickSort Pseudocode Quicksort(array[]) if length(array)>1 choose bound; while there are elements in array include element in either subarray1 or subarray2; quicksort(subarray1); quicksort(subarray2);

QuickSort • Best Case – when the array divides into even size parts; n + 2n/2 + 4n/4 + 8n/8 + … …+nn/n = O(n lg n) • Worst Case – when one array is always size 1; • O(n2) • Average case is in between. Ideally if we can avoid the worst case, but to do that we need to pick a good pivot value; • A better approach is to choose the median of the first, last and middle values

MergeSort • The problem with QuickSort is still picking a good pivot value. • MergeSort combats this by concentrating on merging two already sorted subarrays. • The merge process can be quite straight forward assuming the two subarrays are already sorted. • We can sort the subarrays recursively using MergeSort!

Merge Sort Mergesort(data) If data has at least 2 elements Mergesort(left half); Mergesort(right half); merge(both halves);

merge merge(array1, array2, array3) int i1 = 0, i2 = 0, i3 = 0; while both array2 and array 3 have elements if array2[i2] < array3[i3] array1[i1++] = array2[i2++]; else array1[i1++] = array3[i3++]; add the remains of either array2 or array3 into array1;

MergeSort {10, 15, 2, 9, 12, 4, 19, 5, 8, 0, 11} {10, 15, 2, 9, 12, 4} {19, 5, 8, 0, 11} {10, 15, 2}{9, 12, 4} {19, 5, 8}{0, 11} {10,15}{2}{9,12}{4} {19,5}{8}{0,11} {2, 10, 15}{4, 9, 12} {5, 8, 19}{0, 11} {2, 4, 9, 10, 12, 15} {0, 5, 8, 11, 19} {0, 2, 4, 5, 8, 9, 10, 11, 12, 15, 19}

MergeSort • The MergeSort algorithm is O(n lg n), but has a major drawback due to the space required to store multiple arrays (in the run time stack). • An iterative version while visually more complex is a little more efficient space wise. • Also using a linked list rather than an array can counter the space issue.

Radix Sorting • Radix Sorting mirrors a real life approach to sorting. • Given a selection of books, we might first make a pile of all the ‘A’ author books, all the ‘B’ author books etc. Then we would sort each pile. • Or, if given a series of 10 digit numbers, we might first sort according to the first digit, and then according to the second digit etc.

Comparing approaches • Take a look at the table on page 511 to see how the different sorting algorithms compare.

Data Structures and Algorithms