290 likes | 391 Views
Arrays 2: Sorting and Searching. 0. Admin. 1) No class Thursday. 2) Will cover Strings next Tuesday. 3) Take in report. 4) Hand out program assignment 6, due 11/20, based on material covered today. 1. Review of array basics. What is an array?
E N D
Arrays 2: Sorting and Searching
0. Admin. • 1) No class Thursday. • 2) Will cover Strings next Tuesday. • 3) Take in report. • 4) Hand out program assignment 6, due 11/20, based on material covered today.
1. Review of array basics. • What is an array? • A contiguous series of homogeneous elements. • What does contiguous mean? • Pros and cons? • What does homogeneous mean? • Pros and cons?
The bounds of an array. • C# defaults to 0-based arrays. If we code float [] Amounts = new float [100]; • we get 100 consecutive floats from Amounts[0] to Amounts[99]. It is important to remember this so we avoid a “bounds error”. • Suppose we code Amounts[I] = 83000; • What if I = -2 or 100?
Review of the list concept. • The list is the occupied portion of the array. • The array is the physical container. It may be empty, full, or partially full. • E.g. Amounts, which can contain 100 elements, may only be occupied from Amounts[0] to Amounts[56]. • The list consists of only the valid data values. This does not include “garbage” that happens to be in memory beyond the list.
Why does the list / array distinction matter? • We should only process the valid data (list). • Otherwise we may e.g. include garbage values in the Total of all entries, update both valid and garbage data, sort garbage into valid data and search in the garbage! • This is computer science, not the FBI!
List processing. • While it is proper to talk of loading an array, thereafter it is helpful to speak of processing the list: • We should ensure that our algorithm only processes the valid data values, e.g. from Amounts[0] to Amounts[56].
2. Sorting lists. • To sort information means to arrange it in some order. • It may be a single item, arranged in ascending or descending order e.g. opening weekend profits for movies from best to worst. • Usually it is a collection of data (a record) sorted in order of a key field.
Why sort lists? • 1) Shows highest / lowest values at top / bottom. • 2) May discover important groupings e.g. a lot of people called Patel, Smith or Slarteebartfast. • 3) Easier to search. • Why is a dictionary easier to search than a random listing of words and their definitions?
The Bubble Sort. • Idea: • Repeatedly pass through a list, comparing adjacent elements. • If the elements are out of order, swap. • Large values “bubble”, one swap at a time, to the end of the list.
Pass 1: E B B B B E E E F C C C C F D D D D F A A A A F Pass 2: B B B C C C E D D D E A A A E F F F Example.
Pass 3: Pass 4: B B C A A C D D E E F F Pass 5: A B C D E F Example.
Why is it called the “bubble” sort? • After the first pass, it is guaranteed that the largest element (F) is sorted. • After the second pass, it is guaranteed that the second largest element (E) is sorted. • So, if we aren’t lucky, the maximum number of passes for N elements is….
Why is it called the “bubble” sort? • N-1. • Why? • If all the elements larger than the smallest are correctly placed, the smallest element must already be correctly placed. • What about the maximum number of comparisons within one pass, comparing each adjacent pair?
Why is it called the “bubble” sort? • Also N-1. Why? • So in worse case, if there are N elements, takes N-1 * N-1 or O (N2) comparisons and swaps to sort N elements. • However, the algorithm can be optimized. • (1) If the list is sorted in fewer than N-1 passes, how could we know?
Optimization. • Use a boolean Swap flag: set to false before each pass; only set to true if a swap occurs. • So, if it stays false, there were no swaps, so the list must be sorted already. • (2) Do we always need to compare every adjacent pair? • No, because some parts are already sorted.
Optimization. • Can continue to do comparisons up until the last element that was swapped, e.g. • B B • C C • A A • E D } Last • D E } Change • F F
See BUBSORT.CS • This code shows an optimized Bubble Sort (which is a bit like saying a Turbo tortoise). • Note how arrays are passed as parameters. • Note how the Load function keeps track of the size of the list. • Look at the Bubble Sort logic.
Logic of Bubble Sort. • Note the use of a nested loop. • The outer loop controls the passes, running until the first pass with no swaps. • The inner loop controls the comparisons within each pass, running until the last change made on the previous pass. • Note the Swap logic. Why are Temporary variables needed?
Swapping. • Suppose we want to swap E and B. • We can’t say move E to B, then B to E, because the first move destroys B. • So? • 1) Move E to Temp. • 2) Move B to E. • 3) Move Temp to B.
Limitation of the Bubble Sort. • The Bubble Sort is slow. • But why? • Can only swap adjacent elements. • Consider the worst case scenario: • Index Value • [0] 999 • . . • [999] 0
What is the problem? • It will take 999 comparisons and swaps to place the Value 999 at the correct location [999]. Shuffling is very slow. • What would help solve this problem?
The Shell Sort. • Donald Shell suggested that we compare elements separated by a larger gap. • Start with a gap half the size of the list. Keep passing through until there is a pass with no swaps, then halve the gap. • Why does this help? • See diagram and Shell sort code.
Efficiency. • Studies find no significant difference between the Shell Sort and Bubble Sort for small lists e.g. <= 10 elements. • But for 100 / 1000 / 10000, increasing advantage to use Shell Sort. • The Shell Sort is estimated to be an • O (N1.25) algorithm.
Searching Lists. • If a list is unsorted, we are forced to do a serial or linear search: • I = 0; • Found = false; • while (!Found && I < ListSize) { • if (SearchElement != List[I]) • I++; • else • Found = true; • } // end while
But this can be tedious… • A serial search is O (N), like loading an array. • Only Forrest Gump would search a dictionary this way! • Why is this silly? • How would we search a dictionary?
The Binary Search (for sorted lists). • Use 3 indexes Start, Stop and Middle. • Compare SearchElement to Element[Middle] • if (SearchElement < List[Middle]) • Stop = Middle -1; • else • if (SearchElement > List[Middle] ) • Start = Middle + 1 • else • Found = true;
More technically: • Start = 0; • Stop = ListLength-1; • Found = false; • while ((Start <= Stop) && (!Found)) { • Middle = (Start + Stop) / 2; • if (SearchElement < List[Middle]) • Stop = Middle -1; • else if (SearchElement > List[Middle]) • Start = Middle + 1; • else { • Found = true; • Location = Middle } • } // end while
Example: • A B C D E F G • [0] [1] [2] [3] [4] [5] [6] • Start Middle Stop • E.g. Find ‘B’ • E.g. Find ‘E’.