460 likes | 601 Views
Stuff. No stuff…. Last Time. Introduced The Analysis of Complexity. Today. Continue: The Analysis of Complexity. “Big O” Notation - Cont. Common big O notations: Constant O(1) Logarithmic O(log(n)) Linear O(n) N log n O(n(log(n))) Quadratic O(n 2 ) Cubic O(n 3 ) Polynomial O(n x )
E N D
Stuff • No stuff… CISC121 - Prof. McLeod
Last Time • Introduced The Analysis of Complexity CISC121 - Prof. McLeod
Today • Continue: The Analysis of Complexity CISC121 - Prof. McLeod
“Big O” Notation - Cont. • Common big O notations: • Constant O(1) • Logarithmic O(log(n)) • Linear O(n) • N log n O(n(log(n))) • Quadratic O(n2) • Cubic O(n3) • Polynomial O(nx) • Exponential O(2n) CISC121 - Prof. McLeod
“Big O” Notation - Cont. • On the next slide: how these functions vary with n. Assume a rate of 1 instruction per µsec (micro-second): CISC121 - Prof. McLeod
“Big O” Notation - Cont. CISC121 - Prof. McLeod
“Big O” Notation - Cont. • So, algorithms of O(n3) or O(2n) are not practical for any more than a few iterations. • Quadratic, O(n2) (common for nested “for” loops!) gets pretty ugly for n > 1000 (Bubble sort for example…). • To figure out the big O notation for more complicated algorithms, it is necessary to understand some mathematical concepts. CISC121 - Prof. McLeod
Some Math… • Logarithms and exponents - a quick review: • By definition: logb(a) = c, when a = bc CISC121 - Prof. McLeod
More Math… • Some rules when using logarithms and exponents - for a, b, c positive real numbers: • logb(ac) = logb(a) + logb(c) • logb(a/c) = logb(a) - logb(c) • logb(ac) = c(logb(a)) • logb(a) = (logc(a))/logc(b) • blogc(a) = alogc(b) • (ba)c = bac • babc = ba + c • ba/bc = ba - c CISC121 - Prof. McLeod
More Math… • Summations: • If a does not depend on i: CISC121 - Prof. McLeod
More Math… • Also: CISC121 - Prof. McLeod
Still More Math… • The Geometric Sum, for any real number, a > 0 and a 1. • Also for 0 < a < 1: CISC121 - Prof. McLeod
Still More Math… • Finally, for a > 0, a 1, and n 2: • Lots of other summation formulae exist, (Mathematicians just love to collect them!) but these are the ones commonly found in proofs of big O notations. CISC121 - Prof. McLeod
Properties of Big O Notation • If d(n) is O(f(n)), then ad(n) is still O(f(n)), for any constant, a > 0. • If d(n) is O(f(n)) and e(n) is O(g(n)), then d(n) + e(n) is O(f(n) + g(n)). • If d(n) is O(f(n)) and e(n) is O(g(n)), then d(n)e(n) is O(f(n)g(n)). • If d(n) is O(f(n)) and f(n) is O(g(n)), then d(n) is O(g(n)). CISC121 - Prof. McLeod
Properties of Big O Notation – Cont. • If f(n) is a polynomial of degree d (ie. f(n) = a0 + a1n + a2n2 + a3n3 + … + adnd, then f(n) is O(nd). • loga(n) is O(logb(n)) for any a, b > 0. That is to say loga(n) is O(log2(n)) for any a > 1. CISC121 - Prof. McLeod
“Big O” Notation - Cont. • While correct, it is considered “bad form” to include constants and lower order terms when presenting a Big O notation unless the other terms cannot be simplified and are close in magnitude. • For most comparisons between algorithms, the simplest representation of the notation is sufficient. • When simplifying a complex function always look for the obviously dominant term (for large n) and toss all the others. CISC121 - Prof. McLeod
Application of Big O to Loops • For example, a single “for” loop: sum = 0; for (i = 0; i < n; i++) sum += a[i]; • 2 operations before the loop, 6 operations inside the loop. Total is 2 + 6n. This gives the loop O(n). CISC121 - Prof. McLeod
Application of Big O to Loops - Cont. • Nested “for” loops, where both loop until n: for (i = 0; i < n; i++) { sum[i] = 0; for (j = 0; j < n; j++) sum[i] += a[i, j]; } • Outer loop: 1+5n+n(inner loop). Inner loop = 1+7n. Total = 1+6n+7n2, so algorithm is O(n2). CISC121 - Prof. McLeod
Application of Big O to Loops - Cont. • Nested “for” loops where the inner loop goes less than n times (used in simple sorts): for (i = 0; i < n; i++) { sum[i] = 0; for (j = 0; j < i; j++) sum[i] += a[i, j]; } • Outer loop: 1 + 6n + (7i, for i = 1, 2, 3, …, n-1). CISC121 - Prof. McLeod
Application of Big O to Loops - Cont. • Or: • Which is O(n) + O(n2), which is finally just O(n2). CISC121 - Prof. McLeod
Application of Big O to Loops - Cont. • Nested for loops are usually O(n2), but not always! for (i = 3; i < n; i++) { sum[i] = 0; for (j = i-3; j <= i; j++) sum[i] += a[i, j]; } • Inner loop runs 4 times for every value of i. So, this is actually O(n), linear, not quadratic. CISC121 - Prof. McLeod
Big O and Other Algorithms • For the rest of this course, as we develop algorithms for: • Searching • Sorting • Recursion • Etc. we will analyze the algorithm in terms of its “Big O” notation. • We’ll find out what’s good and what’s bad! CISC121 - Prof. McLeod
Sorting – Overview • Sorting algorithms can be compared on the basis of: • The number of comparisons for a dataset of size n, • The number of data movements (“swaps”) necessary, • How these measures depend on the state of the data (random, mostly in order, reverse order), and • How these measures change with n (“Big O” notation). CISC121 - Prof. McLeod
Simple Sorting Algorithms – Insertion Sort • Probably the most “instinctive” kind of sort: Find the location for an element and move all others up one, and insert the element. • Pseudocode: • Loop through array from i=1 to array.length-1, selecting element at position i = temp. • Locate position for temp (position j, where j <= i), and move all elements above j up one location • Put temp at position j. CISC121 - Prof. McLeod
Simple Sorting Algorithms – Insertion Sort – Cont. public static void insertionSort (int[] A) { int temp; int i, j; for (i=1; i < A.length; i++) { temp = A[i]; for (j=i; j>0 && temp < A[j-1]; j--) A[j] = A[j-1]; A[j] = temp; } // end for } // end insertionSort CISC121 - Prof. McLeod
Complexity of Insertion Sort • “A.length” is the same as “n”. • Outer loop is going to go n times, regardless of the state of the data. • How about the inner loop? • Best case (data in order) is O(n). • Worst and average case is O(n2). CISC121 - Prof. McLeod
Simple Sorting Algorithms – Selection Sort • This one works by selecting the smallest element and then putting it in its proper location. • Pseudocode: • Loop through the array from i=0 to one element short of the end of the array. • Select the smallest element in the array range from i plus one to the end of the array. • Swap this value with the value at position i. CISC121 - Prof. McLeod
Simple Sorting Algorithms – Selection Sort • First, a “swap” method that will be used by this and other sorts: public static void swap(int[] A, int pos1, int pos2) { int temp = A[pos1]; A[pos1] = A[pos2]; A[pos2] = temp; } // end swap CISC121 - Prof. McLeod
Simple Sorting Algorithms – Selection Sort public static void selectionSort(int[] A) { int i, j, least; for (i = 0; i < A.length-1; i++) { least = i; for (j = i+1; j < A.length; j++) if (A[j] < A[least]) least = j; if (i != least) swap(A, least, i); } // end for } // end selectionSort CISC121 - Prof. McLeod
Complexity of Selection Sort • Outer loop goes n times, just like Insertion sort. • Does the number of executions of the inner loop depend on the state of the data? • (See slides 19 and 20) The complexity for comparisons is O(n2), regardless of the state of the data. • The complexity of swaps is O(n), since the swap only takes place in the outer loop. CISC121 - Prof. McLeod
Simple Sorting Algorithms – Bubble Sort • Is best envisioned as a vertical column of numbers as bubbles. The larger bubbles gradually work their way to the top of the column, with the smaller ones pushed down to the bottom. • Pseudocode: • Loop through array from i=0 to length of array. • Loop down from the last element in the array to i. • Swap adjacent elements if they are in the wrong order. CISC121 - Prof. McLeod
Simple Sorting Algorithms – Bubble Sort – Cont. public static void bubbleSort( int[] A) { int i, j; for (i=0; i < A.length; i++) for (j = A.length-1; j > i; j--) if (A[j] < A[j-1]) swap(A, j, j-1); } // end bubbleSort CISC121 - Prof. McLeod
Complexity of Bubble Sort • Outer loop goes n times, as usual. • Both the comparison and swap are in the inner loop (yuk!) • Best case (data already in order): comparisons are still O(n2), but swaps will be O(n). • Bubble sort can be improved (slightly) by adding a flag to the inner loop. If you loop through the array without swapping, then the data is in order, and the method can be halted. CISC121 - Prof. McLeod
Measurements of Execution Speed • Refer to the Excel spreadsheet “SortintTimes.xls”: • Sorted between 1000 and 10,000 random int values. • Used System.currentTimeMillis() to measure execution time. • Measured times for completely random array (average case), almost sorted array (best case) and reverse order array (worst case). CISC121 - Prof. McLeod
Choosing Between the Simple Sorts • (Assuming you have less than about 10,000 data items…) • First: what is the state of the data? • Second: What is slower: comparisons or swaps? • Third: Never use bubble sort, anyways! • Since both selection and insertion are O(n2), you will need to look at the coefficient (c), calculated using sample data, to choose between them. CISC121 - Prof. McLeod
Sorting Animations • For a collection of animation links see: http://www.hig.no/~algmet/animate.html • Here are a couple that I liked: http://www.cs.pitt.edu/~kirk/cs1501/animations/Sort3.html http://cs.smith.edu/~thiebaut/java/sort/demo.html CISC121 - Prof. McLeod
Sequential Search • Sequential search pseudocode: • Loop through array starting at the first element until the value of target matches one of the array elements. • If a match is not found, return –1. CISC121 - Prof. McLeod
Sequential Search, Cont. • As code: public int seqSearch(int[] A, int target) { for (int i = 0; i < A.length; i++) if (A[i] == target) return i; return –1; } CISC121 - Prof. McLeod
Complexity of Sequential Search • Best case - first element is the match: O(1). • Worst case - element not found: O(n). • Average case - element in middle: O(n). CISC121 - Prof. McLeod
Binary Search • Binary search pseudocode. Only works on ordered sets: • Locate midpoint of array to search. b) Determine if target is in lower half or upper half of array. • If in lower half, make this half the array to search. • If in upper half, make this half the array to search. • Loop back to step a), unless the size of the array to search is one, and this element does not match, in which case return –1. CISC121 - Prof. McLeod
Binary Search – Cont. public int binSearch (int[] A, int key) { int lo = 0; int hi = A.length - 1; int mid = (lo + hi) / 2; while (lo <= hi) { if (key < A[mid]) hi = mid - 1; else if (A[mid] < key) lo = mid + 1; else return mid; mid = (lo + hi) / 2; } return -1; } CISC121 - Prof. McLeod
Complexity of Binary Search • For the best case, the element matches right at the middle of the array, and the loop only executes once. • For the worst case, key will not be found and the maximum number of loops will occur. • Note that the loop will execute until there is only one element left that does not match. • Each time through the loop the number of elements left is halved, giving the progression below for the number of elements: CISC121 - Prof. McLeod
Complexity of Binary Search, Cont. • Number of elements to be searched - progression: • The last comparison is for n/2m, when the number of elements is one (worst case). • So, n/2m = 1, or n = 2m. • Or, m = log2(n). • So, the algorithm is O(log(n)) for the worst case. CISC121 - Prof. McLeod
Binary Search - Cont. • Binary search at O(log(n)) is much better than sequential at O(n). • Major reason to sort datasets! CISC121 - Prof. McLeod
A Caution on the Use of “Big O” • Big O notation usually ignores the constant c. • The constant’s magnitude depends on external factors (hardware, OS, etc.) • For example, algorithm A has the number of operations = 108n, and algorithm B has the form 10n2. • A is then O(n), and B is O(n2). • Based on this alone, A would be chosen over B. • But, if c is considered, B would be faster for n up to 107 - which is a very large dataset! • So, in this case, B would be preferred over A, in spite of the Big O notation. CISC121 - Prof. McLeod
A Caution on the Use of “Big O” – Cont. • For “mission critical” algorithm comparison, nothing beats experiments where actual times are measured. • When measuring times, keep everything else constant, just change the algorithm (same hardware, OS, type of data, etc.). CISC121 - Prof. McLeod