310 likes | 336 Views
Explore analysis methods in algorithm design, discussing trade-offs, performance vocabulary, and mathematical tools like Big-Oh notation. Compare and reason about algorithm efficiency for different implementations.
E N D
Tradeoffs, intuitionanalysis, understanding big-Ohaka O-notation Owen Astrachan ola@cs.duke.edu http://www.cs.duke.edu/~ola
Analysis • Vocabulary to discuss performance and to reason about alternative algorithms and implementations • It’s faster! It’s more elegant! It’s safer! It’s cooler! • Use mathematics to analyze the algorithm, • Implementation is another matter • cache, compiler optimizations, OS, memory,…
What do we need? • Need empirical tests and mathematical tools • Compare by running • 30 seconds vs. 3 seconds, • 5 hours vs. 2 minutes • Two weeks to implement code • We need a vocabulary to discuss tradeoffs
Analyzing Algorithms • Three solutions to online problem sort1 • Sort strings, scan looking for runs • Insert into Set, count each unique string • Use map of (String,Integer) to process • We want to discuss trade-offs of these solutions • Ease to develop, debug, verify • Runtime efficiency • Vocabulary for discussion
What is big-Oh about? • Intuition: avoid details when they don’t matter, and they don’t matter when input size (N) is big enough • For polynomials, use only leading term, ignore coefficients: linear, quadratic y = 3x y = 6x-2 y = 15x + 44 y = x2 y = x2-6x+9 y = 3x2+4x
O-notation, family of functions • first family is O(n), the second is O(n2) • Intuition: family of curves, same shape • More formally: O(f(n)) is an upper-bound, when n is large enough the expression cf(n) is larger • Intuition: linear function: double input, double time, quadratic function: double input, quadruple the time
Reasoning about algorithms • We have an O(n) algorithm, • For 5,000 elements takes 3.2 seconds • For 10,000 elements takes 6.4 seconds • For 15,000 elements takes ….? • We have an O(n2) algorithm • For 5,000 elements takes 2.4 seconds • For 10,000 elements takes 9.6 seconds • For 15,000 elements takes …?
More on O-notation, big-Oh • Big-Oh hides/obscures some empirical analysis, but is good for general description of algorithm • Compare algorithms in the limit • 20N hours v. N2 microseconds: • which is better?
cf(N) g(N) x = n More formal definition • O-notation is an upper-bound, this means that N is O(N), but it is also O(N2); we try to provide tight bounds. Formally: • A function g(N) is O(f(N)) if there exist constants c and n such that g(N) < cf(N) for all N > n
Big-Oh calculations from code • Search for element in array: • What is complexity (using O-notation)? • If array doubles, what happens to time? for(int k=0; k < a.length; k++) { if (a[k].equals(target)) return true; } return false; • Best case? Average case? Worst case?
Measures of complexity • Worst case • Good upper-bound on behavior • Never get worse than this • Average case • What does average mean? • Averaged over all inputs? Assuming uniformly distributed random data?
Some helpful mathematics • 1 + 2 + 3 + 4 + … + N • N(N+1)/2 = N2/2 + N/2 is O(N2) • N + N + N + …. + N (total of N times) • N*N = N2 which is O(N2) • 1 + 2 + 4 + … + 2N • 2N+1 – 1 = 2 x 2N – 1 which is O(2N )
Multiplying and adding big-Oh • Suppose we do a linear search then we do another one • What is the complexity? • If we do 100 linear searches? • If we do n searches on an array of size n?
Multiplying and adding • Binary search followed by linear search? • What are big-Oh complexities? Sum? • 50 binary searches? N searches? • What is the number of elements in the list (1,2,2,3,3,3)? • What about (1,2,2, …, n,n,…,n)? • How can we reason about this?
Reasoning about big-O • Given an n-list: (1,2,2,3,3,3, …n,n,…n) • If we remove all n’s, how many left? • If we remove all larger than n/2?
Reasoning about big-O • Given an n-list: (1,2,2,3,3,3, …n,n,…n) • If we remove every other element? • If we remove all larger than n/1000? • Remove all larger than square root n?
Money matters while (n != 0) { count++; n = n/2; } • Penny on a statement each time executed • What’s the cost?
Money matters void stuff(int n){ for(int k=0; k < n; k++) System.out.println(k); } while (n != 0) { stuff(n); n = n/2; } • Is a penny always the right cost/statement? • What’s the cost?
Find k-th largest • Given an array of values, find k-th largest • 0th largest is the smallest • How can I do this? • Do it the easy way… • Do it the fast way …
Easy way, complexity? public int find(int[] list, int index) { Arrays.sort(list); return list[index]; }
Fast way, complexity? public int find(int[] list, int index) { return findHelper(list,index, 0, list.length-1); }
Fast way, complexity? private int findHelper(int[] list, int index, int first, int last) { int lastIndex = first; int pivot = list[first]; for(int k=first+1; k <= last; k++){ if (list[k] <= pivot){ lastIndex++; swap(list,lastIndex,k); } } swap(list,lastIndex,first);
Continued… if (lastIndex == index) return list[lastIndex]; else if (index < lastIndex ) return findHelper(list,index, first,lastIndex-1); else return findHelper(list,index, lastIndex+1,last);
Recurrences int length(ListNode list) { if (0 == list) return 0; else return 1+length(list.getNext()); } • What is complexity? justification? • T(n) = time of length for an n-node list T(n) = T(n-1) + 1 T(0) = O(1)
Recognizing Recurrences • T must be explicitly identified • n is measure of size of input/parameter • T(n) time to run on an array of size n T(n) = T(n/2) + O(1) binary search O( log n ) T(n) = T(n-1) + O(1) sequential search O( n )
Recurrences T(n) = 2T(n/2) + O(1) tree traversal O( n ) T(n) = 2T(n/2) + O(n) quicksort O( n log n) T(n) = T(n-1) + O(n) selection sort O( n2 )
Compute bn, b is BigInteger • What is complexity using BigInteger.pow? • What method does it use? • How do we analyze it? • Can we do exponentiation ourselves? • Is there a reason to do so? • What techniques will we use?
Correctness and Complexity? public BigInteger pow(BigInteger b, int expo) { if (expo == 0) { return BigInteger.ONE; } BigInteger half = pow(b,expo/2); half = half.multiply(half); if (expo % 2 == 0) return half; else return half.multiply(b); }
Correctness and Complexity? public BigInteger pow(BigInteger b, int expo) { BigInteger result =BigInteger.ONE; while (expo != 0){ if (expo % 2 != 0){ result = result.multiply(b); } expo = expo/2; b = b.multiply(b); } return result; }
Solve and Analyze • Given N words (e.g., from a file) • What are 20 most frequently occurring? • What are k most frequently occurring? • Proposals? • Tradeoffs in efficiency • Tradeoffs in implementation