270 likes | 404 Views
Algorithm analysis: Problem Definition. What is the task to be accomplished? Calculate the average grade for a given student Find the nth fibonacci number What are the time / space / speed / performance requirements ?. Algorithm Analysis.
E N D
Algorithm analysis: Problem Definition • What is the task to be accomplished? • Calculate the average grade for a given student • Find the nth fibonacci number • What are the time / space / speed / performance requirements ?
Algorithm Analysis • Algorithm: Finite set of instructions that, if followed, accomplishes a particular task. Algorithm Analysis: • Space complexity • How much space is required • Time complexity • How much time does it take to run the algorithm • Often, we deal with estimates!
Space Complexity • Space complexity = The amount of memory required by an algorithm to run to completion • Core dumps = the most often encountered cause is memory leaks – the amount of memory required is larger than the memory available on a given system
Space Complexity (cont’d) • Fixed part: The size required to store certain data/variables, that is independent of the size of the problem: - e.g. name of the data collection - same size for classifying 2GB or 1MB of texts • Variable part: Space needed by variables, whose size is dependent on the size of the problem: - e.g. actual text • load 2GB of text VS. load 1MB of text • Or amount of info being held while we run through recursive cycle (e.g., fibonacci sequence)
Space Complexity (cont’d) • S(P) = c + S(instance characteristics) • c = constant • Example: void float sum (float* a, int n) { float s = 0; for(int i = 0; i<n; i++) { s+ = a[i]; } return s; } Space? one for n, one for a [passed by reference!], one for s, one for i constant space! (4)
Time Complexity • Often more important than space complexity • More and more space available • time is still a problem • researchers estimate that the computation of various transformations for 1 single DNA chain for one single protein on a1 TerraHZ computer would take about 1 year to run to completion • Algorithms running time is an important issue
Calculate averages • Problem: prefix averages • Given an array orig, compute the array avg such that avg[i] is the average of elements orig[0] … orig[i], for i=0..n-1 • Sol 1 • At each step i, compute the element avg[i] by traversing the array orig and determining the sum of its elements, then the average • Sol 2 • At each step i update a sum of the elements in the array orig and store in avg[i]. • Compute the element avg[i] as sum/I Which is better?
Running time • We can have worst case, average case, or best case. • Suppose the program includes an if-then statement that may execute or not: variable running time • Typically algorithms are measured by their worst case (big-Oh) • We also look at average case.
Calculate running time? • Write the program, run the program with data sets of varying size, and look at the actual running time average. • Problems: • We must implement the algorithm. • We can only test a limited set of inputs- they may not be indicative of the running time for all inputs. • The same hardware and software should be used in order to compare two algorithms. – condition very hard to achieve!
Instead, Use a Theoretical Approach • Evaluates the algorithms independent of the hardware and software • Determine the running time of an algorithm using generalities
Algorithm Analysis • Analyze in terms of Primitive Operations: • e.g., • An addition = 1 operation • Assignment = 1 operation • Calling a method or returning from a method = 1 operation • Index in an array = 1 operation • Comparison = 1 operation • Analysis: count the number of primitive operations executed by the algorithm
Find the maximum element of an array. 1. int findMax(int *A, int n) { 2. int currentMax = A[0] 3. for (int i= 1 ; i < n; i++) 4. if (currentMax < A[i] ) 5.currentMax = A[i]; 6. returncurrentMax; 7. } How many operations ? Declaration: no time Line 2: 2 count Line 6: 1 count Lines 4 and 5: 4 counts * the number of times the loop is iterated. Line 3: 1 + n + n-1 (because loop is iterated n – 1 times). Total: 2 + 1 +n + (n-1) + 4*(n-1) + 1= 6n - 1
Big Oh • We want to say in a formal way 3n2 ≈ n2 • “Big-Oh” Notation: • given functions f(n) and g(n), we say that f(n) is O(g(n)) iff there are positive constants c and n0 such that f(n)≤c g(n) for n ≥ n0 • This is a fancy way of saying that for our function f(n), there’s a function g(n) (and possibly a constant) that will always return a value bigger than f(n). • We want to simplify when picking this function g(n).
Graphic Illustration • E.g., f(n) = 2n+6 • Need to find a function g(n) and a const. c such as f(n) < cg(n) • g(n) = n and c = 4 • f(n) is O(n) • The order of f(n) is n c g n ( n ) 4 g n ( n ) n
More examples • f(n) = 4n2? Is it O(n)? • Can we find a c such that 4n2 < cn for any n > n0. No! • 50n3 + 20n + 4 is O(n3) • Would be correct to say is O(n3+n) • Not useful, as n3 exceeds by far n, for large values • Would be correct to say is O(n5) • OK, but g(n) should be as closed as possible to f(n) • 3log n = O( ? )
Algorithm Analysis • We simplify the analysis bygetting rid of unneeded information • “rounding” 39999≈40000 • We drop constants when expressing big Oh. • E.g., if we have a program that runs in 3n +2 time, we’d say that the function runs in O(n). • We drop lower order terms when expressing big Oh • E.g., if we have a function that runs in polynomial time (4n4 + 300n3 +7n + 2), we can say that it runs in O(n4). • Why? Because after a certain point the lower order is subsumed by the higher order. • e.g., if n is 500 in the above example, then 4n4 could be changed to 5n4 and it would definitely be greater than 300 n3. So we know that 5 g(n) is greater than 4n4 + 300n3 if g(n) is n4. • Hence we get O(n4) for this polynomial.
General Rules • For loops: running time is at most the running time of the statements inside the for loop times the number of iterations • Example: for (i = 0; i < n; i++) k++; • Running time? • Nested loops: The running time of the statements inside the loop multiplied by the product of the sizes of all the loops • Example: for (i = 0; i < n; i++) for (j = 0; j < n; j++) k++; • Running time?
General Rules • Consecutive statements: add running times • Example: for (x = 0; x < n; x++) a[x] = 0; for (y = 0; y < n; y++) for (z = 0; z < n; z++) a[y] += a[z] + y + z; • Running time? • If/Else statements: the running time of an if/else statement is never more than the running time of the largest running time of the possible conditions • Example: if (y > o) for (i = o; i<n; i++) cout << “array[i] is” << a[i] << endl; else { for (i = 0; i < n; i++) for (j = 0; j < n; j++) a[i] = i*j; } • Running time?
Analysis - terminology • Special classes of algorithms (in order of size) • logarithmic: O(log n) • A logarithm is an exponent. It is the expontent to which the base must be raised to produce a given number. • For example, since 23 = 8, then 3 is called the logarithm of 8 with base 2. • 3 = log28 • 3 is the exponent to which 2 must be raised to produce 8. • We write the base as a subscript. linear: O(n) quadratic: O(n2) polynomial: O(nk), k ≥ 1 (3n4 + 4n3 +8n2 + 6n +1) exponential: O(an), n > 1
ExampleRemember the algorithms for computing prefix averages?- compute an array avg starting with an array orig- avg[i] is the average of all elements orig[j] with j < i Solution2: int avg[n]; int s= 0; for (i=0; i < n; i++) { s = s + orig[i] ; avg[i] s/(i+ 1); } return avg; • Solution 1 • int avg[n]; • for (i=0; i < n; i++) • { • int a = 0; • for (int j = 0; j < i; j++) • a = a + orig[j] ; • avg[i] =a/(i+ 1); • } • return avg;
Maximum Subsequent Sum problem • Given an array of random integers (including negative integers), find the string within the array that has the maximum sum. • For instance, if we have: array[10]={-7, 2,-8, 4, 9,-4, 2, 3,-6, 3} the maximum sum will be 14 (from substring a[3] to a[7]
Algorithm 1: • int maxSubSum1(const vector<int> & a){int maxSum = 0;for (int i = 0; i < a.size(); i++) for (int j = i; j < a.size(); j++) { int thisSum = 0; for (int k=i;i<=j;k++) thisSum += a[k]; • if (thisSum > maxSum maxSum = thisSum; • }return maxSum; • }
Algorithm 2: • int maxSubSum2(const vector<int> & a){int maxSum = 0;for (int i = 0; i < a.size(); i++) int thisSum = 0; for (int j = i; j < a.size(); j++) { thisSum += a[j]; • if (thisSum > maxSum maxSum = thisSum; • }}return maxSum; • }
Search • Given an integer X and integers A0, A1, … An-1, which are presorted and already in memory, find i such that Ai = X, or return i = -1 if X is not in the input. • Write the algorithm that does this. What is the big Oh of this algorithm?
Another Example: • Raising an integer to a power: xn • Could write: int num = 1; for (int i=0;i<n;i++) num = x * num;What time does this run in? • Or:long pow (long x, int n){ if (n == 0); return 1; if (n == 1); return x; if ( isEven (n)) return pow (x * x, n/2); else return pow (x * x, n/2) * x;}
“Relatives” of Big-Oh • “Relatives” of the Big-Oh • (f(n)): Big Omega– asymptotic lower bound • (f(n)): Big Theta – asymptotic tight bound • Big-Omega – think of it as the inverse of O(n) • g(n) is (f(n)) if f(n) is O(g(n)) • Big-Theta– combine both Big-Oh and Big-Omega • f(n) is (g(n)) if f(n) is O(g(n)) and g(n) is (f(n)) • Little-oh – f(n) is o(g(n)) if for any c>0 there is n0 such that f(n) < c(g(n)) for n > n0.