580 likes | 755 Views
Topic Number 8 Algorithm Analysis. " bit twiddling: 1. (pejorative) An exercise in tuning (see tune ) in which incredible amounts of time and effort go to produce little noticeable improvement, often with the result that the code becomes incomprehensible."
E N D
Topic Number 8Algorithm Analysis "bit twiddling: 1. (pejorative) An exercise in tuning (see tune) in which incredible amounts of time and effort go to produce little noticeable improvement, often with the result that the code becomes incomprehensible." - The Hackers Dictionary, version 4.4.7 Algorithm Analysis
Is This Algorithm Fast? • Problem: given a problem, how fast does this code solve that problem? • Could try to measure the time it takes, but that is subject to lots of errors • multitasking operating system • speed of computer • language solution is written in Algorithm Analysis
Attendance Question 1 • "My program finds all the primes between 2 and 1,000,000,000 in 1.37 seconds." • how good is this solution? • Good • Bad • It depends Algorithm Analysis
Grading Algorithms • What we need is some way to grade algorithms and their representation via computer programs for efficiency • both time and space efficiency are concerns • are examples simply deal with time, not space • The grades used to characterize the algorithm and code should be independent of platform, language, and compiler • We will look at Java examples as opposed to pseudocode algorithms Algorithm Analysis
Big O • The most common method and notation for discussing the execution time of algorithms is "Big O" • Big O is the asymptotic execution time of the algorithm • Big O is an upper bounds • It is a mathematical tool • Hide a lot of unimportant details by assigning a simple grade (function) to algorithms Algorithm Analysis
Typical Big O Functions – "Grades" Algorithm Analysis
Big O Functions • N is the size of the data set. • The functions do not include less dominant terms and do not include any coefficients. • 4N2 + 10N – 100 is not a valid F(N). • It would simply be O(N2) • It is possible to have two independent variables in the Big O function. • example O(M + log N) • M and N are sizes of two different, but interacting data sets Algorithm Analysis
Actual vs. Big O Simplified Time foralgorithm to complete Actual Amount of data Algorithm Analysis
Formal Definition of Big O • T(N) is O( F(N) ) if there are positive constants c and N0 such that T(N) < cF(N) when N > N0 • N is the size of the data set the algorithm works on • T(N) is a function that characterizes the actual running time of the algorithm • F(N) is a function that characterizes an upper bounds on T(N). It is a limit on the running time of the algorithm. (The typical Big functions table) • c and N0 are constants Algorithm Analysis
What it Means • T(N) is the actual growth rate of the algorithm • can be equated to the number of executable statements in a program or chunk of code • F(N) is the function that bounds the growth rate • may be upper or lower bound • T(N) may not necessarily equal F(N) • constants and lesser terms ignored because it is a bounding function Algorithm Analysis
Yuck • How do you apply the definition? • Hard to measure time without running programs and that is full of inaccuracies • Amount of time to complete should be directly proportional to the number of statements executed for a given amount of data • Count up statements in a program or method or algorithm as a function of the amount of data • This is one technique • Traditionally the amount of data is signified by the variable N Algorithm Analysis
Counting Statements in Code • So what constitutes a statement? • Can’t I rewrite code and get a different answer, that is a different number of statements? • Yes, but the beauty of Big O is, in the end you get the same answer • remember, it is a simplification Algorithm Analysis
Assumptions in For Counting Statements • Once found accessing the value of a primitive is constant time. This is one statement: x = y; //one statement • mathematical operations and comparisons in boolean expressions are all constant time. x = y * 5 + z % 3; // one statement • if statement constant time if test and maximum time for each alternative are constants if( iMySuit == DIAMONDS || iMySuit == HEARTS ) return RED; else return BLACK; // 2 statements (boolean expression + 1 return) Algorithm Analysis
Counting Statements in LoopsAttendenance Question 2 • Counting statements in loops often requires a bit of informal mathematical induction • What is output by the following code?int total = 0;for(int i = 0; i < 2; i++) total += 5;System.out.println( total ); A. 2 B. 5 C. 10 D. 15 E. 20 Algorithm Analysis
Attendances Question 3 • What is output by the following code?int total = 0;// assume limit is an int >= 0for(inti = 0; i < limit; i++) total += 5;System.out.println( total ); • 0 • limit • limit * 5 • limit * limit • limit5 Algorithm Analysis
Counting Statements in Nested LoopsAttendance Question 4 • What is output by the following code?int total = 0;for(inti = 0; i < 2; i++) for(int j = 0; j < 2; j++) total += 5;System.out.println( total ); • 0 • 10 • 20 • 30 • 40 Algorithm Analysis
Attendance Question 5 • What is output by the following code?int total = 0;// assume limit is an int >= 0for(inti = 0; i < limit; i++) for(int j = 0; j < limit; j++) total += 5;System.out.println( total ); • 5 • limit * limit • limit * limit * 5 • 0 • limit5 Algorithm Analysis
Loops That Work on a Data Set • The number of executions of the loop depends on the length of the array, values. • How many many statements are executed by the above method • N = values.length. What is T(N)? F(N)? public int total(int[] values){ int result = 0; for(int i = 0; i < values.length; i++) result += values[i]; return result; } Algorithm Analysis
Counting Up Statements • int result = 0; 1 time • int i = 0; 1 time • i < values.length; N + 1 times • i++ N times • result += values[i]; N times • return total; 1 time • T(N) = 3N + 4 • F(N) = N • Big O = O(N) Algorithm Analysis
Showing O(N) is Correct • Recall the formal definition of Big O • T(N) is O( F(N) ) if there are positive constants c and N0 such that T(N) < cF(N) when N > N0 • In our case given T(N) = 3N + 4, prove the method is O(N). • F(N) is N • We need to choose constants c and N0 • how about c = 4, N0 = 5 ? Algorithm Analysis
vertical axis: time for algorithm to complete. (approximate with number of executable statements) c * F(N), in this case, c = 4, c * F(N) = 4N T(N), actual function of time. In this case 3N + 4 F(N), approximate function of time. In this case N No = 5 horizontal axis: N, number of elements in data set Algorithm Analysis
Attendance Question 6 • Which of the following is true? • Method total is O(N) • Method total is O(N2) • Method total is O(N!) • Method total is O(NN) • All of the above are true Algorithm Analysis
Just Count Loops, Right? // assume mat is a 2d array of booleans // assume mat is square with N rows, // and N columns int numThings = 0; for(int r = row - 1; r <= row + 1; r++) for(int c = col - 1; c <= col + 1; c++) if( mat[r][c] ) numThings++; What is the order of the above code? O(1) B. O(N) C. O(N2) D. O(N3) E. O(N1/2) Algorithm Analysis
It is Not Just Counting Loops // Second example from previous slide could be // rewritten as follows: int numThings = 0; if( mat[r-1][c-1] ) numThings++; if( mat[r-1][c] ) numThings++; if( mat[r-1][c+1] ) numThings++; if( mat[r][c-1] ) numThings++; if( mat[r][c] ) numThings++; if( mat[r][c+1] ) numThings++; if( mat[r+1][c-1] ) numThings++; if( mat[r+1][c] ) numThings++; if( mat[r+1][c+1] ) numThings++; Algorithm Analysis
Sidetrack, the logarithm • Thanks to Dr. Math • 32 = 9 • likewise log3 9 = 2 • "The log to the base 3 of 9 is 2." • The way to think about log is: • "the log to the base x of y is the number you can raise x to to get y." • Say to yourself "The log is the exponent." (and say it over and over until you believe it.) • In CS we work with base 2 logs, a lot • log2 32 = ? log2 8 = ? log2 1024 = ? log10 1000 = ? Algorithm Analysis
When Do Logarithms Occur • Algorithms have a logarithmic term when they use a divide and conquer technique • the data set keeps getting divided by 2 public intfoo(int n){ // pre n > 0int total = 0; while( n > 0 ) { n = n / 2; total++; } return total;} • What is the order of the above code? • O(1) B. O(logN) C. O(N) D. O(Nlog N) E. O(N2) Algorithm Analysis
Dealing with other methods • What do I do about method calls? double sum = 0.0; for(int i = 0; i < n; i++) sum += Math.sqrt(i); • Long way • go to that method or constructor and count statements • Short way • substitute the simplified Big O function for that method. • if Math.sqrt is constant time, O(1), simply count sum += Math.sqrt(i); as one statement. Algorithm Analysis
Dealing With Other Methods public intfoo(int[] list){ int total = 0; for(inti = 0; i < list.length; i++){ total += countDups(list[i], list); } return total; } // method countDups is O(N) where N is the // length of the array it is passed What is the Big O of foo? • O(1) B. O(N) C. O(NlogN) D. O(N2) E. O(N!) Algorithm Analysis
Quantifiers on Big O • It is often useful to discuss different cases for an algorithm • Best Case: what is the best we can hope for? • least interesting • Average Case (a.k.a. expected running time): what usually happens with the algorithm? • Worst Case: what is the worst we can expect of the algorithm? • very interesting to compare this to the average case Algorithm Analysis
Best, Average, Worst Case • To Determine the best, average, and worst case Big O we must make assumptions about the data set • Best case -> what are the properties of the data set that will lead to the fewest number of executable statements (steps in the algorithm) • Worst case -> what are the properties of the data set that will lead to the largest number of executable statements • Average case -> Usually this means assuming the data is randomly distributed • or if I ran the algorithm a large number of times with different sets of data what would the average amount of work be for those runs? Algorithm Analysis
Another Example • T(N)? F(N)? Big O? Best case? Worst Case? Average Case? • If no other information, assume asking average case public double minimum(double[] values){ int n = values.length; double minValue = values[0]; for(int i = 1; i < n; i++) if(values[i] < minValue) minValue = values[i]; return minValue;} Algorithm Analysis
Independent Loops // from the Matrix class public void scale(int factor){ for(int r = 0; r < numRows(); r++) for(int c = 0; c < numCols(); c++) iCells[r][c] *= factor; } Assume an numRows() = N and numCols() = N. In other words, a square Matrix. What is the T(N)? What is the Big O? O(1) B. O(N) C. O(NlogN) D. O(N2) E. O(N!) Algorithm Analysis
Significant Improvement – Algorithm with Smaller Big O function • Problem: Given an array of ints replace any element equal to 0 with the maximum value to the right of that element. Given: [0, 9, 0, 8, 0, 0, 7, 1, -1, 0, 1, 0] Becomes: [9, 9, 8, 8, 7, 7, 7, 1, -1, 1, 1, 0] Algorithm Analysis
Replace Zeros – Typical Solution public void replace0s(int[] data){ int max; for(int i = 0; i < data.length -1; i++){ if( data[i] == 0 ){ max = 0; for(int j = i+1; j<data.length;j++) max = Math.max(max, data[j]); data[i] = max; } } } Assume most values are zeros. Example of a dependent loops. Algorithm Analysis
Replace Zeros – Alternate Solution public void replace0s(int[] data){ int max = Math.max(0, data[data.length – 1]); int start = data.length – 2; for(int i = start; i >= 0; i--){ if( data[i] == 0 ) data[i] = max; else max = Math.max(max, data[i]); } } Big O of this approach? O(1) B. O(N) C. O(NlogN) D. O(N2) E. O(N!) Algorithm Analysis
A Caveat • What is the Big O of this statement in Java? int[] list = new int[n]; • O(1) B. O(N) C. O(NlogN) D. O(N2) E. O(N!) • Why? Algorithm Analysis
Summing Executable Statements • If an algorithms execution time is N2 + N the it is said to have O(N2) execution time not O(N2 + N) • When adding algorithmic complexities the larger value dominates • formally a function f(N) dominates a function g(N) if there exists a constant value n0 such that for all values N > N0 it is the case that g(N) < f(N) Algorithm Analysis
Example of Dominance • Look at an extreme example. Assume the actual number as a function of the amount of data is: N2/10000 + 2Nlog10 N+ 100000 • Is it plausible to say the N2 term dominates even though it is divided by 10000 and that the algorithm is O(N2)? • What if we separate the equation into (N2/10000) and (2N log10 N + 100000) and graph the results. Algorithm Analysis
Summing Execution Times • For large values of N the N2 term dominates so the algorithm is O(N2) • When does it make sense to use a computer? red line is 2Nlog10 N + 100000 blue line is N2/10000 Algorithm Analysis
Comparing Grades • Assume we have a problem • Algorithm A solves the problem correctly and is O(N2) • Algorithm B solves the same problem correctly and is O(N log2N ) • Which algorithm is faster? • One of the assumptions of Big O is that the data set is large. • The "grades" should be accurate tools if this is true Algorithm Analysis
Running Times • Assume N = 100,000 and processor speed is 1,000,000,000 operations per second Algorithm Analysis
Theory to Practice ORDykstra says: "Pictures are for the Weak." Times in Seconds. Red indicates predicated value. Algorithm Analysis
Change between Data Points Value obtained by Timex / Timex-1 Algorithm Analysis
Okay, Pictures Algorithm Analysis
Put a Cap on Time Algorithm Analysis
No O(N^2) Data Algorithm Analysis
Just O(N) and O(NlogN) Algorithm Analysis
Just O(N) Algorithm Analysis
Reasoning about algorithms • We have an O(N) algorithm, • For 5,000 elements takes 3.2 seconds • For 10,000 elements takes 6.4 seconds • For 15,000 elements takes ….? • For 20,000 elements takes ….? • We have an O(N2) algorithm • For 5,000 elements takes 2.4 seconds • For 10,000 elements takes 9.6 seconds • For 15,000 elements takes …? • For 20,000 elements takes …? Algorithm Analysis
A Useful Proportion • Since F(N) is characterizes the running time of an algorithm the following proportion should hold true: F(N0) / F(N1) ~= time0 / time1 • An algorithm that is O(N2) takes 3 seconds to run given 10,000 pieces of data. • How long do you expect it to take when there are 30,000 pieces of data? • common mistake • logarithms? Algorithm Analysis