Algorithm Efficiency

Algorithm Efficiency • There are often many approaches (algorithms) to solve a problem. How do we choose between them? • At the heart of a computer program design are two (sometimes conflicting) goals: 1. To design an algorithm that is easy to understand, code, and debug. 2. To design an algorithm that makes efficient use of the computer’s resources. • Goal 1 is the concern of Software Engineering. • Goal 2 is the concern of data structures and algorithm analysis. • When goal 2 is important, how do we measure an algorithm’s cost?

How to Measure Efficiency? • Empirical comparison (run the programs). • Only valid for that machine. • Only valid for that compiler. • Only valid for that coding of the algorithm. • Asymptotic Algorithm Analysis • Must identify critical resources • time - where we will concentrate • space • Identify factors affecting that resource • For most algorithms, running time depends on “size” of the input. • Running time is expressed as T(n) for some function T on input size n.

Examples of Growth Rate Example 1: int largest (int* array, int n) { int currlarge = 0; for (int I=0; I<n; I++) if (array[I]>currlare) currlarge=array[I]; return currlarge; } Example 2: sum = 0; for (I=1; I<=n; I++) for (j=1; j<=n; j++) sum++;

Growth Rate Graphs

Expanded View

Best, Worst and Average Cases • Not all inputs of a given size take the same time. • Sequential search for K in an array of n integers: • Begin at the first element in array and look at each element in turn until K is found. • Best Case: • Worst Case: • Average Case: • While average time seems to be the fairest measure, it may be difficult to determine. • When is the worst case time important? • Time critical events (real time processing).

Faster Computer or Faster Algorithm? What happens when we buy a computer 10 times faster? f(n) n n’ change n’/n 10n 1,000 10,000 n’=10n 10 20n 500 5,000 n’=10n 10 5n log n 250 1,842 sqrt(10)n<n’<10n 7.37 2n2 70 223 n’=sqrt(10)n 3.16 2n 13 16 n’=n+3 >1 n:Size of input that can be processed in 1 hour (10,000 steps). n’: Size of input that can be processed in one hour on the new machine (100,000 steps).

Asymptotic Analysis: Big-oh • Definition: T(n) is in the set O(f(n)) if there exist two positive constants c and n0 such that |T(n)|<= c|f(n)| for all n>n0. • Usage: the algorithm is in O(n2) in [best, average, worst] case. • Meaning: for all data sets big enough (i.e., n>n0), the algorithm always executes in less than c|f(n)| steps in [best, average, worst] case. • Upper Bound: • Example: if T(n)=3n2 then T(n) is in O(n2). • Tightest upper bound: • T(n)=3n2 is in O(n3), we prefer O(n2).

Big-oh Example • Example 1. Finding the value X in an array. • T(n)=csn/2. • For all values of n>1, |csn/2|<=cs|n|. Therefore, by the definition, T(n) is in O(n) for n0=1 and c=cs. • Example 2. T(n)=c1n2+c2n in the average case • | c1n2+c2n |<=| c1n2+c2n2 |<=(c1+c2)|n2| for all n>1. • Therefore, T(n) is in O(n2). • Example 3: T(n)=c. This is in O(1).

Big-Omega • Definition :T(n) is in the set Ω(g(n)) if there exist two positive constants c and n0 such that |T(n)| >= c|g(n)| for all n>n0. • Meaning: For all data sets big enough (i.e., n>n0 ), the algorithm always executes in more than c|g(n)| steps. • It is a LOWER bound. • Example: T(n)=c1n2+c2n • | c1n2+c2n |>=| c1n2| for all n>1. • |T(n)|>= c|n2| for c=c1 and n0=1. • Therefore, T(n) is in Ω(n2)by the definition • We want the greatest lower bound.

Theta Notation • When big-Oh and Ω are the same for an algorithm, we indicate this by using Θ (big-Theta) notation. • Definition: an algorithm is said to be Θ(h(n)) if it is in O(h(n)) and it is in Ω (h(n)). • Simplifying rules: • if f(n) is in O(g(n)) and g(n) is in O(h(n)) then f(n) is in O(h(n)). • if f(n) is in O(kg(n)) for any constant k>0, then f(n) is in O(g(n)). • if f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)), then (f1+f2)(n) is in O(max(g1(n),g2(n))). • if f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)), then f1(n)f2(n) is in O(g1(n)g2(n)).

Big O rules • If T1(n)=O(f(n)) and T2(n)=O(g(n)) then • T1(n)+T2(n)=max( O(f(n)), O(g(n)) ) • T1(n)*T2(n)=O(f(n)) * O(g(n)) • If T(n) is a polynomial of degree k then T(n)= Θ(nk). • logkn = O(n) for any constant k. Logarithms grow very slowly.

General Algorithm Analysis Rules • The running time of a for loop is at most the running time of the statements inside the for loop times the number of iterations. • Analyze nested loops inside out. Then apply the previous rule. • Consecutive statements just add (so apply the max rule). • The running time of a if/else statement is never more than the running time of the test plus the larger of the times of the true and false case.

Running Time of a Program • Example 1: a=b; • this assignment statement takes constant time, so it is Θ(1) • Example 2: sum=0; for (I=1; I<=n; I++) sum+=n; • Example 3: sum=0; for (j=1; j<=n; j++) for (I=1; I<=j; I++) sum++; for (k=1; k<=n; k++) a[k]=k-1;

More Examples • Example 4: sum1=0; for (I=1; I<=n; I++) for (j=1; j<=n; j++) sum1++; sum2=0; for (I=1; I<=n; I++) for (j=1; j<=I; j++) sum2++: • Example 5: sum1=0; for (k=1; k<n; k*=2) for (j=1; j<=n; j++) sum1++: sum2=0; for (k=1; k<=n; k*=2) for (j=1; j<=k; j++) sum2++;

Binary Search int binary (int value, int* array, int size) { int left=-1; int right=size; while (left+1!= right) { int mid=(left+right)/2; if (value < array[mid]) right=mid; else if (value>array[mid]) left=mid; else return mid; } return -1; }

Binary Search Example Position Key 0 11 1 13 2 21 3 26 4 29 5 36 6 40 7 41 8 45 9 51 10 54 11 56 12 65 13 72 14 77 15 83 Now let’s search for the value 45.

Unsuccessful Search Position Key 0 11 1 13 2 21 3 26 4 29 5 36 6 40 7 41 8 45 9 51 10 54 11 56 12 65 13 72 14 77 15 83 How many elements are examined in the worse case? Now let’s search for the value 24

Case Study – Maximum Subsequence • Given a sequence of integers a1, a2,…, an, find the subsequence that gives you the largest sum. • Since there is no size limit on the subsequence, then if the sequence is all positive or all negative then the solution is trivial.

Simple Solution • Look at all possible combinations of start and stop positions of the subsequence. for (i=0 i<n; i++) for (j=i; j<n; j++) { thissum=0; for (k=i; k<=j;k++) thissum=thissum+a[k]; if (thissum>maxsum) maxsum=thissum; }

Analysis of Simple Solution • Inner loop is executed j-i+1 times. • The middle loop changes j from i to n-1. • Looking at the j-I+1 when j goes from I to n-1, we have 1+2+…+(n-i). • So this is done (n-i+1)(n-i)/2 times. • (n2-2ni+i2+n-i)/2

More Analysis • The outer loop changes i from 0 to n-1. • n2 summed n times is n3 • The sum of 2ni when i changes from 0 to n-1 is 2n(n-1)(n)/2 = n3-n2 • The sum of i2 when i changes from 0 to n-1 is (n-1)(n)(2n-1)/6 = (2n3-3n2-n)/6 • The sum of n when i changes is n2. • The sum of i when i changes from 0 to n-1 is (n-1)(n)/2 = (n2-n)/2. • Total is (n3+n3-n2+ (2n3-3n2-n)/6+ n2+ (n2-n)/2)/6 • This is O(n3).

An improved Algorithm • Start at position i and find the sum of all subsequences that start at position i. Then repeat for all starting positions. for (i=0 i<n; i++) { thissum=0; for (j=i; j<n; j++) { thissum=thissum+a[j]; if (thissum>maxsum) maxsum=thissum;}}

Analysis of Improved Algorithm • The inner loop goes from i to n-1. • When i is 0, this is n times • When i is 1 it is n-1 times • Until i is n-1 then 1 time • Summing this up backwards, we get 1+2+…+n = n(n+1)/2=(n2+n)/2= O(n2)

Final great algorithm for (j=0; j<n; j++) { thissum=thissum+a[j]; if (thissum>maxsum)maxsum=thissum; else if(thissum<0) thissum=0; } O(n)

Analyzing Problems • Upper bound: The upper bound of best known algorithm to solve the problem. • Lower bound: The lower bound for every possible algorithm to solve that problem, even unknown algorithms. • Example: Sorting • Cost of I/O: Ω(n) • Bubble or insertion sort O(n2) • A better sort (Quicksort Mergesort, Heapsort) O(n log n) • We prove in chapter 8 that sorting is Ω(n log n)

Multiple Parameters • Compute the rank ordering for all C pixel values in a picture of P pixels. • Monitors have a fixed number of colors (256, 16M, 64M). • Need to count the number of each color and determine the most used and least used colors. for (i=0; i<C; i++) count[i]=0; for (i=0; i<P; i++) count[value[i]]++; sort(count); • If we use P as the measure, then time is O(P logP) • Which is bigger, C or P? 600x400=2400; 1024x1024=1M • More accurate is O(P + C log C)

Space Bounds • Space bounds can also be analyzed with asymptotic complexity analysis. • Time: Algorithm • Space: Data Structure • Space/Time Tradeoff Principle: • One can often achieve a reduction in time if one is willing to sacrifice space, or vice versa. • Encoding or packing information • Boolean flags- takes one bit, but a byte is the smallest storage, so pack 8 booleans into 1 byte. Takes more time, less space. • Table Lookup • Factorials - Compute once, use many times • Disk based Space/Time Tradeoff Principle: • The smaller you can make your disk storage requirements, the faster your program will run. Disk is slow.

Algorithm Efficiency