Algorithm Design and Analysis (ADA)

Algorithm Design and Analysis (ADA) 242-535, Semester 1 2013-2014 • Objective • to introduce the Big-Oh notation for estimating the worst case running time of programs 2. Running Time of Programs

Overview • Running Time: T() • Big-Oh and Approximate Running Time • Calculating Big-Oh Directly • Analyzing Function Calls • Analyzing Recursive Functions • Towers of Hanoi

1. Running Time: T() • What is the running time of this program? void main(){ int i, n; scanf("%d", &n); for(i=0; i<n; i++) printf("%d"\n", i);} continued

Counting Instructions • Assume 1 instruction takes 1 ms

There is no single answer! • the running time depends on the size of the n value • Instead of a time answer in seconds, we want a time answer which is related to the size of the input. continued

For example: • running time T(n) = constant * n • this means that as n gets bigger, so does the program time • running time islinearly related to the input size running time constant * n size of n

A simple way of writing the running time is: T(n) = 3n

Running Time Theory • A program/algorithm has a running time T(n) • n is some measure of the input size • T(n) is the largest amount of time the program takes on any input of size n • T(n) is the worst running time • not always accurate, but easy to calculate • Time units are left unspecified.

1.1. Kinds of Running Time Worst-case:(we use this usually) • T(n) = maximum time of algorithm on any input of size n. - one possible value Average-case:(we sometimes use this) • T(n) = expected time of algorithm over all inputs of size n. - this approach needs info about the statistical distribution (probability) of the inputs. - e.g. uniform spread of data (i.e. all data is equally likely) Best-case:(don't use this, it's misleading) • e.g. write a slow algorithm that works fast on specially selectedinput.

1.2. T(n) Example • Loop fragment for finding the product of all the positive numbers in the A[] array of size n: (2)int prod = 1;(3) for(j = 0; j < n; j++)(4) if (A[j] > 0)(5) prod = prod * A[j]; • Count each assignment and test as 1 “time unit”.

Convert 'for' to 'while' • The while-loop is easier to count (and equivalent to the for-loop): int prod = 1; // 1 int j = 0; // 1 while (j < n) { // 1 for the test if (A[j] > 0) // 1 prod = prod*A[j]; // 1 j++; // 1 } What about counting the loop? We assume that 1 instruction takes 1 "time unit"

Calculation • The for loop executes n times • each loop carries out (in the worse case) 5 ops • test of j < n, if test, multiply, assign, j increment • total loop time = 5n • plus 3 ops at start and end • small assign (line 2), init of j (line 3), final j < n test • Total time T(n) = 5n + 3 • running time is linear with the size of the array

1.3. Comparing Different T()’s TB(n) = 2n2 • If input size < 50, program B takes less time. • But for large n’s, which are more common in real code, program B is worse and worse (slower). T(n) TA(n) = 100n input size n

1.4. Common Growth Formulae & Names • Formula (n = input size) Name n linear n2 quadratic n3 cubic nm polynomial, e.g. n10 mn ( m >= 2) exponential, e.g. 5n n! factorial 1 constant log n logarithmic n log n log log n

Growth Examples

1.5. Execution Times Assume 1 instruction takes 1 microsec (10-6 secs) to execute. How long will n instructions take? • 3 9 50 100 1000 106n 3 9 50 100 1ms 1secn2 9 81 2.5ms 10ms 1sec 12 daysn3 27 729 125ms 1sec 16.7 min 31,710yr2n 8 512 36yr 4*1016 yr 3*10287 yr 3*10301016yrlog n 2 3 6 7 10 20 n (no. of instructions) growth formula T() if n is 50, you will wait 36 years for an answer!

Notes • Logarithmic running times are best. • Polynomial running times are acceptable, if the power isn’t too big • e.g. n2 is ok, n100 is terrible • Exponential times mean sloooooooow code. • some size problems may take longer to finish than the lifetime of the universe!

1.6. Why use T(n)? • T() can guide our choice of which algorithm to implement, or program to use • e.g. selection sort or merge sort? • T() helps us look for better algorithms in our own code, without expensive implementation, testing, and measurement.

(Wrong) Arguments against T(n) • Algorithms often perform much better on average than the worst case used in T() • quicksort is n log n on a “random” array, but n2 in the worse case • but for most algorithms, the worst case is a good predictor of its running time • average case analyses can be done, but they are harder mathematically continued

Some people say: • “Who cares about running time? In a few years, machines will be so fast that even bad algorithms will be fast.” • History shows this argument to be wrong. As machines get faster, problem sizes get bigger. • Most interesting problems (e.g. computer vision, natural language processing) always require more resources • fast algorithms will always be needed continued

Some people say: • "Benchmarking (running programs on a standard set of test cases) is easier." • This is sometimes true, but the benchmarks only give numbers for that particular machine, OS, compiler, computer language.

T() is too Machine Dependent • We want T() to be the same for an algorithm independent of the machine where it is running. • This is not true since different machines (and OSes) execute instructions at different speeds. • Consider the loop example (slide 11) • on machine A, every instruction takes 1 "time unit" • the result is TA(n) = 5n + 3

On machine B, every instruction takes 1 "time unit" except for multiplication, which takes 5 "time units". • The for loop executes n times • each loop carries out (in the worse case) 5 ops • test of j < n, if test, multiply, assign, j increment • total loop time = 9n • plus 3 ops at start and end • small assign (line 2), init of j (line 3), final j < n test • Total time TB(n) = 9n + 3 • running time is linear with the size of the array

TA() = 4n + 3 and TB() = 9n +3 • These are both linear equations (which is good), but the constants are different (which is bad) • We want a T() notation that is independent of machines.

2. Big-Oh and Running Time • Big-Oh notation for T(n) ignores constant factors which depend on compiler/machine behaviour • that's good • Big-Oh simplifies the process of estimating the running time of programs • we don't need to count every code line • that's also good

The Big-Oh value specifies running time independent of: • machine architecture • e.g. don’t consider the running speed of individual machine operations • machine load (usage) • e.g. time delays due to other users • compiler design effects • e.g. gcc versus Visual C

Example • When we counted instructions for the loop example, we found: • TA() = 4n + 3 • TB() = 9n + 3 • The Big-Oh equation, O(), is based on the T(n) equation but ignores constants (which vary from machine to machine). This means for both machine A and B: T(n) is O(n)we say "T(n) is order n"

More Examples • T(n) value Big Oh : O() • 10n2+ 50n+100 O(n2) • (n+1)2 O(n2) • n10 O(2n) • 5n3 + 1 O(n3) • These simplifications have a mathematical reason, which is explained in section 2.2. hard to understand

2.1. Is Big-Oh Useful? • O() ignores constant factors, which means it is a more general measure of running time for algorithms across different platforms/compilers. • It can be comparedwith Big-Oh values for other algorithms. • i.e. linear is better than polynomial and exponential, but worse than logarithmic • i.e. O(log n) < O(n) < O(n2) < O(2n)

2.2. Definition of Big-Oh • The connection between T() and O() is: • the T() = f(n) can be written as T(n) is O( g(n) ) • this means that g(n) is the most important thing in T()'s f(n) function when n is large • Example 1: • T(n) = 4n + 3 // the f() function is 4n + 3 • write as T(n) is O(n) // the g() function is n • Example 2: • T(n) = 9n + 3 // the f() function is 9n + 3 • write as T(n) is O(n) // the g() function is n continued

More Formally • n0 and c are called witnesses to the relationship: T(n) = f(n) and T(n) is O(g(n) ) • In some textbooks this is written as: • f(n) is O(g(n)) // leave out the T(n) We write T(n) = f(n) as T(n) is O(g(n)) if there exist constants c > 0, n0 > 0 such that 0 f(n) c*g(n) for all nn0.

O-notation as a Graph • O-notation gives an upper bound for a function to within a constant factor. We write T(n)=f(n) as T(n) is O(g(n)) if there are positive constants n0 and c such that at and to the right of n0, the value of f(n) always lies on or below c*g(n). above is

Asymptotic Analysis • The fancy name for calculating big-Oh (and the other bounds mentioned later) is asymptotic analysis. • Asymptotic means "a curve whose distance to another curve tends to zero as the curves travel off to infinity" • this is seen by the limit of f(n)/c*g(n) curve, as n ∞: • 0 ≤ f(n) ≤ c*g(n) for all n ≥ n0 • so 0 ≤ f(n)/c*g(n) ≤ 1, such that f(n)/c*g(n)  a constant

Set definition of O-notation O(g(n))= { f(n) : there exist constants c > 0, n0 > 0 such that 0 f(n) c*g(n) for all nn0} f(n) is in a set of functions that are less than or equal to g(n) This means that the c*g(n) curve can be an upper bound for many different f(n) curves

Example 1 • T(n) = 10n2 + 50n + 100 • can be written asT(n) is O(n2) • Why? • f(n) = 10n2 + 50n + 100; g(n) = n2 • Witnesses: n0 = 1, c = 160 • then f(n)<= c*g(n), n >= 1so 10n2 + 50n + 100 <= 160 n2since 10n2 + 50n + 100 <= 10n2 + 50n2 + 100n2 <= 160 n2 Informally, the n2part isthe most important thing in the T() function

T() and O() Graphed http://dlippman.imathas.com/ graphcalc/graphcalc.html T(n) T(n) is O(n2) (c g(n) == 160n2) above T(n) = 10n2 + 50n + 10 ( f(n) == 10n2 + 50n + 10) n0 == 1 n

f(n)/c g(n)  constant as n  ∞ the equation will distance to the y 'curve' will approach 0.Why? (10n2 + 50n + 10) / (160n2) y = 1/16 = 0.0625 n

Example 2 • T(n) = (n+1)2 • can be wriiten asT(n) is O(n2) • Why? • f(n) = (n+1)2; g(n) = n2 • Witnesses: n0 = 1, c = 4 • then f(n) <= c*g(n), n >= 1so (n+1)2 <= 4n2since n2 + 2n + 1 <= n2 + 2n2 + n2 <= 4n2

T() and O() Graphed T(n) T(n) is O(n2) (c g(n) == 4n2) above T(n) = (n+1)2 (f(n) == (n+1)2 n0 == 1 n

Example 3 • T(n) = n10 • can be written asT(n) is O(2n) • Why? • f(n) = n10 ; g(n) = 2n • Witnesses: n0 = 64, c = 1 • then f(n) <= c*g(n), n >= 64so n10 <= 2nsince10*log2 n <= n (by taking logs of both sides) which is true when n >= 64 (10*log2 64 == 10*6; 60 <= 64)

n10 and 2n Graphed T(n) T(n) is O(2n) (c g(n) == 2n above T(n) = n10 f(n) == n10 (58.770, 4.915E17) n

2.4. Some Observations about O() • When choosing an O() approximation to T(), remember that: • constant factors do not matter • e.g. T(n) = (n+1)2 is O(n2) • low-order terms do not matter • e.g. T(n) = 10n2 + 50n + 100 is O(n2) • there are many possible witnesses because there are usually many O() graphs that are above the T() equation

2.5. Simplifying O() Expressions • Inside an O() expressions, always drop constant factors and low-order terms. • For example: • T(n) = 3n5+ 10n4 + n • T(n) is O(3n5) • but, T(n) is O(n5) is simpler and tighter • this means that the O() is closer to the T() curve

3. Calculating Big-Oh Directly • Up to now, I have calculated T(n) = f(n) by counting instructions (e.g. see the loop example), and then simplified T(n) to becomes T(n) is O(g(n)) • We can calculate the big-oh function, g(), directly, without counting instructions • easier and faster

3. Big-Oh for Programs • First decide on a size measure for the data in the program. This will become the n. • Data Type Possible Size Measure integer its valuestring its lengtharray its length

3.1. Building a Big-Oh Result • The Big-Oh value for a program is built up in stages by: • 1) Calculate the Big-Oh’s for all the simple statements in the program • e.g. assignment, arithmetic • 2) Then use those value to obtain the Big-Oh’s for the complex statements • e.g. blocks, for loops, if-statements

Simple Statements (in C) • We assume that simple statements always take a constant amount of time to execute • written as O(1) • this is not a time unit (not 1 ms, not 1 microsec) • O(1) means a running time independent of the input size n • Kinds of simple statements: • assignment, break, continue, return, all library functions (e.g. putchar(),scanf()), arithmetic, boolean tests, array indexing

Complex Statements • The Big-Oh value for a complex statement is a combination of the Big-Oh values of its component simple statements. • Kinds of complex statements: • blocks { ... } • conditionals: if-then-else, switch • loops: for, while, do-while continued

3.2. Structure Trees • The easiest way to see how complex statement timings are based on simple statements (and other complex statements) is by drawing a structure tree for the program.

Example: binary conversion void main() { int i;(1) scanf(“%d”, &i);(2) while (i > 0) {(3) putchar(‘0’ + i%2);(4) i = i/2; }(5) putchar(‘\n’); }

Algorithm Design and Analysis (ADA)