CSE 326: Data Structures Program Analysis

CSE 326: Data Structures Program Analysis Lecture 3: Friday, Jan 8, 2003

Outline • Empirical analysis of algorithms • Formal analysis of algorithms • Reading assignment: sec. 2.4.3 (maximum subsequence)

Determining the Complexity of an Algorithm • Empirical measurements: • pro: discover if constant factors are significant • con: may be running on “wrong” inputs • Formal analysis (proofs): • pro: no interference from implementation/hardware details • con: hides constants; may be hard In theory, theory is the same as practice, but in practice it is not

Measuring Empirical Complexity:Linear vs. Binary Search • Find a item in a sorted array of length N • Binary search algorithm:

int bfind(int x, int a[], int left, int right) { if (left+1 == right) return –1; m = (left + right) / 2; if (x == a[m]) return m; if (x < a[m]) return bfind(x, a, left, m); else return bfind(x, a, m, right); } int lfind(int x, int a[], int n) { if (n==0) return –1; if (x == a[n-1]) return n-1; return lfind(x, a, n-1); } for (i=0; i<n; i++) a[i] = i; for (i=0; i<n; i++) lfind(i,a,n); orbfind(i,a,-1,n)

Graphical Analysis

slope  2 slope  1 Recall: we search n timesLinear = O(n2) Binary = O(n log n)

Property of Log/Log Plots • On a linear plot, a linear function is a straight line • On a log/log plot, any polynomial function is a straight line! slopey/ x = exponent Proof: suppose y = cxk log(y) = log(cxk) log(y) = log(c) + log(xk) log(y) = log(c) + k log(x) horizontal axis vertical axis slope

Why does O(n log n) look like a straight line? slope  1

Empirical Complexity • Large data sets may be required to gain an accurate empirical picture • When running time is expected to be polynomial, use Log/log plots  slope = exponent • When the running time is expected to be exponential, use log on the y axis • When running time is expected to be log, then use long on the x axis • Best: try all three, and see which one is linear

Analyzing Code • primitive operations • consecutive statements • function calls • conditionals • loops • recursive functions

Conditionals • Conditional if C then S1 else S2 • Suppose you are doing a O( ) analysis? Time(C) + Max(Time(S1),Time(S2)) or Time(C)+Time(S1)+Time(S2) • Suppose you are doing a ( ) analysis? Time(C) + Min(Time(S1),Time(S2)) or Time(C)

Nested Loops for i = 1 to n do for j = 1 to n do sum= sum+ 1

Nested Dependent Loops for i = 1 to n do for j = i to n do sum= sum+ 1

Nested Dependent Loops for i = 1 to n do for j = i to n do sum= sum+ 1 Compute itthe hard way: Compute it the smart way: substitute n - i+1 with j

Other Important Series • Sum of squares: • Sum of exponents: • Geometric series: • Novel series: • Reduce to known series, or prove inductively

Linear Search Analysis void lfind(int x, int a[], int n) { for (i=0; i<n; i++) if (a[i] == x) return i; return –1;} • Best case, tight analysis: • Worst case, tight analysis:

Iterated Linear Search Analysis for (i=0; i<n; i++) a[i] = i; for (i=0; i<n; i++) lfind(i,a,n); • Easy worst-case upper-bound: • Worst-case tight analysis:

Analyzing Recursive Programs • Express the running time T(n) as a recursive equation • Solve the recursive equation • For an upper-bound analysis, you can optionally simplify the equation to something larger • For a lower-bound analysis, you can optionally simplify the equation to something smaller

Binary Search int bfind(int x, int a[], int left, int right) { if (left+1 == right) return –1; m = (left + right) / 2; if (x == a[m]) return m; if (x < a[m]) return bfind(x, a, left, m); else return bfind(x, a, m, right); } What is the worst-case upper bound?

Binary Search int bfind(int x, int a[], int left, int right) { if (left+1 == right) return –1; m = (left + right) / 2; if (x == a[m]) return m; if (x < a[m]) return bfind(x, a, left, m); else return bfind(x, a, m, right); } Introduce some constants… b = time needed for base case c = time needed to get ready to do a recursive call Size is n = right-left Running time is thus:

Binary Search Analysis One sub-problem, half as large Equation: T(1)  b T(n)  T(n/2) + cfor n>1 Solution: T(n)  T(n/2) + c write equation T(n/4) + c + c expand T(n/8) + c + c + c T(n/2k) + kc inductive leap T(1) + c log n where k = log n select value for k b + c log n = O(log n) simplify

Solving Recursive Equations by Telescoping • Create a set of equations, take their sum

Inductive Proof If you know the closed form solution,you can validate it by ordinary induction

E D C B A A F B C D E F Amortized Analysis Stack • Stack operations • push • pop • is_empty • Stack property: if x is on the stack before y is pushed, then x will be popped after y is popped • What is biggest problem with an array implementation?

int[] data; int maxsize; int top; Push(e){ if (top == maxsize){ temp = new int[2*maxsize]; for (i=0;i<maxsize;i++) temp[i]=data[i]; data = temp; maxsize = 2*maxsize; } data[++top] = e; } int pop() { return data[--top]; } Stretchy Stack Implementation Best case Push = O( ) Worst case Push = O( )

Stretchy Stack Amortized Analysis • Consider sequence of npush/pop operations • Amortized time = (T1 + T2 + . . . + Tn) / n • We compute this next push(e1) push(e2) pop() push(e3) push(e4) pop() . . . push(ek)  time = T1 n  time = Tn

Stretchy Stack Amortized Analysis • The length of the array increases like this: 1, 2, 4, 8, . . . , 2k, . . ., n • For each Ti we have one of the following • Ti = O(1) for pop(), and for some push(ei) • Ti = O(2k) for some push(ei) • Hence

Stretchy Stack Amortized Analysis Let’s compute this sum: And therefore: In an asymptotic sense, there is no overhead in using stretchy arraysrather than regular arrays!

Geometric Series

Stretchy Stack Amortized Analysis • Careful ! We must be clever to get good amortized performance ! • Consider “smart pop”: int pop(){ int e = data[--top]; if (top <= maxsize/2){ maxsize = maxsize/2; temp = new int[maxsize]; for (i=0;i<maxsize;i++) temp[i]=data[i]; data = temp;} return e; }

Stretchy Stack Amortized Analysis • Take the sequence of 3n push/pop operations: push(e1) push(e2) ... push(en) pop() push(en) pop() push(en) pop() ... push(en) pop() n Suppose n = 2k+1 Hence amortized time is: T = ((1) + . . . + (1) + (n) + . . .+ (n))/3n = (n (1) + 2n (n))/3n = 2/3 (n) Hence T = (n) !!! 2n

CSE 326: Data Structures Program Analysis