CSC 430: Basics of Algorithm Runtime Analysis

CSC 430: Basics of Algorithm Runtime Analysis

What is Runtime Analysis? • Fundamentally, it is the process of counting the number of “steps” it takes for an algorithm to terminate • “Step” = primitive computer operation • Arithmetic • Read/Write memory • Make a comparison • … • The number of steps needed depends on: • The fundamental algorithmic structure • The data structures you use

Other Types of Algorithm Analysis • Algorithm space analysis • Approximate-solution analysis • Parallel algorithm analysis • A PROBLEM’S run-time analysis • Complexity class analysis

Runtime Motivates Design • Easiest Algorithm to implement for any problem: Brute force. • check every possible solution. • Typically takes 2N time or worse for inputs of size N. • Unacceptable in practice. • Usually, through insight, we can do much better • What do we mean by “better”?

Worst-Case Analysis • Best case running time: • Rarely used; often superfluous • Average case running time: • Obtain bound on running time of algorithm on random input as a function of input size N. • Hard (or impossible) to accurately model real instances by random distributions. • Acceptable in certain, easily-justified scenarios • Worst case running time: • Obtain bound on largest possible running time • Generally captures efficiency in practice. • Pessimistic view, but hard to find effective alternative.

Defining Efficiency: Worst-Case Polynomial-Time • Def. An algorithm is efficient if its worst-case running time is polynomial or better. • Not always true, but works in practice: • 6.02  1023 N20 is technically poly-time • Breaking through the exponential barrier of brute force typically exposes some crucial structure of the problem. • Most poly-time algorithms that people develop have low constants and low exponents (e.g., N1 N2 N3). • Exceptions. • Sometimes the problem size demands greater efficiency than poly-time • Some poly-time algorithms do have high constants and/or exponents, and are useless in practice. • Some exponential-time (or worse) algorithms are widely used because the worst-case instances seem to be rare. simplex methodUnix grep

2.2 Asymptotic Order of Growth

What is Asymptotic Analysis • Instead of giving precise runtime, we’ll show that algorithms are bounded within a constant factor of common functions • Three types of bounds • Upper Bounds (never greater) • Lower Bounds (never less) • Tight Bounds (Upper and Lower are same)

Growth Notation • Upper bounds (Big-Oh): • T(n) O(f(n)) if there exist constants c > 0 and n0 0 such that for all n  n0we have T(n)  c · f(n). • Lower bounds (Omega): • T(n) (f(n)) if there exist constants c > 0 and n0 0 such that for all n  n0we have T(n)  c · f(n). • Tight bounds (Theta): • T(n) (f(n)) if T(n) is both O(f(n)) and (f(n)). • Frequently abused notation: • T(n) = O(f(n)). • Asymmetric: • f(n) = 5n3; g(n) = 3n2 • f(n) = O(n3) = g(n) • but f(n)  g(n). • Better notation: T(n)  O(f(n)).

Use-Cases • Big-Oh: • “The algorithm’s runtime is O(N lgN), a potential improvement over prior approaches, which were O(N2).” • Omega: • “Their algorithm’s runtime is (N2), which is worse than our algorithm’s proven O(N1.5) runtime.” • Theta: • “Our original algorithm was (N lgN) and O(N2). With modifications, we were able to improve the runtime bounds to (N1.5)”

Properties • Transitivity. • If f O(g) and g O(h) then f O(h). • If f (g) and g (h) then f (h). • If f (g) and g (h) then f (h). • Additivity. • If f O(h) and g O(h) then f + g O(h). • If f (h) and g (h) then f + g (h). • If f (h) and g O(h) then f + g (h). • These generalize to any number of terms • “Symmetry”: • If f (h) then h (f) • If f  O(h) then h (f) • If f (h) then h  O(f)

Asymptotic Bounds for Some Common Functions • Polynomials. a0 + a1n + … + adnd is (nd) if ad > 0. • Polynomial time. Running time is O(nd) for some constant d independent of the input size n. • Logarithms. O(log a n) = O(log b n) for any constants a, b > 0. • Logarithms. For every x > 0, log n = O(nx). • Exponentials. For every r > 1 and every d > 0, nd = O(rn). can avoid specifying the base log grows slower than every polynomial every exponential grows faster than every polynomial

Proving Bounds • Use Transitivity: • If an algorithm is at least as fast as another, then it has the same upper bound • Use Additivity: • f(n) = n3 + n2 + n1 + 3 • f(n)  (n3) • Be direct: • Start with the definition and provide the parts it requires • Take the logarithm of both functions • Use limits: • Keep in mind that sometimes functions can’t be bounded. (But I won’t assign you any like that!)

2.4 A Survey of Common Running Times

Linear Time: O(n) • Linear time. Running time is at most a constant factor times the size of the input. • Computing the maximum. Compute maximum of n numbers a1, …, an. def max(L): max = L[0] for iin L: if i> max: max = i return max

Linear Time: O(n) #the merge algorithm combines two #sorted lists into one larger sorted #list def merge(A,B): i = j = 0 M = [] while i < len(A) or j < len(B): if A[i]  B[j]: M.append(A[i]) i+=1 else: M.append(B[j]) j+=1 if i == len(A): return M + B[j:] return M + A[i:] • Claim. Merging two lists of size n takes O(n) time. • Pf. After each comparison, the length of output list increases by 1.

O(n log n) Time • O(n log n) time. Arises in divide-and-conquer algorithms. • Sorting.Mergesort and heapsort are sorting algorithms that perform O(n log n) comparisons. • Largest empty interval. Given n time-stamps x1, …, xn on which copies of a file arrive at a server, what is largest interval of time when no copies of the file arrive? • O(n log n) solution. Sort the time-stamps. Scan the sorted list in order, identifying the maximum gap between successive time-stamps.

Quadratic Time: O(n2) • Quadratic time. Enumerate all pairs of elements. • Closest pair of points. Given a list of n points in the plane (x1, y1), …, (xn, yn), find the pair that is closest. • Remark.(n2) seems inevitable, but this is just an illusion. def closest_pair(L): x1 = L[0] x2 = L[1] #no need to compute the sqrt for the distance min = (x1[0] – x2[0])**2 + (x1[1] – x2[1])**2 for iin range(len(L)): for j in range(i+1,len(L)) : x1,x2 = L[i],L[j] d = (x1[0] – x2[0])**2 + (x1[1] – x2[1])**2 if d < min : min d return min see chapter 5

Cubic Time: O(n3) • Cubic time. Enumerate all triples of elements. • Set disjointness. Given n sets S1, …, Sn each of which is a subset of1, 2, …, n, is there some pair of these which are disjoint? def find_disjoint_sets(S): for i in range(len(S)): for j in range(i+1,len(S)): for p in S[i]: if p in S[j]: break; else: #no p in S[i]belongs to S[j] return S[i],S[j] #return tuple if found return None #return nothing if all sets overlap

Polynomial Time: O(nk) Time k is a constant • Independent set of size k. Given a graph, are there k nodes such that no two are joined by an edge? • O(nk) solution. Enumerate all subsets of k nodes. • Check whether S is an independent set = O(k2). • Number of k element subsets = • O(k2 nk / k!) = O(nk). def independent_k(V,E,k): #subset_k,independent omitted for space for sub in subset_k(V,k): if independent(sub,E): return sub return None poly-time for k=17,but not practical

Exponential Time • Independent set. Given a graph, what is maximum size of an independent set? • O(2n) solution. Enumerate all subsets. def max_independent(V,E): for k in len(V): if not independent_k(V,E,k): return k-1

CSC 430: Basics of Algorithm Runtime Analysis