910 likes | 1.08k Views
CSE 326: Data Structures Introduction & Part One: Complexity. Henry Kautz Autumn Quarter 2002. Overview of the Quarter. Part One: Complexity inductive proofs of program correctness empirical and asymptotic complexity order of magnitude notation; logs & series analyzing recursive programs
E N D
CSE 326: Data StructuresIntroduction &Part One: Complexity Henry Kautz Autumn Quarter 2002
Overview of the Quarter • Part One: Complexity • inductive proofs of program correctness • empirical and asymptotic complexity • order of magnitude notation; logs & series • analyzing recursive programs • Part Two: List-like data structures • Part Three: Sorting • Part Four: Search Trees • Part Five: Hash Tables • Part Six: Heaps and Union/Find • Part Seven: Graph Algorithms • Part Eight: Advanced Topics
Material for Part One • Weiss Chapters 1 and 2 • Additional material • Graphical analysis • Amortized analysis • Stretchy arrays • Any questions on course organization?
Program Analysis • Correctness • Testing • Proofs of correctness • Efficiency • How to define? • Asymptotic complexity - how running times scales as function of size of input
Proving Programs Correct • Often takes the form of an inductive proof • Example: summing an array int sum(int v[], int n) { if (n==0) return 0; else return v[n-1]+sum(v,n-1); } What are the parts of an inductive proof?
Inductive Proof of Correctness int sum(int v[], int n) { if (n==0) return 0; else return v[n-1]+sum(v,n-1); } Theorem: sum(v,n) correctly returns sum of 1st n elements of array v for any n. Basis Step: Program is correct for n=0; returns 0. Inductive Hypothesis (n=k): Assume sum(v,k) returns sum of first k elements of v. Inductive Step (n=k+1): sum(v,k+1) returns v[k]+sum(v,k), which is the same of the first k+1 elements of v.
Proof by Contradiction • Assume negation of goal, show this leads to a contradiction • Example: there is no program that solves the “halting problem” • Determines if any other program runs forever or not Alan Turing, 1937
Program NonConformist (Program P) If ( HALT(P) = “never halts” ) Then Halt Else Do While (1 > 0) Print “Hello!” End While End If End Program • Does NonConformist(NonConformist) halt? • Yes? That means HALT(NonConformist) = “never halts” • No? That means HALT(NonConformist) = “halts” Contradiction!
Defining Efficiency • Asymptotic Complexity - how running time scales as function of size of input • Why is this a reasonable definition?
Defining Efficiency • Asymptotic Complexity - how running time scales as function of size of input • Why is this a reasonable definition? • Many kinds of small problems can be solved in practice by almost any approach • E.g., exhaustive enumeration of possible solutions • Want to focus efficiency concerns on larger problems • Definition is independent of any possible advances in computer technology
Technology-Depended Efficiency • Drum Computers: Popular technology from early 1960’s • Transistors too costly to use for RAM, so memory was kept on a revolving magnetic drum • An efficient program scattered instructions on the drum so that next instruction to execute was under read head just when it was needed • Minimized number of full revolutionsof drum during execution
The Apocalyptic Laptop • Speed Energy Consumption • E = m c 2 • 25 million megawatt-hours • Quantum mechanics:Switching speed = h / (2 * Energy) • h is Planck’s constant • 5.4 x 10 50 operations per second Seth Lloyd, SCIENCE, 31 Aug 2000
Big Bang Ultimate Laptop, 1 year 1 second 1000 MIPS, since Big Bang 1000 MIPS, 1 day
Defining Efficiency • Asymptotic Complexity - how running time scales as function of size of input • What is “size”? • Often: length (in characters) of input • Sometimes: value of input (if input is a number) • Which inputs? • Worst case • Advantages / disadvantages ? • Best case • Why?
Average Case Analysis • More realistic analysis, first attempt: • Assume inputs are randomly distributed according to some “realistic” distribution • Compute expected running time • Drawbacks • Often hard to define realistic random distributions • Usually hard to perform math
Amortized Analysis • Instead of a single input, consider a sequence of inputs • Choose worst possible sequence • Determine average running time on this sequence • Advantages • Often less pessimistic than simple worst-case analysis • Guaranteed results - no assumed distribution • Usually mathematically easier than average case analysis
Comparing Runtimes • Program A is asymptotically less efficient than program B iff the runtime of A dominates the runtime of B, as the size of the input goes to infinity • Note: RunTime can be “worst case”, “best case”, “average case”, “amortized case”
Which Function Dominates? 100n2 + 1000 log n 2n + 10 log n n! 1000n15 3n7 + 7n n3 + 2n2 n0.1 n + 100n0.1 5n5 n-152n/100 82log n
Race I n3 + 2n2 vs. 100n2 + 1000
Race II n0.1 vs. log n
Race III n + 100n0.1 vs. 2n + 10 log n
Race IV 5n5 vs. n!
Race V n-152n/100 vs. 1000n15
Race VI 82log(n) vs. 3n7 + 7n
Order of Magnitude Notation (big O) • Asymptotic Complexity - how running time scales as function of size of input • We usually only care about order of magnitude of scaling • Why?
Order of Magnitude Notation (big O) • Asymptotic Complexity - how running time scales as function of size of input • We usually only care about order of magnitude of scaling • Why? • As we saw, some functions overwhelm other functions • So if running time is a sum of terms, can drop dominated terms • “True” constant factors depend on details of compiler and hardware • Might as well make constant factor 1
Eliminate low order terms • Eliminate constant coefficients
Common Names Slowest Growth constant: O(1) logarithmic: O(log n) linear: O(n) log-linear: O(n log n) quadratic: O(n2) exponential: O(cn) (c is a constant > 1) Fastest Growth superlinear: O(nc) (c is a constant > 1) polynomial: O(nc) (c is a constant > 0)
Summary • Proofs by induction and contradiction • Asymptotic complexity • Worst case, best case, average case, and amortized asymptotic complexity • Dominance of functions • Order of magnitude notation • Next: • Part One: Complexity, continued • Read Chapters 1 and 2
Part One: Complexity, continued Friday, October 4th, 2002
Determining the Complexity of an Algorithm • Empirical measurement • Formal analysis (i.e. proofs) • Question: what are likely advantages and drawbacks of each approach?
Determining the Complexity of an Algorithm • Empirical measurement • Formal analysis (i.e. proofs) • Question: what are likely advantages and drawbacks of each approach? • Empirical: • pro: discover if constant factors are significant • con: may be running on “wrong” inputs • Formal: • pro: no interference from implementation/hardware details • con: can make mistake in a proof! In theory, theory is the same as practice, but not in practice.
Measuring Empirical Complexity:Linear vs. Binary Search • Find a item in a sorted array of length N • Binary search algorithm:
My C Code void bfind(int x, int a[], int n) { m = n / 2; if (x == a[m]) return; if (x < a[m]) bfind(x, a, m); else bfind(x, &a[m+1], n-m-1); } void lfind(int x, int a[], int n) { for (i=0; i<n; i++) if (a[i] == x) return; } for (i=0; i<n; i++) a[i] = i; for (i=0; i<n; i++) lfind(i,a,n); or bfind
slope 2 slope 1
Property of Log/Log Plots • On a linear plot, a linear function is a straight line • On a log/log plot, any polynomial function is a straight line! • The slopey/ x is the same as the exponent horizontal axis vertical axis slope
Summary • Empirical and formal analyses of runtime scaling are both important techniques in algorithm development • Large data sets may be required to gain an accurate empirical picture • Log/log plots provide a fast and simple visual tool for estimating the exponent of a polynomial function
Formal Asymptotic Analysis • In order to prove complexity results, we must make the notion of “order of magnitude” more precise • Asymptotic bounds on runtime • Upper bound • Lower bound
Definition of Order Notation • Upper bound: T(n) = O(f(n)) Big-O Exist constants c and n’ such that T(n) c f(n) for all n n’ • Lower bound:T(n) = (g(n)) Omega Exist constants c and n’ such that T(n) c g(n) for all n n’ • Tight bound: T(n) = θ(f(n)) Theta When both hold: T(n) = O(f(n)) T(n) = (f(n))
Upper/Lower vs. Worst/Best • Worst case upper bound is f(n) • Guarantee that run time is no more than c f(n) • Best case upper bound is f(n) • If you are lucky, run time is no more than c f(n) • Worst case lower bound is g(n) • If you are unlikely, run time is at least c g(n) • Best case lower bound is g(n) • Guarantee that run time is at least c g(n)
Analyzing Code • primitive operations • consecutive statements • function calls • conditionals • loops • recursive functions