230 likes | 350 Views
Mergesort, Analysis of Algorithms. Jon von Neumann and ENIAC (1945). Why Does It Matter?. Run time (nanoseconds). 1.3 N 3. 10 N 2. 47 N log 2 N. 48 N. Time to solve a problem of size. 1000. 1.3 seconds. 10 msec. 0.4 msec. 0.048 msec. 10,000. 22 minutes. 1 second. 6 msec.
E N D
Mergesort, Analysis of Algorithms Jon von Neumann and ENIAC (1945)
Why Does It Matter? Run time(nanoseconds) 1.3 N3 10 N2 47 N log2N 48 N Time tosolve aproblemof size 1000 1.3 seconds 10 msec 0.4 msec 0.048 msec 10,000 22 minutes 1 second 6 msec 0.48 msec 100,000 15 days 1.7 minutes 78 msec 4.8 msec million 41 years 2.8 hours 0.94 seconds 48 msec 10 million 41 millennia 1.7 weeks 11 seconds 0.48 seconds Max sizeproblemsolvedin one second 920 10,000 1 million 21 million minute 3,600 77,000 49 million 1.3 billion hour 14,000 600,000 2.4 trillion 76 trillion day 41,000 2.9 million 50 trillion 1,800 trillion N multiplied by 10,time multiplied by 1,000 100 10+ 10
Orders of Magnitude Seconds Equivalent Meters PerSecond ImperialUnits Example 1 1 second 10-10 1.2 in / decade Continental drift 10 10 seconds 10-8 1 ft / year Hair growing 102 1.7 minutes 10-6 3.4 in / day Glacier 103 17 minutes 10-4 1.2 ft / hour Gastro-intestinal tract 104 2.8 hours 10-2 2 ft / minute Ant 105 1.1 days 1 2.2 mi / hour Human walk 106 1.6 weeks 102 220 mi / hour Propeller airplane 107 3.8 months 104 370 mi / min Space shuttle 108 3.1 years 106 620 mi / sec Earth in galactic orbit 109 3.1 decades 108 62,000 mi / sec 1/3 speed of light 1010 3.1 centuries . . . forever Powersof 2 210 thousand 1021 age ofuniverse 220 million 230 billion
Impact of Better Algorithms • Example 1: N-body-simulation. • Simulate gravitational interactions among N bodies. • physicists want N = # atoms in universe • Brute force method: N2 steps. • Appel (1981). N log N steps, enables new research. • Example 2: Discrete Fourier Transform (DFT). • Breaks down waveforms (sound) into periodic components. • foundation of signal processing • CD players, JPEG, analyzing astronomical data, etc. • Grade school method: N2 steps. • Runge-König (1924), Cooley-Tukey (1965).FFT algorithm: N log N steps, enables new technology.
Mergesort A L G O R I T H M S divide • Mergesort (divide-and-conquer) • Divide array into two halves. A L G O R I T H M S
Mergesort • Mergesort (divide-and-conquer) • Divide array into two halves. • Recursively sort each half. A L G O R I T H M S A L G O R I T H M S divide A G L O R H I M S T sort
Mergesort • Mergesort (divide-and-conquer) • Divide array into two halves. • Recursively sort each half. • Merge two halves to make sorted whole. A L G O R I T H M S A L G O R I T H M S divide A G L O R H I M S T sort A G H I L M O R S T merge
Mergesort Analysis • How long does mergesort take? • Bottleneck = merging (and copying). • merging two files of size N/2 requires N comparisons • T(N) = comparisons to mergesort N elements. • to make analysis cleaner, assume N is a power of 2 • Claim. T(N) = N log2 N. • Note: same number of comparisons for ANY file. • even already sorted • We'll prove several different ways to illustrate standard techniques.
Proof by Picture of Recursion Tree T(N) N 2(N/2) T(N/2) T(N/2) T(N/4) T(N/4) T(N/4) 4(N/4) T(N/4) log2N . . . 2k (N / 2k) T(N / 2k) . . . T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(2) N/2(2) Nlog2N
Proof by Telescoping • Claim. T(N) = N log2 N (when N is a power of 2). • Proof. For N > 1:
Mathematical Induction • Mathematical induction. • Powerful and general proof technique in discrete mathematics. • To prove a theorem true for all integers k 0: • Base case: prove it to be true for N = 0. • Induction hypothesis: assuming it is true for arbitrary N • Induction step: show it is true for N + 1 • Claim: 0 + 1 + 2 + 3 + . . . + N = N(N+1) / 2 for all N 0. • Proof: (by mathematical induction) • Base case (N = 0). • 0 = 0(0+1) / 2. • Induction hypothesis: assume 0 + 1 + 2 + . . . + N = N(N+1) / 2 • Induction step: 0 + 1 + . . . + N + N + 1 = (0 + 1 + . . . + N) + N+1 = N (N+1) /2 + N+1 = (N+2)(N+1) / 2
Proof by Induction • Claim. T(N) = N log2 N (when N is a power of 2). • Proof. (by induction on N) • Base case: N = 1. • Inductive hypothesis: T(N) = N log2 N. • Goal: show that T(2N) = 2N log2 (2N).
Proof by Induction • What if N is not a power of 2? • T(N) satisfies following recurrence. • Claim. T(N) Nlog2 N. • Proof. See supplemental slides.
Computational Complexity • Framework to study efficiency of algorithms. Example = sorting. • MACHINE MODEL = count fundamental operations. • count number of comparisons • UPPER BOUND = algorithm to solve the problem (worst-case). • N log2 N from mergesort • LOWER BOUND = proof that no algorithm can do better. • N log2 N - N log2 e • OPTIMAL ALGORITHM: lower bound ~ upper bound. • mergesort
Decision Tree a1 < a2 YES NO a2 < a3 a1 < a3 YES NO YES NO a1 < a3 a2 < a3 YES NO YES NO printa1, a2, a3 printa2, a1, a3 printa1, a3, a2 printa3, a1, a2 printa2, a3, a1 printa3, a2, a1
Comparison Based Sorting Lower Bound • Theorem. Any comparison based sorting algorithm must use(N log2N) comparisons. • Proof. Worst case dictated by tree height h. • N! different orderings. • One (or more) leaves corresponding to each ordering. • Binary tree with N! leaves must have height • Food for thought. What if we don't use comparisons? • Stay tuned for radix sort. Stirling's formula
Proof by Induction • Claim. T(N) Nlog2 N. • Proof. (by induction on N) • Base case: N = 1. • Define n1 = N / 2 , n2 = N / 2. • Induction step: assume true for 1, 2, . . . , N – 1.
Implementing Mergesort Item aux[MAXN]; void mergesort(Item a[], int left, int right) { int mid = (right + left) / 2; if (right <= left) return; mergesort(a, left, mid); mergesort(a, mid + 1, right); merge(a, left, mid, right); } mergesort (see Sedgewick Program 8.3) uses scratch array
Implementing Mergesort void merge(Item a[], int left, int mid, int right) { int i, j, k; for (i = mid+1; i > left; i--) aux[i-1] = a[i-1]; for (j = mid; j < right; j++) aux[right+mid-j] = a[j+1]; for (k = left; k <= right; k++) if (ITEMless(aux[i], aux[j])) a[k] = aux[i++]; else a[k] = aux[j--]; } merge (see Sedgewick Program 8.2) copy to temporary array merge two sorted sequences
Profiling Mergesort Empirically void merge(Item a[], int left, int mid, int right) <999>{ int i, j, k; for (<999>i = mid+1; <6043>i > left; <5044>i--) <5044>aux[i-1] = a[i-1]; for (<999>j = mid; <5931>j < right; <4932>j++) <4932>aux[right+mid-j] = a[j+1]; for (<999>k = left; <10975>k <= right; <9976>k++) if (<9976>ITEMless(aux[i], aux[j])) <4543>a[k] = aux[i++]; else <5433>a[k] = aux[j--]; <999>} void mergesort(Item a[], int left, int right) <1999>{ int mid = <1999>(right + left) / 2; if (<1999>right <= left) return<1000>; <999>mergesort(a, aux, left, mid); <999>mergesort(a, aux, mid+1, right); <999>merge(a, aux, left, mid, right); <1999>} Mergesort prof.out Striking feature:All numbers SMALL! # comparisonsTheory ~ N log2 N = 9,966Actual = 9,976
Sorting Analysis Summary • Running time estimates: • Home pc executes 108 comparisons/second. • Supercomputer executes 1012 comparisons/second. • Lesson 1: good algorithms are better than supercomputers. • Lesson 2: great algorithms are better than good ones. Insertion Sort (N2) Mergesort (N log N) computer thousand million billion thousand million billion home instant 2.8 hours 317 years instant 1 sec 18 min super instant 1 second 1.6 weeks instant instant instant Quicksort (N log N) thousand million billion instant 0.3 sec 6 min instant instant instant