420 likes | 443 Views
Explore essential data structures and algorithms for media applications. Understand complexities, sorting techniques, and correctness proofs for effective problem-solving.
E N D
Lecturer Dr. Minming Li Department ofComputer Science Room : Y6426 Phone : 27889538 Email : minmli@cs.cityu.edu.hk, mli000@cityu.edu.hk Mail box : P16(Outside CS General Office, Yellow zone, 6/F)
Course Web Page • What you can find at the course webpage: http://www.cs.cityu.edu.hk/~minmli/course.htm Lecture Slides Ready at least 24 hours before the lecture Tutorial Exercises Selected questions are to be done during tutorials or as homework Assignments and solutions Announcement from the teacher Important ones will also be distributed through email Tutorial Class Room changed to MMW-2478 Q&A
Reference Books • Weiss M.Data Structures & Algorithm Analysis in C++.3rd Ed. Addison Wesley (1999) • Hanan Samet The Design and Analysis of Spatial Data StructuresAddison Wesley (1989) • Harvey M. Deitel, Paul J. Deitel. Visual C++.NET :How to Program Prentice Hall (2003)
Tutorial exercises (0%) 2 Tests (7% +8%) 2 Assignments (6% + 9%) Exam (70%) Assessment Pattern Compulsory: exam mark >= 30
How to learn this course? • Better not read the textbook ahead of time • Try to keep up with the lecture • Read the materials carefully after the lecture • Do the assignments on your own • You may discuss with each other • You may study materials available on internet • You may refer to any book • But the details should be entirely your work • Respond in time if you have any suggestions to the course or problems with the course
Overview • Why Data Structures? • What to do when we have a large amount of data to deal with? • Organize it in ways easy to understand • Space efficiency • Time efficiency • Easy to display and transform
Data Data Data Overview • Linked List • Tree • Stack • Queue • Hashing Data Label
Overview PART I • Program Complexities • Abstract Data Types • Linked lists • Trees • Stacks • Queues • Heaps • Hash tables
Overview PART II • Vectors and Bitmaps • Quadtrees and Octrees • The handling of 2D and 3D data • Geometric Structures • Spatial Layout • Shape and Attributes • Connectivity of Components
Algorithms • What is an algorithm? A sequence of elementary computational steps that transform the input into the output • What for? A tool for solving well-specified computational problems, e.g., Sorting, Matrix Multiplication • What do we need to do with an algorithm? • Correctness Proof: for every input instance, it halts with the correct output • Performance Analysis: How does the algorithm behave as the problem size gets large both in running time and storage requirement
A Sorting Problem Input : <a0, a1, … , an-1> Output: A permutation (re-ordering) <a’0, a’1, … , a’n-1> of the input sequence such that a’0 a’1 … a’n-1 Example: <22, 51, 34, 44, 67, 11> => <11, 22, 34, 44, 51, 67>
Insertion Sort Note that when we are dealing with kth number, the first k-1 numbers are already sorted 5, 3, 1, 2, 6, 4 3, 5, 1, 2, 6, 4 1, 3, 5, 2, 6, 4 1, 2, 3, 5, 6, 4 1, 2, 3, 5, 6, 4 1, 2, 3, 4, 5, 6
0 … j j+1 . . 1 3 52 6 4 1 3 5 5 6 4 1 3 3 5 6 4 1 2 3 5 6 4 Currently sorted part Currently unsorted part Insertion Sort 5, 3, 1, 2, 6, 4 3, 5, 1, 2, 6, 4 1, 3, 5, 2, 6, 4 1, 2, 3, 5, 6, 4 1, 2, 3, 5, 6, 4 1, 2, 3, 4, 5, 6 • To sort A[0,1,…,n-1] in place • Steps: • Pick element A[j] • Move A[j-1,…,0] to the right until proper position for A[j] is found 1 3 52 6 4 Example
1 3 52 6 4 1 3 5 5 6 4 1 3 3 5 6 4 1 2 3 5 6 4 Insertion Sort A[0] A[1] A[2] A[3] A[4] A[5] j=1 53 1 2 6 4 j=23 51 2 6 4 j=31 3 52 6 4 j=4 1 2 3 56 4 j=5 1 2 3 5 64 1 2 3 4 5 6 Insertion-Sort (A) 1. for j=1 to n-1 2. key=A[j] 3. i=j-1 4. while i>=0 and A[i]>key 5. A[i+1]=A[i] 6. i=i-1 7. A[i+1]=key j=3
Correctness of Algorithm • We only consider algorithms with loops • Find a property as loop invariant • How to show something is loop invariant? • Initialization: It is true prior to the first iteration of the loop • Maintenance: If it is true before an iteration, it remains true before the next iteration • Termination: When the loop terminates, the invariant gives a useful property that helps to show the algorithm is correct
Correctness of Insertion Sort • loop invariant • At start of each iteration of for loop, A[0..j-1] consists of the elements originally in A[0..j-1] but in sorted order • Initialization • Before the first iteration, j=1. => A[0 .. j-1] contains only A[0]. => Loop invariant holds prior to the first iteration. • Maintenance • In each iteration, the algorithm moves A[j-1],A[j-2],A[j-3] .. to the right until the proper position for A[j] is found. Then A[j] is inserted. => if the loop invariant is true before an iteration, it remains true before next iteration. • Termination • The outer loop ends with j=n. Substituting n for j in the loop invariant, we get “A[______] consists of the n sorted elements.”
Running time of Insertion Sort Insertion-Sort(A) 1 for j = 1 to n-1 2 key = A[j] 3 i = j-1 4 while i >= 0 and A[i] > key 5 A[i+1] = A[i] 6 i = i - 1 7 A[i+1] = key Cost times c1 c2 c3 c4 c5 c6 c7 n n-1 n-1 j=1..n-1 (tj+1) j=1..n-1 tj j=1..n-1 tj n-1 c1, c2, .. = running time for executing line 1, line 2, etc. tj = no. of times that line 5,6 are executed, for each j. The running time T(n) = c1*n+c2*(n-1)+c3*(n-1)+c4*(j=1..n-1 (tj+1))+c5*(j=1..n-1 tj)+c6*(j=1..n-1 tj)+c7*(n-1)
Analyzing Insertion Sort T(n) = c1*n+c2*(n-1)+c3*(n-1)+c4*(j=1..n-1 (tj+1))+ c5*(j=1..n-1 tj)+c6*(j=1..n-1 tj)+c7*(n-1) Worse case: Reversely sorted inner loop body executed for all previous elements. tj=j. T(n) = c1*n+c2*(n-1)+c3*(n-1)+c4*(j=1..n-1 (j+1))+ c5*(j=1..n-1 j)+c6*(j=1..n-1 j)+c7*(n-1) T(n) = c1*n+c2*(n-1)+c3*(n-1)+c4*(j=1..n-1 (j+1))+ c5*(j=1..n-1 j)+c6*(j=1..n-1 j)+c7*(n-1) T(n) = An2+Bn+C Note: j=1..n-1 j = n(n-1)/2 j=1..n-1 (j+1) = (n+2)(n-1)/2
Analyzing Insertion Sort T(n)=c1*n+c2*(n-1)+c3*(n-1)+c4*(j=1..n-1 (tj+1))+c5*(j=1..n-1 tj)+c6*(j=1..n-1 tj)+c7*(n-1) Worst caseReverse sorted inner loop body executed for all previous elements. So, tj=j. T(n) is quadratic (square): T(n)=An2+Bn+C Average caseHalf elements in A[0..j-1] are less than A[j]. So, tj = j/2 T(n) is also quadratic: T(n)=An2+Bn+C Best caseAlready sorted inner loop body never executed. So, tj=0. T(n) is linear: T(n)=An+B
Kinds of Analysis (Usually) Worst case Analysis: T(n) = max time on any input of size n Knowing it gives us a guarantee about the upper bound. In some cases, worst case occurs fairly often (Sometimes) Average case Analysis: T(n) = average time over all inputs of size n Average case is often as bad as worst case. There really exists “good” example (Rarely) Best case Analysis: Cheat with slow algorithm that works fast on some input. Good only for showing bad lower bound. (New) Smoothed Analysis Average in the local region instead of all inputs
Kinds of Analysis Worst case Average case • Worst Case: maximum value • Average Case: average value • Best Case: minimum value Best case 0 1 2 3 4 n
Running time of algorithm in microseconds(in term of data size n) 4.32 * 10-6sec 5.32 * 10-6sec 5.91 * 10-6sec 4.47 * 10-6sec 6.32 * 10-6sec 7.75 * 10-6sec 20 * 10-6sec 40 * 10-6sec 60 * 10-6sec 86 * 10-6sec 213 * 10-6sec 354 * 10-6sec 400 * 10-6sec 1600 * 10-6sec 3600 * 10-6sec 0.16 sec 2.56 sec 1.05 sec 12.73 days 36571 years 77147 years 2.56 * 1034 years 2.64 * 1068 years Order of Growth Examples: Algorithm A Algorithm B Algorithm C Algorithm D Algorithm E Algorithm F Algorithm G Algorithm H
f(n) 90000 log2n 80000 Sqrt n 70000 n 60000 microseconds nlog2n 50000 n2 40000 30000 n4 20000 2n 10000 n n! 0 1 3 5 7 9 13 15 17 19 11 Order of Growth Assume: an algorithm can solve a problem of size n in f(n) microseconds (10-6 seconds). Note: for example, For all f(n) in (n4), the shapes of their curves are nearly the same as f(n)=n4.
cg(n) f(n) n0 f(n) is O(g(n)) Formal formulations: (We’ll not go into details of these equations.) c2g(n) (g(n)) = { f(n): there exist positive constants c1, c2, n0 such that 0 c1g(n) f(n) c2g(n) for all n n0 } f(n) c1g(n) n0 f(n) is (g(n)) Asymptotic Notation • Asymptotic Tight Bound: f(n)= (g(n)) Intuitively like “=” • f(n) grows as fast as g(n) • Asymptotic Upper Bound: f(n)= O (g(n)) Intuitively like “≤” • f(n) grows not faster than g(n) • Asymptotic Lower Bound Ω f(n)= Ω(g(n)) Intuitively like “≥” • f(n) grows not slower than g(n) O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 f(n) cg(n) for all n n0 }
Asymptotic Notation Note that:Running time of Insertion Sort is (n2) is incorrect. Why? Worst case running time of Insertion Sort is (n2) -- correct / incorrect* Best case running time of Insertion Sort is (n) -- correct / incorrect* Running time of Insertion Sort is O(n2) -- correct / incorrect* Asymptotic Tight Bound: Intuitively like “=” Asymptotic Upper Bound: Intuitively like “≤” Asymptotic Lower Bound Ω Intuitively like “≥” What about“<”? o versus O : o means better e.g. nlogn=o(n2) means nlogn grows slower than n2
Asymptotic Notation • Relationship between typical functions • log n = o (n) • n = o (n log n) • nc = o (2n) where nc may be n2, n4, etc. • If f(n)=n+log n, we call log n lower order terms (You are not required to analyze, but remember these relations) log n < sqrt(n) < n < nlog n < n2 < n4 < 2n < n! • Rule of combination (for positive function) • f(n) = O (g(n)) and h(n) = O (k(n)) • f(n)h(n) = O (g(n)k(n)) • f(n)+h(n) = O (g(n)+k(n))
Asymptotic Notation • When calculating asymptotic running time • Drop lower order terms • Ignore leading constants • Example 1: T(n) = An2+Bn+C • An2 • T(n) = O(n2) • Example 2: T(n) = Anlogn+Bn2+Cn+D • Bn2 • T(n) = O(n2) Remember: We can write T(n)=O(n2); T(n)= (n2), but not T(n) ≤ O(n2);
Asymptotic Performance Insertion-Sort(A) 1 for j = 1 to n-1 2 key = A[j] 3 i = j-1 4 while i >= 0 and A[i] > key 5 A[i+1] = A[i] 6 i = i – 1 7 A[i+1] = key O(n2) Very often the algorithm complexity can be observed directly from simple algorithms There are 4 very useful rules for such Big-Oh analysis ...
Rule 1. FOR LOOPS Rule 2. NESTED FOR LOOPS The running time of a for loop is at most the running time of the statements inside the for loop (including tests) times no. of iterations The total running time of a statement inside a group of nested loops is the running time of the statement multiplied by the product of the sizes of all the loops. for (i=0;i<N;i++) a++; for (i=0;i<N;i++) for (j=0;j<N;j++) k++; O(N) O(N2) Rule 3. CONSECUTIVE STATEMENTS Count the maximum one. for (i=0;i<N;i++) a++; for (i=0;i<N;i++) for (j=0;j<N;j++) k++; O(N2) Asymptotic Performance General rules for Big-Oh Analysis: Rule 4. IF / ELSE For the fragment: If (condition) S1 else S2, take the test + the maximum for S1 and S2.
Asymptotic Performance Example of Big-Oh Analysis: void function1(int n) { int i, j; int x=0; for (i=0;i<n;i++) x++; for (i=0;i<n;i++) for (j=0;j<n;j++) x++; } void function2(int n) { int i; int x=0; for (i=0;i<n/2;i++) x++; } This function is O(__) This function is O(__)
Asymptotic Performance Example of Big-Oh Analysis: void function3(int n) { int i; int x=0; if (n>10) for (i=0;i<n/2;i++) x++; else { for (i=0;i<n;i++) for (j=0;j<n/2;j++) x--; } } void function4(int n) { int i; int x=0; for (i=0;i<10;i++) for (j=0;j<n/2;j++) x--; } This function is O(__) This function is O(__)
Asymptotic Performance Example of Big-Oh Analysis: void function5(int n) { int i; for (i=0;i<n;i++) if (IsSignificantData(i)) SpecialTreatment(i); } Suppose IsSignificantData is O(n), SpecialTreatment is O(n log n) This function is O(____)
Asymptotic Performance • Recursion int Power(int base,int pow) { if (pow==0) return 1; else return base*Power(base,pow-1); } • Example 32=9 Power(3,2)=3*Power(3,1) Power(3,1)=3*Power(3,0) Power(3,0)=1 T(n): the number of multiplications needed to compute Power(3,n) T(n)=T(n-1)+1; T(0)=0 T(n)=n Running time of function Power(3,n) is O(n)
Asymptotic Performance • Why recursion? • Can’t we just use iteration (loop)? • The reason for recursion • Easy to program in some situations • Disadvantage • More time and space required • Example: • Tower of Hanoi Problem
The problem: Use fewest steps to move all disks from the source rod to the target without violating the rules through the whole process (given one intermediate rod for buffering)? a source rod an intermediate rod a target rod Tower of Hanoi Rules: (1) The disks must be stacked in order of size. (2) Each time move 1 disk. Given some rods for stacking disks.
Tower of Hanoi • Suppose you can manage the n-1 disks • How do you solve the n disks case? • A recursive solution: • Step 1: Move the top n-1 disks from source rod to intermediate rod via target rod • Step 2: Move the largest disk from source rod to target rod • Step 3: Move the n-1 disks from intermediate rod to target rod via source rod
Tower of Hanoi void Towers (int n, int Source, int Target, int Interm) { if (n==1) Console::Write(S“\nFrom {0} to {1}”, Source.ToString(), Target.ToString()); else { Towers(n-1, Source, Interm, Target); Towers(1, Source, Target, Interm); Towers(n-1, Interm, Target, Source); } } How many “Console::Write” are executed? T(n)=2T(n-1)+1 Towers (3,’A’,’C’,’B’) Towers (2,’A’,’B’,’C’) Towers (1,’A’,’C’,’B’) Towers (2,’B’,’C’,’A’)
Recursive Relation • T(n)=T(n-1)+A; T(1)=1 • T(n)=O(n) • T(n)=T(n-1)+n; T(1)=1 • T(n)=O(n2) • T(n)=2T(n/2) + n; T(1)=1 • T(n)=O(n log n) • T(n)=2T(n-1)+1 • T(n)=O(2n) • More general form: T(n)=aT(n/b)+cn • Master’s Theorem (You are not required to know)
Summary • Introduction / Insertion sort • Correctness of algorithm • Worst /Average case analysis • Order of growth • Asymptotic Performance • 4 rules for asymptotic analysis • Recursive programs