CS 235102 Data Structures ( 資料結構 )

CS 235102 Data Structures (資料結構) Chapter 1: Basic Concepts Spring 2012

What is an algorithm? • An “algorithm” is a set of instructions that solves a well-defined computational problem. • Algorithms must satisfy the following criteria: • Input.Zero/more quantities are externally supplied. • Output. At least one quantity is produced. • Definiteness.Each instruction is clear and unambiguous. • Finiteness. Terminate after a finite no. of steps. • Effectiveness. Each instruction is basic and feasible to be computed easily.

Describing an algorithm • Many ways to describe an algorithm: • Use English sentences • Graphic representations (called Flowcharts) • Work well only if algorithms are small and simple • Use C language (mixing with English sentences) • Our practice in our course

Binary Search • Assume we have n ≥ 1 distinct integers that are sorted in array A[0 … n-1]. Determine if an integer x in the array. If x=A[j], return index j; otherwise return -1. A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] • Eg. For x=9, return index 4; For x=10, return -1. 1 3 5 8 9 17 32 50 A

Binary Search • Step 1. Find element A[mid] at middle position. • Step 2. If x = A[mid], then return mid; If x < A[mid], then search left-half; If x > A[mid], then search right-half;

Algorithm of Binary Search int binsearch(int A[], int x) //search x in A[] { int left=0, right=n-1, mid; while (left <= right) { //more integers to check // let A[mid] be the middle element mid = (left+right)/2; if (x == A[mid]) return mid; if (x < A[mid]) right = mid-1; if (x > A[mid]) left = mid+1; } return -1; }

Recursive Algorithm • Direct recursion • Procedure calls itself directly Eg.procA proc A • Indirect recursion • Procedure calls other procedures that invoke the calling procedure Eg. procA procB  procA • Some problem itself is defined recursively.

Binomial Coefficients • Binomial coefficient C(n, m) = can be computed by the recursive formula: C(n, m) = C(n-1, m) + C(n-1, m-1) where C(0, 0) = C(n, n) = 1. n! m! (n-m)!

Computing Binomial Coefficients // Compute binomial coefficient C(n,m) recursively. int bin_coeff(int n, m) { // termination conditions if (m==n) then return 1; else if (m==0) then return 1; // recursive step else return bin_coeff(n-1,m) + bin_coeff(n-1,m-1); }

Hints for Recursive Algorithms • To write a recursive algorithm, someone must make sure to have • Termination conditions; • Parameter values decrease so that each call brings us one step closer to a solution.

Recursive Binary Search int binsearch(int A[], x, left, right) { int mid; if (left <= right) { //more integers to check // let A[mid] be the middle element mid = (left+right)/2; if (x == A[mid]) return mid; if (x < A[mid]) return binsearch(A, x, left, mid-1); if (x > A[mid]) return binsearch(A, x, mid+1, right); } return -1; }

A Running Example • Search for x=9 in array A[0…7] : A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] • 1st call: binsearch(A, 9, 0, 7) 2nd call: binsearch(A, 9, 4, 7) 3rd call: binsearch(A, 9, 4, 4) return index 4. 1st 3rd 2nd 1 3 5 8 9 17 32 50 A

Criteria for Good Programs • Many criteria to judge a program: • Meeting the original task specification? • Does it work correctly? • Documentation for using the program? • Modularity • Code readability • Although the above criteria are vitally important, it is difficult to achieve them. • In order to achieve them, many real experience and practice are needed.

Performance Analysis • Two criteria for performance analysis/evaluation: • How much memory space is needed? • How much running time is needed? • Performance analysisvs.performance measurement • Performance analysis: • machine independent • a prior estimate • heart of “complexity theory” • Performance measurement: • machine dependent • a posterior testing

Performance Analysis • Performance Analysis contains two parts: Space Complexity and Time Complexity. • Space Complexity includes also two parts: • 1st part -- A fixed part: • independent of the no. and the size of input and outputs. • includes instruction space, space for simple variables, fixed-size structured variables, constants. • 2nd part -- A variable part: (see next slide …)

Performance Analysis • Space Complexity includes also two parts: • 2nd part -- A variable part: • depends on the particular problem instance I being solved. • includes recursion stack space, structured variable whose size depends on the particular instance I. • Thus Space Complexity S(P) = 1st part + 2nd part = C + SP(I) constant Concentrate on evaluate this term

Instance Characteristic (I) • Commonly used characteristics (I) include the number, size and values of the inputs and output. also two parts: • Eg. sorting(A[], n) Then I= number of integers = n. • Eg. Summing 1 to n, i.e., 1+2+3+… n Then I= value of n = n.

Space Complexity • Eg. (See Program 1.10 in the textbook.) • C= space for the program + • space for variables a, b, c, abc • = constant • SP(I) = 0 • Thus S(P) = C + SP(I) = constant. float abc(float a, b, c) { return a+b+b*c+(a+b-c)/(a+b)+4.00; }

Iterative Summing (Program 1.11) // Compute the sum of n numbers in A[] iteratively. float sum(float A[], int n) {float tempsum = 0; int i; for (i=0; i<n; i++) tempsum += A[i]; return tempsum; } • In C: -- Pass an array by reference only. • -- Pass other parameters by value. • In PASCAL: -- All parameters can be passed by reference or value.

Iterative Summing (Program 1.11) • Instance characteristic (I) = n(=size of array A[]) • If passed by reference (eg. in C) • Ssum(I) = Ssum(n) = 0 • If passed by value, • Ssum(I) = Ssum(n) = n

Recursive Summing (Program 1.12) // Compute the sum of n numbers in A[] recursively. float rsum(float A[], int n) { if (n) return rsum(A, n-1) + A[n-1]; return A[0]; } • Instance characteristic (I) = n • Each call requires • 4 ∙ ( 1 + 1 + 1) = 12 bytes • How many calls (recursive depth): • rsum(A, n)  rsum(A,n-1)  …  rsum(A, 0) • ==> n+1 calls • Srsum(I) = Srsum(n) = 12 ∙ (n+1)

Time Complexity • Time taken by program P: T(P) = Compile time + Running time = TC+ TP(I) • How to evaluate TP(I) ?? • Add, Sub, Multiply, … take different running time • Use “program step” to estimate running time • “program step” = a program segment whose execution time is independent of instance characteristic I. • Eg. abc=a+b+b*c; -- one program step a=2; -- one program step constant

Iterative Summing (Program 1.11) // Compute the sum of n numbers in A[] iteratively. float sum(float A[], int n) {float tempsum = 0; // 1 step int i; for (i=0; i<n; i++) // n+1 steps tempsum += A[i]; // n steps return tempsum; // 1 step } • Instance charateric (I) = n (=size of array A[]) • Tsum(I) = Tsum(n) = 1 + (n+1) + n + 1 • =2n + 3

Recursive Summing (Program 1.12) // Compute the sum of n numbers in A[] recursively. float rsum(float A[], int n) { if (n) // 1 step return rsum(A, n-1) + A[n-1]; // 1 step return A[0]; // 1 step } • Instance characteristic (I) = n • Recurrence relation for Trsum(n) : • Trsum(0) = 2 • Trsum(n) = 2 +Trsum(n-1) • = 2 +( 2 +Trsum(n-2) ) • = … • = 2n + Trsum(0) = 2n + 2

Matrix Addition (Program 1.16) // Compute the sum of n numbers in A[] iteratively. void add(int a[][MAX_SIZE], b[][MAX_SIZE], c[][MAX_SIZE], int rows, int cols) { int i, j ; for (i=0; i<rows; i++) { //rows+1 steps for (j=0; j<cols; j++) //rows(cols+1) steps c[i][j] = a[i][j]+b[i][j]; //rows(cols) steps } } • Instance charateric (I) = rows(cols) • TP(rows,cols) = (rows+1) + rows(cols+1) + rows(cols) • = 2∙rows∙cols + 2∙rows + 1

Observation on Step Counts • In the previous examples : • Tsum(n) = 2n + 3 steps • Trsum(n) = 2n + 2 steps • Can we say that rsum is faster than sum ? • No. Since the execution time of each step is different. • The step count is useful in that • “How the running time changes with changes in the instance characteristic?”

Growth Rate of Time Complexity • Eg. For sum program, Tsum(n) = 2n + 3 “means”: • when n  10 fold ==>Tsum(n)  10 fold • sum program runs in linear time. • We only want to know the growth rate (called asymptotic time) • Eg. Tsum(n) = 2n + 3 vs. Trsum(n) = 2n + 2 Then clearly Tsum(n) and Trsum(n) have the same growth rate.

Asymptotic Notation (Big-O) • To compare the time complexity of two programs that compute the same function. • To predict the growth rate in run time as the instance characteristic increases. • Eg. Two programs with time complexity: P1: c1 n2 + c2 n P2: c3 n • Let c1 =1, c2 =2, and c3 =100. Then • P1 faster, n2 + 2n ≤ 100n for n ≤ 98 • P2 faster, n2 + 2n > 100n for n > 98

Asymptotic Notation (Big-O) • Eg. Two programs with time complexity: P1: c1 n2 + c2 n P2: c3 n • Let c1 =1, c2 =2, and c3 =1000. Then • P1 faster, n2 + 2n ≤ 100n for n ≤ 998 • P2 faster, n2 + 2n > 100n for n > 998 • No matter what values c1, c2, and c3 are, there will be n beyond which c1 n2 + c2 n > c3 n

Definition of Big-O • Definition of O-notation: f(n) = O(g(n)) iff these exist c, no>0 such that f(n) ≤ cg(n) for all n ≥ no . • Eg. 3n + 2 = O(n) since 3n+2 ≤ 4n for all n ≥ 2 • Eg. 100n + 6 = O(n) since 100n+6 ≤ 101 n for all n ≥ 10 • Eg. 10n2 + 4n + 2 = O(n2) since 10n2 + 4n + 2 ≤ 11 n2for all n ≥5

Observation of Big-O • Leading constants and lower-order terms do not matter. • Can always find a constant large enough to make higher-order term swamp other terms. • Thm 1.2: If f(n) = amnm + … + a1n + ao, then f(n) = O(nm). • Eg. 10n2 + 4n + 2 = O(n2)

Property of Big-O • Thm 1.2: If f(n) = amnm + … + a1n + ao, then f(n) = O(nm). • Pf. f(n) ≤ |am| nm + … + |a1| n + |ao| ≤ nm (|am| + … + |a1| + |ao|) ≤ cnm , wherec=|am| + … + |a1| + |ao| for n ≥ 1 Hence f(n) = O(nm).

More Examples • 0.1 n2- 10n - 6 = O(n2) • n + log n = O(n) • n + n log n = O(n log n) • n2 + log n = O(n2) • 2n + n10000 = O(2n) • n4+ 1000 n3 + n2 = O(n4) • n4+ 1000 n3 + n2 = O(n5) • n4+ 1000 n3 + n2 ≠ O(n3) always 1 We don’t write O(2n2)

Naming Common Functions • O(1) -- constant time • O(log n) -- logarithmic time • O(n) -- linear time • O(n2) -- quadratic time • O(n3) -- cubic time • O(n100) -- polynomial time • O(2n) -- exponential time When n is large enough, the latter terms take more time than the former ones.

More Notes on Big-O • f(n) =O(g(n)) “means” g(n) is an upper bound of f(n) • n = O(n) n = O(n2) n = O(n3) • Big-O is usually used to compute the worst-case running time of a program. • f(n) =O(g(n)) is correct; but O(g(n)) =f(n)is wrong. We want g(n) as small as possible !!

Compute Running Time in Big-O • How to compute the time complexity of a program in big-Oh ? • Compute the total step-count, then take big-Oh. • Take big-Oh on each step, then sum up the big-Oh of all steps.

Rule of Sum • Thm: If f1(n) = O(g1(n)), and f2(n)=O(g2(n)), then f1(n) + f2(n) = O(max(g1(n), g2(n)). • Eg. f1(n) = O(n) f2(n) = O(n2) Then f1(n) + f2(n) = O(n2). • Eg. f1(n) = O(n) f2(2) = O(n) Then f1(n) + f2(n) = O(n). • Used to compute segments of program P1 and P2 if P1 is followed by P2.

Rule of Product • Thm: If f1(n) = O(g1(n)), and f2(n)=O(g2(n)), then f1(n) ∙ f2(n) = O(g1(n)∙ g2(n)). • Eg. f1(n) = O(n) f2(n) = O(n) Then f1(n) ∙ f2(n) = O(n2). • Used in time analysis of nested loops (See next slide …)

Rule of Product for (i=0; i<n; i++) { // O(n) for (j=0; j<n; j++) // O(n) sum := sum + 1; // O(1) } • By rule of product, running time of this program: • f(n) = O(n ∙ n∙ 1) • = O(n2).

Complexity of Binary Search int binsearch(int A[], int x) //search x in A[] { int left=0, right=n-1, mid; while (left <= right) { // O(log2 n) // let A[mid] be the middle element mid = (left+right)/2; // O(1) if (x == A[mid]) return mid; // O(1) if (x < A[mid]) right = mid-1; // O(1) if (x > A[mid]) left = mid+1; // O(1) } return -1; // O(1) }

Complexity of Binary Search • Analysis of the while loop: • Iteration 1: n values to be searched • Iteration 2: n/2 left for searching • Iteration 3: n/4 left for searching • … • Iteraton k+1: n/(2k) left for searching When n/(2k) = 1, searching must finish. That is n = 2k ==> k = log2 n • Hence, worst-case running time of binary search is O(log2 n).

Definition of Big-Ω • Definition of Ω-notation: f(n) = Ω(g(n)) iff these exist c, no>0 such that f(n) ≥ cg(n) for all n ≥ no . • Eg. 3n + 2 = Ω(n) since 3n+2 ≥ 3n for all n ≥ 1 • Eg. 100n + 6 = Ω(n) since 100n+6 ≥ 100 n for all n ≥ 1 • Eg. 10n2 + 4n + 2 = Ω(n2) since 10n2 + 4n + 2 ≥ n2for all n ≥1

Definition of Big-Θ • Definition of Θ-notation: f(n) = Θ(g(n)) iff f(n) = O(g(n)) and f(n) = Ω(g(n)). • Eg. 3n + 2 = Θ(n) • Eg. 100n + 6 = Θ(n) • Eg. 10n2 + 4n + 2 = Θ(n2)

Plot of Common Function Values

Running Times On Computer

Performance Measurement • Obtain actual space and time requirement when running a program. • How to do time measurement in C ? • Method 1:Use clock(), measured in clock ticks • (See next slides for details …) • Method 2:Use time(), measured in seconds • (See next slides for details …) • To time a short event, it is necessary to repeat it many times, and then take their average.

Performance Measurement • Method 1: Use clock(), measured in clock ticks #include <time.h> void main() { clock_t start = clock(); // main body of program comes here! clock_t stop = clock(); double duration = ((double) (stop-start)) / CLOCKS_PER_SEC; }

Performance Measurement • Method 2: Use time(), measured in seconds #include <time.h> void main() { time_t start = time(NULL); // main body of program comes here! time_t stop = time(NULL); double duration = (double) difftime(stop,start); }

CS 235102 Data Structures ( 資料結構 )

CS 235102 Data Structures ( 資料結構 )

Presentation Transcript

ECE 242 Spring 2003 Data Structures in Java

Data Structures

caBIG Data Structures

Tutorial on Statistical N-Body Problems and Proximity Data Structures

Algorithms and Data Structures for Low-Dimensional Topology

Introduction to Algorithms and Data Structures

CSC 211 Data Structures Lecture 17

Abstract Data Types and Stacks

Kagan Structures

Data Structures Intro

CSE 326 Data Structures Part 6: Priority Queues, AKA Heaps

Linear Data Structures (Stack)

Intro to Computer Science I

241-423 Advanced Data Structures and Algorithms

CS 61b: Final Review

Advanced Data Structures NTUA 2007 R-trees and Grid File

241-423 Advanced Data Structures and Algorithms

Data Manipulation Using MySQL

COMPILER CONSTRUCTION

CS221: Algorithms and Data Structures Lecture #1 Complexity Theory and Asymptotic Analysis

CS 235102 Data Structures ( 資料結構 )

C++ Plus Data Structures