530 likes | 554 Views
Learn about algorithms, their properties, specification, pseudocode, and performance evaluation in CSCI 3333 Data Structures course by Dr. Bun Yue. Understand theoretical and experimental analysis tools like asymptotic analysis to evaluate algorithm efficiency.
E N D
Algorithm Analysis CSCI 3333 Data Structures byDr. Bun YueProfessor of Computer Scienceyue@uhcl.eduhttp://sce.uhcl.edu/yue/2013
Acknowledgement • Mr. Charles Moen • Dr. Wei Ding • Ms. Krishani Abeysekera • Dr. Michael Goodrich
Algorithm • “A finite set of instructions that specify a sequence of operations to be carried out in order to solve a specific problem or class of problems.”
Example Algorithm • What do you think about this algorithm?
Some Properties of Algorithms • A deterministic sequential algorithm should be: • Finiteness: Algorithm must complete after a finite number of instructions. • Absence of ambiguity: Each step must be clearly defined, having only one interpretation. • Definition of sequence: Each step must have a unique defined preceding & succeeding step. The first step (start step) & last step (halt step) must be clearly noted. • Input/output: There must be a specified number of input values and result values. • Feasibility: It must be possible to perform each instruction.
Specifying Algorithms • There are no universal formal. • By English. • By Pesudocode • By a programming language (not usually desirable.)
Specifying Algorithms • Must be included: • Input: type and meaning • Output: type and meaning • Instruction sequences. • May be included: • Assumptions • Side effects • Pre-conditions and post-conditions
Specifying Algorithms • Like sub-programs, algorithms can be broken down into sub-algorithms to reduce complexity. • High level algorithms may use English more. • Low level algorithms may use pseudocode more.
Pseudocode • High-level description of an algorithm • More structured than English prose • Less detailed than a program • Preferred notation for describing algorithms • Hide program design issues
Example: ArrayMax AlgorithmarrayMax(A, n) Input array A of n integers Output maximum element value of A currentMax A[0] for i 1 to n 1 do if A[i] currentMax then currentMax A[i] return currentMax • What do you think?
Example Algorithm: BogoSort Algorithm BogoSort(A); Do while (A is not sorted) randomize(A); End do; • What do you think?
Better Specification Algorithm BogoSort(A) Input: A: an array of comparable elements. Output: none Side Effect: A is sorted. while not sorted(A) randomize(A); end while;
Sorted(A) Algorithm Sorted(A) Input: A: an array of comparable elements. Output: a boolean value indicating whether A is sorted (in non-descending order); for (i=0; i<A.size-2; i++) if A[i] > A[i+1] return false; end for; return true;
BogoSort • Is correct! • Terrible performance!
Criteria for Judging Algorithms • Correctness • Accuracy (especially for numerical methods and approximation problems) • Performance • Simplicity • Ease of implementations
How Bogosort Fare? • Correctness: yes • Accuracy: yes (not really applicable) • Performance: terrible even for small arrays • Simplicity: yes • Ease of implementation: medium
Summation Algorithm • Correctness: ok. • Performance: bad, proportional to N. • Simplicity: not too bad • Ease: not too bad
Summation Algorithm • A better summation algorithm Algorithm Summation(N) Input: N: Natural Number Output: Summation of 1 to N. return N * (N+1) / 2; • Performance: constant time independent of N.
Performance of Algorithms • Performance of algorithms can be measured by runtime experiments, called benchmark analysis.
Experimental Studies • Write a program implementing the algorithm • Run the program with inputs of varying size and composition • Use a method like System.currentTimeMillis() to get an accurate measure of the actual running time • Plot the results
Limitations of Experiments • It is necessary to implement the algorithm, which may: • be difficult, • introduce implementation related performance issues. • Results may not be indicative of the running time on other inputs not included in the experiment. • In order to compare two algorithms, the same hardware and software environments must be used.
Theoretical Analysis • Uses a high-level description of the algorithm instead of an implementation • Characterizes running time as a function of the input sizes (usually n) • Takes into account all possible inputs • Allows us to evaluate the speed of an algorithm independent of the hardware/software environment
Analysis Tools (Goodrich, 166) Asymptotic Analysis • Developed by computer scientists for analyzing the running time of algorithms • Describes the running time of an algorithm in terms of the size of the input – in other words, how well does it “scale” as the input gets larger • The most common metric used is: • Upper bound of an algorithm • Called the Big-Oh of an algorithm, O(n)
Analysis Tools (Goodrich web page) Algorithm Analysis • Use asymptotic analysis • Assume that we are using an idealized computer called the Random Access Machine (RAM) • CPU • Unlimited amount of memory • Accessing a memory cell takes one unit of time • Primitive operations take one unit of time • Perform the analysis on the pseudocode
Analysis Tools (Goodrich, 48, 166) Pseudocode • High-level description of an algorithm • Structured • Not as detailed as a program • Easier to understand (written for a human) Algorithm arrayMax(A,n): Input: An array A storing n >= 1 integers. Output: The maximum element in A. currentMax A[0] for i 1 to n – 1 do if currentMax < A[i] then currentMax A[i] return currentMax
Analysis Tools (Goodrich, 48) Pseudocode Details Declaration Algorithm functionName(arg,…) Input:… Output:… Control flow • if…then…[else…] • while…do… • repeat…until… • for…do… Indentation instead of braces Return value return expression Expressions • Assignment (like in Java) • Equality testing (like in Java)
Analysis Tools (Goodrich, 48) Pseudocode Details Declaration Algorithm functionName(arg,…) Input:… Output:… Control flow • if…then…[else…] • while…do… • repeat…until… • for…do… Indentation instead of braces Return value return expression Expressions • Assignment (like in Java) • Equality testing (like in Java) Tips for writing an algorithm Use the correct data structure Use the correct ADT operations Use object-oriented syntax Indent clearly Use a return statement at the end
Analysis Tools (Goodrich, 164–165) Primitive Operations • To do asymptotic analysis, we can start by counting the primitive operations in an algorithm and adding them up (need not be very accurate). • Primitive operations that we assume will take a constant amount of time: • Assigning a value to a variable • Calling a function • Performing an arithmetic operation • Comparing two numbers • Indexing into an array • Returning from a function • Following a pointer reference
Analysis Tools (Goodrich, 166) Counting Primitive Operations • Inspect the pseudocode to count the primitive operations as a function of the input size (n) Algorithm arrayMax(A,n): currentMax A[0] for i 1 to n – 1 do if currentMax < A[i] then currentMax A[i] return currentMax Count Array indexing + Assignment 2 Initializing i 1 Verifying i<n n Array indexing + Comparing 2(n-1) Array indexing + Assignment 2(n-1) worst Incrementing the counter 2(n-1) Returning 1 Best case: 2+1+n+4(n–1)+1 = 5n Worst case: 2+1+n+6(n–1)+1 = 7n-2
Analysis Tools (Goodrich, 165) Best, Worst, or Average Case • A program may run faster on some input data than on others • Best case, worst case, and average case are terms that characterize the input data • Best case – the data is distributed so that the algorithm runs fastest • Worst case – the data distribution causes the slowest running time • Average case – very difficult to calculate • We will concentrate on analyzing algorithms by identifying the running time for the worst case data
Analysis Tools (Goodrich, 166) Running Time of arrayMax • Worst case running time of arrayMax f(n) = (7n – 2) primitive operations • Actual running time depends on the speed of the primitive operations—some of them are faster than others • Let t = speed of the slowest primitive operation • Let f(n) = the worst-case running time of arrayMax f(n) = t (7n – 2)
Analysis Tools (Goodrich, 166) Growth Rate of arrayMax • Growth rate of arrayMax is linear • Changing the hardware alters the value of t, so that arrayMax will run faster on a faster computer • Growth rate does not change. Slow PC 10(7n-2) Fast PC 5(7n-2) Fastest PC 1(7n-2)
Analysis Tools Tests of arrayMax • Does it really work that way? • Used arrayMax algorithm from p.166, Goodrich • Tested on two PCs • Yes, results both show a linear growth rate Pentium III, 866 MHz Pentium 4, 2.4 GHz
Big-Oh Notation • Given functions f(n) and g(n), we say that f(n) is O(g(n))if there are positive constantsc and n0 such that f(n)cg(n) for n n0 • Example: 2n+10 is O(n) 2n+10cn (c 2) n 10 n 10/(c 2) Pick c = 3 and n0 = 10
Big-Oh Meaning • Big-O expresses an upper bound on the growth rate of a function, for sufficiently large values of n. • It shows the asymptotic behavior. • Example: 3n2 + n lg n +2000 = O(n2) 3n2 + n lg n +2000 = O(n3)
Big-Oh Example • Example: the function n2is not O(n) • n2cn • n c • The above inequality cannot be satisfied since c must be a constant
Big-Oh Proof Example 50n2+100n lgn + 1500 is O(n2) Proof: Need to find c such that 50n2+100n lgn + 1500 < cn2 For example, pick c = 151 and n0 = 40. You may complete the proof.
Big-Oh and Growth Rate • The big-Oh notation gives an upper bound on the growth rate of a function • The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n) • We can use the big-Oh notation to rank functions according to their growth rate
Big-O Theorems For all the following theorems, assume that f(n) is a function of n and that K is an arbitrary constant. • Theorem1: K is O(1) • Theorem 2: A polynomial is O(the term containing the highest power of n) f(n) = 7n4 + 3n2 + 5n + 1000 is O(n4) • Theorem 3: K*f(n) is O(f(n)) [that is, constant coefficients can be dropped] g(n) = 7n4 is O(n4) • Theorem 4: If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n) is O(h(n)). [transitivity]
Big-O Theorems • Theorem 5: Each of the following functions is strictly big-O of its successors: K [constant] logb(n) [always log base 2 if no base is shown] n nlogb(n) n2 n to higher powers 2n 3n larger constants to the n-th power n! [n factorial] nn For Example: f(n) = 3nlog(n) is O(nlog(n)) and O(n2) and O(2n) smaller larger
Big-O Theorems • Theorem 6: In general, f(n) is big-O of the dominant term of f(n), where “dominant” may usually be determined from Theorem 5. f(n) = 7n2+3nlog(n)+5n+1000 is O(n2) g(n) = 7n4+3n+106 is O(3n) h(n) = 7n(n+log(n)) is O(n2) • Theorem 7: For any base b, logb(n) is O(log(n)).
Asymptotic Algorithm Analysis • The asymptotic analysis of an algorithm determines the running time in big-Oh notation • To perform the asymptotic analysis • We find the worst-case number of primitive operations executed as a function of the input size • We express this function with big-Oh notation • Example: • We determine that algorithm arrayMax executes at most 7n 2 primitive operations • We say that algorithm arrayMax “runs in O(n) time” • Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them when counting primitive operations
Example: Computing Prefix Averages • We further illustrate asymptotic analysis with two algorithms for prefix averages • The i-th prefix average of an array X is average of the first (i+ 1) elements of X: A[i]= (X[0] +X[1] +… +X[i])/(i+1) • Computing the array A of prefix averages of another array X has applications to financial analysis
Prefix Average (Quadratic) • The following algorithm computes prefix averages in quadratic time by applying the definition AlgorithmprefixAverages1(X, n) Inputarray X of n integers Outputarray A of prefix averages of X #operations A new array of n integers n fori0ton 1do n sX[0] n forj1toido 1 + 2 + …+ (n 1) ss+X[j] 1 + 2 + …+ (n 1) A[i]s/(i+ 1)n returnA 1
The running time of prefixAverages1 isO(1 + 2 + …+ n) The sum of the first n integers is n(n+ 1) / 2 There is a simple visual proof of this fact Thus, algorithm prefixAverages1 runs in O(n2) time Arithmetic Progression
Prefix Averages (Linear) • The following algorithm computes prefix averages in linear time by keeping a running sum AlgorithmprefixAverages2(X, n) Inputarray X of n integers Outputarray A of prefix averages of X #operations A new array of n integers n s 0 1 fori0ton 1do n ss+X[i] n A[i]s/(i+ 1)n returnA 1 • Algorithm prefixAverages2 runs in O(n) time
Math you need to Review • Summations • Logarithms and Exponents • Proof techniques • Basic probability • properties of logarithms: logb(xy) = logbx + logby logb (x/y) = logbx - logby logbxa = alogbx logba = logxa/logxb • properties of exponentials: a(b+c) = aba c abc = (ab)c ab /ac = a(b-c) b = a logab bc = a c*logab
Relatives of Big-Oh • big-Omega (Lower bound) • f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) c•g(n) for n n0 • big-Theta (order) • f(n) is (g(n)) if there are constants c’ > 0 and c’’ > 0 and an integer constant n0 1 such that c’•g(n) f(n) c’’•g(n) for n n0
Intuition for Asymptotic Notation Big-Oh • f(n) is O(g(n)) if f(n) is asymptotically less than or equal to g(n) big-Omega • f(n) is (g(n)) if f(n) is asymptotically greater than or equal to g(n) big-Theta • f(n) is (g(n)) if f(n) is asymptotically equal to g(n)
Relatives of Big-Oh Examples • 5n2 is (n2) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) c•g(n) for n n0 let c = 5 and n0 = 1 • 5n2 is also (n) f(n) is (g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) c•g(n) for n n0 let c = 1 and n0 = 1 • 5n2 is (n2) f(n) is (g(n)) if it is (n2) and O(n2). We have already seen the former, for the latter recall that f(n) is O(g(n)) if there is a constant c > 0 and an integer constant n0 1 such that f(n) <c•g(n) for n n0 Let c = 5 and n0 = 1