420 likes | 806 Views
CS 420 – Design of Algorithms. Basic Concepts. Design of Algorithms. We need mechanism to describe/define algorithms Independent of the language implementation of the algorithm Pseudo-code. Algorithms.
E N D
CS 420 – Design of Algorithms Basic Concepts
Design of Algorithms • We need mechanism to describe/define algorithms • Independent of the language implementation of the algorithm • Pseudo-code
Algorithms • Algorithm – “any well defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output” Cormen, et a.
Algorithms • Algorithm – “is a procedure (a finite set of well-defined instructions) for accomplishing some task which, given an initial state, will terminate in a defined end-state. “ • from Wikipedia.org • http://en.wikipedia.org/wiki/Algorithm
Algorithms • Human Genome Project • Security/encryption for e-commerce • Spacecraft navigation • Pulsar searches • Search Engines
Algorithms • Search engines • Search algorithms • linear search- • read-compare-read-… • run-time – linear function of n (n=size of database) • suppose the DB has 40,000,000 records • then 40,000,000 read-compare cycles • at 1000 read-compare cycles per second = 40,000 seconds = 667 minutes ~ 11 hours
Algorithms • Search Google for “house” • 730,000,000 hits in 0.1 seconds
Algorithms • Binary tree search algorithm • The keyword is indexed in a set of binary indexes – is keyword in left or right half of database? Database aaa-mon moo-zxy aaa-jaa jaa-mon moo-tev tew-zxy
Algorithms • Binary search algorithm • So, to search a 40,000,000 record database • for a single term – • T(40,000,000) = log2(40,000,000) • = 26 read compare cycles • at 1000 read/compare cycles/sec = 0.026 seconds
Algorithms • Binary Search Algorithm • So, what about 730,000,000 records • Search for a single keyword – • 30 read/compare cycles • or about 0.03 seconds
Pseudo-code • Like English – easily readable • Clear and consistent • Rough correspondence to language implementation • Should give a clear understanding of what the algorithm does
Using Pseudo-code • Use indentation to indicate block structure. Blocks of code at the same level of indentation. • Do not use “extra” statements like begin-end • Looping constructs and conditionals are similar to Pascal (while, for, repeat, if-then-else). In for loops the loop counter is persistent
Using Pseudo-code • Use a consistent symbol to indicate comments. Anything on line after this symbol is a comment, not code • Multiple assignment is allowed • Variables are local to a procedure unless explicitly declared as global • Array elements are specified by the array name followed by indices in square brackets… A[i]
Pseudo-code • .. indicates a range of values A[1..4] means elements 1,2,3,and 4 of array A • Compound data can be represented as objects with attributes or fields. Reference these attributes array references. For example a variable that is the length of the array A is length[A]
Pseudo-code • An array reference is a pointer • Parameters are passed by value • assignments to parameters within a procedure are local to the procedure • Boolean operators short-circuit • Be consistent • don’t use read one place and input another unless they have functionally different meaning
Insertion-Sort Algorithm INSERTION-SORT(A) for j = 2 to length[A] do key = A[j] C* Insert A[j] into the sorted sequence A[1..j-1] i=j-1 while i > 0 and A[i]> key do A[i+1] = A[i] i=i-1 A[i+1]=key
Analysis of Algorithms • Analysis may be concerned with any resources • memory • bandwidth • runtime • Need a model for describing runtime performance of an algorith • RAM – Random Access Machine
RAM • There are other models but for now… • Assume that all instructions are sequential • All data is accessible in one step • Analyze performance (run-time) in terms of inputs • meaning of inputs varies – size of an array, number of bits, vertices and edges, etc. • Machine independent • Language independent
RAM • Need to base analysis on cost of instruction execution • assign costs (run-time) to each instruction
INSERTION-SORT • Run-time = sum of products of costs (instruction runtimes) and execution occurrences • T(n)= c1n + c2(n-1) + c4(n-1) + c5nj=2tj +c6nj=2(tj-1) + c7nj=2(tj-1) +c8(n-1)
INSERTION-SORT • Best case vs Worst Case • Best case • Input array already sort • Worst case • Input array sorted in reverse order
INSERTION-SORT • For sake of discussion… • assume that all c=2 • then, for best case • T(n) = 10n-8 • n=1000, T(n) = 9992 • for worst case … • T(n) = 3n2+7n-8 • n=1000, T(n) = 3006992
Insertion-sort Performance * Best case is a linear function of n
So, what are we really interested in? • the big picture • the trend in run-time performance as the problem grows • not concerned about small differences in algorithms • what happens to the algorithm as the problem gets explosively large • the order of growth
Abstractions and assumptions • The cost coefficients will not vary that much… and will not contribute significantly to the growth of run-time performance • so we can set them to a constant • … and we can ignore them • remember the earlier example – • c1 = c2 = … = 2
Abstractions and assumptions • In a polynomial run-time function the order of growth is controlled by the higher order term • T(n) = 3n2+7n-8 • so we can ignore (discard) the lower order terms • T(n) = 3n2
Abstractions and assumptions • It turns out that with sufficiently large n the coefficient of the high order term is not that important in characterizing the order of growth of a run-time function • So, from that perspective the run-time function of the Insertion-Sort algorithm (worst-case) is - • T(n) = n2
Abstractions and assumptions • Are these abstraction assumptions correct? • for small problems – no • but for sufficiently large problem • they do a pretty good job of characterizing the run-time function of algorithms
Design of Algoritms • Incremental approach to algorithm design • Design for a very small case • expand the complexity of the problem and algorithm • Divide and Conquer • Start with a large (full)problem • Divide it into smaller problems • Solve smaller problems • Combine results from smaller problems
Another look at Sort algorithms • Suppose: • you have an array evenly divisible by two • in each half (left and right) values are already sorted in order • but not in order across the whole array • task: sort the array so that it is in order across the entire array
Merge Sorted subarrays • Split the array into two subarrays • Add a marker to each subarrays to indicate the end • Set index to first value of each subarray • Compare indexed (pointed to) value of each subarray • If either indexed value is an end-marker: move all remaining values (except the end-mark from the other subarray to the output array; Stop • Move the smallest of the two values to the output array (sorted); increment the index to that subarray • Go to step 4
Merge(A, p, q, r) • Where A is the array containing values to be sorted, each half is already sorted from smallest to largest • p = is the starting point index for the array A • q = is the end point index for the left side of array A (end of first half… sort of) • r = end index for array A • So, sort values from p to r from two halves of array A where q marks where to split the array into subarray
Merge(A, p, q, r) • n1 = q – p + 1 • n2 = r – q • c* create subarrays L[1..n1+1] and R[1..n2+1] • for i = 1 to n1 • do L[i] = A[p+i-1] • for j = 1 to n2 • do R[j] = A[q+j] • L[n1+1] = • R[n2+1] = • i = 1 • j = 1 • for k = p to r • do if L[i] <= R[j] • then A[k] = L[i] • i = i + 1 • else A[k] = R[j] • j = j + 1
MERGE_SORT(A,p,r) • if p < r • then q = (p+r)/2 • MERGE_SORT(A, p, q) • MERGE_SORT(A, q+1, r) • MERGE(A, p, q, r)
Asymptotic Notation • Big (theta) • (g(n)) = {f(n) : there exists two constants c1 and c2, n0 such that 0<=c1g(n)<=f(n)<=c2g(n) for all n >=n0}
Asymptotic Notation • Big O (oh) • O(g(n)) = {f(n) : there positive constants c and n0 such that 0<=f(n)<=cg(n) for all n >=n0}
Asymptotic Notation • Big (Omega) • (g(n)) = {f(n) : there positive constants c and n0 such that 0<=cg(n)<=f(n) for all n >=n0}
Asymptotic Notation • Little o (oh) • o(g(n)) = {f(n) : there positive constants c>0 and n0>0 such that 0<=f(n)<cg(n) for all n >=n0}
Asymptotic Notation • Little (omega) • (g(n)) = {f(n) : there positive constants c>0 there exists a constant n0 such that 0<=cg(n)<f(n) for all n >=n0 }