90 likes | 107 Views
Explore the complexity of algorithms & regular expressions in Bioinformatics, comparing speeds & bounds for efficient problem-solving. Understand DNA sequences representation through graph automation & grammar expressions. Utilize ScanProsite for motif analysis.
E N D
Algorithm Complexity, Regular Expression Ka-Lok Ng Department of Bioinformatics Asia University
Fast versus Slow Algorithms • typical speed of a CPU ~ GHz 10-9 s for each operation • estimate the running time of the algorithm ~ the total number of operations that the algorithm performs compare different algorithms that solves the same problem • suppose algorithm A ~ 11n3 and algorithm B ~ 99n2+7 operations • algorithm B is faster for large n • Brute force algorithm is an exponential algorithm in contrast to polynomial algorithms (n2, n3 …)
Big-O notation (Time complexity) • O(n2) 5n2+3.2n+99993 the dominant term is n2 • Big-O relationship establishes an upper bound on the growth of a function • F(n) = O(g(n)) The function f grows no faster than the function g (upper bound) • F(n) = W(g(n)) The function f grows no slower than the function g (lower bound) • An algorithm time grown no faster than g and no slower than g g is a tight bound • If an algorithm requires 2nlog n , technically it is anO(n2) algorithm , although this is a misleading loose bound. A tight bound is O(nlog n). It is often easier to prove a loose bound than a tight one.
Graph as automation • Consider the following four DNA sequences, ACAATG ACAAATC AGAATC ACCGATC • These four sequences can be represented by a special sort of graph, Figure 3.13, called an automation. • Remarks: (1) draw allow and circle, (2) write down the character, (3) loops back to earlier states and self-stats are allowed, and (4) fill in states 1 ~ 8.
Expressions and grammar The following four DNA sequences, ACAATG ACAAATC AGAATC ACCGATC can be represented by an expression, so-called regular expression, A [ G | C+ | C+G] A* T [G | C] Where ‘*’ mean ‘zero or more occurrences’ , ‘+’ mean ‘one more occurrences’ and ‘[|]’ mean ‘or’, with alternatives provided on either side of the middle stick.
Expressions and grammar Figure 3.13 can be represented by Table 3.3.
Expressions and grammar • Figure 3.13 can be represented by the following six transition rules, • Those four DNA sequences are represented by the six rules.
ScanProsite • http://expasy.org/tools/scanprosite/ • Parameter setting: human, at least 10 hits, show 100 results only
ScanProsite • A-X-[ST](2)-X(0,1)-V motif