1 / 25

Strings

Delve into the fundamentals of computational biology: strings, graphs, and automata. Learn about string processing, graph theory, and automation principles in this introductory exploration.

jperkins
Download Presentation

Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Strings • Basic data type in computational biology • A string is an ordered succession of characters or symbols from a finite set called an alphabet • Sequence is synonymous with string • s = AATGCA • Length, |s| = 6, s[1] = A • Empty string

  2. Strings • Substring t is a string from consecutive characters of the parent s • Superstring s is parents string of substring t • s[i,j] indicates characters from string s between indices i and j. • Concatenation of two strings is st • prefix and suffix

  3. Graphs • A graph consists of two sets • V: the set of nodes or vertices • E: the set of edges (pair of vertices) • G=(V,E) • Simple graph: No loops • Directed Graphs: Directed Edges • valence (in and out degree of vertex) • Weighted Graphs

  4. Connectedness • Cycles: No edge repeated and return to start • Acyclic: no cycles • Complete: Every possible edge • Bipartite: Separated into two disjoint subsets • Tree: acyclic and connected graph (root, leaves) • Interval Graphs: Collection of intervals of real line with edge if intersection nonempty

  5. Graph Problems • Hamiltonian: Cycle with every vertex on it • Eulerian: Every edge in cycle but only once • Coloring: Minimum number of colors so that no two adjacent vertices have same color • Matching: Subset of edges such that no two edges in M share an endpoint • Adjacency Matrix

  6. Finite Automata • A Finite Collection of States Q • A finite alphabet E of input signals • A function d which for every possible combination of current state and input determines a new state. • Two special states, Initial and Final or Accepting state.

  7. The FA accepts any sequence of symbols that puts it in an accepting state • The set of all such sequences is the language of the automaton Input Accept Reset

  8. State Transition Diagram ? ? ? ? 0 1 0 1 1 4 0 1 2 3 0 1 0 1 0 ? 1 5 0 ?

  9. Regular Expressions • 01(001)*01 • Language accepted by a FA • Pumping Lemma: If L is a regular language, then there is a constant n such that for each word W in L with length >= n, there are words X, Y, Z such that W=XYZ, length of XY <= n, length of Y >=1, and XYkZ is in L for k integer.

  10. Used to tell when a language is not in a particular class Let L be language of all palindromes over [a,b]. Abbababba (symmetric about midpoint) Is L regular? W = anban (definition of palindrome) W=XYZ, XY = an, Z=ban W=XY2Z=amban in L by pumping lemma, m>n W not in L, not a palindrome, L not regular

  11. Chomsky Hierarchy

  12. Turing Machine Read/Write 010010111011011010101000111101010110100010111010010101010111101010001010101101 Start Reset

  13. Turing machine M • x is a string over M’s alphabet E • R/W head over leftmost symbol in x, M in start state • R/W communicates symbol on tape to control mechnisim in box • M can read symbol, replace symbol, move tape to right or left onecell at a time • If M halts (final state), string y on the tape is M’s output corresponding to input x • Doesn’t necessarily halt for every x • Computes partial function f: E*---->E* • M is same thing as its program, which is a set of quintuples • (q, s, q’, s’, d) where q is current state, s is current symbol, q’is next state, s’ is symbol to be written, and d is direction to move • M’s compute a particular class of functions over intergers called partial recursive functions

  14. Church-Turing Thesis • All notions of effective computability are equivalent. • Therefore, all computers are created equal. • Other schemes: Lambda calculus, General Recursive Functions, etc...

  15. Universal Turing Machine • Fixed Program in Finite Control • Program reads description of Turing Machine from one tape and simulates its behavior on another tape (two tapes) • Universal Machine U, Machine to be simulate T

  16. Fixed program for U is like an interpreter • Tape 1 contains quintuples defining T • Tape 2 intially blank. Same output as T here • Given T’s current state and input symbol, find thequintuple (q, s, q’, s’, d) in the description of T that applies • Record the new state q’, write the new symbol s’ ontape 2, move in direction d, read new symbol on tape 2, andrecord it beside q’

  17. Halting Problem • What is not effectively computable? • It the a TM, M, that does the following: • Given an arbitrary TM, T, as input, and an equally arbitrary tape, t, decide whether T halts on t • Equivalent to does T accept t • Undecidable

  18. Diagonalization

  19. Diagonal Set: _ X X _ X _ Its Complement: X _ _ X _ X The complement of the diagonal is different for every row. Can be extended to infinite sets. Used to show that there are languages that are not acceptable by TM. Therefore, there can be no TM that decides that decides whether arbitrary strings are accepted by arbitrary Turing Machines. Since we canrepresent TM by strings, after some work, it follows that there can be no TM that decides halting problems. Therefore, there are problems that admit no algorithmic solution.

  20. Complexity Classes • P: efficient algorithms • NP: no efficient algorithms found • Check solution in polynomial time • Transform any NP (P is subset) to NP-complete in polynomial time • P = NP ???

  21. Satisfiability (SAT) • Boolean Expression: • (x1+~x3+x4)(~x1+~x2+~x4)(~x2+x3)(~x1+x2+x4) • What combination of variable values (0,1) makes statement true or false (1,0) • 2n combinations • Decision problem: Is formula satisfiable?

  22. NP-complete • NP: Nondeterministic Polynomial Time • 1970, Cook found way to transform every problem in NP to a single, complete problem (satisfiability). • Transform in polynomial time • Instance of one problem has solution if and only if instance of other problem does • Solve any instance of any problem equivalent to solving some instance of SAT

  23. NP-Complete • P and NP are decision problems (answer yes or no) • Optimization problems (minimize or maximize an objective function) • NP-hard • As least as hard as NP-complete decision problem

  24. What to do? • Solve efficiently or prove NP-complete • X In NP? Check solution in polynomial time • Known NP-complete Y to X: Solve X in P then solve Y in P • Solve on specific, easier instances • Exhaustive search • Approximate in polynomial time • Heuristics • Quantum Computer

More Related