200 likes | 325 Views
Transformational Grammars and PROSITE Patterns. Roland Miezianko CIS 595 - Bioinformatics Prof. Vucetic. Agenda. Transformational Grammars Definition The Chomsky Hierarchy Finite State Automata FMR-1 Triplet Repeat Region Regular Grammar Example PROSITE Patterns in Regular Grammar Form.
E N D
Transformational Grammarsand PROSITE Patterns Roland Miezianko CIS 595 - Bioinformatics Prof. Vucetic
Agenda • Transformational Grammars • Definition • The Chomsky Hierarchy • Finite State Automata • FMR-1 Triplet Repeat Region • Regular Grammar Example • PROSITE • Patterns in Regular Grammar Form
Assumptions • Treated biological sequences as one-dimensional strings of independent and uncorrelated symbols. • Need to address interaction among base pairs to understand secondary structures.
Secondary Structures • The 3-D folding of proteins and nucleic acids involves extensive physical interactions between residues that are not adjacent in primary sequence. [1] • Require a model for secondary structure that reflect the interaction among base pairs.
Modeling Strings • General theories for modeling strings of symbols has been developed by computational linguists • Chomsky in 1956, 1959 • Interested in how a brain or computer program could algorithmically determine whether a sentence was grammatical or not
Transformational Grammars • Transformational Grammars consist of: • Symbols • Abstract Nonterminal Symbols • Terminal Symbols • Rewriting Rules (Productions) • A --> B
Transformational Grammars, Example Example Grammar Two-letter terminal alphabet: {a, b} Single nonterminal letter: S Three Productions: S->aS S->bS S->e (e=special blank terminal symbol) Example derivation of our simple grammar: S->aS->abS->abbS->abb
Chomsky Hierarchy • Four types of restrictions on grammar’s productions resulted on four classes of grammars. • Regular Grammars • Context-Free Grammars • Context-Sensitive Grammars • Unrestricted Grammars
Chomsky Hierarchy unrestricted context-sensitive context-free regular
Automata • Each grammar has a corresponding abstract computational device called: automaton GrammarParsing Automaton Regular Finite State Context-Free Push-Down Context-Sensitive Linear Bounded Unrestricted Turing Machine
FRM-1 TripletRepeat Region • FRM-1 gene sequence contains CGG which is repeated number of times • Number of triplets is highly variable between individuals • Increased copy number is associated with a genetic disease
FRM-1 TripletRepeat Region • FSA will match any string from the “language” that contains the strings: GCG CTG GCG CGG CTG GCG CGG CGG CTG GCG CGG CGG CGG CGG … CTG
FRM-1 TripletRepeat Region Regular Grammar for our Finite State Automaton finds any number of copies of CGG
PROSITE Patterns • PROSITE database is an example of a biological application of regular grammars • Unlike methods which assign scores to alignments, PROSITE patterns either match a sequence or do not.
PROSITE Patterns • Consists of a string of pattern elements separated by dashes and terminated by a period • Pattern Element – single letter • [ ] - any one letter • { } – anything but enclosed letters • X – any residue can occur • X(y) – any letter of length y
PROSITE Patterns RNP-1 Motif [RK]-G-{EDRKHPCG}-[AGSCI]-[FY]-[LIVA]-x-[FYM].
Conclusion • Transformational grammars are useful in developing acceptors of different length sequences and for matching specific multi-sequence regions. • Higher order grammars in the Chomsky hierarchy are more difficult to program and apply
References [1] Durbin, R. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. University of Cambridge Press, 1998. [2] Gibson, G. A Primer of Genome Science. Sinauer Associates, Inc. Publishers, 2002. [3] Mount, D. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, 2001. [4] PROSITE Database http://us.expasy.org/prosite/
Transformational Grammarsand PROSITE Patterns Questions And Answers