720 likes | 737 Views
Explore the foundational concepts and practical applications of splicing in formal languages and biochemical procedures. Learn about splicing systems in DNA computing, circular DNA structures, and molecular manipulation techniques.
E N D
Tom Head – splicing systems • There is a solid theoretical foundation for splicing as an operation on formal languages. • In biochemical terms, procedures based on splicing may have some advantages, since the DNA is used mostly in its double stranded form, and thus many problems of unintentional annealing may be avoided. • The basic model is a single tube, containing an initial population of dsDNA, several restriction enzymes, and a ligase. Mathematically this is represented as a set of strings (the initial language), a set of cutting operations, and a set of pasting operations. • It has been proved to a Universal Turing Machine.
Tom Head – splicing systems • These are the techniques that are common in the microbiologist's lab and can be used to program a molecular computer. DNA can be: • synthezisedesired strands can be created • separatestrands can be sorted and separated by length • mergeby pouring two test tubes of DNA into one to perform union • extractextract those strands containing a given pattern • melt/annealbreaking/bonding two ssDNA molecules with complementary sequences • amplifyuse of PCR to make copies of DNA strands • cutcut DNA with restriction enzymes • rejoinrejoin DNA strands with 'sticky ends' • detectconfirm presence or absence of DNA
Tom Head – splicing systems • Initial set (finite or infinite) consists of double-stranded DNA molecules • Specific classes of enzymatic activities considered-those of restriction enzymes • Recombinant behavior modeled and associated sets analyzed by new formalism called Splicing Systems • Attention focused on effect of sets of restriction enzymes and a ligase that allow DNA molecules to be cleaved and Re-associated to produce further molecules.
Splicing systems Circular DNA and Splicing Systems DNA molecules exist not only in linear forms but also in circular forms.
Splicing systems LINEAR SPLICING CIRCULAR
Splicing in nature G|A …ATTGACCC… …CAATCAGG… AT|C …CAAT …ATTG ligase CAGG… ACCC… …ATTGCAGG… …CAATACCC…
Splicing in DNA computing V alphabet u1 u2 r = splicing rule u1, u2, u3, u4 V* u3 u4 (x, y) (z, w) x, y, z, w V* r x = x1u1u2x2 y = y1u3u4y2 x1u1u4y2 = z y1u3u2x2 = w x1, x2, y1, y2 V* r
Extended H-system = (V, T, A, R) V alphabet T V terminal alphabet A V*set of strings R splicing rules L() = *(A) T* if A, R FIN thenL() REG … with permitting context u1 u2 C1 R C1, C2 V* u3 u4 C2 if A, R FINthen L() RE
Rotation h1abs1ct1 h1bs1cAt1 h1a bs1ct1 h1a hAbs1c t1 t1 hA bs1cAtA hA 1 3 2 hA AtA hA AtA h1a AtA h1a AtA h1 at1 h1 at1 h1bs1cAt1 h1bs1cA tA tA hAatA 3 hAa t1 hAa t1 4 h1a hA {s1} {s1, tA} 1 3 hA AtA h1 at1 t1 {hA, s1} tA {h1, s1} 4 2 h1a AtA hAa t1 hAa t1
Păun’s linear splicing operation (1996) r = u1|u2 $ u3|u4 rule : (x u1u2 y, wu3u4z) (x u1 u4z , wu3 u2 y) sites u1 u2 u3 u4 x y Pattern recognition w z u1 u4 x z cut u2 u3 y w paste u1 u4 u3 u2 y x z w
Circular splicing restriction enzyme 1 restriction enzyme 2 ligase enzymes
Circular languages Conjugacy relation on A* w, w A*, w ~ w w = xy, w = yx abaa Example abaa, baaa, aaab, aaba are conjugates Ao = A* o =set of allcircular words ow = [w]o , w A*
Circular languages Circular language C Ao set of equivalence classes A* o A* Cir(L) = {ow | w L} (circularization of L) L L C (A linearization of C, i.e.Cir(L)=C ) {w A*| ow C}=Lin(C) C (Full linearization of C)
Circular languages Definition FAo ={ C Ao| L A*, Cir(L) = C, L FA, FA Chomsky hierarchy} Theorem [Head, Păun, Pixton] C Rego Lin (C) Reg
Circular splicing systems (A= finite alphabet, I Ao initial language) Păun’s definition SCPA = (A, I, R) R A* | A* $ A* | A* rules ohu1u2, oku3u4 Ao r = u1| u2 $ u3 | u4 R u2hu1 u4ku3 ou2hu1 u4ku3
Circular splicing systems Definition A circular splicing language C(SCPA) (i.e. a circular language generated by a splicing system SCPA) is the smallest circular language containing I and closed under the application of the rules in R.
Other splicing systems (A= finite alphabet, I Ao initial language) Head’s definition SCH = (A, I, T) T A* A* A* triples Ao ohpxq, okuxv (p, x, q ), (u,x,v) T vkux ohpx vkux q qhpx Pixton’s definition SCPI = (A, I, R) R A* A* A* rules Ao oh, oh (, ;), (, ; ) R oh h h h
Problem Characterize C(Reg, Fin) FAo C(Fin, Fin) class of circular languages C= C(SCPA) generated by SCPA with I and R both finite sets.
Problem Theorem [Păun96] F{Rego, CFo, REo} R +add. hyp. (symmetry, reflexivity, self-splicing) C(F, Fin) F Theorem [Pixton95-96] F{Rego, CFo, REo} R Fin+add. hyp. (symmetry, reflexivity) C(Rego, Fin)Rego, C(F, Reg) F
Circular finite splicing languages CSo CFo C(Fin, Fin) Rego o((aa)*b) o(an bn) o(aa)* I= oaa o1, R={aa | 1 $ 1 | aa} I= oab o1, R={a | b $ b | a}
Finite automata for circular languages J. Kari and L. Kari Context-free Recombinations, words, sequences, languages where computer science, biology and linguistics meet, C. Martin-Vide, V. Mitrana (Eds.). Kluwer, the Netherlands.
Finite automata for circular languages Definition • Finite automaton A, circular language K-accepted by A, L( A )oK , all words wo such that A has a cycle labeled by w • K–Acceptance Circular/linear language accepted by a finite automaton A, defined as L(A) oL(A), L(A) linear language accepted by automaton A defined in the usual way • Definition A circular/linear language L *o is regular if there is a finite automaton A that accepts the circular and linear parts of L, i.e. that accepts L * and L o
P-acceptance The following definition is equivalent to a definition given by Pixton: the circular language accepted by a finite automaton is a set of all words that label a loop containing at least one initial and one final state. DefinitionGiven a finite automaton A, the circular language accepted by A, L(A)oP is the set of all words ow such that A has a cycle labeled by w that contains at least one final state.
H-acceptance The circular languages accepted by finite automaton by the following definition coincide with the regular circular languages introduced by Head. Given a finite automation A, the circular language accepted by A, L( A )oH is the set of all words ow such that w = u v and v u L( A ) Pixton has shown that if in addition we assume that the family of languages is closed under repetition (i.e., wn is in the language whenever w is) H – acceptance and P – Acceptance are equivalent
K-acceptance Advantages of K-acceptance The same automaton accepts both the linear and circular components of the language
Sources T. Head, Circular Suggestions for DNA Computing, in: Pattern Formation in Biology, Vision and Dynamics, Eds. A.Carbone, M Gromov and P. Prusinkiewicz, World Scientific,Singapore , 2000, pp. 325-335. J. Kari, A Cryptosystem Based on Propositional Logic, in: Machines, Languages and Complexity, 5th International Meeting of Young Computer Scientists, Czeckoslovakia, Nov. 14-18, 1988, Eds. J. Dassow and J.Kelemen, LNCS 381, Springer, 1989, pp.210-219. Rani Siromoney, Bireswar Das, DNA Algorithm for Breaking a Propositional Logic Based Cryptosystem, Bulletin of the EATCS, Number 79, February 2003, pp.170-176.
C-D-E-L model Introducing CUT-DELETE-EXPAND-LIGATE (C-D-E-L) model Combine features in Divide-Delete-Drop (D-D-D) (Leiden) and CUT-EXPAND-LIGATE (C-E-L) (Binghamton) to form CUT-DELETE-EXPAND-LIGATE (C-D-E-L) model This enables us to get an aqueous solution to 3SAT which is a counting problem and known to be in IP. 3SAT Defined as follows: Instance: F a propositional formula of form F = C1C2 …Cm where Ci are clauses and i = 1, 2, …, m. Each Ci is of the form ( li1 li2 li3) where li j , j = 1, 2, 3 are literals from the set of variables {x1 , x2 , … , xn} Question What is the number of truth assignments that satisfy F?
Data register molecule Standard double stranded DNA cloning plasmid are commercially available. A plasmid is a circular molecule approximately 3 kb. It contains a sub-segment, MCS (multiple cloning site) of approximately 175 base pairs that can be removed using a pair of restriction enzyme sites that flank the segment. The MCS contains pair-wise disjoint sites at which restriction enzymes act such that each produces a 5’ overhang.
C-D-E-L model In C-D-E-L, a segment of the plasmid used is of the form …c1s1c1…c2s2c2…cnsnncn… Where ci are called sites, such that no other subsequence of plasmid matches with this sequence and si are called stations and i=1,…,n In D-D-D, lengths of stations are required to be the same However in C-D-E-L, lengths of stations all different which is fundamental in solving #3SAT Bio-molecular operations used in C-D-E-L are similar to the operations in C-E-L
Design x1 , … , xn the variables in F, x1 , … , xn their negations si station associated with xi si station associatd with si ci site associated with station sici site associated with station si vi length of station associated with xi, i=1, …, n vn+j length of station associated with literal xj , j=1,…, n Choose stations in such a way that the sequence [ v1 , … , v2n ] satisfies the property k vi < vk+1 , k = 1, … , 2n-1 i=1 i.e. an Super-increasing (Easy) Knapsack Sequence From sum, sub-sequence efficiently recovered.
Solution • Solution in Cn is analyzed by gel separation • If more than one solution is present, they will be of different lengths, thus will form separate bands • By counting number of bands we count the number of satisfying assignments. • Furthermore, from lengths of satisfying assignment ,exact assignment is read. • This can be done since stations have lengths from easy knapsack sequence any subsequence of an easy knapsack sequence has different sum from the sums of other subsequences.
Solution Thus solution to 3–SAT viz. finding the number of satisfying assignments is effectively done. Moreover, reading the truth assignments is a great advantage to break the cryptosystem based on propositional logic
Advantage Advantage over previous method of attack • In the cryptanalytic attack proposed earlier, modifying D-D-D, it was required to execute the DNA algorithm for each bit in the crypto-text • But in the present method proposed, using C-D-E-L (combining features of C-C-C and C-E-L ) apply 3-SAT on P and read any satisfying assignment from the final solution • This gives an equivalent public key, which amounts to breaking the cryptosystem
Splicing systems so far • H-system • Lipton[94-95a-95b] Formalization and generalization of Adleman’s approach to other NP-complete problems. • Ex H-system • Circular H-system • Sticker system • P-system
Splicing systems so far For computational strength • Turing Equivalence Expansion • Finiteness & Regularity • More Operator Formalization To confirm homogeneity • HPP solving & AGL
Operations of DNA molecules • Separating and fusing DNA strands • Lengthening of DNA • Shortening DNA • Cutting DNA • Multiplying DNA
Separating and fusing DNA strands Denaturation • separating the single strands without breaking them • weaker hydrogen than phosphodiester bonding • heat DNA (85° - 90° C) Renaturation • slowly cooling down • annealing of matching, separated strands
Enzymes Machinery for Nucleotide Manipulation • Enzymes are proteins that catalyze chemical reactions. • Enzymes are very specific. • Enzymes speed up chemical reactions extremely efficiently (speedup: 1012) • Nature has created a multitude of enzymes that are useful in processing DNA.
Lengthening DNA • DNA polymerase enzymes add nucleotides to a DNA molecule Requirements • single-stranded template • primer, • bonded to the template • 3´-hydroxyl end available for extension • Note: Terminal transferase needs no primer.