630 likes | 745 Views
DNA Structure Notation Operations. Vincenzo Manca Dipartimento di Informatica Universita’ di Verona. 10 Years of Molecular Computing. 1994 Adleman’s Experiment * 1995 Lipton’s Model * 1996 Int. Conf. on Math. Linguistics (Marcus) 1997 Mangalia (Paun, Head)
E N D
DNA Structure NotationOperations Vincenzo Manca Dipartimento di Informatica Universita’ di Verona
10 Years of Molecular Computing • 1994 Adleman’s Experiment * • 1995 Lipton’s Model * • 1996 Int. Conf. on Math. Linguistics (Marcus) • 1997 Mangalia (Paun, Head) • 1998 MFCS Brno (Molecular Computing) • 1999 (Paun’s WMC) • 2000 DNA6 Leiden * • 2001 DNA7 Tampa (FL) : 3-SAT • 2002 DNA8 Sapporo : DNA Duplication • 2004 DNA10 Milan : XPCR Extraction • 2005 DNA11 Ontario : XPCR Recombination
DNA Computing Motto • Problem: Data and Requirements • Algorithm: Solutions • Encode data by DNA strands • Encode algorithms by biotech procedures • Decode final strands as solutions
A General schema of combinatorial problem A set of Requirements for “assignments”, that is, sequences 0/1 of some length nThe Space of possible solutions has E(2,n) elements, but only some of them satisfy the requirements Encode assignments by DNA strandsEncode requirements as biotech protocols that filterthe strands encoding the true solutions
Possible Solutions True Solutions Solution Extraction In linear time !!! Space Generation In linear time
New Trends in DNAC • DNA Self Assembly (Seeman, Winfree, …) • DNA Automata (Shapiro) • DNA Algorithms ==> new biotech protocols
A change of perspective Biotech Protocols DNA Computing Computing DNA Algorithms
In the search for implementing algorithms on DNA, general algorithmic principles are discovered in fundamental biomolecular processes.
H • - OH = • - H2O P P Nucleotides ~330 Dalton 1 Dalton = 1.64 10-24 1 g. H = 6.2 1023 1’--- 1’ = ~ 1nm CH2OH 5’ O 1’ ---- 4’ ---- P B 3’ 2’ 2’ 5’ 3’ CH2 O 1’ 1’ 4’ 4’ H B B O 5’ CH2OH 3’ 2’ A few grams of DNA = the amount of all electronic information stored in all the world
Strings • Strings over an alphabet are sequences of symbols of the alphabet : abbabbba • On strings a concatenation associative operation - - is defined () = () = = • A language L is a set of strings
DNA Sequences are Mobile Double Strings B = {A, T, C, G} B* = strings over B [i,j] || s is a -strand ors : or type(s )= :n or mult()=n
Complementation - c (involutive) Reverse rev (involutive) Mirror mir (involutive) mir()= rev(c) Reverse and Complementationcommute Hybridization || ] [ ] [ Pairing
B = {A, T, C, G} BB* = strings over B : fraction notation Axiom : = rev() rev() ext Overlap --x-- overlapping concatenation Z -> up <- down -> ->/ = ->/
BilinearityComplementarityAntiparallelism The marvelous form 5’ 3’
Hybridization : || mir() ] [ <==> , mir() ] [ <==> ] [ for some Pairing :] [ ==> / rev()
Notation / = = -> / mir() = <> / = rev() = <- ===> <> = <mir()> BB* is the set of DNA strings , BB* B*
A pool P of DNA molecules is a multiset of strands i) Set of strands typed by strings ii) Set of strings with multiplicities P = {s1:1 , s2:2, ….} P = {1: n1 , 2: n2, ….} multP(1) = n1 , multP (2) = n2 s P P
Types of DNA Pools are Languages of BB* Type(T) = { BB* | s : , s T }
Test Tube Operations in DNAC • Denature (Melting) • Renature (Hybridization, Annealing) • Mix • Split • fish (by Affinity) • Remove • length • Separate (Gel Electrophoresis) • Ligate (Ligase) • Extend (Polymerase) • Synthetize (Oligos) • Infix
DNA Ligase ’ ’ ’ ’ ’ ’ Ligase Joins 5' phosphate to 3' hydroxyl
More Complex Operations • Amplification (PCR) • Sequencing • Restriction (R. Enzymes) • Clonation (Plasmide Transinfection)
h(a) PCR with 3’ sticky end b h(b) a long short Linear Exponential
PCR Lemma Given a pool P of type {} and two primers , that hybridize with and respectively ( ][ ).If the extensions e1 and e2 of the two primers with the relative single strands overlap, then an exponential amplification of strands happens which has the blunt form : <e1 Z exte2>which appears within the first two steps.
Operation T of type L T’ of type L’
MathematicallyTest Tube Operations • Type (T) = L means that • Types of strands of T constitute the language L Given some test tubes as arguments with some types provide as results Test tubes with other types
DNA Test Tube Machine Register Machines where: - Registers are Test Tubes (multisets of strands instead of numbers) - DNA Test Tubes operations (instead of arithmetic operations)
Adleman’s Problem Given a Graph (of seven nodes) Find (if there are) The paths from two given nodes (0,6) Passing once for every node (hamiltonian paths)
Adleman - Lipton’s Extract ModelIn Combinatorial Problems • The Generation of all possible solutions in linear time • The Extraction of true solutions in linear time Extraction is performed in a number of sub-steps and each of them selects all the strands that include a sub-strand of a given type
Adleman’s Encoding Bj Ai Bi Bj’ Ai’ Node i = i i i i Arc ij = mir(i j) ic jc i , j = 1, …, 7 |i| = |i| = 10
Adleman’s Algorithm Generation of hamiltonian paths from v1 to v7 Generate paths of G (hybridization/ligation) Perform PCR of primers 0, mir(6) Separate paths of length 140 (7 x 20) for J := 1 to 7 do Select strands where jj occurs output remaining strands
MIX and Split Method Generation of space solutions of N variables Merge X1 and X1 in a tube T Split T into A and B For J := 2 To N Extend strands of A with XJ Extend strands of B with XJ Merge A and B into T Split T into A and B Merge A and B
Lipton’s Algorithm 3-Sat(N, M) • Generate N-space solutions in T • For J = 1 To M • T1 := Extract [T, L(1,J)] • T := T - T1 • T2 := Extrtact[T , L(2,J)] • T := T - T2 • T3 := Extract[T , L(3,J)] • T := Merge(T1, T2) • T := Merge(T, T3) • Detect T • if T , thentake a clone and sequence it (Solution) • else “Unsolvable Problem”
DNA Extraction Strands of type are called -strands (or instances of ) A -strand with including as substring is called a -superstrand ( is a -superstring) Problem: Extract all the -superstrands of a pool P
A Formulation of the DNA Extraction Problem Given an input pool P of heterogeneous DNA strands with the same length and with the same prefix and suffix, and given a string Provide an output pool P[]such that all and only the types of -superstrands of P are represented in P[].
In other words, extraction of -superstrands of P means To provide a pool P[] such that for any two strings : P <==> P[] i.e. the strings represented in P[] are all and only the -superstrings belonging to P. §
Cross Pairing PCR Shortly XPCR
XPCR provides an efficient method for affix concatenation of double strands (Head’s null context splicing rule) N.B. Genome Sequencing is related to Affix Concatenation Closure
h() Melting + Hybridization Polymerase Extension
h() Melting + Hybridization Polymerase Extension