DNA Computing: Mathematics with Molecules

DNA Computing: Mathematics with Molecules Russell DeatonProfessor Comp. Sci. & Engr.The University of Arkansas Fayetteville, AR 72701 rdeaton@uark.edu

What is DNA Computing (DNAC) ? The use of biological molecules, primarily DNA, DNA analogs, and RNA, for computational purposes.

Why Nucleic Acids? • Density (Adleman, Baum): • DNA: 1 bit per nm3, 1020 molecules • Video: 1 bit per 1012 nm3 • Efficiency (Adleman) • DNA: 1019 ops / J • Supercomputer: 109 ops / J • Speed (Adleman): • DNA: 1014 ops per s • Supercomputer: 1012 ops per s

What makes DNAC possible? • Great advances in molecular biology • PCR (Polymerase Chain Reaction) • DNA Microarrays • New enzymes and proteins • Better understanding of biological molecules • Ability to produce massive numbers of DNA molecules with specified sequence and size • DNA molecules interact through template matching reactions

What is a the typical methodology? • Encoding: Map problem instance onto set of biological molecules and molecular biology protocols • Molecular Operations: Let molecules react to form potential solutions • Extraction/Detection: Use protocols to extract result in molecular form

PHYSICAL STRUCTURE OF DNA 20 Å 3’ OH 5’ C Minor Groove 34 Å 5’ 3’ Sugar-Phosphate Backbone Major Groove 5’ 3’ Nitrogenous Base C 5’ 3’ 0H Central Axis

What is an example? • “Molecular Computation of Solutions to Combinatorial Problems” • Adleman, Science, v. 266, p. 1021.

Algorithm • Generate Random Paths through the graph. • Keep only those paths that begin with vin and end with vout. • If graph has n vertices, then keep only those paths that enter exactly n vertices. • Keep only those paths that enter all the vertices at least once. • In any paths remain, say “Yes”; otherwise, say “No”

(-) (+) (+) (-) to Sugar-Phosphate Backbone (+) (-) to Sugar-Phosphate Backbone Guanine Cytosine Hydrogen Bond INTER-STRAND HYDROGEN BONDING (+) (-) (-) (+) to Sugar-Phosphate Backbone to Sugar-Phosphate Backbone Adenine Thymine

B A a b A B a b B A a b B A a b STRAND HYBRIDIZATION 100° C HEAT COOL OR

DNA LIGATION   ’ ’   ’ ’ ’ ’ Ligase Joins 5' phosphate to 3' hydroxyl

Encoding ‘GCATGGCC 0 CCGGTCGA’ 1 CCGGTACC’ ‘AGCTTAGG 2 ‘ATGGCATG 0 1 0 2 ‘GCATGGCCATGGCATG CCGGTACC’ ‘GCATGGCCAGCTTAGG CCGGTCGA’

V0 V1 V2 V3 V4 V5 V6 E0->1 E1->2 E2->3 E3->4 E4->5 E5->6 V4 V5 V1 V2 V0 V6 E4->5 E5->1 E1->2 E0->6 V0 V3 V2 V3 V4 V5 V6 E0->3 E3->2 E2->3 E3->4 E4->5 E5->6 Massively Parallel Search

DNA Polymerase

POLYMERASE CHAIN REACTION

V0 V1 V2 V3 V4 V5 V6 E0->1 E1->2 E2->3 E3->4 E4->5 E5->6 V4 V5 V1 V2 V0 V6 E4->5 E5->1 E1->2 E0->6 V0 V3 V2 V3 V4 V5 V6 E0->3 E3->2 E2->3 E3->4 E4->5 E5->6 Start = V0, Stop = V6

GEL ELECTROPHORESIS - SIZE SORTING Electrode Samples Slower Gel Buffer Electrode Faster

V0 V1 V2 V3 V4 V5 V6 E0->1 E1->2 E2->3 E3->4 E4->5 E5->6 V0 V6 E0->6 V0 V3 V2 V3 V4 V5 V6 E0->3 E3->2 E2->3 E3->4 E4->5 E5->6 Right Length

CACCATGTGAC CACCATGTGAC PMP CACCATGTGAC N B PMP S ANTIBODY AFFINITY Add oligo with Biotin label + B GTGGTACACTG Anneal Heat and cool Add Paramagnetic-Streptavidin Particles + B GTGGTACACTG Bind Isolate with Magnet GTGGTACACTG

V0 V1 V2 V3 V4 V5 V6 E0->1 E1->2 E2->3 E3->4 E4->5 E5->6 V0 V3 V2 V3 V4 V5 V6 E0->3 E3->2 E2->3 E3->4 E4->5 E5->6 Every Vertex

V0 V1 V2 V3 V4 V5 V6 E0->1 E1->2 E2->3 E3->4 E4->5 E5->6 Hamiltonian Path

Mismatches

DNA Word Design • Importance of Template-Matching Hybridization Reactions in DNA Computing (DNAC) • Sequence design should implement DNAC architecture. • Planned Hybridizations • Problem Size • Subsequent Processing Reactions • Designed sequences should minimize unplanned “cross-hybridizations.” • Consequences of Bad Designs: Errors and Poor Efficiency

DNA Word Design • Design problem is hard. • As number of sequences required to represent the problem increases, this constraints increasingly conflicts with the requirement of non-crosshybridization. • How much of DNA sequence space is available for computation?

Why In Vitro? • In Vitro Selection and Evolution • PCR as tool for selection • Ability to synthesis huge, random starting populations • Mutagenesis • Oligos manufactured under conditions for use • Use massive parallelism of DNAC to solve word design problem

Protocol Outline • Start with huge population of random sequences with attached primers. • Anneal rapidly to quench oligos in mismatched configurations. • Using temperature as a control, melt most mismatched pairs. • Amplify and purify • Repeat

Experimental Results

Latest Results

DNA Memories

Overview Sequences Comple- mentary to Input DNAs New Unknown Input DNAs Labeled Tag Sequence Complements Input DNAs (Unknown Seq.) Tag1 Random Probe Learning Recall Output Memory DNA Strands (With the 3’ end Comple- mentary to the Input DNAs) Separates Memory DNA Strands that Match or Partially Match the New Inputs from Those That Don’t Match

Learning • Learning: Information acquired from examples rather than programmed • Protocol to store input DNAs (possibly of unknown sequence) • Higher level representation of the input sequences • Not individual sequence memories but whole populations • Clustering of input sequences in vitro • Massively random and parallel copying or sampling depending on number of inputs and probes

Base-by-Base Amplification Input DNA Tag Probe Extension

Sampling Input DNA Tag Probe Extension

Energy Energy Input Sequence Input Sequence Energy Surface Manipulation through Learning Before Learning After Learning

Tags • Non-Crosshybridizing Sequences • Convenient for Input/Output in absence of input sequence information • Manipulate memory without input sequences • Implement DNA2DNA Computations (Landweber and Lipton, DNA 3)

Recall • Hybridization to retrieve memories • Similar sequences patterns matched • Pattern matching done against whole memory • Single memory associated with single tags • Memory composite of output on multiple tags

Experiments • Test learning and recall with plasmid • Test of sensitivity in concentration • Test coverage of input sequence space with: • Plasmids (5k bp) • E. Coli (5M bp) • Test sequence resolution of protocols

DNA Computing: Mathematics with Molecules