Analysis & Synthesis of Omes

Analysis & Synthesis of Omes DOE Wed 3-Nov-2004 11:30 AM Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen For more info see: arep.med.harvard.edu

Synthetic Biology Tools Systems Biology Loop Metabolic optimality Models Experimental designs (Systematic) Data Flux & Competitive growth DNA&RNA Polony-Seq Syntheses & Perturbations Proteasome targeting Genome engineering

DOE Synthetic Genomes: Why? Cheaper/faster "standard biology", hypothesis testing Systems Biology: Multiple simultaneous tests Viruses: Aid strain transfer; generate variants, new haplotypes Anti-viral vaccines and therapeutics (including variants) In vitro: Make products toxic in E.coli. Microbes: Interspecific hybrids (e.g. codon usage) Structural biology: variants Rapid vaccine response to engineered bioterrorism. Cell-mediated immunity + humoral. Fix mismatch between genome analysis & synthesis

DOE Synthetic Genomes: Why? In vitro Microbial & Human Antimutators Artificial ecosystems (laboratory scales) Energy aiding pathway improvement Instrustrial production: Enzymes, SingleCellProtein, Protein-drugs Remediation: Hybrid genomes (opt. codons), combinatorial pathway (Maxygen & Diversa). Xylose & Oil Pharmaceuticals: Combinatorial syntheses Nano science Combinatorial syntheses, Complex nanosystems, more general nanoassembly (in reach of polymerases and ribosome-like factories) Health research: 10X faster results per current $ (cost/benefit) Hypothesize & test unknown gene combinations Synthetic standards (arrays, MS, quantitation, etc) Agriculture: salt, cold, drought, pest tolerant hybrid genomes

Motif Co-occurrence, comparative genomics, RNA clusters, and/or ChIP2-location data P= 10-6 to 10-11 Genome Res. 14:201–208 Bulyk, McGuire,Masuda,Church

Synthetic testing of DNA motif combinations 1.3 2.4 (1.3 in DargR) 1.1 1.3 0.7 2.5 0.2 1.4 1.4 3.5 RNA Ratio (motif- to wild type) for each flanking gene Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208

Synthetic Genomes&Proteomes. Why? • Test or engineer cis-DNA/RNA-elements • Access to any protein (complex) including • post-transcriptional modifications • Affinity agents for the above. • Mass spectrometry standards, protein design • Utility of molecular biology DNA-RNA-Protein • in vitro "kits" (e.g. PCR, SP6, Roche) • Toward these goals design a chassis: • 115 kbp genome. 150 genes. • Nearly all 3D structures known. • Comprehensive functional data.

(PURE) translation utility Removing tRNA-synthetases, translational release-factors, RNases & proteases Selection of scFvs specific for HBV DNA polymerase using ribosome display. Lee et al. 2004 J Immunol Methods. 284:147 Programming peptidomimetic syntheses by translating genetic codes designed de novo. Forster et al. 2003 PNAS 100:6353 High level cell-free expression & specific labeling of integral membrane proteins. Klammt et al. 2004 Eur J Biochem 271:568 Cell-free translation reconstituted with purified components. Shimizu et al. 2001 Nat Biotechnol. 19:751-5.

yU mS eU UUG UGG CAG | | | | | | | | | ... AUG AAC ACC GUU GAA 5' A 3' fM N T V E in vitro genetic codes 5' 3' Second base A U A C U C A C yU mS U G eU 80% average yield per unnatural coupling. bK = biotinyllysine , mS = Omethylserine eU=2-amino-4-pentenoic acid yU = 2-amino-4-pentynoic acid Forster, et al. (2003) PNAS 100:6353-7

Mirror world : enzyme, parasite, & predator resistance& access 2n diastereomers (n chiral atoms) L-amino acids & D-ribose (rNTPs, dNTPs) Transition: EF-Tu, peptidyl transferase, DNA-ligase D-amino acids & L-ribose (rNTPs, dNTPs) Dedkova, et al. (2003)Enhanced D-amino acid incorporation into protein by modified ribosomes. J Am Chem Soc 125, 6616-7

Oligos for 150 & 776 synthetic genes(for E.coli minigenome & M.mobile whole genome respectively) Forster & Church

Up to 760K Oligos/Chip18 Mbp for $700 raw (6-18K genes) <1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/InvitrogenPhoto-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K NimblegenPhotolabile 5'protection Nuwaysir, Smith, Albert Tian, Gong, Church

Improve DNA Synthesis Cost Synthesis on chips in pools is 5000Xless expensive per oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!) Solution: Amplify the oligos then release them. 10 50 10 => ss-70-mer (chip) => ds-90-mer => ds-50-mer 20-mer PCR primers with restriction sites at the 50mer junctions Tian, Gong, Sheng , Zhou, Gulari, Gao, Church

Improve DNA Synthesis Accuracyvia mismatch selection Other mismatch methods: MutS (&H,L) Tian & Church

Genome assembly 50 75 125 225 425 825 … 100*2^(n-1) Moving forward: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding) 2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. >30 kbp homologous (Nick Reppas) 4. Phage integrase site-specific recombination, also for counters. Stemmer et al. 1995. Gene 164:49-53;Mullis 1986 CSHSQB.

All 30S-Ribosomal-protein DNAs(codon re-optimized) 1.7 kb 0.3 kb Tian, Gong, Sheng , Zhou, Gulari, Gao, Church

Improving synthesis accuracy 9-fold Tian & Church

Extreme mRNA makeoverfor protein expression in vitro RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially. RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable. Solution: Iteratively resynthesize all mRNAs with less mRNA structure. Western blot based on His-tags Tian & Church

Why sequence? • • Cancer: mutation sets for individual clones, loss-of-heterozygosity • • Pathogen "weather map", biowarfare sensors • • RNA splicing & chromatin modification patterns. • Synthetic biology & lab selections • Antibodies or "aptamers" for any protein • B & T-cell receptor diversity: Temporal profiling, clinical • Preventative medicine & genotype–phenotype associations • Cell-lineage during development • Phylogenetic footprinting, biodiversity Shendure et al. 2004 Nature Rev Gen 5, 335.

Sequencing single molecules Ecosystem studies really need single-cell amplification because of multiple chromosomes (& RNAs) (Even an 80% genome coverage is better than 100 kb BACs)

Single bacterial chromosome amplification Ratio to unamplified hybridization along the chromosome of Escherichia & Prochlorococus on Affymetrix chips.

Convergence on non-electrophorectic tag sequencing methods? • Tag >400 14-26 20 100 26 bp (2-ends) • EST SAGE MPSS 454 Polony-Seq • Single-molecule vs. amplified single molecule. • Array vs. bead packing vs. random • Rapid scans vs. long scans (chemically limited, 454) • Number of immobilized primers: • 0: Chetverin'97 "Molecular Colonies" • 1: Mitra'99 > Agencourt "Bead Polonies" • 2: Kawashima'88, Adams'97 > Lynx/Solexa: "Clusters" http://arep.med.harvard.edu/Polonator/Plone.htm

Selector bead Polony Fluorescent In SituSequencing Libraries 1 to 100kb Genomic 2x20bp after MmeI (BceAI, AcuI) LR M M Sequencing primers PCR bead Greg Porreca Abraham Rosenbaum Dressman et al PNAS 2003 emulsion

Cleavable dNTP-Fluorophore (& terminators) Reduce or photo- cleave Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65

0.5% of full gel area Polony-FISSeq: up to 2 billion beads/slide

Polony-FISSeq: up to 2 billion beads/slide Cy5 primer (570nm) ; Cy3 dNTP (666nm) Jay Shendure

Polony FISSeq Stats • # of bases sequenced (total) 23,703,953 • # bases sequenced (unique) 73 • Avg fold coverage 324,711 X • Pixels used per bead (analysis) ~3.6 • Read Length per primer 14-15 bp • Insertions 0.5% • Deletions 0.7% • Substitutions (raw) 4e-5 • Throughput: 360,000 bp/min • Current capillary sequencing 1400 bp/min • (600X speed/cost ratio, ~$5K/1X) • (This may omit: PCR , homopolymer, context errors) Shendure

High accuracy special case: homopolymers (e.g. AAA, CC, etc.) • Use "compressed" tags , ACG = ACCG=ACCCG • Quantitate incorporation • Reversible terminators • "Wobble sequencing" All of these work. • Maintenance of amplification fidelity using linear amplification from initial genomic fragment

"Wobble sequencing" for homopolymers 6 positions * 16 primers * 4 dNTPs => 13 bp (paired ends) CCTCATTCTCT AA + dATP (then C, …) CCTCATTCTCT AC + dATP (then C, …) . . . CCTCATTCTCTnnAA + dATP (then C, …) . . . CCTCATTCTCTnnNNnnNNnnTT + dATP (then C, …) 4.5/64 bp/cycle (for wobble sequencing) vs. 2.5/4 bp/cycle (for simple sequential base-extension)

Analysis & Synthesis of Omes